-
Notifications
You must be signed in to change notification settings - Fork 2
10 Installation and Configuration Guide
This training is tested on Ubuntu 16.04 LTS.
You will need to install or have access to the following software to complete this training.
The Anaconda Distribution of Python will give you access to the Python libraries, Python environment and Jupyter Notebooks.
- install nb_conda using the
conda install
command so you can select your Python environments from Jupyter.
The PostgreSQL database is used as the datastore for this training.
- Install a PostgreSQL database.
- Install a database client like PGAdmin so you can configure the database and write queries against it.
You will use Git to step through different versions of the training.
Some of the Unofficial Extensions for Jupyter may be helpful. In particular, the execution time extension is useful for time consuming machine learning executions.
- install the extensions following the instructions here. Using Anaconda for installation is preferred.
- you will need a GitHub account so you can access this project code and contribute
- Set up a .pgpass password file as described here or use an alternative connection method that allows you to avoid writing database credentials into your code.
A key principle with Guerrilla Analytics and proper Data Science is that of reproducibility. To that end, you should be able to reproduce your coding environment (Python version and associated package versions).
The root of the project contains an Anaconda environment definition file environment.yml
. Execute the following command to enable this environment on your machine:
conda env create -f environment.yml
The following commands were used to create the original environment and export it to a file.
-
conda create --no-default-packages -n proj001_lfb python=3.5
- note that the environment is given the same name as the project name ID. This is another 0 documentation convention to avoid confusion in a working environment where you may want many Anaconda environments.
- note that the environment is created as a bare bones
--no-default-packages
environment to keep size and bloat to a minimum.
-
source activate proj001_lfb
- this activates the environment just created
-
conda install ipykernel
- this allows Jupyter to see the environment you have defined. Note, if you have not activated the correct environment in Jupyter, you will probably see an error
ImportError: No module named...
.
- this allows Jupyter to see the environment you have defined. Note, if you have not activated the correct environment in Jupyter, you will probably see an error
-
conda install -y psycopg2 sqlalchemy pandas seaborn numpy matplotlib
- this installs all packages that the project depends on
-
conda env export -f environment.yml
- this exports the environment definition into a file so it can be re-imported by other project contributors