Skip to content

10 Installation and Configuration Guide

Guerrilla Analytics edited this page Apr 8, 2018 · 5 revisions

Pre-requisites

This training is tested on Ubuntu 16.04 LTS.

Installations

You will need to install or have access to the following software to complete this training.

Anaconda distribution of Python

The Anaconda Distribution of Python will give you access to the Python libraries, Python environment and Jupyter Notebooks.

nb_conda extension for Jupyter

  • install nb_conda using the conda install command so you can select your Python environments from Jupyter.

PostgreSQL database

The PostgreSQL database is used as the datastore for this training.

  • Install a PostgreSQL database.
  • Install a database client like PGAdmin so you can configure the database and write queries against it.

Git

You will use Git to step through different versions of the training.

Optional installations

NBExtensions for Jupyter

Some of the Unofficial Extensions for Jupyter may be helpful. In particular, the execution time extension is useful for time consuming machine learning executions.

  • install the extensions following the instructions here. Using Anaconda for installation is preferred.

Configuration

GitHub

  • you will need a GitHub account so you can access this project code and contribute

PostgreSQL

  • Set up a .pgpass password file as described here or use an alternative connection method that allows you to avoid writing database credentials into your code.

Anaconda Python environment

A key principle with Guerrilla Analytics and proper Data Science is that of reproducibility. To that end, you should be able to reproduce your coding environment (Python version and associated package versions).

The root of the project contains an Anaconda environment definition file environment.yml. Execute the following command to enable this environment on your machine:

  • conda env create -f environment.yml

Note on creating the Anaconda Python environment

The following commands were used to create the original environment and export it to a file.

  • conda create --no-default-packages -n proj001_lfb python=3.5
    • note that the environment is given the same name as the project name ID. This is another 0 documentation convention to avoid confusion in a working environment where you may want many Anaconda environments.
    • note that the environment is created as a bare bones --no-default-packages environment to keep size and bloat to a minimum.
  • source activate proj001_lfb
    • this activates the environment just created
  • conda install ipykernel
    • this allows Jupyter to see the environment you have defined. Note, if you have not activated the correct environment in Jupyter, you will probably see an error ImportError: No module named... .
  • conda install -y psycopg2 sqlalchemy pandas seaborn numpy matplotlib
    • this installs all packages that the project depends on
  • conda env export -f environment.yml
    • this exports the environment definition into a file so it can be re-imported by other project contributors