Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
163 lines (104 sloc) 3.93 KB

Python Data Science Workspace

This repository contains a Workspace for doing Data Science in Python.

Table of Contents

Requirements

Installation and setup

How to set-up the workspace the first time

  1. If not already existing, create a conda environment:

     conda create -n data_science python=3.5
    
  2. Activate the environment:

     eval $(make setup)
    
  3. Setup the workspace

     make initialize
    
  4. Reactivate the environment:

     eval $(make setup)
    
  5. GPU support for Jupyter:

    For computers on linux with optimus, you have to make a kernel that will be called with "optirun" to be able to use GPU acceleration. For this go to the following folder:

     cd ~/.local/share/jupyter/kernels/
    

    then edit the file python3/kernel.json in order to add "optirun" as first entry into the argv array:

     {
         "language": "python",
         "display_name": "Python 3",
         "argv": [
             "optirun",
             "/home/fabien/.conda/envs/data_science/bin/python",
             "-m",
             "ipykernel",
             "-f",
             "{connection_file}"
         ]
     }
    
  6. Load the submodules

     git submodule init
     git submodule update
    

How to use the workspace

  1. Activate the environment (if not already activated on this session):

     eval $(make setup)
    
  2. Set Spark environment variables

     export SPARK_HOME=/opt/spark
     export PATH=$SPARK_HOME/bin:$PATH
    
  3. Start Jupyter Notebook with the start Makefile's target:

     make
    

How to update the workspace (after an upstream update)

  1. Get the last changes from upstream

     git pull
    
  2. Activate the environment (if not already activated on this session):

     eval $(make setup)
    
  3. Update the dependencies:

     make update
    
  4. Reactivate the environment:

     eval $(make setup)
    
  5. Update submodules

     git submodule init
     git submodule update
    

How to upgrade the workspace (upgrading python packages)

  1. Activate the environment (if not already activated on this session):

     eval $(make setup)
    
  2. Upgrade the dependencies:

     make upgrade
    
  3. Reactivate the environment:

     eval $(make setup)
    

Facets

Facets is a tool for the visual exploration of datasets. It can be installed as following:

jupyter nbextension install facets/facets-dist/ --user

Then jupyter notebook should be started with an additionnal command line option:

--NotebookApp.iopub_data_rate_limit=10000000

The visualization can then be loaded as explained in the demo notebook.

Interesting notebook extensions

I recommend installing the following notebook extension:

  • Code prettify
  • Codefolding
  • Collapsible Headings
  • contrib_nbextensions_help_item
  • Execute time
  • Initialization cells
  • Nbextensions dashboard tab
  • Nbextensions edit menu item
  • Notify
  • Python Markdown
  • Runtools
  • ScrollDown
  • Skip-Traceback
  • spellchecker
  • table_beautifier
  • Table of Contents (2)
  • Tree Filter
  • VIM binding