This repo holds the contents developed for the tutorial, Exploratory Data Analysis in Python, presented at PyCon 2017 on May 17, 2017.
We suggest setting up your environment and testing it (as detailed below) and then following along the video of the tutorial found here.
As there was limited time for instruction, we also recommend pausing throughout and practicing some of the methods discussed as you go.
We welcome any PRs with other demonstrations of how you would perform EDA on the provided datasets.
Before the tutorial
Microsoft Azure option
If you don't want to deal with setting up your environment or have any problems with the below instructions, you can work through the tutorial through Microsoft Azure Notebooks by creating an account and cloning the tutorial library found here (all of this is for free, forever).
1. Clone this repo
Clone this repository locally on your laptop.
- Go to the green Clone or download button at the top of the repository page and copy the https link.
- From the command line run the command:
git clone https://github.com/cmawer/pycon-2017-eda-tutorial.git
2. Set up your python environment
Install conda or miniconda
We recommend using conda for managing your python environments. Specifically, we like miniconda, which is the most lightweight installation. You can install miniconda here. However, the full anaconda is good for beginners as it comes with many packages already installed.
Create your environment
Once installed, you can create the environment necessary for running this tutorial by running the following command from the command line in the
setup/ directory of this repository:
conda update conda
conda env create -f environment.yml
This command will create a new environment named
Activate your environment
To activate the environment you can run this command from any directory:
source activate eda3 (Mac/Linux)
activate eda3 (Windows)
If you are experienced in python and do not use conda, the
requirements.txt file is available also in the
setup/ directory for pip installation. This was our environment frozen as is for a Mac. If using Windows or Linux, you may need to remove some of the version requirements.
We will be using widgets to create interactive visualizations. They will have been installed during your environment setup but you still need to run the following from the commandline:
jupyter nbextension enable --py --sys-prefix widgetsnbextension
4. Test your python environment
Now that your environment is set up, let's check that it works.
- Go to the
setup/directory from the command line and start a Jupyter notebook instance:
a lot of text should appear -- you need to leave this terminal running for your Jupyter instance to work.
Assuming this worked, open up the notebook titled
Once the notebook is open, go to the
Cellmenu and select
Check that every cell in the notebook ran (i.e did not produce error as output).
test-my-environment.htmlshows what the notebook should look like after running.