Skip to content
Hands-on in-person workshop for Data Analysis with Python
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
figs Update figs Dec 8, 2019
images
notebooks
presentations
.gitignore
CODE_OF_CONDUCT.md
CONTRIBUTING.md
LICENSE
README.md
check_environment.py
environment.yml
postBuild

README.md

Quick Start

The workshop code is available as Jupyter notebooks. You can run the notebooks in the cloud (no installation required) by clicking the "launch binder" button:

Binder

Why

For people who struggle to start in data analysis with Python

Description

This hands-on in-person workshop is based on Data Analysis with Python Course by IBM Cognitive Class

Learn how to prepare data for analysis, perform simple statistical analyses, create meaningful data visualizations, predict future trends from data using Jupyter-based environment.

Outline

The workshop will cover core topics:

01 Intro Colab

Problem Attributes Types
  • Understanding the Domain
  • Understanding the Dataset
  • Python package for data science
  • Importing and Exporting Data in Python
  • Basic Insights from Datasets

02 Data Wrangling Colab

Distribution Bins Histogram
  • Identify and Handle Missing Values
  • Data Formatting
  • Data Normalization Sets
  • Binning
  • Indicator variables

03 EDA Colab

Heatmap Scatterplot Boxplot
  • Descriptive Statistics
  • Basic of Grouping
  • ANOVA
  • Correlation

04 Model Development Colab

3rd Polynomial Actual/Fitted 11th Polynomial
  • Simple and Multiple Linear Regression
  • Model Evaluation Using Visualization
  • Polynomial Regression and Pipelines
  • R-squared and MSE for In-Sample Evaluation
  • Prediction and Decision Making

05 Model Evaluation Colab

5th Polynomial R^2 4 Features
  • Model Evaluation
  • Over-fitting, Under-fitting and Model Selection
  • Ridge Regression
  • Grid Search

Prerequisite

Pre-workshop

You will need a laptop that can access the internet

1: Installation

Install miniconda or install the (larger) Anaconda distribution

Install Python 3.7 using Miniconda

OR Install Python 3.7 using Ananconda

2: Setup

2.1: Download workshop code & materials

Clone the repository

git clone git@github.com:aymanibrahim/dapy.git

OR Download the repository as a .zip file

2.2: Change directory to dapy

Change current directory to dapy directory

cd dapy

2.3: Install Python with required packages

Install Python 3.7 with the required packages into an environment named dapy as per environment.yml YAML file.

conda env create -f environment.yml

When conda asks if you want to proceed, type "y" and press Enter.

3: Activate environment

Change the current default environment (base) into dapy environment.

conda activate dapy

4: Install & Enable ipywidgets extentions

Enable ipywidgets Jupyter Notebook extension

jupyter contrib nbextension install --user
jupyter nbextension enable --py widgetsnbextension
jupyter nbextension enable python-markdown/main

# Notebooks w/ extensions that auto-run code must be "trusted" to work the first time
jupyter trust ./notebooks/05_Model_Evaluation.ipynb

Install ipywidgets JupyterLab extension

jupyter labextension install @jupyter-widgets/jupyterlab-manager

Enable widgetsnbextension

jupyter nbextension enable --py widgetsnbextension --sys-prefix

5: Check installation

Use check_environment.py script to make sure everything was installed correctly, open a terminal, and change its directory (cd) so that your working directory is the workshop directory dapy you cloned or downloaded. Then enter the following:

python check_environment.py

If everything is OK, you will get the following message:

Your workshop environment is set up

6: Start JupyterLab

Start JupyterLab using:

jupyter lab

JupyterLab will open automatically in your browser.

You may access JupyterLab by entering the notebook server’s URL into the browser.

7: Stop JupyterLab

Press CTRL + C in the terminal to stop JupyterLab.

8: Deactivate environment

Change the current environment (dapy) into the previous environment.

conda deactivate

Workshop Instructor

Ayman Ibrahim, PMP

References

Contributing

Thanks for your interest in contributing! There are many ways to contribute to this project. Get started here.

License

Workshop Code

License: MIT

Workshop Materials

Creative Commons License

Data Analysis with Python Workshop by Ayman Ibrahim is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at IBM Cognitive Class Data Analysis with Python by Joseph Santarcangelo, PhD. and Mahdi Noorian, PhD.

You can’t perform that action at this time.