The workshop code is available as Jupyter notebooks. You can run the notebooks in the cloud (no installation required) by clicking the "launch binder" button:
For people who struggle to start in data analysis with Python
This hands-on in-person workshop is based on Data Analysis with Python Course by IBM Cognitive Class
Learn how to prepare data for analysis, perform simple statistical analyses, create meaningful data visualizations, predict future trends from data using Jupyter-based environment.
The workshop will cover core topics:
- Understanding the Domain
- Understanding the Dataset
- Python package for data science
- Importing and Exporting Data in Python
- Basic Insights from Datasets
- Identify and Handle Missing Values
- Data Formatting
- Data Normalization Sets
- Indicator variables
- Descriptive Statistics
- Basic of Grouping
|3rd Polynomial||Actual/Fitted||11th Polynomial|
- Simple and Multiple Linear Regression
- Model Evaluation Using Visualization
- Polynomial Regression and Pipelines
- R-squared and MSE for In-Sample Evaluation
- Prediction and Decision Making
|5th Polynomial||R^2||4 Features|
- Model Evaluation
- Over-fitting, Under-fitting and Model Selection
- Ridge Regression
- Grid Search
You will need a laptop that can access the internet
Install miniconda or install the (larger) Anaconda distribution
2.1: Download workshop code & materials
Clone the repository
git clone firstname.lastname@example.org:aymanibrahim/dapy.git
2.2: Change directory to dapy
Change current directory to dapy directory
2.3: Install Python with required packages
Install Python 3.7 with the required packages into an environment named dapy as per environment.yml YAML file.
conda env create -f environment.yml
When conda asks if you want to proceed, type "y" and press Enter.
3: Activate environment
Change the current default environment (base) into dapy environment.
conda activate dapy
4: Install & Enable ipywidgets extentions
Enable ipywidgets Jupyter Notebook extension
jupyter contrib nbextension install --user jupyter nbextension enable --py widgetsnbextension jupyter nbextension enable python-markdown/main # Notebooks w/ extensions that auto-run code must be "trusted" to work the first time jupyter trust ./notebooks/05_Model_Evaluation.ipynb
Install ipywidgets JupyterLab extension
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter nbextension enable --py widgetsnbextension --sys-prefix
5: Check installation
Use check_environment.py script to make sure everything was installed correctly, open a terminal, and change its directory (cd) so that your working directory is the workshop directory dapy you cloned or downloaded. Then enter the following:
If everything is OK, you will get the following message:
Your workshop environment is set up
6: Start JupyterLab
Start JupyterLab using:
JupyterLab will open automatically in your browser.
You may access JupyterLab by entering the notebook server’s URL into the browser.
7: Stop JupyterLab
Press CTRL + C in the terminal to stop JupyterLab.
8: Deactivate environment
Change the current environment (dapy) into the previous environment.
Ayman Ibrahim, PMP
- Python: Programming language
- Conda: Package and environment manager
- Anaconda: Python distribution
- Miniconda: Minimal installer for conda
- NumPy: Fundamental package for scientific computing with Python
- Matplotlib: Python 2D plotting library
- seaborn: Statistical Data Visualization
- pandas: Python data analysis library
- scikit-learn: Machine Learning in Python
- Jupyter Notebook: Web application to create documents with code, equations, visualizations and text
- JupyterLab: Web-based development environment for Jupyter Notebooks
- Python for Data Science: Course by IBM Cognitive Class
- Data Analysis with Python: Course by IBM Cognitive Class
Thanks for your interest in contributing! There are many ways to contribute to this project. Get started here.
Data Analysis with Python Workshop by Ayman Ibrahim is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at IBM Cognitive Class Data Analysis with Python by Joseph Santarcangelo, PhD. and Mahdi Noorian, PhD.