Timing: 4-6 pm PST on Tuesdays and Fridays from Nov 2nd, 2021 to Nov 23rd, 2021 (7 sessions in total)
Where: Online on Zoom
The workshop series is designed with a focus on the practical aspects of machine learning using real-world datasets and the tools in the Python ecosystem. It is targeted towards complete beginners familiar with Python but is also designed adaptively so that you will be challenged even if you have some familiarity with machine learning tools.
You will learn the minimal but most useful tools for exploring datasets using pandas
and then gently introduced to neural networks. Some concepts from natural language processing will also be covered as you will train neural network models on textual data. You will also learn more involved architectures such as Convolution Neural Networks (CNN) and apply them to real-world image datasets.
Register using this Google form to save your seat. Please also register for the Zoom meeting here. After registering, you will receive a confirmation email containing information about joining the Zoom meeting.
Each session of the workshop will build on the previous ones. It is important that you attend all the sessions of the series for it to be useful. The learning material and solutions will be made available in this Github repository after each session.
- The workshop will cover the data science and deep learning tools in the Python ecosystem from the scratch. Some familiarity with Python is a pre-requisite. If you have a grip on the basics of coding in some other language such as Javascript, that should suffice too.
- Basics of Probability and Statistics
- Basics of Calculus
- Basics of Linear Algebra
Here is an optional quiz to brush up your Python skills before the workshop.
Please download and install Anaconda with Python 3.8 version on your laptop ahead of the workshop.
- Introduction to Jupyter Notebook
- Pandas dataframes as a data structure
- Indexing and slicing data frames
- Data exploration
- Basic statistical plots using
matplotlib
andseaborn
- Detecting and filling missing values
- Regular expressions for text mining
- More on
pandas
- Groupby operations - One hot encoding for categorical features
- An exercise on preprocessing the movie reviews from the IMDb dataset using regular expressions
- Binary classification algorithm: Logistic Regression
- Underfitting and Overfitting to the training dataset; Model cross-validation
- Natural language processing (NLP) concepts: Bag Of Words (BOW) model, TF-IDF vectorizor, using word n-grams, etc.
- Application of Logistic Regression and NLP concepts using
scikit-learn
on the IMDb dataset to predict the sentiment (positive or negative) of the movie reviews
- Linear Regression
- Neural networks: Building the intuition of the architecture and the iterative learning process
- An exercise on implementing AND, OR and XOR gates with neural networks by trial-and-error
- Multi-Layer Perception: Forward and Backward propagation
- A primer on
Keras
- Training a neural network on IMDb dataset for sentiment analysis
- Vanishing gradients and exploding gradients in deep networks
- Activation functions
- Weight Initialization
- Regularization - L1 and L2, Dropout
- Tuning other hyper-parameters such as learning rate, number of epochs, etc.
- Exploring the TensorFlow Playground
- Application of the above concepts on IMDb dataset for training a neural network for sentiment analysis
- Image preprocessing for neural networks
- Feature extraction using convolution filters
- Convolution Neural Network architecture (CNN)
- Training a CNN model on CIFAR-10 dataset
- Imbalanced datasets and classification metrices:
- Confusion matrix
- Decision Threshold
- Precision/Recall
- F1-score
- Area Under ROC curve
- Mini-project: Building a spam detector using dataset from Kaggle
This page will be frequently updated with more information.