Skip to content


Repository files navigation

A Hands-on Workshop series in Machine Learning

Timing: 4-6 pm PST on Tuesdays and Fridays from Nov 2nd, 2021 to Nov 23rd, 2021 (7 sessions in total)
Where: Online on Zoom

The workshop series is designed with a focus on the practical aspects of machine learning using real-world datasets and the tools in the Python ecosystem. It is targeted towards complete beginners familiar with Python but is also designed adaptively so that you will be challenged even if you have some familiarity with machine learning tools.

You will learn the minimal but most useful tools for exploring datasets using pandas and then gently introduced to neural networks. Some concepts from natural language processing will also be covered as you will train neural network models on textual data. You will also learn more involved architectures such as Convolution Neural Networks (CNN) and apply them to real-world image datasets.

Register using this Google form to save your seat. Please also register for the Zoom meeting here. After registering, you will receive a confirmation email containing information about joining the Zoom meeting.

Each session of the workshop will build on the previous ones. It is important that you attend all the sessions of the series for it to be useful. The learning material and solutions will be made available in this Github repository after each session.


  1. The workshop will cover the data science and deep learning tools in the Python ecosystem from the scratch. Some familiarity with Python is a pre-requisite. If you have a grip on the basics of coding in some other language such as Javascript, that should suffice too.
  2. Basics of Probability and Statistics
  3. Basics of Calculus
  4. Basics of Linear Algebra

Here is an optional quiz to brush up your Python skills before the workshop.

Please download and install Anaconda with Python 3.8 version on your laptop ahead of the workshop.

Topics to be covered:

1. Data Manipulation using pandas (Tuesday, Nov 2nd, 2021)

  • Introduction to Jupyter Notebook
  • Pandas dataframes as a data structure
  • Indexing and slicing data frames
  • Data exploration
  • Basic statistical plots using matplotlib and seaborn
  • Detecting and filling missing values
  • Regular expressions for text mining

2. More on pandas and Regular Expressions (Friday, Nov 5th, 2021)

  • More on pandas - Groupby operations
  • One hot encoding for categorical features
  • An exercise on preprocessing the movie reviews from the IMDb dataset using regular expressions

3. Logistic Regression (Tuesday, Nov 9th, 2021)

  • Binary classification algorithm: Logistic Regression
  • Underfitting and Overfitting to the training dataset; Model cross-validation
  • Natural language processing (NLP) concepts: Bag Of Words (BOW) model, TF-IDF vectorizor, using word n-grams, etc.
  • Application of Logistic Regression and NLP concepts using scikit-learn on the IMDb dataset to predict the sentiment (positive or negative) of the movie reviews

4. A Gentle Introduction to Neural Networks (Friday, Nov 12th, 2021)

  • Linear Regression
  • Neural networks: Building the intuition of the architecture and the iterative learning process
  • An exercise on implementing AND, OR and XOR gates with neural networks by trial-and-error
  • Multi-Layer Perception: Forward and Backward propagation
  • A primer on Keras
  • Training a neural network on IMDb dataset for sentiment analysis

5. Fine-tuning Neural Networks (Tuesday, Nov 16th, 2021)

  • Vanishing gradients and exploding gradients in deep networks
  • Activation functions
  • Weight Initialization
  • Regularization - L1 and L2, Dropout
  • Tuning other hyper-parameters such as learning rate, number of epochs, etc.
  • Exploring the TensorFlow Playground
  • Application of the above concepts on IMDb dataset for training a neural network for sentiment analysis

6. Convolution Neural Networks (Friday, Nov 19th, 2021)

  • Image preprocessing for neural networks
  • Feature extraction using convolution filters
  • Convolution Neural Network architecture (CNN)
  • Training a CNN model on CIFAR-10 dataset

7. Classification metrices (Tuesday, Nov 23rd, 2021)

  • Imbalanced datasets and classification metrices:
    • Confusion matrix
    • Decision Threshold
    • Precision/Recall
    • F1-score
    • Area Under ROC curve
  • Mini-project: Building a spam detector using dataset from Kaggle

This page will be frequently updated with more information.


No description, website, or topics provided.






No releases published


No packages published