Skip to content
Notebook for quick search
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE Initial commit Nov 18, 2017

Machine Learning Specialization

Course can be found in Coursera

Partial notes can be found in my blog SSQ

1. Machine Learning Foundations: A Case Study Approach

Course can be found in Coursera

Programming Assignments

Slides and more details about this course can be found in my Github SSQ

  • Week 1 Introduction

    • Regression. Case study: Predicting house prices
    • Classification. Case study: Analyzing sentiment
    • Clustering & Retrieval. Case study: Finding documents
    • Matrix Factorization & Dimensionality Reduction. Case study: Recommending Products
    • Capstone. An intelligent application using deep learning
    • Familiar with Ipython notebook and Sframe
  • Week 2 Regression Predicting House Prices

  • Week 3 Classification Analyzing Sentiment

2. Machine Learning: Regression

Course can be found in Coursera

Description Programming Assignments
  • Linear regression
  • Regularization: Ridge (L2), Lasso (L1)
  • Nearest neighbor and kernel regression
  • Gradient descent
  • Coordinate descent
  • Loss functions, bias-variance tradeoff
  • cross-validation, sparsity, overfitting
  • model selection, feature selection

Slides and more details about this course can be found in my Github SSQ

  • Week 1: Simple Linear Regression:

    • Describe the input (features) and output (real-valued predictions) of a regression model
    • Calculate a goodness-of-fit metric (e.g., RSS)
    • Estimate model parameters to minimize RSS using gradient descent
    • Interpret estimated model parameters
    • Exploit the estimated model to form predictions
    • Discuss the possible influence of high leverage points
    • Describe intuitively how fitted line might change when assuming different goodness-of-fit metrics
    • Fitting a simple linear regression model on housing data
  • Week 2: Multiple Regression: Linear regression with multiple features

    • Describe polynomial regression
    • Detrend a time series using trend and seasonal components
    • Write a regression model using multiple inputs or features thereof
    • Cast both polynomial regression and regression with multiple inputs as regression with multiple features
    • Calculate a goodness-of-fit metric (e.g., RSS)
    • Estimate model parameters of a general multiple regression model to minimize RSS:
      • In closed form
      • Using an iterative gradient descent algorithm
    • Interpret the coefficients of a non-featurized multiple regression fit
    • Exploit the estimated model to form predictions
    • Explain applications of multiple regression beyond house price modeling
    • Exploring different multiple regression models for house price prediction
    • Implementing gradient descent for multiple regression
  • Week 3: Assessing Performance

    • Describe what a loss function is and give examples
    • Contrast training, generalization, and test error
    • Compute training and test error given a loss function
    • Discuss issue of assessing performance on training set
    • Describe tradeoffs in forming training/test splits
    • List and interpret the 3 sources of avg. prediction error
      • Irreducible error, bias, and variance
    • Discuss issue of selecting model complexity on test data and then using test error to assess generalization error
    • Motivate use of a validation set for selecting tuning parameters (e.g., model complexity)
    • Describe overall regression workflow
    • Exploring the bias-variance tradeoff
  • Week 4: Ridge Regression

    • Describe what happens to magnitude of estimated coefficients when model is overfit
    • Motivate form of ridge regression cost function
    • Describe what happens to estimated coefficients of ridge regression as tuning parameter λ is varied
    • Interpret coefficient path plot
    • Estimate ridge regression parameters:
      • In closed form
      • Using an iterative gradient descent algorithm
    • Implement K-fold cross validation to select the ridge regression tuning parameter λ
    • Observing effects of L2 penalty in polynomial regression
    • Implementing ridge regression via gradient descent
  • Week 5: Lasso Regression: Regularization for feature selection

    • Perform feature selection using “all subsets” and “forward stepwise” algorithms
    • Analyze computational costs of these algorithms
    • Contrast greedy and optimal algorithms
    • Formulate lasso objective
    • Describe what happens to estimated lasso coefficients as tuning parameter λ is varied
    • Interpret lasso coefficient path plot
    • Contrast ridge and lasso regression
    • Describe geometrically why L1 penalty leads to sparsity
    • Estimate lasso regression parameters using an iterative coordinate descent algorithm
    • Implement K-fold cross validation to select lasso tuning parameter λ
    • Using LASSO to select features
    • Implementing LASSO using coordinate descent
  • Week 6: Going nonparametric: Nearest neighbor and kernel regression

    • Motivate the use of nearest neighbor (NN) regression
    • Define distance metrics in 1D and multiple dimensions
    • Perform NN and k-NN regression
    • Analyze computational costs of these algorithms
    • Discuss sensitivity of NN to lack of data, dimensionality, and noise
    • Perform weighted k-NN and define weights using a kernel
    • Define and implement kernel regression
    • Describe the effect of varying the kernel bandwidth λ or # of nearest neighbors k
    • Select λ or k using cross validation
    • Compare and contrast kernel regression with a global average fit
    • Define what makes an approach nonparametric and why NN and kernel regression are considered nonparametric methods
    • Analyze the limiting behavior of NN regression
    • Use NN for classification
    • Predicting house prices using k-nearest neighbors regression

3. Machine Learning: Classification

Course can be found in Coursera

Description Programming Assignments
  • Linear classifiers
  • Logistic regression
  • Decision trees
  • Ensembles
  • Stochastic gradient descent
  • Recursive greedy
  • Boosting
  • Decision boundaries, MLE
  • ensemble methods, online learning
Core ML
  • Alleviating overfitting
  • Handling missing data
  • Precision-recall
  • Online learning

Slides and more details about this course can be found in my Github

  • Week 1:
    • Linear Classifiers & Logistic Regression
      • decision boundaries
      • linear classifiers
      • class probability
      • logistic regression
      • impact of coefficient values on logistic regression output
      • 1-hot encoding
      • multiclass classification using the 1-versus-all
      • Predicting sentiment from product reviews
  • Week 2:
    • Learning Linear Classifiers
      • Maximum likelihood estimation
      • Gradient ascent algorithm for learning logistic regression classifier
      • Choosing step size for gradient ascent/descent
      • (VERY OPTIONAL LESSON) Deriving gradient of logistic regression
      • Implementing logistic regression from scratch
    • Overfitting & Regularization in Logistic Regression
  • Week 3:
    • Decision Trees
      • Predicting loan defaults with decision trees
      • Learning decision trees
        • Recursive greedy algorithm
        • Learning a decision stump
        • Selecting best feature to split on
        • When to stop recursing
      • Using the learned decision tree
        • Traverse a decision tree to make predictions: Majority class predictions; Probability predictions; Multiclass classification
      • Learning decision trees with continuous inputs
        • Threshold splits for continuous inputs
        • (OPTIONAL) Picking the best threshold to split on
      • Identifying safe loans with decision trees
      • Implementing binary decision trees from scratch
  • Week 4
    • Overfitting in decision trees
      • Identify when overfitting in decision trees
      • Prevent overfitting with early stopping
        • Limit tree depth
        • Do not consider splits that do not reduce classification error
        • Do not split intermediate nodes with only few points
      • Prevent overfitting by pruning complex trees
        • Use a total cost formula that balances classification error and tree complexity
        • Use total cost to merge potentially complex trees into simpler ones
      • Decision Trees in Practice for preventing overfitting
    • Handling missing data
      • Describe common ways to handling missing data:
        1. Skip all rows with any missing values
        2. Skip features with many missing values
        3. Impute missing values using other data points
      • Modify learning algorithm (decision trees) to handle missing data:
        1. Missing values get added to one branch of split
        2. Use classification error to determine where missing values go
  • Week 5
    • Boosting
      • Identify notion ensemble classifiers
      • Formalize ensembles as the weighted combination of simpler classifiers
      • Outline the boosting framework – sequentially learn classifiers on weighted data
      • Describe the AdaBoost algorithm
        • Learn each classifier on weighted data
        • Compute coefficient of classifier
        • Recompute data weights
        • Normalize weights
      • Implement AdaBoost to create an ensemble of decision stumps
      • Discuss convergence properties of AdaBoost & how to pick the maximum number of iterations T
      • Exploring Ensemble Methods with pre-implemented gradient boosted trees
      • Implement your own boosting module
  • Week 6
    • Evaluating classifiers: Precision & Recall
      • Classification accuracy/error are not always right metrics
      • Precision captures fraction of positive predictions that are correct
      • Recall captures fraction of positive data correctly identified by the model
      • Trade-off precision & recall by setting probability thresholds
      • Plot precision-recall curves.
      • Compare models by computing precision at k
      • Exploring precision and recall
  • Week 7
    • Scaling to Huge Datasets & Online Learning
      • Significantly speedup learning algorithm using stochastic gradient
      • Describe intuition behind why stochastic gradient works
      • Apply stochastic gradient in practice
      • Describe online learning problems
      • Relate stochastic gradient to online learning
      • Training Logistic Regression via Stochastic Gradient Ascent

4. Machine Learning: Clustering & Retrieval

Course can be found in Coursera

Description Programming Assignments
  • Nearest neighbors
  • Clustering, mixtures of Gaussians
  • Latent Dirichlet allocation (LDA)
  • K-means, MapReduce
  • K-NN, KD-trees, locality-sensitive hashing (LSH)
  • Expectation-maximization (EM)
  • Gibbs sampling
  • Distance metrics, approximation algorithms,
  • hashing, sampling algorithms, scaling up with map-reduce
Core ML
  • Unsupervised learning
  • Probabilistic modeling
  • Data parallel problems
  • Bayesian inference

Slides and more details about this course can be found in my Github SSQ

  • Week 1 Intro

  • Week 2 Nearest Neighbor Search: Retrieving Documents

    • Implement nearest neighbor search for retrieval tasks
    • Contrast document representations (e.g., raw word counts, tf-idf,…)
      • Emphasize important words using tf-idf
    • Contrast methods for measuring similarity between two documents
      • Euclidean vs. weighted Euclidean
      • Cosine similarity vs. similarity via unnormalized inner product
    • Describe complexity of brute force search
    • Implement KD-trees for nearest neighbor search
    • Implement LSH for approximate nearest neighbor search
    • Compare pros and cons of KD-trees and LSH, and decide which is more appropriate for given dataset
    • Choosing features and metrics for nearest neighbor search
    • Implementing Locality Sensitive Hashing from scratch
  • Week 3 Clustering with k-means

    • Describe potential applications of clustering
    • Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
    • Determine whether a task is supervised or unsupervised
    • Cluster documents using k-means
    • Interpret k-means as a coordinate descent algorithm
    • Define data parallel problems
    • Explain Map and Reduce steps of MapReduce framework
    • Use existing MapReduce implementations to parallelize kmeans, understanding what’s being done under the hood
    • Clustering text data with k-means
  • Week 4 Mixture Models: Model-Based Clustering

    • Interpret a probabilistic model-based approach to clustering using mixture models
    • Describe model parameters
    • Motivate the utility of soft assignments and describe what they represent
    • Discuss issues related to how the number of parameters grow with the number of dimensions
      • Interpret diagonal covariance versions of mixtures of Gaussians
    • Compare and contrast mixtures of Gaussians and k-means
    • Implement an EM algorithm for inferring soft assignments and cluster parameters
      • Determine an initialization strategy
      • Implement a variant that helps avoid overfitting issues
    • Implementing EM for Gaussian mixtures
    • Clustering text data with Gaussian mixtures
  • Week 5 Latent Dirichlet Allocation: Mixed Membership Modeling

    • Compare and contrast clustering and mixed membership models
    • Describe a document clustering model for the bagof-words doc representation
    • Interpret the components of the LDA mixed membership model
    • Analyze a learned LDA model
      • Topics in the corpus
      • Topics per document
    • Describe Gibbs sampling steps at a high level
    • Utilize Gibbs sampling output to form predictions or estimate model parameters
    • Implement collapsed Gibbs sampling for LDA
    • Modeling text topics with Latent Dirichlet Allocation
  • Week 6 Hierarchical Clustering & Closing Remarks

    • Bonus content: Hierarchical clustering
      • Divisive clustering
      • Agglomerative clustering
        • The dendrogram for agglomerative clustering
        • Agglomerative clustering details
    • Hidden Markov models (HMMs): Another notion of “clustering”
    • Modeling text data with a hierarchy of clusters
You can’t perform that action at this time.