Dmitry Efimov
Dmitry Efimov committed 0fde3cc Mar 31, 2017


MTH594 Advanced data mining: theory and applications

The materials for the course MTH 594 Advanced data mining: theory and applications taught by Dmitry Efimov in American University of Sharjah, UAE in Spring, 2016 semester. The program of the course can be downloaded from the folder syllabus.

To compose this lectures mainly I used the ideas from three sources:

  1. Stanford lectures by Andrew Ng on YouTube:
  2. The book "The elements of Statistical Learning" by T. Hastie, R. Tibshirani and J. Friedman:
  3. Lectures by Andrew Ng on Coursera:

All uploaded pdf lectures are adapted in a way to help students to understand the material.

The supplementary files from ipython folder are aimed to teach students how to use built-in methods to train the models on Python 2.7.

In case you found some mistakes or typos, please email me, this course is a new for me and probably there are some :)

The content of the lectures:

Supervised learning

Linear and logistic regressions, perceptrons

Linear regression

Analytical minimization: normal equations

Statistical interpretation

Logistic regression


Bayesian interpretation and regularization

Python implementation

Linear regression
Logistic regression

Methods of optimization

Gradient descent

Examples of gradient descent

Newton's method

Python implementation

Batch gradient descent
Stochastic gradient descent

Generalized linear models (GLM)

Exponential family

Generalized Linear Models (GLM)

Python implementation

Softmax regression

Generative learning algorithms

General idea of generative algorithms


Gaussian discriminant analysis

Generative vs Discriminant comparison

Naive Bayes

Laplace smoothing

Event models

Python implementation

Gaussian discriminant analysis
Naive Bayes

Neural networks



Python implementation

Support vector machines

Support vector machines: intuition

Primal/dual optimization problem and KKT

SVM dual


Kernel examples

Kernel testing

SVM with kernels

Soft margin

SMO algorithm

Python implementation

Coordinate ascent
SMO algorithm

Nonparametric methods

Locally weighted regression

Generalized additive models (GAM)

GAM for regression

GAM for classification

Tree-based methods

Regression trees

Classification trees


Exponential loss
Gradient boosting
Gradient tree boosting

Python implementation

Locally weighted regression
GAM for regression
GAM for classification
Regression decision trees
Classification decision trees
Gradient tree boosting

Learning theory

Bias / variance

Empirical risk minimization (ERM)

Union bound / Hoeffding inequality

Uniform convergence

VC dimension

Model selection

Feature selection

Python implementation

Cross validation

Online learning

Advices for apply ML algorithms

Unsupervised learning



Python implementation

Mixture of Gaussians and EM algorithm

Mixture of Gaussians

Jensen's inequality

General EM algorithm

EM algorithm for the mixture of Gaussians

EM algorithm for the mixture of Naive Bayes

Python implementation

Mixture of Gaussians
EM algorithm for mixture of Gaussians

Factor analysis


Marginal and conditionals for Gaussians

Factor analysis model

EM steps for factor analysis

Python implementation

Principal component analysis

PCA algorithm

Latent semantic indexing (LSI)

Python implementation

Independent component analysis (ICA)


The materials for the course MTH 594 Advanced data mining: theory and applications (Dmitry Efimov, American University of Sharjah)



