Machine Learning From Scratch

This is my repository on learning Machine Learning from scratch, if you want to checkout my repository on Deep learning with TensorFlow click here 👉: TensorFlow-Deep-Learning

Number	Notebook	Description
00	Basic ML Intuition	What is ML, Bias and Variances?
01	Data Preprocess Template	Data preprocess template
02	Regression	Simple linear regression, multiple, poly, ...
03	Classification	Logistic regression, knn, svm, ...
04	Clustering	KMeans clustering, Hierarchical clustering
05	Association Rule Learning	Apriori, Eclat
06	Reinforcement Learning	UCB, Thomson Sampling
07	NLP	Introduction to nlp
08	Dimensionality Reduction	PCA, Kernel PCA, LDA
##	Model Selection	Model selection: regression, classifcation
##	Case Study	Case study

Details

Basic Intuition

Math

StatQuest: The Normal Distribution clearly explained

Machine learning Fundamentals

Regression

Number	Notebook	Extras
01	Simple Linear Regression
02	Multiple Linear Regression	When to multiple linear regression
03	Polynomial Regression	Polynomial Regression
04	Support Vector Regression	Introduction to SVR, `kernels`
05	Decision Tree Regression	Decision Tree Regression, Decision Tree ML
06	Random Forest Regression	Random Forest, Random Forest ML

Regression: Pros and cons

Regression Model	Pros	Cons
Linear Regression	Works on any size of the dataset, gives informations about relevance of features.	The Linear Regression Assumptions.
Polynomial Regression	Works on any size of dataset, works very well on non linear problems.	Need to choose the right polynomial degree for a good bias, variance tradeoff.
SVR	Easily adaptable, works very well on non linear problems, not bias by outlier.	Compulsory to apply feature scaling, not well documentated, more difficult to understand.
Decision Tree Regression	Interpretablity, no need for feature scaling, works on both linear, nonlinear problems.	Poor Results on too small datasets, overfitting can easily occur.
Random Forest Regression	Powerful and accurate, good performance on may problems, including nonlinear.	Poor Results on too small datasets, overfitting can easily occur.

Classification

Number	Notebook	Extras
01	Logistic Regression	StatQuest: Logistic Regression
02	K-Nearest-Neighbours	StatQuest: KNN
03	Support Vector Machine	StatQuest: SVM
04	Kernel SVM	StatQuest: Polinomial Kernel StatQuest: RBF kernel
05	Naive Bayes	StatQuest: Naive Bayes StatQuest: Gaussian Naive Bayes
06	Decision Tree	StatQuest: Decision Tree Regression
04	Random Forest Regression	StatQuest: Random Forest

Classifications: Pros and Cons

Classification Model	Pros	Cons
Logistic Regression	Probabilistics approach, gives informations about statiscal significance of features.	The Logistic Regression Assumptions.
K-NN	Simple to understand, fast and efficient.	Need to choose the number of neighbours K.
SVM	Performant, not biased by outliers, not sensitive to overfitting.	Not appropriate for nonlinear problems, not the best choice for large number of features.
Kernel SVM	High performance on nonlinear problems, not biased by outliers, not sensitive to overfitting.	Not the best choice for large number of features, more complex.
Naive Bayes	Efficient not biased by outliers, works on nonlinear problems, probabilitstic approach.	Based on the assumption that features have same statistical relevance.
Decision Tree Classification	Interpretability, no need for feature scaling, works on both linear, nonlinear problems.	Poor results on too small datasets, overfitting can easily occur.
Random Forest Classification	Powerful and accurate, good performance on many problems, including nonlinear.	No interpretability, overfitting can easily occur, need to choose the number of trees.

Clustering

Number	Notebook	Extras
01	KMean	StatQuest: KMeans Clustering , WCSS and Elbow method
02	Hierarchical	StatQuest: Hierarchical Clustering , Dendrogram method

Clustering: Pros and Cons

Regression Model	Pros	Cons
K-Means	Simple to understand, easily adaptable, works well on small or large datasets, fast, efficient and performant.	Need to choose the number of cluster.
Hierarchical Clustering	The optimal number of clusters can be obtained by the model itself, pratical visualization with the dendrogram.	Not appropriate for large datasets.

Association Rule Learning

Number	Notebook	Extras
01	Apriori	Apriori Algorithm
02	Eclat

Reinforcement Learning

Number	Notebook	Extras
01	Upper Confidence Bound	Confidence Bounds, UCB and Multi-armed bandit problem
02	Thomson Sampling	Thomson Sampling

The Multi-Armed Bandit Problem

NLP

Number	Notebook	Extras
01	Introduction to nlp

Dimensionality Reduction

Number	Notebook	Extras
01	Principal Component Analysis	setosa-PCA example, StatQuest-PCA, plotly-PCA visualization

Model selection

Number	Notebooks	Extras
01	Regression
02	Classification	The Accuracy paradox, AUC-ROC and CAP Curves, Precision, Recall and F-1 score

Case study

Number	Notebooks	Extras
01	Logistic Regression	Breast Cancer classifier

Extras

Datasets:

Blogs:

Acknowledge:

Thanks Kirill Eremenko, Hadelin de Ponteves for creating such an awesome about machine learning online.
Thanks Josh Starmer aka StatQuest for your brilliant video about machine learning, help me alot of understanding the math behind the ML algorithm.
Thanks mr Vũ Hữu Tiệp for your brilliant blogs about machine learning, helps me a lot from the day i didn't know what is machine learning is.
Thanks mr Phạm Đình Khánh for your blogs about machine learning and deep learning.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
Data		Data
Part 1 - Data Preprocessing		Part 1 - Data Preprocessing
Part 2 - Regression		Part 2 - Regression
Part 3 - Classification		Part 3 - Classification
Part 4 - Clustering		Part 4 - Clustering
Part 5 - Association Rule Learning		Part 5 - Association Rule Learning
Part 6 - Reinforcement Learning		Part 6 - Reinforcement Learning
Part 7 - NLP/Section 36 - Introduction to NLP		Part 7 - NLP/Section 36 - Introduction to NLP
Part 8 - Dimensionality Reduction/Section 43 - PCA		Part 8 - Dimensionality Reduction/Section 43 - PCA
Part Extra - CaseStudy/Breast Cancer Logistic Regression		Part Extra - CaseStudy/Breast Cancer Logistic Regression
Part Extra - Model Selection		Part Extra - Model Selection
.gitattributes		.gitattributes
README.md		README.md

BaoLocPham/Machine-Learning-with-Scikit-Learn

Folders and files

Latest commit

History

Repository files navigation

Machine Learning From Scratch

Table of contents

Details

Basic Intuition

Math

Machine learning Fundamentals

Regression

Regression: Pros and cons

Classification

Classifications: Pros and Cons

Clustering

Clustering: Pros and Cons

Association Rule Learning

Reinforcement Learning

The Multi-Armed Bandit Problem

NLP

Dimensionality Reduction

Model selection

Case study

Extras

Datasets:

Blogs:

Acknowledge:

About

Topics

Resources

Stars

Watchers

Forks

Languages