# Welcome to the BITS-ACM ML SIG 2019-20

[Link to github Repository](www.github.com/coolsidd/ML_SIG_2019_Lecture/)

# What is Machine Learning?

Let's play a game. Head over to [Quick Draw](https://quickdraw.withgoogle.com/#)

“A field of study that gives computers the ability to learn without being explicitly programmed.“
-Arthur Samuel


**Definition:** A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

### A concrete example - the MNIST dataset
Suppose you are given an image with a single handwritten number on it. Write a program that outputs a number between 0-9 given the image of single handwritten digit.
![MNIST](./mnist.png "MNIST")

## 1. Linear Regression using Gradient Descent

## The core algorithm - Gradient descent
The main algorithm which allows learning is known as Gradient Descent.

Let us move on to a simple example. Let us consider the following problem:  
Given n data points $(x_{0},y_{0})$, $(x_{1},y_{1})$, ..., $(x_{n-1},y_{n-1})$ such that  
$y_{i}$ = $mx_{i}$ + c + $\eta$ where $\eta$ ~ N(0,1)  
How would we go about finding m and c ?  
We take an iterative approach. Initially we choose any random $m_{0}$, $c_{0}$ and we measure the error using our model. Namely  
E($m_{0}$, $c_{0}$) = $\sum_{i=0}^{n-1}((m_{0}x_{i}-c_{0})-y_{i})^2$  
We tweak our model's parameters - m,c such that the error E decreases over time. We do so by moving in the direction opposite to the derivative $\frac{\mathrm{d}E}{\mathrm{d}x}$. Our goal is to minimize our error.  
Note that  
$\frac{\partial E(m_{j},c_{j})}{\partial m_{j}}$ = $\sum_{i=0}^{n-1}2x_{i}((m_{j}x_{i}-c_{j})-y_{i})$  
$\frac{\partial E(m_{j},c_{j})}{\partial c_{j}}$ = -$\sum_{i=0}^{n-1}2((m_{j}x_{i}-c_{j})-y_{i})$  
Thus, our update now becomes  
$m_{j+1}$ = $m_{j}$ - $\alpha \frac{\partial E(m_{j},c_{j})}{\partial m_{j}}$  
$c_{j+1}$ = $c_{j}$ - $\alpha \frac{\partial E(m_{j},c_{j})}{\partial c_{j}}$  
where $\alpha$ is a step size called the **learning rate**.  

## Visualization - 1.1: Linear Regression
Run the 2 cells below and you can interactively see how gradient descent in action!  
.  
Choose the target values by setting the m_slope and c_intercept parameters. For example if you would like your target line to be y = $3-5x$, then set:      
m_slope = -5  
c_intercept = 3  
.  
Other values that you can change are:  
* num_pts:  Number of Points to consider (max: 200)  
* learning_rate: The step size parameter $\alpha$ discussed earlier. (In general, a small value for $\alpha$, the step size parameter leads to slower convergence while a very high value leads to NaNs in the parameters. A reasonable range is \[0.0001, 0.001\])

In [2]:
# imports
from gradient_visualization import get_line
import numpy as np
np.random.seed(42)

In [3]:
%matplotlib notebook
get_line(
    m_slope=-1,
    c_intercept=60,
    
    initial_m=-5, 
    initial_b=10,
    
    num_pts=100,
    learning_rate=0.0007
)

<IPython.core.display.Javascript object>

Starting gradient descent at:
 b = 10, m = -5, error = 26185.13058985488
After 5000 iterations:
 b = 49.23219703482182, m = -0.7023283812180963, error = 97.816929365528


interactive(children=(IntSlider(value=5000, description='iterations', max=5000), Output()), _dom_classes=('wid…

### Advantages and disadvantages of Linear Regression

Advantages:  
    * Fast (less computations  
    * Easy to implement  
Disadvantages:  
    * Data has to be linearly separable
    * Can only be used to predict continuous values

# 2. Understanding Polynomial Regresison
Suppose you are given some n = 200 data points $(x_{0},y_{0}), (x_{1},y_{1}), ..., (x_{n-1},y_{n-1})$.  
How would you calculate the given function's value at some other point (say $\pi$)?  
<br>
As a first step, you could try to approximate the given function using Polynomials. i.e.  
Find a polynomial f(x) = $a_{0} + a_{1}x + a_{2}x^{2} + ... + a_{n-1}x^{n-1}$ such that  
f($x_{0}$) = $y_{0}$  
f($x_{1}$) = $y_{1}$  
...  
f($x_{n}$) = $y_{n}$  

## Visualization - 2.1: Polynomial regression 
Run the 2 cells below and you can interactively see how we can approximate a given set of data points using polynomials. You are free to chose the degree of the Polynomial and the amount of noise in the Data by adjusting the appropriate sliders.

* func_to_fit: function that needs to approximated.  
  Other functions to try:  
    * y = $\sqrt x$:  ```np.sqrt```
    * y = x:  ```polynomial(0,1)```
    * y = $3-4x+5x^2$:   ```polynomial(3,-4,5)```
* x_range: Range of points on the x-axis
* num_points: Number of Points to consider (max: 200)

In [5]:
# imports - no need to worry about these
%matplotlib notebook
import warnings
import Polynomial_Fit
from Polynomial_Fit import polynomial
import numpy as np
warnings.filterwarnings('ignore')

In [6]:
Polynomial_Fit.poly_regression(func_to_fit=np.sqrt, x_range=(1,50))

<IPython.core.display.Javascript object>

interactive(children=(IntSlider(value=2, description='val', max=30, min=1), Output()), _dom_classes=('widget-i…

interactive(children=(FloatSlider(value=0.3, description='noise', max=1.0, step=0.001), Output()), _dom_classe…

### So why do we require more complex algorithms when we can always use Polynomial Regression?
* Strong assumptions about the data
* Need to choose the parameters - degree and terms 
* High sensitivity to outliers
* State explosion problem - multivariate polynomial regression
* Computationally expensive - requires the inverse of an nxn matrix

# 3. What are Neural Networks?


# 3.1  Why Neural Networks don't work well on images - CNNs
![CNN](./cnn_animation.gif "CNN anim")

# 3.2 - Applying Neural Networks on sequential data - RNNs
![RNN](./seq_model.gif "RNN anim")

## 4. Clustering Data points (Unsupervised)

### What is clustering?  
Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups.

Suppose your are given n points $(x_{0},y_{0})$, $(x_{1},y_{1})$, ..., $(x_{n-1},y_{n-1})$ on a plane, it is probably very easy for you to cluster the given points intuitively.  
But how would you go about writing a program that can cluster any set of points?
This is where clustering algorithms come in to the picture. These algorithms are given the input data and no additional information (that is why they are called *unsupervised*), they can extract some useful structure from the input data.

## Visualization 4.1 - Clustering points
Run the 2 cells below and you can see the DBSCAN clustering algorithm in action.  
Click on the white canvas to add a point,  
the algorithm will continuously cluster all the points on the screen.  
Points belonging to the same cluster will have the same colour.  
DBSCAN (**D**ensity-**B**ased **S**patial **C**lustering of **A**pplications with **N**oise)  finds core samples of high density and expands clusters from them.

In [9]:
%matplotlib notebook
from Clustering_interactive import cluster_pts
cluster_pts()

<IPython.core.display.Javascript object>

Comparison of different clustering algorithms 
![comparison_of_clusters](./cluster_comparison.png "Comparison of different clustering algorithms")


# Table of Contents

1.  [Resources](#orgc2333fa)
    1.  [Prerequisites](#org938ebca)
        1.  [Python](#org7ce6723)
        2.  [Jupyter](#orgcd5df0b)
        3.  [Github](#org4100c87)
        4.  [Numpy and other Libraries](#orga5103c5)
        5.  [Common Terminologies](#orgcc02c15)
        6.  [Links](#org4d4f432)
    2.  [Resource Hubs](#orga30d318)
    3.  [Machine Learning Intro](#org1d02013)
    4.  [Neural Networks](#org612036b)
    5.  [Clustering](#org65b14fe)
    6.  [What are Genetic Algorithms?](#org3c605fe)
    7.  [Cool Applications of AI](#orga1e6d9c)



<a id="orgc2333fa"></a>

# Resources


<a id="org938ebca"></a>

## Prerequisites


<a id="org7ce6723"></a>

### Python

-   [Corey Schafer Videos](https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7)
-   [Automate the boring stuff](https://automatetheboringstuff.com/) : For other interesting python projects


<a id="orgcd5df0b"></a>

### Jupyter

-   [Jupyter Notebook Quickstart](https://jupyter.readthedocs.io/en/latest/install.html)


<a id="org4100c87"></a>

### Github

-   [Github Guides](https://guides.github.com/)
-   [Git tower videos](https://www.git-tower.com/learn/git/videos)
-   [Git tower ebook](https://www.git-tower.com/learn/git/ebook/en/command-line/introduction)


<a id="orga5103c5"></a>

### Numpy and other Libraries

-   [Numpy Quickstart guide](https://docs.scipy.org/doc/numpy/user/quickstart.html)
-   [CS231N Tutorials](http://cs231n.github.io/python-numpy-tutorial/)


<a id="orgcc02c15"></a>

### Common Terminologies

Learn about the following terminologies:

-   Accuracy, Precision and F-Score
-   Curse of Dimensionality
-   Bias Variance Tradoff
-   Loss Functions


<a id="org4d4f432"></a>

### Links

-   [Wikipedia Accuracy vs Precision](https://en.wikipedia.org/wiki/Accuracy_and_precision)
-   [Wikipedia Bias vs Variance](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff)
-   [Wikipedia Curse of Dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality)


<a id="orga30d318"></a>

## Resource Hubs

-   [Awesome Deep Learning - Github \*\*](https://github.com/ChristosChristofidis/awesome-deep-learning)


<a id="org1d02013"></a>

## Machine Learning Intro

-   [Visual Introduction to Machine Learning Part 1](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/)
-   [Visual Introduction to Machine Learning Part 2](http://www.r2d3.us/visual-intro-to-machine-learning-part-2/)
-   [Introduction to Curse of Dimensionality](http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/)
-   [Google&rsquo;s Crash Course](https://developers.google.com/machine-learning/crash-course)


<a id="org612036b"></a>

## Neural Networks
-   [Tensorflow Playground visualisation](https://playground.tensorflow.org/)
-   [Neural Networks as arbitrary function fitters](http://neuralnetworksanddeeplearning.com/chap4.html)
-   [3B1B What is a Neural Network?](https://www.youtube.com/watch?v=aircAruvnKk)
-   [Computational Graphs Explained](https://medium.com/tebs-lab/deep-neural-networks-as-computational-graphs-867fcaa56c9)


<a id="org65b14fe"></a>

## Clustering

-   [Wikipedia Clustering](https://en.wikipedia.org/wiki/Cluster_analysis)
-   [5 Clustering Algorithms](https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68)


<a id="org3c605fe"></a>

## What are Genetic Algorithms?

-   [UC Davis Introduction to Genetic Algorthms](https://web.cs.ucdavis.edu/~vemuri/classes/ecs271/Genetic%20Algorithms%20Short%20Tutorial.htm)
-   [Evolutionary computation course (AEC 02 and 03 only)](https://github.com/lmarti/evolutionary-computation-course)
-   [Genetic Algorithms simply explained](https://lethain.com/genetic-algorithms-cool-name-damn-simple)
-   [David Goldberg&rsquo;s Book on Genetic Algorithms and Soft Computing](./David_E_Goldberg.pdf)
-   


<a id="orga1e6d9c"></a>

## Cool Applications of AI

-   [Pixrv](http://affinelayer.com/pixsrv)
-   [This person does not exist](https://thispersondoesnotexist.com/)
-   [Quick Draw](https://quickdraw.withgoogle.com/)
-   [Google-AI Experiments](https://experiments.withgoogle.com/collection/ai)

