# <center>Hands-on Introduction to Machine Learning</center>

<img src="images/ml_.jpg" alt="Machine Learning" width="500px"/>

## What is Machine Learning?
Machine learning is the field of computing that is aimed and helping computers learn without explicitly being programmed.
This means we don't writes set of rules but rather allow the computer to learn by tuning certain parameters.


## Categories of Machine Learning

### Supervised Learning

Supervised learning is a learning technique where both features/data and expected results are fed to the machine learning algorithm till the algorithms learns to identify what result a set of features mean. The trained algorithm is the tested on new data to validate its performance.

**Sample Supervised Algorithms:**
* Linear Regression
* Logistic Regression
* Support Vector Machines (SVMs)
* Decision Trees and Random Forests
* Neural networks

<img src="images/super.png" alt="Supervised Learning" width="500px"/>


### Unsupervised Learning

Unsupervised learning  just feeds the learning algorithm data without the expected output but with features that should guide the algorithm to finding patterns in the data. It is generally used for finding hidden patterns in data/population

**Sample Supervised Algorithms:**
* k-Means
* Hierarchical Cluster Analysis (HCA)
* Expectation Maximization

<img src="images/unsuper.png" alt="Unsupervised Learning" width="500px"/>

### Reinforcement Learning

<img src="images/rein.png" alt="Reinforcement Learning" width="500px"/>

Reinforcement Learning is a training system that uses rules and policies to train an agent.
Many robots implement Reinforcement Learning algorithms to learn how to walk. DeepMind’s AlphaGo program is also a good example of Reinforcement Learning: it made the headlines in March 2016 when it beat the world champion Lee Sedol at the game of Go.

## Tools for Machine Learning

<img src="images/stats.png" alt="Machine Learning Tools" width="500px"/>

### Numerical Analysis

1. Numpy installed via **pip install numpy**

### Data Wrangling and Analysis Tools

1. Pandas installed via **pip install pandas**

### Data Visualization Tools

1. Matplotlib installed via **pip install matplotlib**
2. Seaborn installed via **pip install seaborn**

### Deep Learning Tools and Libraries

1. Tensorflow via **pip install tensorflow**
2. Keras installed via **pip install keras**
3. Pytorch installed via **pip install torch===1.3.0**

### Statistical Modelling

1. Scikit-learn installed via **pip install scikit-learn**

### <center> Wait I thought this was suppose to be hands-on? </center>
<center> <img src="images/tenor.gif" alt="Hold Up" width="500px"/> </center>

## Lets get our hands dirty with code

We will be working on the iris dataset and using petal length, petal width, sepal length and sepal width to classify these three flowers (iris setosa, iris virginica and iris versicolor) from each other.

To follow along you will need to install numpy, pandas, seaborn, matplotlib and scikit-learn
<center> <img src="images/iris.jpg" alt="Iris dataset" width="500px"/> </center>

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

iris = pd.read_csv("dataset/Iris.csv") #Read in data
iris.head(10) #Show the first 10 rows of data

In [None]:
#Remove the ID column as it doesn't give any useful information about the flowers we are trying to classify
iris.drop('Id',axis=1,inplace=True)

In [None]:
iris.head()

In [None]:
#Check for data inconsistency like missing data etc
iris.info()

In [None]:
#Get Statistical data regarding the dataset
iris.describe()

## Visualize our data using Matplotlib and Seaborn

Like its popularly said a picture is worth a thousand words, data visualization allows to get a pictoral representation of trends within the data.

In [None]:
fig = iris[iris.Species=='Iris-setosa'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='orange', label='Setosa')
iris[iris.Species=='Iris-versicolor'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='blue', label='versicolor',ax=fig)
iris[iris.Species=='Iris-virginica'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='green', label='virginica', ax=fig)
fig.set_xlabel("Sepal Length")
fig.set_ylabel("Sepal Width")
fig.set_title("Sepal Length VS Sepal Width")
fig=plt.gcf()
fig.set_size_inches(10,6)
plt.show()

In [None]:
plt.figure(figsize=(15,10))
plt.subplot(2,2,1)
sns.violinplot(x='Species',y='SepalWidthCm',data=iris)
plt.subplot(2,2,2)
sns.violinplot(x='Species',y='SepalLengthCm',data=iris)
plt.subplot(2,2,3)
sns.violinplot(x='Species',y='PetalWidthCm',data=iris)
plt.subplot(2,2,4)
sns.violinplot(x='Species',y='PetalLengthCm',data=iris)

## Training and testing our model

Sklearn provides numerous algorithms for solving classification problems, we will be focusing on **support vector machines (SVM)** in this hands-on session.

The support vector machine is an algorithm that transforms data using a technique call the kernel trick to find boundaries between data. It is very good classification algorithm

In order to train our model we will have to take first:
1. Divide our data into training and test set, the training set will be fed to the **fit()** function of the SVM along with the labels. while the train set is fed to **predict()** function without the labels and it tells us how well our model performed at predicting the correct label
2. We the get the performance of our test set as the accuracy of our model

**Not Before we import the necessary packages though**

In [None]:
from sklearn import svm  #import Support Vector Machine (SVM) Algorithm from scikit learn
from sklearn import metrics #import metrics from sklearn used to evaluate the performance of the SVM against our algorithm
from sklearn.cross_validation import train_test_split #importing train_test_split for isolating out data into train and test sets

In [None]:
print("Shape of original data: ", iris.shape)
train, test = train_test_split(iris, test_size = 0.3) #spliting the data into 30% testing and hence 70% training
print("Shape of train data: ", train.shape)
print("Shape of test data: ", test.shape)

In [None]:
train_X = train[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']] #Separating feature data from expected label or class
train_y=train.Species #Storing expected label separately

train_X.head()

In [None]:
test_X = test[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']] #Separating feature data from expected label or class
test_y=test.Species #Storing expected label separately

test_y.head()

In [None]:
model = svm.SVC() #create instance of SVM
model.fit(train_X,train_y) #Feed our train and test data to the fit method for training
prediction=model.predict(test_X) #Pass test data to the predict method for see how it performs on data who's labels it doesn't know
print('SVM Model accuracy: ',metrics.accuracy_score(prediction,test_y)) #Evaluate the test predictions against the expected prediction

## Congrats you've built and tested your first model

Now lets use our model to make predictions

Iris-virginica test data: 5.4,	2.6,	4.7,	1.9

In [None]:
#Collect sepal and petal data
sepal_len = float(input("Enter the sepal length: "))
sepal_wid = float(input("Enter the sepal width: "))
petal_len = float(input("Enter the petal length: "))
petal_wid = float(input("Enter the petal width: "))

In [None]:
pred = model.predict([[sepal_len, sepal_wid, petal_len, petal_wid]])
print(pred)