# Introduction to Machine Learning

Machine Learning "gives computer the ability to learn without being explicitly programmed".

**How is it related to AI?**

AI is about building *intelligent* machines. ML is about building *intelligent* machines that *learn* (from data). 

**How is it different from traditional programming?**

First. What is a *program*? Well, a program specifies the computation required to generate an output from an input. Normally, it's us, as the programmers, who specify that computation required. 

But, in machine learning that computation is "learned" by the machine itself i.e it finds the program, the computation required.

And that is extremely useful because for a lot of complex (and even seemingly simple) problems, we cannot exactly specify the computation required. 

**Example 1**: Converting Temperature in Celcius (input) to Temperature in Kelvin (output) can be explicitly programmed. But say from Temperature in Celcius (input) we need to calculate/predict no. of buckets of ice cream sold at a store. That cannot be explicitly programmed.

**Example 2**: How would we create a program to recognize "2"? 

Could it handle all these variations?

<img src="../images/twos.png" width="400"/>

## Types of Machine Learning problems

There are many ways we can divide ML problems. But generally we can think in terms of these axes:

I. Supervised (has labeled data):
- Regression (continuous valued)
- Classification (Categorical)

II. Unsupervised (has unlabeled data):
- Dimensionality Reduction, etc (continous valued)
- Clustering (categorical)

## Goalkeeper or not? 

Given a player's height and weight, predict whether they are a goalkeeper or an outfield player.

<img src="../images/gk.jpg" width="300"/>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn

In [None]:
df = pd.read_csv('../data/players.csv')

In [None]:
df.head()

## Exercise 1

Scatterplot (using `lmplot`) `height` and `weight`. Also use 'Position' for the `hue` parameter.

In [None]:
#Code Here

### Exercise 2

Create a new column `Target` by mapping `Position` column. `GK` becomes `1`, `Outfield` becomes `0`.

## Train-Test Split

### Exercise 3

Split `df` into `train_df` (70%) and `test_df` (30%). 

Hint: Use indexing with `loc`.

In [None]:
# Code Here
# train_df = ...
# test_df = ...

## Self-Made Classifier

We will define a classifier by ourselves by figuring out a threshold that makes sense. 

### Exercise 4

i) Plot `height` distribution of the two groups. Call `sns.distplot` twice. First with data only for `GK`s. Again with data only for `Outfield`s. 

ii) Do the same for `weight`. 

In [None]:
# Code Here

### Exercise 5

Define a function `classify` that takes 2d numpy list (where each row contains height and weight of one player) and returns a list of predictions. Prediction is `1` if it thinks the person is a `GK` else `0`.

Define thresholds yourself from the above plots. 

Eg: If `height` > ... and `weight` > ... , predict `1`.

In [None]:
# Code Here

### Exercise 6

Compute accuracy of the classifier on the test set. 

You can do it manually by comparing the `Target` column and generated predictions. Or you can use `sklearn.metrics.accuracy_score`.

In [None]:
# Code Here

## Classification with KNN

In [None]:
from sklearn.neighbors import KNeighborsClassifier

In [None]:
classifier = KNeighborsClassifier(n_neighbors=7)

In [None]:
train_x, train_y = train_df[['height', 'weight']].values, train_df['Target'].values

In [None]:
classifier.fit(train_x, train_y)

### Exercise 7

- Extract `test_x` and `test_y`.
- Make predictions. Use `predict` function of the classifier.
- Find Accuracy

In [None]:
# Code Here

## Resources

* [Intro to ML: DAT4](https://github.com/justmarkham/DAT4/blob/master/slides/06_ml_knn.pdf)