# Introduction to Machine Learning (ML)

## The two approaches of ML
1. Supervised Learning 
2. Unsupervised Learning

<img height="50%" width="50%" src = "https://camo.githubusercontent.com/a537edffaf330adda5e556ddd87e5a61bd78914fac9120ada446594abae696a7/68747470733a2f2f692e766173336b2e72752f3777312e6a7067" />

## 1. Supervised Learning

<p>Tn this approach, the algorithm learns from a <b>training dataset</b> while the human-teacher acts as a supervisor in the learning process. Thus, as the algorithm makes predictions, the teacher can correct them, until the algorithm achieves an <b>acceptable level</b> of performance. Generally, supervised learning algorithms aim to model the <b>relationship between independent variables and a target variable</b>, whereas the result is either <b>classified</b> (discrete target variable) or a <b>regression</b> (continuous target variable).    

### The scikit-learn library

The proposed library provides not only a consistent interface to Machine Learning models, but also a very exceptional documentation and an active community that supports users

#### Description of datasets
<p> Each datasets consists of rows and columns:
    <ol>
    <li> Each row is related to an <b>observation</b> (e.g. particles, plants, crops) </li>
        <li> Each column considered to be a <b>feature</b>, knows also as predictor, attribute, independent variable, input, regressor, covariate (e.g. cm, weight, salary et cetera) </li>
    </ol>
    
<p> One of the these features has to be the target value needed to be predicted 
    

#### Loading datasets

In [1]:
import pandas as pd

In [2]:
filename = 'weatherAUS.csv'
df = pd.read_csv(filename)

#### Explore dataset

In [3]:
df.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,W,...,71.0,22.0,1007.7,1007.1,8.0,,16.9,21.8,No,No
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,NNW,...,44.0,25.0,1010.6,1007.8,,,17.2,24.3,No,No
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,W,...,38.0,30.0,1007.6,1008.7,,2.0,21.0,23.2,No,No
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,SE,...,45.0,16.0,1017.6,1012.8,,,18.1,26.5,No,No
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,ENE,...,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,No,No


In [4]:
df.describe()

Unnamed: 0,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,WindSpeed9am,WindSpeed3pm,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm
count,143975.0,144199.0,142199.0,82670.0,75625.0,135197.0,143693.0,142398.0,142806.0,140953.0,130395.0,130432.0,89572.0,86102.0,143693.0,141851.0
mean,12.194034,23.221348,2.360918,5.468232,7.611178,40.03523,14.043426,18.662657,68.880831,51.539116,1017.64994,1015.255889,4.447461,4.50993,16.990631,21.68339
std,6.398495,7.119049,8.47806,4.193704,3.785483,13.607062,8.915375,8.8098,19.029164,20.795902,7.10653,7.037414,2.887159,2.720357,6.488753,6.93665
min,-8.5,-4.8,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,980.5,977.1,0.0,0.0,-7.2,-5.4
25%,7.6,17.9,0.0,2.6,4.8,31.0,7.0,13.0,57.0,37.0,1012.9,1010.4,1.0,2.0,12.3,16.6
50%,12.0,22.6,0.0,4.8,8.4,39.0,13.0,19.0,70.0,52.0,1017.6,1015.2,5.0,5.0,16.7,21.1
75%,16.9,28.2,0.8,7.4,10.6,48.0,19.0,24.0,83.0,66.0,1022.4,1020.0,7.0,7.0,21.6,26.4
max,33.9,48.1,371.0,145.0,14.5,135.0,130.0,87.0,100.0,100.0,1041.0,1039.6,9.0,9.0,40.2,46.7


#### Name of columns (Features)

In [5]:
features_names = df.columns
features_names

Index(['Date', 'Location', 'MinTemp', 'MaxTemp', 'Rainfall', 'Evaporation',
       'Sunshine', 'WindGustDir', 'WindGustSpeed', 'WindDir9am', 'WindDir3pm',
       'WindSpeed9am', 'WindSpeed3pm', 'Humidity9am', 'Humidity3pm',
       'Pressure9am', 'Pressure3pm', 'Cloud9am', 'Cloud3pm', 'Temp9am',
       'Temp3pm', 'RainToday', 'RainTomorrow'],
      dtype='object')

#### Dimensions

In [6]:
rows, cols = df.values.shape[0],df.values.shape[1]
rows,cols

(145460, 23)

#### Data Cleaning and Preparation

In [7]:
df = df.drop('Location',1)
df = df.drop('WindGustDir',1)
df = df.drop('WindDir9am',1)
df = df.drop('WindDir3pm',1)
df = df.drop('Date',1)
df = df.fillna(0.0)
df.loc[df.RainToday == "Yes", 'RainToday'] = 1
df.loc[df.RainToday == "No", 'RainToday'] = 1
df.loc[df.RainTomorrow == "Yes", 'RainTomorrow'] = 1
df.loc[df.RainTomorrow == "No", 'RainTomorrow'] = 1

In [13]:
dataForPrediction = df.iloc[0,:]
df = df.iloc[1:,:]

#### Split features from target column

In [14]:
features = df.iloc[:, 0:-1]
target = df.iloc[:,-1]

### K-nearest neighbors (KNN) classification
1. Choose the k-value.
2. Identify the k-observations included in the training data, which have the minimum distance from the measurements of the value needed to be predicted.
3. Group by label these k-observation and select the one with the higher frequency to be set as the predicted response value.

#### Basic Steps
1. Load library
2. Initialize the estimator
3. Train the model (or fit the data)
4. Make prediction for a new observation

In [15]:
features.head()

Unnamed: 0,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,WindSpeed9am,WindSpeed3pm,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday
1,7.4,25.1,0.0,0.0,0.0,44.0,4.0,22.0,44.0,25.0,1010.6,1007.8,0.0,0.0,17.2,24.3,1
2,12.9,25.7,0.0,0.0,0.0,46.0,19.0,26.0,38.0,30.0,1007.6,1008.7,0.0,2.0,21.0,23.2,1
3,9.2,28.0,0.0,0.0,0.0,24.0,11.0,9.0,45.0,16.0,1017.6,1012.8,0.0,0.0,18.1,26.5,1
4,17.5,32.3,1.0,0.0,0.0,41.0,7.0,20.0,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,1
5,14.6,29.7,0.2,0.0,0.0,56.0,19.0,24.0,55.0,23.0,1009.2,1005.4,0.0,0.0,20.6,28.9,1


In [21]:
dataForPrediction = dataForPrediction.drop('RainTomorrow')

In [22]:
dataForPrediction.values.shape, features.values.shape

((17,), (145459, 17))

In [28]:
target = target.astype('int')

In [29]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(features, target)

KNeighborsClassifier(n_neighbors=3)

In [31]:
prediction = knn.predict([dataForPrediction])

In [33]:
prediction[0]

1