# Introdução

# Table of contents
[1. Loading and preparing data](#1.-Loading-and-preparing-data)

[2. Standardization](#2.-Standardization)

[3. Normalization](#3.-Normalization)

[4. Predict with new data](#4.-Predict-with-new-data)

Vamos dar início ao estudo...

# 1. Loading and preparing data

In [8]:
import pandas as pd
import numpy as np

from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score, accuracy_score

In [9]:
data = pd.read_csv('https://raw.githubusercontent.com/MaksonViini/Teste/master/Machine%20Learning%20Models/Datasets/Advertising.csv')

In [10]:
data.head()

Unnamed: 0.1,Unnamed: 0,TV,Radio,Newspaper,Sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9


### Spliting data

In [None]:
X = data.drop(['Unnamed: 0', 'Sales'], axis=1)
y = data['Sales']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# 2. Standardization

## 2.1 Standard Scaler

In [74]:
pipe_scaler = Pipeline([('scaler', StandardScaler()),
                        ('lr', LinearRegression())])

In [79]:
pipe_scaler.fit(X_train, y_train)

Pipeline(steps=[('scaler', StandardScaler()), ('lr', LinearRegression())])

In [80]:
pipe_scaler.score(X_test, y_test)

0.8935163320163659

In [81]:
y_pred = pipe_scaler.predict(X_test)

In [82]:
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print('Valores do Mean Absolute Error: ', mae)
print('Valores do Mean Squared Error: ', mse)
print('Valores do Root Mean Squared Error: ', rmse)
print('Valores do R2 score: ', r2)

Valores do Mean Absolute Error:  1.4023124989385065
Valores do Mean Squared Error:  2.88002373009419
Valores do Root Mean Squared Error:  1.6970632663793621
Valores do R2 score:  0.8935163320163659


# 3. Normalization

## 3.1 Min Max Scaler

In [71]:
pipe_norm = Pipeline([('norm', MinMaxScaler()), 
                 ('lr', LinearRegression())])

In [72]:
pipe_norm.fit(X_train, y_train)

Pipeline(steps=[('norm', MinMaxScaler()), ('lr', LinearRegression())])

In [68]:
pipe_norm.score(X_test, y_test)

0.8935163320163658

In [69]:
y_pred = pipe_norm.predict(X_test)

In [56]:
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print('Valores do Mean Absolute Error: ', mae)
print('Valores do Mean Squared Error: ', mse)
print('Valores do Root Mean Squared Error: ', rmse)
print('Valores do R2 score: ', r2)

Valores do Mean Absolute Error:  1.402312498938507
Valores do Mean Squared Error:  2.8800237300941918
Valores do Root Mean Squared Error:  1.6970632663793628
Valores do R2 score:  0.8935163320163658


# 4. Predict with new data

In [62]:
print(X_test.head())
print()
print(y_test.head())

        TV  Radio  Newspaper
95   163.3   31.6       52.9
15   195.4   47.7       52.9
30   292.9   28.3       43.2
158   11.7   36.9       45.2
128  220.3   49.0        3.2

95     16.9
15     22.4
30     21.4
158     7.3
128    24.7
Name: Sales, dtype: float64


In [60]:
input_data = np.array([['163.3', '31.6', '52.9']])

In [70]:
pipe_norm.predict(input_data)

array([16.38348211])