# Introduction to Scikit-Learn (sklearn)

0. An end to end sklearn workflow
1. Getting the data ready
2. Choose the right estimator/algorithm for our problems
3. Fit the model/algorithm and use it to make predictions or our data
4. =>Evaluating the model 
5. Improve the model
6. Save and load trained model
7. Putting it all together!

### 4.4 Regression Model Evaluation Metrix
1. R^2 (r-squared) or coeficient of determination
2. Mean absolute error (MAE)
3. Mean squared error (MSE)

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston

In [34]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

# load the dataset
boston = load_boston()
boston_df = pd.DataFrame(boston['data'], columns=boston['feature_names'])
boston_df['target'] = pd.Series(boston['target'])

# split into feature and label
X = boston_df.drop("target", axis=1)
y = boston_df['target']

np.random.seed(42)
# create train and test split
X_train, X_test, y_train, y_test = train_test_split(X, y)

# select model
model = RandomForestRegressor()

# fit the model
model.fit(X_train, y_train);

# predicted value
y_pred = model.predict(X_test)

#### Default Score method

In [35]:
score = model.score(X_test, y_test)
score

0.8471696005277883

### R^2 Method
Shows how close the actual values of the regression line 

In [69]:
from sklearn.metrics import r2_score
score = r2_score(y_test, y_pred)

0.8471696005277883

In [74]:
# custom r2 score by me
def r2_score_method(y_test, y_pred):
    
    squared_error_with_line = np.sum((y_pred - y_test)**2)
    squared_error_with_mean = np.sum((y_pred.mean() - y_test)**2)
    
    how_far_they_are = squared_error_with_line / squared_error_with_mean
    how_close_they_are = 1 - how_far_they_are
    
    return how_close_they_are

In [75]:
score = r2_score_method(y_test, y_pred)

In [81]:
y_pred_mean = np.full(len(y_test), y_test.mean())


In [88]:
r2_score(y_test, y_pred_mean) == 0

True

## Mean Absolute Error
mean of difference of actual and predicted values

In [89]:
from sklearn.metrics  import mean_absolute_error
mean_absolute_error(y_test, y_pred)

2.123362204724411

In [96]:
def mean_abs_error(y_test, y_pred):
    y_pred_mean = y_test.mean()
    mae = np.mean(np.abs(y_pred - y_test))
    
    return mae
mean_abs_error(y_test, y_pred)

2.1233622047244114

## Mean Squared Error


In [98]:
from sklearn.metrics  import mean_squared_error
mean_squared_error(y_test, y_pred)


10.70227633858268

In [101]:
def mean_sq_error(y_test, y_pred):
    y_pred_mean = y_test.mean()
    mse = np.mean(np.square(y_pred - y_test))
    
    return mse

mean_sq_error(y_test, y_pred)

10.702276338582683