### Model Evaluation

This is a demonstration of evaluating a model.

#### Step 1

Load the required dependencies.


In [1]:
# Import frameworks
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('../style_Matplotlib_charts.mplstyle')
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
import pickle

#### Step 9. Multiple Variable Linear Regression

Loading the multi-variable linear regression model for testing

In [3]:
filename = 'PolynomialRegression_Models/A_PR_model_v6.sav'
model_C = pickle.load(open(filename, 'rb'))

Opening and parsing the test data CSV file, with the specific input features collected

In [9]:
mx_test = ['value_score', 'tot_rooms', 'ds_float', 'suburb_lat', 'suburb_lng']
testing_data = pd.read_csv('A_testing_data_new_2.csv')
mx_test = np.array(testing_data[poly_features])
my_test = np.array(testing_data['Target'])

#### Step 10

- Individually plotting test data features, targets and predictions from the model, on a graph for each input feature, with the purpose of visually evaluating the accuracy of the model

In [10]:
my_pred = model_C.predict(mx_test)

# plot predictions and targets vs original features    
fig,ax=plt.subplots(1,len(mx_test),figsize=(12,3),sharey=True)
for i in range(len(ax)):
    ax[i].scatter(mx_test[:,i],my_test, label = 'target')
    ax[i].set_xlabel(mx_test[i])
    ax[i].scatter(mx_test[:,i],my_pred,color="orange", label = 'predict')
    ax[i].set_ylabel("Target"); ax[i].legend();
fig.suptitle("Diabetes Disease Progress")
plt.show()

ValueError: X has 5 features, but LinearRegression is expecting 461 features as input.

#### Step 11

- Predict the score

In [6]:
test_score = model_C.score(mx_test, my_test)
print(f'Training data score: {test_score}')

Training data score: 0.419107384182133


- This is quite a low score, though already higher than polynomial and linear regression algorithms, there are improvements which can be made

#### Step 11

- You can do predictions to get a feel for the model

In [8]:
print(model_C.predict([[0.1,0.1,0.1, 0.1, 0.1, 0.1]]))

[-513284.3344197]


#### Step 12 

- Manually calculating the loss and cost of the model using testing data

In [9]:
data = {col: mx_test[:, i] for i, col in enumerate(mx_col)}
data['Target'] = my_test
data['Predicted result'] = model_C.predict(mx_test)
data['Cost'] = my_test - model_C.predict(mx_test).round(2)

table = pd.DataFrame(data)
print(table)

      num_bed  num_bath  property_size  value_score  tot_rooms  \
0         3.0       1.0       0.434959     0.176236        4.0   
1         3.0       1.0       0.507317     0.369173        4.0   
2         5.0       3.0       0.558537     0.426005        8.0   
3         4.0       1.0       0.450407     0.249957        5.0   
4         3.0       2.0       0.502439     0.389032        5.0   
...       ...       ...            ...          ...        ...   
2329      2.0       1.0       0.355285     0.314162        3.0   
2330      5.0       3.0       0.378049     0.322278        8.0   
2331      4.0       3.0       0.450407     0.352563        7.0   
2332      5.0       4.0       0.565041     0.406127        9.0   
2333      5.0       3.0       0.444715     0.253907        8.0   

      suburb_median_income   Target  Predicted result       Cost  
0                  25844.0   537000      5.433852e+03  531566.15  
1                  38792.0  1150000      1.365840e+06 -215840.27  
2     

Shows the x axis intercepts, which are the values when inputs are 0 and the coefficients which is the influence a feature has on the prediction

In [None]:
print(f'X Axis intercept: {model_C.intercept_}')
print(f'Coefficient: {model_C.coef_}')

X Axis intercept: -795725.0294041773
Coefficient: [-1.16143017e+05  2.20660965e+05 -3.32049776e+06  5.93583364e+06
  1.04517948e+05  3.51736449e+01]


AttributeError: 'NoneType' object has no attribute 'reshape'

- The differences in coefficients illuminate how the different input features  influence the model, features like value_score have a high coefficient meaning they have a large influence on the model prediction, while others have quite low predicitons or even negative ones