<style>
*
{
	text-align: justify;
	line-height: 1.5;
	font-family: "Arial", sans-serif;
	font-size: 12px;
}

h2, h3, h4, h5, h6
{
	font-family: "Arial", sans-serif;
	font-size: 12px;
	font-weight: bold;
}
h2
{
	font-size: 14px;
}
h1
{
	font-family: "Wingdings", sans-serif;
	font-size: 16px;
}
</style>

## Bovine Tuberculosis Herd Rate Prediction Model Builder

<!--
import data_analytics.github as github
print(github.create_jupyter_notebook_header("tahirawwad", "agriculture-data-analytics", "notebooks/notebook-5-01-ml-irish-bovine-tuberculosis.ipynb", "master"))
-->
<table style="margin: auto;"><tr><td><a href="https://mybinder.org/v2/gh/tahirawwad/agriculture-data-analytics/master?filepath=notebooks/notebook-5-01-ml-irish-bovine-tuberculosis.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open In Binder"/></a></td><td>online editors</td><td><a href="https://colab.research.google.com/github/tahirawwad/agriculture-data-analytics/blob/master/notebooks/notebook-5-01-ml-irish-bovine-tuberculosis.ipynbynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a></td></tr></table>

### Objective
Create a machine learning model to predict rate of Herd Tuberculosis.

### Setup

Import required third party Python libraries, import supporting functions and sets up data source file paths.

In [1]:
# Local
#!pip install -r script/requirements.txt
# Remote option
#!pip install -r https://raw.githubusercontent.com/tahirawwad/agriculture-data-analytics/requirements.txt
#Options: --quiet --user

In [14]:
from agriculture_data_analytics.project_manager import *
from agriculture_data_analytics.dataframe_labels import *
from keras_tuner.tuners import RandomSearch
from pandas import read_csv, DataFrame
from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow import keras
from tensorflow.keras import layers
from xgboost import XGBRegressor
from data_analytics import github
import numpy as np
import os
import pandas
import pickle
import shutil

In [15]:
pandas.options.display.float_format = '{:.5f}'.format

In [16]:
READ_BINARY = "rb"
WRITE_BINARY = "wb"

In [17]:
model_directory: str = "county-bovine-tb-models"
directory: str = f'./../artifacts/{model_directory}/'

In [18]:
artifact_manager: ProjectArtifactManager = ProjectArtifactManager()
artifact_manager.is_remote = True
github.display_jupyter_notebook_data_sources(
    [artifact_manager.get_county_bovine_tuberculosis_eda_filepath()])
artifact_manager.is_remote = False

https://github.com/markcrowe-com/agriculture-data-analytics/artifacts/county-bovine-tuberculosis-eda-output.csv?raw=true


### Load dataframe

In [41]:
dataframe_filepath: str = artifact_manager.get_county_bovine_tuberculosis_eda_filepath()
dataframe: DataFrame = read_csv(dataframe_filepath)

In [42]:
print("Row, Column Count:", dataframe.shape)

Row, Column Count: (319, 11)


In [43]:
dataframe

Unnamed: 0,Year,Veterinary Office,Animal Count,Herd Incidence Rate,Restricted Herds at end of Year,Restricted Herds at start of Year,Herds Tested,Herds Count,Reactors per 1000 Tests A.P.T.,Reactors to date,Tests on Animals
0,2010,Carlow,86258.00000,4.02000,28.00000,52.00000,1295.00000,1353.00000,1.14000,124.00000,108584.00000
1,2010,Cavan,202119.00000,5.32000,124.00000,257.00000,4832.00000,4915.00000,3.13000,981.00000,313822.00000
2,2010,Clare,237260.00000,5.71000,175.00000,350.00000,6134.00000,6282.00000,5.05000,1947.00000,385705.00000
3,2010,Cork North,462707.00000,4.43000,119.00000,259.00000,5849.00000,5986.00000,1.62000,1078.00000,664648.00000
4,2010,Cork South,417478.00000,6.30000,216.00000,385.00000,6107.00000,6310.00000,2.72000,1592.00000,586105.00000
...,...,...,...,...,...,...,...,...,...,...,...
314,2020,Waterford,272528.00000,2.62000,42.00000,56.00000,2141.00000,2189.00000,0.52000,196.00000,287221.00000
315,2020,Westmeath,214210.00000,6.77000,115.00000,203.00000,2998.00000,3047.00000,3.26000,1192.00000,304207.00000
316,2020,Wexford,310311.00000,4.54000,63.00000,137.00000,3016.00000,3088.00000,1.48000,701.00000,366765.00000
317,2020,Wicklow E,80707.00000,9.11000,47.00000,96.00000,1054.00000,1070.00000,3.85000,648.00000,131187.00000


### Check the types for machine learning

In [36]:
dataframe.dtypes

Year                                   int64
Veterinary Office                     object
Animal Count                         float64
Herd Incidence Rate                  float64
Restricted Herds at end of Year      float64
Restricted Herds at start of Year    float64
Herds Tested                         float64
Herds Count                          float64
Reactors per 1000 Tests A.P.T.       float64
Reactors to date                     float64
Tests on Animals                     float64
dtype: object

Veterinary Office is an object, specifically a string. We must encode it as a number for machine learning.

In [37]:
dummy_values_dataframe = dataframe[[
    "Veterinary Office"
]]

dataframe.drop('Veterinary Office',
                                          axis=1,
                                          inplace=True)

dummy_values_dataframe = pandas.get_dummies(dummy_values_dataframe,
                                            columns=["Veterinary Office"],
                                            prefix=["Veterinary Office "])

In [38]:
dataframe = dataframe.join(dummy_values_dataframe)

### Set Year as Index

In [39]:
dataframe.set_index(YEAR, drop=True, inplace=True)
dataframe.head()

Unnamed: 0_level_0,Animal Count,Herd Incidence Rate,Restricted Herds at end of Year,Restricted Herds at start of Year,Herds Tested,Herds Count,Reactors per 1000 Tests A.P.T.,Reactors to date,Tests on Animals,Veterinary Office _Carlow,...,Veterinary Office _Offaly,Veterinary Office _Roscommon,Veterinary Office _Sligo,Veterinary Office _Tipperary North,Veterinary Office _Tipperary South,Veterinary Office _Waterford,Veterinary Office _Westmeath,Veterinary Office _Wexford,Veterinary Office _Wicklow E,Veterinary Office _Wicklow W
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010,86258.0,4.02,28.0,52.0,1295.0,1353.0,1.14,124.0,108584.0,1,...,0,0,0,0,0,0,0,0,0,0
2010,202119.0,5.32,124.0,257.0,4832.0,4915.0,3.13,981.0,313822.0,0,...,0,0,0,0,0,0,0,0,0,0
2010,237260.0,5.71,175.0,350.0,6134.0,6282.0,5.05,1947.0,385705.0,0,...,0,0,0,0,0,0,0,0,0,0
2010,462707.0,4.43,119.0,259.0,5849.0,5986.0,1.62,1078.0,664648.0,0,...,0,0,0,0,0,0,0,0,0,0
2010,417478.0,6.3,216.0,385.0,6107.0,6310.0,2.72,1592.0,586105.0,0,...,0,0,0,0,0,0,0,0,0,0


In [13]:
dataframe.dtypes

Animal Count                          float64
Herd Incidence Rate                   float64
Restricted Herds at end of Year       float64
Restricted Herds at start of Year     float64
Herds Tested                          float64
Herds Count                           float64
Reactors per 1000 Tests A.P.T.        float64
Reactors to date                      float64
Tests on Animals                      float64
Veterinary Office _Carlow               uint8
Veterinary Office _Cavan                uint8
Veterinary Office _Clare                uint8
Veterinary Office _Cork North           uint8
Veterinary Office _Cork South           uint8
Veterinary Office _Donegal              uint8
Veterinary Office _Dublin               uint8
Veterinary Office _Galway               uint8
Veterinary Office _Kerry                uint8
Veterinary Office _Kildare              uint8
Veterinary Office _Kilkenny             uint8
Veterinary Office _Laois                uint8
Veterinary Office _Leitrim        

In [14]:
dataframe.isnull().sum()

Animal Count                          0
Herd Incidence Rate                   0
Restricted Herds at end of Year       0
Restricted Herds at start of Year     0
Herds Tested                          0
Herds Count                           0
Reactors per 1000 Tests A.P.T.        0
Reactors to date                      0
Tests on Animals                      0
Veterinary Office _Carlow             0
Veterinary Office _Cavan              0
Veterinary Office _Clare              0
Veterinary Office _Cork North         0
Veterinary Office _Cork South         0
Veterinary Office _Donegal            0
Veterinary Office _Dublin             0
Veterinary Office _Galway             0
Veterinary Office _Kerry              0
Veterinary Office _Kildare            0
Veterinary Office _Kilkenny           0
Veterinary Office _Laois              0
Veterinary Office _Leitrim            0
Veterinary Office _Limerick           0
Veterinary Office _Longford           0
Veterinary Office _Louth              0


In [15]:
print("dataset dimensions", dataframe.shape)

dataset dimensions (319, 38)


### Prepare Model Data

#### Select the feature set and the target

Select the feature set and the target 

In [16]:
feature_values = dataframe.drop(columns=['Herd Incidence Rate']).values
target_values = dataframe['Herd Incidence Rate'].values.reshape(-1, 1)

print(f'Features dimension: {np.shape(feature_values)[1]} Columns, {np.shape(feature_values)[0]} Rows')
print(f'Target dimension:   {np.shape(target_values)[1]} Column,  {np.shape(target_values)[0]} Rows')

Features dimension: 37 Columns, 319 Rows
Target dimension:   1 Column,  319 Rows


In [17]:
feature_values

array([[8.62580e+04, 2.80000e+01, 5.20000e+01, ..., 0.00000e+00,
        0.00000e+00, 0.00000e+00],
       [2.02119e+05, 1.24000e+02, 2.57000e+02, ..., 0.00000e+00,
        0.00000e+00, 0.00000e+00],
       [2.37260e+05, 1.75000e+02, 3.50000e+02, ..., 0.00000e+00,
        0.00000e+00, 0.00000e+00],
       ...,
       [3.10311e+05, 6.30000e+01, 1.37000e+02, ..., 1.00000e+00,
        0.00000e+00, 0.00000e+00],
       [8.07070e+04, 4.70000e+01, 9.60000e+01, ..., 0.00000e+00,
        1.00000e+00, 0.00000e+00],
       [4.47430e+04, 4.60000e+01, 7.30000e+01, ..., 0.00000e+00,
        0.00000e+00, 1.00000e+00]])

In [18]:
target_values

array([[ 4.02],
       [ 5.32],
       [ 5.71],
       [ 4.43],
       [ 6.3 ],
       [ 3.34],
       [ 7.16],
       [ 3.98],
       [ 2.22],
       [ 5.23],
       [ 6.19],
       [ 5.46],
       [ 2.74],
       [ 3.12],
       [ 3.43],
       [ 6.01],
       [ 1.95],
       [ 8.  ],
       [ 3.56],
       [ 6.47],
       [ 4.69],
       [ 3.11],
       [ 6.92],
       [ 5.23],
       [ 5.55],
       [ 6.75],
       [ 8.08],
       [15.72],
       [ 4.81],
       [ 4.98],
       [ 3.96],
       [ 4.68],
       [ 5.45],
       [ 5.8 ],
       [ 3.38],
       [ 5.43],
       [ 3.34],
       [ 2.67],
       [ 3.74],
       [ 6.7 ],
       [ 4.48],
       [ 2.35],
       [ 2.83],
       [ 4.28],
       [ 4.33],
       [ 1.88],
       [ 7.65],
       [ 2.7 ],
       [ 4.68],
       [ 3.95],
       [ 2.57],
       [ 4.33],
       [ 3.94],
       [ 5.14],
       [ 5.24],
       [ 9.47],
       [11.1 ],
       [ 7.6 ],
       [ 4.71],
       [ 3.86],
       [ 5.06],
       [ 5.09],
       [

In [19]:
target_values

array([[ 4.02],
       [ 5.32],
       [ 5.71],
       [ 4.43],
       [ 6.3 ],
       [ 3.34],
       [ 7.16],
       [ 3.98],
       [ 2.22],
       [ 5.23],
       [ 6.19],
       [ 5.46],
       [ 2.74],
       [ 3.12],
       [ 3.43],
       [ 6.01],
       [ 1.95],
       [ 8.  ],
       [ 3.56],
       [ 6.47],
       [ 4.69],
       [ 3.11],
       [ 6.92],
       [ 5.23],
       [ 5.55],
       [ 6.75],
       [ 8.08],
       [15.72],
       [ 4.81],
       [ 4.98],
       [ 3.96],
       [ 4.68],
       [ 5.45],
       [ 5.8 ],
       [ 3.38],
       [ 5.43],
       [ 3.34],
       [ 2.67],
       [ 3.74],
       [ 6.7 ],
       [ 4.48],
       [ 2.35],
       [ 2.83],
       [ 4.28],
       [ 4.33],
       [ 1.88],
       [ 7.65],
       [ 2.7 ],
       [ 4.68],
       [ 3.95],
       [ 2.57],
       [ 4.33],
       [ 3.94],
       [ 5.14],
       [ 5.24],
       [ 9.47],
       [11.1 ],
       [ 7.6 ],
       [ 4.71],
       [ 3.86],
       [ 5.06],
       [ 5.09],
       [

#### Define Training and Test Sets

The data is liner data and should not be shuffled. Set test set size to 20%.

In [20]:
test_size: float = 0.8
X_train, X_test, Y_train, Y_test = train_test_split(feature_values,
                                                    target_values,
                                                    test_size=test_size,
                                                    shuffle=False)

#### Scale & Transform

In [21]:
features_scaler = MinMaxScaler()

features_scaler.fit(X_train)

xtest_scale = features_scaler.transform(X_test)
xtrain_scale = features_scaler.transform(X_train)

In [22]:
target_scaler = MinMaxScaler()

target_scaler.fit(Y_train)

ytrain_scale = target_scaler.transform(Y_train)
ytest_scale = target_scaler.transform(Y_test)

##### Save Scalers

In [23]:
filename: str = 'features-scaler.pickle'
features_scaler_filepath: str = f'{directory}{filename}'

with open(features_scaler_filepath, WRITE_BINARY) as file:
    pickle.dump(features_scaler, file)

In [38]:
filename: str = 'target-scaler.pickle'
target_scaler_filepath: str = f'{directory}{filename}'

with open(target_scaler_filepath, WRITE_BINARY) as file:
    pickle.dump(target_scaler, file)

### Model Testing

#### Score Board

In [25]:
model_scores_dataframe = DataFrame(
    columns=['Model', 'Mean Absolute Error Score(%)'])

#### Random Forest Regressor

##### Train Model

Hyper parameter tuning via Grid Search Cross Validation

In [26]:
xtrain_scale

array([[0.13938889, 0.07655502, 0.07013575, ..., 0.        , 0.        ,
        0.        ],
       [0.38183374, 0.53588517, 0.53393665, ..., 0.        , 0.        ,
        0.        ],
       [0.45536802, 0.77990431, 0.74434389, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.49287905, 0.66507177, 0.65384615, ..., 0.        , 0.        ,
        0.        ],
       [1.        , 0.73205742, 0.60859729, ..., 0.        , 0.        ,
        0.        ],
       [0.88034594, 1.        , 0.78280543, ..., 0.        , 0.        ,
        0.        ]])

In [27]:
random_forest_regressor = RandomForestRegressor()

random_forest_regressor_paramaters_grid = {
    'bootstrap': [True, False],
    'criterion': ['squared_error', 'absolute_error', 'poisson'],
    'max_depth': [1, 3, 5],
    'max_features': ["auto", "sqrt", "log2"],
    'n_estimators': [100, 500, 800],  # Number of trees
}

grid_search_cv = GridSearchCV(
    estimator=random_forest_regressor,
    param_grid=random_forest_regressor_paramaters_grid,
    n_jobs=-1,  # Use all processors on CPU
    cv=5)  #cross validation 5 fold of datasets

grid_search_cv.fit(xtrain_scale, ytrain_scale.reshape(-1))

GridSearchCV(cv=5, estimator=RandomForestRegressor(), n_jobs=-1,
             param_grid={'bootstrap': [True, False],
                         'criterion': ['squared_error', 'absolute_error',
                                       'poisson'],
                         'max_depth': [1, 3, 5],
                         'max_features': ['auto', 'sqrt', 'log2'],
                         'n_estimators': [100, 500, 800]})

##### Test and Score Model

In [28]:
print('Best training model', grid_search_cv.best_estimator_)
print('Best training model score, coefficient of determination R squared',
      grid_search_cv.best_score_)

y_predict = target_scaler.inverse_transform(
    grid_search_cv.predict(xtest_scale).reshape(-1, 1))
mae_score = mean_absolute_error(Y_test, y_predict)

Best training model RandomForestRegressor(bootstrap=False, max_depth=5, max_features='sqrt')
Best training model score, coefficient of determination R squared 0.6608566936156757


##### Save Model Score

In [29]:
values = ['Random Forest Regressor', mae_score]
model_scores_dataframe.loc[len(model_scores_dataframe)] = values
model_scores_dataframe.head()

Unnamed: 0,Model,Mean Absolute Error Score(%)
0,Random Forest Regressor,1.10382


#### XGBOOST Regressor

##### Train Model

In [30]:
# define XGBRegressor
xgb_regressor_milk = XGBRegressor(random_state=2021)

# define parameters space to loop over
params_xgb_milk = {
    'n_estimators': [20, 40, 80, 160, 340, 500],
    'max_depth': [3, 6, 9],
    'gamma': [0.01, 0.1],
    'learning_rate': [0.001, 0.01, 0.1, 1]
}

# Hyper parameter tuning via Grid Search Cross Validation
grid_xgb_milk = GridSearchCV(
    estimator=xgb_regressor_milk,
    param_grid=params_xgb_milk,
    #n_jobs=-1,
    scoring=['r2', 'neg_root_mean_squared_error'],
    refit='r2',
    n_jobs=-1,
    cv=5,
    verbose=4)

# fit grid to training scaled set
grid_xgb_milk.fit(xtrain_scale, ytrain_scale)

# print best training model & R squared score
print('Best training model ', grid_xgb_milk.best_estimator_)
print('Best model Parameters', grid_xgb_milk.best_params_)
print('Best training model score, coefficient of determination R squared',
      grid_xgb_milk.best_score_)

Fitting 5 folds for each of 144 candidates, totalling 720 fits
Best training model  XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
             gamma=0.01, gpu_id=-1, importance_type=None,
             interaction_constraints='', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=340, n_jobs=8,
             num_parallel_tree=1, predictor='auto', random_state=2021,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)
Best model Parameters {'gamma': 0.01, 'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 340}
Best training model score, coefficient of determination R squared 0.4370026552468394


##### Test and Score Model

In [31]:
y_predict = target_scaler.inverse_transform(
    grid_xgb_milk.predict(xtest_scale).reshape(-1, 1))

mae_score = mean_absolute_error(Y_test, y_predict)

##### Save Model Score

In [32]:
model_scores_dataframe.loc[len(model_scores_dataframe)] = ['XGBOOST', mae_score]
model_scores_dataframe.head()

Unnamed: 0,Model,Mean Absolute Error Score(%)
0,Random Forest Regressor,1.10382
1,XGBOOST,1.16595


#### ANN Artificial Neural Network

##### Train Model

In [33]:
#Training & Keras Parameter Tuning

temp_directory: str = './../temp/ANN-tuner/'


# Define ANN model with Hyper parameter variable
def build_model(hp):
    model = keras.Sequential()
    for i in range(hp.Int('num_layers', 2, 23)):
        model.add(
            layers.Dense(units=hp.Int('units_' + str(i),
                                      min_value=23,
                                      max_value=600,
                                      step=32),
                         activation='relu'))
        model.add(layers.Dense(1, activation='linear'))
        model.compile(optimizer=keras.optimizers.Adam(
            hp.Choice('learning_rate', [1e-2, 1e-3, 1e-4])),
                      loss='mean_absolute_error',
                      metrics=['mean_absolute_error'])
        return model


if os.path.isdir(temp_directory):
    try:
        shutil.rmtree(temp_directory)
    except OSError as exception:
        print(f"Error: {exception.filename} - {exception.strerror}.")

# create a directory to store each iteration of modelling
tuner = RandomSearch(build_model,
                     objective='val_mean_absolute_error',
                     max_trials=5,
                     executions_per_trial=3,
                     directory=temp_directory,
                     project_name='Milk production')

# Defined parameter space to search in
tuner.search_space_summary()

# train trial models and compare with validation set
tuner.search(xtrain_scale,
             ytrain_scale,
             epochs=50,
             validation_data=(xtest_scale, ytest_scale))

# print best 10 models according to val_mean_absolute_error
print('\n')
tuner.results_summary()

# get best model from training trials
bestANNModel = tuner.get_best_models(num_models=1)[0]

# fit best model to training scaled data and scaled test data
bestANNModel.fit(xtrain_scale,
                 ytrain_scale,
                 epochs=50,
                 validation_data=(xtest_scale, ytest_scale))
#Clean up
if os.path.isdir(temp_directory):
    try:
        shutil.rmtree(temp_directory)
    except OSError as exception:
        print(f"Error: {exception.filename} - {exception.strerror}.")

Trial 5 Complete [00h 00m 11s]
val_mean_absolute_error: 0.05927479391296705

Best val_mean_absolute_error So Far: 0.05437584097186724
Total elapsed time: 00h 01m 19s
INFO:tensorflow:Oracle triggered exit


Results summary
Results in ./../temp/ANN-tuner/Milk production
Showing 10 best trials
Objective(name='val_mean_absolute_error', direction='min')
Trial summary
Hyperparameters:
num_layers: 14
units_0: 215
learning_rate: 0.01
Score: 0.05437584097186724
Trial summary
Hyperparameters:
num_layers: 4
units_0: 55
learning_rate: 0.01
Score: 0.05927479391296705
Trial summary
Hyperparameters:
num_layers: 22
units_0: 279
learning_rate: 0.001
Score: 0.06580159316460292
Trial summary
Hyperparameters:
num_layers: 18
units_0: 471
learning_rate: 0.0001
Score: 0.07589597503344218
Trial summary
Hyperparameters:
num_layers: 9
units_0: 407
learning_rate: 0.0001
Score: 0.0782822494705518
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 1

##### Test and Score Model

In [34]:
# Predict Milk Production and un-scale back to original values

y_predict = target_scaler.inverse_transform(
    bestANNModel.predict(xtest_scale).reshape(-1, 1))

print('predicted milk production values \n', y_predict)
print('actual milk production values \n', Y_test)

# Calculate Mean Absolute Error
mae_score = mean_absolute_error(Y_test, y_predict)
#print(MAE_xgb)

predicted milk production values 
 [[ 3.9955287]
 [ 6.327525 ]
 [ 2.534208 ]
 [ 2.630229 ]
 [ 4.807317 ]
 [ 6.6609488]
 [ 4.9839983]
 [ 2.3957696]
 [ 3.3127103]
 [ 3.8197966]
 [ 3.550305 ]
 [ 1.9722961]
 [ 7.7193027]
 [ 2.9001968]
 [ 4.3407664]
 [ 4.6506767]
 [ 3.7580168]
 [ 4.641457 ]
 [ 3.3300593]
 [ 5.0384455]
 [ 6.1742516]
 [ 6.296391 ]
 [11.716482 ]
 [ 5.9296327]
 [ 3.7409413]
 [ 4.502376 ]
 [ 4.8580513]
 [ 4.568509 ]
 [ 4.69051  ]
 [ 3.397685 ]
 [ 6.999357 ]
 [ 2.6328554]
 [ 2.5847578]
 [ 4.6601524]
 [ 5.2183337]
 [ 4.503792 ]
 [ 2.4357126]
 [ 3.1486387]
 [ 3.1311932]
 [ 3.7285075]
 [ 1.9706503]
 [ 6.981721 ]
 [ 3.1721892]
 [ 4.260887 ]
 [ 4.3662143]
 [ 2.6511135]
 [ 4.90421  ]
 [ 2.95642  ]
 [ 4.754788 ]
 [ 4.9307137]
 [ 6.806156 ]
 [ 9.128482 ]
 [ 5.184237 ]
 [ 3.3168488]
 [ 3.6170993]
 [ 5.059202 ]
 [ 4.526993 ]
 [ 3.6838682]
 [ 2.8880286]
 [ 6.208103 ]
 [ 2.7835135]
 [ 2.723317 ]
 [ 5.167187 ]
 [ 4.3743353]
 [ 4.275838 ]
 [ 2.8210168]
 [ 2.8566432]
 [ 2.8518467]
 [ 2.9675229]

##### Save Model Score

In [35]:
model_scores_dataframe.loc[len(model_scores_dataframe)] = ['ANN', mae_score]
model_scores_dataframe.head()

Unnamed: 0,Model,Mean Absolute Error Score(%)
0,Random Forest Regressor,1.10382
1,XGBOOST,1.16595
2,ANN,0.67648


### Save Artifacts

Save trained model into binary pickle file to use the model later with new input data from web app

#### Save Models

In [37]:
filename: str = 'ann-model.h5'
ann_filepath: str = f'{directory}{filename}'
bestANNModel.save(ann_filepath, save_format='h5')