## Beef & cattle export model training

<!--
import data_analytics.github as github
print(github.create_jupyter_notebook_header("markcrowe-com", "agriculture-data-analytics", "notebooks/notebook-3-01-ml-beef-exports.ipynb", "master"))
-->
<table style="margin: auto;"><tr><td><a href="https://mybinder.org/v2/gh/markcrowe-com/agriculture-data-analytics/master?filepath=notebooks/notebook-3-01-ml-beef-exports.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open In Binder"/></a></td><td>online editors</td><td><a href="https://colab.research.google.com/github/markcrowe-com/agriculture-data-analytics/blob/master/notebooks/notebook-3-01-ml-beef-exports.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a></td></tr></table>

The objective is to build a machine learning (ML) model of the [bovine-tuberculosis-eda-output.csv](./../artifacts/bovine-tuberculosis-eda-output.csv).

### Setup

Import required third party Python libraries, import supporting functions and sets up data source file paths.

In [1]:
# Local
#!pip install -r script/requirements.txt 
# Remote option
#!pip install -r https://github.com/markcrowe-com/agriculture-data-analytics/blob/master/notebooks/script/requirements.txt --quiet --user
# Command Line options include:--quiet, --user

In [3]:
from agriculture_data_analytics.project_manager import *
from agriculture_data_analytics.dataframe_labels import *
import pandas 
import numpy
from sklearn.preprocessing import MinMaxScaler
#import data_analytics.exploratory_data_analysis as eda
#import data_analytics.exploratory_data_analysis_reports as eda_reports

In [None]:
artifact_manager: ProjectArtifactManager = ProjectArtifactManager()
artifact_manager.is_remote = True
github.display_jupyter_notebook_data_sources(
    [artifact_manager.get_cattle_beef_exports_eda_filepath()])
artifact_manager.is_remote = False

### Load dataframe

In [2]:
df = pd.read_csv("./../artifacts/cattle-beef-exports-eda-output.csv")
print("data dimensions \n",df.shape)
print()
print("data column info \n",df.info)

data dimensions 
 (32, 57)

data column info 
 <bound method DataFrame.info of     Unnamed: 0  Year          UNIT  Agricultural Output at Basic Prices  \
0            0  1990  Euro Million                               5200.0   
1            1  1991  Euro Million                               4994.4   
2            2  1992  Euro Million                               5374.0   
3            3  1993  Euro Million                               5625.9   
4            4  1994  Euro Million                               5781.5   
5            5  1995  Euro Million                               6035.5   
6            6  1996  Euro Million                               6134.7   
7            7  1997  Euro Million                               5753.1   
8            8  1998  Euro Million                               5831.7   
9            9  1999  Euro Million                               5651.4   
10          10  2000  Euro Million                               5985.5   
11          11  2001 

In [3]:
df.head()
#eda_reports.print_dataframe_analysis_report(df)

Unnamed: 0.1,Unnamed: 0,Year,UNIT,Agricultural Output at Basic Prices,All Cereals,All Crops,All Livestock,All Livestock Products,All Livestock Products - Milk,All Livestock Products Other Products (excluding Milk),...,Livestock - Horses,Livestock - Pig,Livestock - Poultry,Livestock - Sheep,Net Value Added at Basic Prices,Operating Surplus,Other Subsidies Less Taxes on Production,Subsidies less Taxes on Products,Subsidies on Products,Taxes on Products
0,0,1990,Euro Million,5200.0,216.7,1123.5,2201.9,1360.0,1316.3,43.7,...,83.2,237.2,117.1,188.8,2306.3,1943.5,14.8,333.9,408.9,75.0
1,1,1991,Euro Million,4994.4,218.6,1127.3,2114.0,1301.9,1258.9,43.0,...,70.2,242.1,125.5,198.9,2126.2,1774.0,10.3,279.1,357.3,78.2
2,2,1992,Euro Million,5374.0,235.4,1154.4,2262.5,1410.7,1373.1,37.6,...,61.4,280.4,124.5,192.8,2484.9,2179.4,31.4,366.6,446.0,79.5
3,3,1993,Euro Million,5625.9,165.7,1101.7,2449.6,1476.7,1439.0,37.7,...,84.5,257.9,116.0,232.4,2560.6,2247.9,26.3,398.4,466.4,68.0
4,4,1994,Euro Million,5781.5,127.3,1157.0,2319.3,1487.6,1446.2,41.4,...,75.9,264.5,134.6,230.2,2513.7,2278.5,110.0,612.3,666.0,53.7


## Production of Milk

In [4]:
## Extract milk production dataset
# drop redundunt columns
df = df.drop('Unnamed: 0',axis = 1)

# extract milk dataset
df_milk = df[['Year',
#              'UNIT',
              'All Livestock Products - Milk',
              'Taxes on Products',
              'Subsidies on Products',
              'Compensation of Employees',
              'Contract Work',
              'Entrepreneurial Income',
              'Factor Income',
              'Fixed Capital Consumption - Farm Buildings',
              'Fixed Capital Consumption - Machinery, Equipment, etc',
              'Interest less FISIM',
              'Operating Surplus',
              'Livestock - Cattle',
              'Livestock - Sheep',
              'Land Rental',
              'Intermediate Consumption - Contract Work',
              'Intermediate Consumption - Crop Protection Products',
              'Intermediate Consumption - Energy and Lubricants',
              'Intermediate Consumption - Feeding Stuffs',
              'Intermediate Consumption - Fertilisers',
              'Intermediate Consumption - Financial Intermediation Services Indirect',
              'Intermediate Consumption - Forage Plants',
              'Intermediate Consumption - Maintenance and Repairs',
              'Intermediate Consumption - Seeds',
              #'Intermediate Consumption - Services',
              'Intermediate Consumption - Veterinary Expenses',
              'Intermediate Consumption - Other Goods (Detergents, Small Tools, etc)',
              #'Intermediate Consumption - Other Goods and Services'
              
             ]]
# Assign year as index
df_milk.set_index('Year',drop=True,inplace=True)

print("Milk production dataset dimenssions \n", df_milk.shape)

Milk production dataset dimenssions 
 (32, 25)


In [5]:
#eda_reports.print_dataframe_analysis_report(df_milk)

### Define 20% Training set 80% Test set

In [13]:
# define target & feature variables

X = df_milk.iloc[:,2:].values
Y = df_milk.iloc[:,1].values.reshape(-1,1)
print('features shape ',np.shape(X))
print('target shape ',np.shape(Y))

# impute mean value for NA
from sklearn.impute import SimpleImputer
imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
X = imp_mean.fit_transform(X)
Y = imp_mean.fit_transform(Y)


# split train test split 20
from sklearn.model_selection import train_test_split
X_train, X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2,random_state=2021)
print()
print('x_train shape ', X_train.shape)
print('y_train shape ', Y_train.shape)
print()
print('x_test shape ', X_test.shape)
print('y_test shape ', Y_test.shape)

features shape  (32, 23)
target shape  (32, 1)

x_train shape  (25, 23)
y_train shape  (25, 1)

x_test shape  (7, 23)
y_test shape  (7, 1)


In [14]:
scaler_x = MinMaxScaler()
scaler_y = MinMaxScaler()
print(scaler_x.fit(X_train))
xtrain_scale=scaler_x.transform(X_train)

print(scaler_y.fit(Y_train))
ytrain_scale=scaler_y.transform(Y_train)

print(scaler_x.fit(X_test))
xtest_scale=scaler_x.transform(X_test)

print(scaler_x.fit(Y_test))
ytest_scale=scaler_y.transform(Y_test)

MinMaxScaler()
MinMaxScaler()
MinMaxScaler()
MinMaxScaler()


In [11]:
# fill NAN values with the average  mean scaled
#np.isnan(np.sum(xtrain_scale))
#xtrain_scale[np.isnan(xtrain_scale)==True]= np.nanmean(xtrain_scale)
#np.isnan(np.sum(ytrain_scale))
#ytrain_scale[np.isnan(ytrain_scale)==True]= np.nanmean(ytrain_scale)
#np.isnan(np.sum(xtest_scale))
#xtest_scale[np.isnan(xtest_scale)==True]= np.nanmean(xtest_scale)
#np.isnan(np.sum(ytest_scale))
#ytest_scale[np.isnan(ytest_scale)==True]= np.nanmean(ytest_scale)

### Model 1 RandomForest Regressor

In [19]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor


In [41]:
rf_model_milk = RandomForestRegressor(random_state=2021)

In [71]:
params_rf_milk = {'n_estimators':[100,200,500],
                  'criterion':['squared_error', 'absolute_error', 'poisson'],
                  'max_features':["auto"]
                  
            }

In [72]:
GS_rf_milk = GridSearchCV(estimator= rf_model_milk,
                     param_grid= params_rf_milk
                     )

In [73]:
GS_rf_milk.fit(xtrain_scale,ytrain_scale.reshape(-1))

Traceback (most recent call last):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 387, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  

Traceback (most recent call last):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 387, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  

Traceback (most recent call last):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 387, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  

Traceback (most recent call last):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 387, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  

Traceback (most recent call last):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 387, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  

Traceback (most recent call last):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 387, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  

Traceback (most recent call last):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 387, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  

Traceback (most recent call last):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 387, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  

Traceback (most recent call last):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 387, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/admin/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  

GridSearchCV(estimator=RandomForestRegressor(random_state=2021),
             param_grid={'criterion': ['squared_error', 'absolute_error',
                                       'poisson'],
                         'max_features': ['auto'],
                         'n_estimators': [100, 200, 500, 1000]})

In [74]:
# print best model
print(GS_rf_milk.best_estimator_)
print('Best model score', GS_rf_milk.best_score_)

RandomForestRegressor(criterion='poisson', n_estimators=1000, random_state=2021)
Best model score -2.151782053681818


### Model 2 XGBOOST Regressor

In [15]:
# xgboost
!pip install xgboost



In [16]:
from xgboost import XGBRegressor
xgb_model_milk = XGBRegressor(random_state=2021)


In [20]:
# make a search space of parameters to loop over

params_xgb_milk = {'n_estimators':[20,40,80,160,340,500],
             'max_depth':[3,6,9],
             'gamma':[0.01,0.1],
             'learning_rate':[0.001,0.01,0.1,1]
             }

In [21]:
GS_xgb_milk = GridSearchCV(estimator=xgb_model_milk,
                     param_grid=params_xgb_milk,
                     #n_jobs=-1,
                     scoring=['r2','neg_root_mean_squared_error'],
                     refit= 'r2',
                     cv=5,
                     verbose=4
                     )

In [22]:
GS_xgb_milk.fit(xtrain_scale,ytrain_scale);

Fitting 5 folds for each of 144 candidates, totalling 720 fits
[CV 1/5] END gamma=0.01, learning_rate=0.001, max_depth=3, n_estimators=20; neg_root_mean_squared_error: (test=-0.280) r2: (test=-1.607) total time=   0.1s
[CV 2/5] END gamma=0.01, learning_rate=0.001, max_depth=3, n_estimators=20; neg_root_mean_squared_error: (test=-0.305) r2: (test=-23.320) total time=   0.0s
[CV 3/5] END gamma=0.01, learning_rate=0.001, max_depth=3, n_estimators=20; neg_root_mean_squared_error: (test=-0.186) r2: (test=-2.664) total time=   0.0s
[CV 4/5] END gamma=0.01, learning_rate=0.001, max_depth=3, n_estimators=20; neg_root_mean_squared_error: (test=-0.314) r2: (test=-8.101) total time=   0.0s
[CV 5/5] END gamma=0.01, learning_rate=0.001, max_depth=3, n_estimators=20; neg_root_mean_squared_error: (test=-0.364) r2: (test=-0.126) total time=   0.0s
[CV 1/5] END gamma=0.01, learning_rate=0.001, max_depth=3, n_estimators=40; neg_root_mean_squared_error: (test=-0.276) r2: (test=-1.527) total time=   0.0s


[CV 4/5] END gamma=0.01, learning_rate=0.001, max_depth=6, n_estimators=340; neg_root_mean_squared_error: (test=-0.298) r2: (test=-7.200) total time=   0.2s
[CV 5/5] END gamma=0.01, learning_rate=0.001, max_depth=6, n_estimators=340; neg_root_mean_squared_error: (test=-0.347) r2: (test=-0.022) total time=   0.1s
[CV 1/5] END gamma=0.01, learning_rate=0.001, max_depth=6, n_estimators=500; neg_root_mean_squared_error: (test=-0.198) r2: (test=-0.299) total time=   0.3s
[CV 2/5] END gamma=0.01, learning_rate=0.001, max_depth=6, n_estimators=500; neg_root_mean_squared_error: (test=-0.199) r2: (test=-9.278) total time=   0.3s
[CV 3/5] END gamma=0.01, learning_rate=0.001, max_depth=6, n_estimators=500; neg_root_mean_squared_error: (test=-0.176) r2: (test=-2.295) total time=   0.3s
[CV 4/5] END gamma=0.01, learning_rate=0.001, max_depth=6, n_estimators=500; neg_root_mean_squared_error: (test=-0.299) r2: (test=-7.228) total time=   0.3s
[CV 5/5] END gamma=0.01, learning_rate=0.001, max_depth=6,

[CV 3/5] END gamma=0.01, learning_rate=0.01, max_depth=3, n_estimators=160; neg_root_mean_squared_error: (test=-0.248) r2: (test=-5.553) total time=   0.1s
[CV 4/5] END gamma=0.01, learning_rate=0.01, max_depth=3, n_estimators=160; neg_root_mean_squared_error: (test=-0.352) r2: (test=-10.416) total time=   0.1s
[CV 5/5] END gamma=0.01, learning_rate=0.01, max_depth=3, n_estimators=160; neg_root_mean_squared_error: (test=-0.340) r2: (test=0.019) total time=   0.1s
[CV 1/5] END gamma=0.01, learning_rate=0.01, max_depth=3, n_estimators=340; neg_root_mean_squared_error: (test=-0.096) r2: (test=0.695) total time=   0.2s
[CV 2/5] END gamma=0.01, learning_rate=0.01, max_depth=3, n_estimators=340; neg_root_mean_squared_error: (test=-0.087) r2: (test=-0.963) total time=   0.1s
[CV 3/5] END gamma=0.01, learning_rate=0.01, max_depth=3, n_estimators=340; neg_root_mean_squared_error: (test=-0.303) r2: (test=-8.768) total time=   0.2s
[CV 4/5] END gamma=0.01, learning_rate=0.01, max_depth=3, n_estim

[CV 5/5] END gamma=0.01, learning_rate=0.01, max_depth=9, n_estimators=80; neg_root_mean_squared_error: (test=-0.337) r2: (test=0.035) total time=   0.1s
[CV 1/5] END gamma=0.01, learning_rate=0.01, max_depth=9, n_estimators=160; neg_root_mean_squared_error: (test=-0.110) r2: (test=0.600) total time=   0.1s
[CV 2/5] END gamma=0.01, learning_rate=0.01, max_depth=9, n_estimators=160; neg_root_mean_squared_error: (test=-0.096) r2: (test=-1.392) total time=   0.1s
[CV 3/5] END gamma=0.01, learning_rate=0.01, max_depth=9, n_estimators=160; neg_root_mean_squared_error: (test=-0.245) r2: (test=-5.380) total time=   0.1s
[CV 4/5] END gamma=0.01, learning_rate=0.01, max_depth=9, n_estimators=160; neg_root_mean_squared_error: (test=-0.352) r2: (test=-10.410) total time=   0.1s
[CV 5/5] END gamma=0.01, learning_rate=0.01, max_depth=9, n_estimators=160; neg_root_mean_squared_error: (test=-0.340) r2: (test=0.019) total time=   0.1s
[CV 1/5] END gamma=0.01, learning_rate=0.01, max_depth=9, n_estimat

[CV 3/5] END gamma=0.01, learning_rate=0.1, max_depth=6, n_estimators=40; neg_root_mean_squared_error: (test=-0.298) r2: (test=-8.438) total time=   0.0s
[CV 4/5] END gamma=0.01, learning_rate=0.1, max_depth=6, n_estimators=40; neg_root_mean_squared_error: (test=-0.406) r2: (test=-14.211) total time=   0.0s
[CV 5/5] END gamma=0.01, learning_rate=0.1, max_depth=6, n_estimators=40; neg_root_mean_squared_error: (test=-0.346) r2: (test=-0.017) total time=   0.0s
[CV 1/5] END gamma=0.01, learning_rate=0.1, max_depth=6, n_estimators=80; neg_root_mean_squared_error: (test=-0.093) r2: (test=0.713) total time=   0.1s
[CV 2/5] END gamma=0.01, learning_rate=0.1, max_depth=6, n_estimators=80; neg_root_mean_squared_error: (test=-0.083) r2: (test=-0.795) total time=   0.1s
[CV 3/5] END gamma=0.01, learning_rate=0.1, max_depth=6, n_estimators=80; neg_root_mean_squared_error: (test=-0.297) r2: (test=-8.404) total time=   0.1s
[CV 4/5] END gamma=0.01, learning_rate=0.1, max_depth=6, n_estimators=80; ne

[CV 5/5] END gamma=0.01, learning_rate=1, max_depth=3, n_estimators=40; neg_root_mean_squared_error: (test=-0.339) r2: (test=0.024) total time=   0.0s
[CV 1/5] END gamma=0.01, learning_rate=1, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.108) r2: (test=0.612) total time=   0.1s
[CV 2/5] END gamma=0.01, learning_rate=1, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.104) r2: (test=-1.811) total time=   0.0s
[CV 3/5] END gamma=0.01, learning_rate=1, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.330) r2: (test=-10.561) total time=   0.0s
[CV 4/5] END gamma=0.01, learning_rate=1, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.405) r2: (test=-14.120) total time=   0.0s
[CV 5/5] END gamma=0.01, learning_rate=1, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.339) r2: (test=0.024) total time=   0.0s
[CV 1/5] END gamma=0.01, learning_rate=1, max_depth=3, n_estimators=160; neg_root_mean_sq

[CV 2/5] END gamma=0.01, learning_rate=1, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.104) r2: (test=-1.811) total time=   0.0s
[CV 3/5] END gamma=0.01, learning_rate=1, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.330) r2: (test=-10.561) total time=   0.0s
[CV 4/5] END gamma=0.01, learning_rate=1, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.405) r2: (test=-14.120) total time=   0.0s
[CV 5/5] END gamma=0.01, learning_rate=1, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.339) r2: (test=0.024) total time=   0.0s
[CV 1/5] END gamma=0.01, learning_rate=1, max_depth=9, n_estimators=80; neg_root_mean_squared_error: (test=-0.108) r2: (test=0.612) total time=   0.1s
[CV 2/5] END gamma=0.01, learning_rate=1, max_depth=9, n_estimators=80; neg_root_mean_squared_error: (test=-0.104) r2: (test=-1.811) total time=   0.1s
[CV 3/5] END gamma=0.01, learning_rate=1, max_depth=9, n_estimators=80; neg_root_mean_sq

[CV 3/5] END gamma=0.1, learning_rate=0.001, max_depth=6, n_estimators=40; neg_root_mean_squared_error: (test=-0.184) r2: (test=-2.585) total time=   0.0s
[CV 4/5] END gamma=0.1, learning_rate=0.001, max_depth=6, n_estimators=40; neg_root_mean_squared_error: (test=-0.312) r2: (test=-8.003) total time=   0.0s
[CV 5/5] END gamma=0.1, learning_rate=0.001, max_depth=6, n_estimators=40; neg_root_mean_squared_error: (test=-0.362) r2: (test=-0.115) total time=   0.0s
[CV 1/5] END gamma=0.1, learning_rate=0.001, max_depth=6, n_estimators=80; neg_root_mean_squared_error: (test=-0.267) r2: (test=-1.376) total time=   0.1s
[CV 2/5] END gamma=0.1, learning_rate=0.001, max_depth=6, n_estimators=80; neg_root_mean_squared_error: (test=-0.289) r2: (test=-20.832) total time=   0.1s
[CV 3/5] END gamma=0.1, learning_rate=0.001, max_depth=6, n_estimators=80; neg_root_mean_squared_error: (test=-0.180) r2: (test=-2.450) total time=   0.1s
[CV 4/5] END gamma=0.1, learning_rate=0.001, max_depth=6, n_estimator

[CV 5/5] END gamma=0.1, learning_rate=0.01, max_depth=3, n_estimators=40; neg_root_mean_squared_error: (test=-0.344) r2: (test=-0.004) total time=   0.0s
[CV 1/5] END gamma=0.1, learning_rate=0.01, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.185) r2: (test=-0.135) total time=   0.1s
[CV 2/5] END gamma=0.1, learning_rate=0.01, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.169) r2: (test=-6.459) total time=   0.0s
[CV 3/5] END gamma=0.1, learning_rate=0.01, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.197) r2: (test=-3.120) total time=   0.0s
[CV 4/5] END gamma=0.1, learning_rate=0.01, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.313) r2: (test=-8.048) total time=   0.0s
[CV 5/5] END gamma=0.1, learning_rate=0.01, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.340) r2: (test=0.019) total time=   0.0s
[CV 1/5] END gamma=0.1, learning_rate=0.01, max_depth=3, n_estimators=160; ne

[CV 3/5] END gamma=0.1, learning_rate=0.01, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.173) r2: (test=-2.182) total time=   0.0s
[CV 4/5] END gamma=0.1, learning_rate=0.01, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.299) r2: (test=-7.232) total time=   0.0s
[CV 5/5] END gamma=0.1, learning_rate=0.01, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.344) r2: (test=-0.004) total time=   0.0s
[CV 1/5] END gamma=0.1, learning_rate=0.01, max_depth=9, n_estimators=80; neg_root_mean_squared_error: (test=-0.185) r2: (test=-0.135) total time=   0.1s
[CV 2/5] END gamma=0.1, learning_rate=0.01, max_depth=9, n_estimators=80; neg_root_mean_squared_error: (test=-0.169) r2: (test=-6.459) total time=   0.1s
[CV 3/5] END gamma=0.1, learning_rate=0.01, max_depth=9, n_estimators=80; neg_root_mean_squared_error: (test=-0.197) r2: (test=-3.120) total time=   0.1s
[CV 4/5] END gamma=0.1, learning_rate=0.01, max_depth=9, n_estimators=80; ne

[CV 3/5] END gamma=0.1, learning_rate=0.1, max_depth=6, n_estimators=40; neg_root_mean_squared_error: (test=-0.208) r2: (test=-3.604) total time=   0.0s
[CV 4/5] END gamma=0.1, learning_rate=0.1, max_depth=6, n_estimators=40; neg_root_mean_squared_error: (test=-0.271) r2: (test=-5.759) total time=   0.0s
[CV 5/5] END gamma=0.1, learning_rate=0.1, max_depth=6, n_estimators=40; neg_root_mean_squared_error: (test=-0.356) r2: (test=-0.075) total time=   0.0s
[CV 1/5] END gamma=0.1, learning_rate=0.1, max_depth=6, n_estimators=80; neg_root_mean_squared_error: (test=-0.146) r2: (test=0.294) total time=   0.1s
[CV 2/5] END gamma=0.1, learning_rate=0.1, max_depth=6, n_estimators=80; neg_root_mean_squared_error: (test=-0.082) r2: (test=-0.752) total time=   0.1s
[CV 3/5] END gamma=0.1, learning_rate=0.1, max_depth=6, n_estimators=80; neg_root_mean_squared_error: (test=-0.208) r2: (test=-3.615) total time=   0.1s
[CV 4/5] END gamma=0.1, learning_rate=0.1, max_depth=6, n_estimators=80; neg_root_m

[CV 5/5] END gamma=0.1, learning_rate=1, max_depth=3, n_estimators=40; neg_root_mean_squared_error: (test=-0.359) r2: (test=-0.094) total time=   0.0s
[CV 1/5] END gamma=0.1, learning_rate=1, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.120) r2: (test=0.522) total time=   0.0s
[CV 2/5] END gamma=0.1, learning_rate=1, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.063) r2: (test=-0.047) total time=   0.0s
[CV 3/5] END gamma=0.1, learning_rate=1, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.248) r2: (test=-5.516) total time=   0.1s
[CV 4/5] END gamma=0.1, learning_rate=1, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.333) r2: (test=-9.210) total time=   0.0s
[CV 5/5] END gamma=0.1, learning_rate=1, max_depth=3, n_estimators=80; neg_root_mean_squared_error: (test=-0.359) r2: (test=-0.094) total time=   0.0s
[CV 1/5] END gamma=0.1, learning_rate=1, max_depth=3, n_estimators=160; neg_root_mean_squared_e

[CV 2/5] END gamma=0.1, learning_rate=1, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.063) r2: (test=-0.047) total time=   0.0s
[CV 3/5] END gamma=0.1, learning_rate=1, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.248) r2: (test=-5.516) total time=   0.0s
[CV 4/5] END gamma=0.1, learning_rate=1, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.333) r2: (test=-9.210) total time=   0.0s
[CV 5/5] END gamma=0.1, learning_rate=1, max_depth=9, n_estimators=40; neg_root_mean_squared_error: (test=-0.359) r2: (test=-0.094) total time=   0.0s
[CV 1/5] END gamma=0.1, learning_rate=1, max_depth=9, n_estimators=80; neg_root_mean_squared_error: (test=-0.120) r2: (test=0.522) total time=   0.0s
[CV 2/5] END gamma=0.1, learning_rate=1, max_depth=9, n_estimators=80; neg_root_mean_squared_error: (test=-0.063) r2: (test=-0.047) total time=   0.0s
[CV 3/5] END gamma=0.1, learning_rate=1, max_depth=9, n_estimators=80; neg_root_mean_squared_er

In [23]:
# print best model
print(GS_xgb_milk.best_estimator_)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
             gamma=0.1, gpu_id=-1, importance_type=None,
             interaction_constraints='', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=340, n_jobs=4,
             num_parallel_tree=1, predictor='auto', random_state=2021,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)


In [24]:
# print best parameters
print('Best model Parameters',GS_xgb_milk.best_params_)
# best score
print('Best model R2 score',GS_xgb_milk.best_score_)

# write the Grid Search results to csv to choose best model with least resource consumption
GS_xgb_df_milk = pd.DataFrame(GS_xgb_milk.cv_results_)
GS_xgb_df_milk = GS_xgb_df_milk.sort_values('rank_test_r2')
#GS_xgb_df_milk.to_csv('./../artifacts/grid-search-xgb-milk-results.csv')

Best model Parameters {'gamma': 0.1, 'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 340}
Best model R2 score -1.9548865307455963


In [35]:
print(GS_xgb_milk.predict(X_test))
print(Y_test)

[0.28771797 0.28771797 0.28771797 0.28771797 0.28771797 0.28771797
 0.28771797]
[[36.8]
 [39.3]
 [25. ]
 [53.7]
 [27.9]
 [39.4]
 [49.5]]


## ANN

In [None]:
#!pip install --upgrade tensorflow
import math
import matplotlib.pyplot as plt
#import numpy as np
from numpy.random import seed
seed(2021)
#import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import tensorflow
tensorflow.random.set_seed(1)
from tensorflow.python.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.wrappers.scikit_learn import KerasRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
#from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

In [None]:
model = Sequential()
# input layers  = Number of features in the training set + 1
model.add(Dense(24, input_dim=24, kernel_initializer='normal', activation='relu'))
# hidden layers = Training Data Samples/Factor * (Input Neurons + Output Neurons)
model.add(Dense(30, activation='relu'))
model.add(Dense(1, activation='linear'))
model.summary()

In [None]:
model.compile(loss='mse', optimizer='adam', metrics=['mse','mae'])
history=model.fit(xtrain_scale, ytrain_scale, epochs=30, batch_size=150, verbose=1, validation_split=0.2)
predictions = model.predict(xtest_scale)

In [None]:
print(history.history.keys())
# "Loss"
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

In [None]:
predictions = scaler_y.inverse_transform(predictions)
predictions

## ANN hyper parameter tuning

In [None]:
#!pip install tensorflow
!pip install keras-tuner
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras_tuner.tuners import RandomSearch

In [38]:
def build_model(hp):
    model= keras.Sequential()
    for i in range(hp.Int('num_layers',2,23)):
        model.add(layers.Dense(units=hp.Int('units_' + str(i),
                                           min_value=23,
                                           max_value=600,
                                           step=32),
                              activation='relu'))
        model.add(layers.Dense(1,activation='linear'))
        model.compile(
            optimizer=keras.optimizers.Adam(
                hp.Choice('learning_rate',[1e-2,1e-3,1e-4])),
        loss='mean_absolute_error',
        metrics=['mean_absolute_error'])
        return model

In [39]:
# create a directory to store each iteration of modelling
tuner = RandomSearch(
        build_model,
        objective='val_mean_absolute_error',
        max_trials=5,
        executions_per_trial=3,
        directory='CA2',
        project_name='Milk production forecast')

2022-01-20 21:01:52.615371: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [84]:
# parameter space to search in
tuner.search_space_summary()

Search space summary
Default search space size: 3
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 2, 'max_value': 23, 'step': 1, 'sampling': None}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': None}
learning_rate (Choice)
{'default': 0.01, 'conditions': [], 'values': [0.01, 0.001, 0.0001], 'ordered': True}


In [85]:
# train the model
tuner.search(xtrain_scale,ytrain_scale,epochs=20,validation_data=(xtest_scale,ytest_scale))

Trial 5 Complete [00h 00m 04s]
val_mean_absolute_error: 0.06291941305001576

Best val_mean_absolute_error So Far: 0.05973196029663086
Total elapsed time: 00h 00m 24s
INFO:tensorflow:Oracle triggered exit


In [86]:
# print best 10 models according to previously selected metric
tuner.results_summary()

Results summary
Results in CA2/Milk production forecast
Showing 10 best trials
Objective(name='val_mean_absolute_error', direction='min')
Trial summary
Hyperparameters:
num_layers: 4
units_0: 160
learning_rate: 0.01
Score: 0.05973196029663086
Trial summary
Hyperparameters:
num_layers: 15
units_0: 512
learning_rate: 0.01
Score: 0.06291941305001576
Trial summary
Hyperparameters:
num_layers: 6
units_0: 448
learning_rate: 0.01
Score: 0.06576412667830785
Trial summary
Hyperparameters:
num_layers: 22
units_0: 96
learning_rate: 0.01
Score: 0.10544607043266296
Trial summary
Hyperparameters:
num_layers: 20
units_0: 96
learning_rate: 0.0001
Score: 0.14712721357742944
