# Machine Learning with SAP Datasphere, Hands-On Workshop
## Create first forecast

Retrieve the credentials to connect to SAP Datasphere

In [None]:
import json
file = open('credentials.json', 'r')
credentials = json.load(file)
file.close()

Establish a connection with SAP Datasphere

In [None]:
import hana_ml.dataframe as dataframe
conn = dataframe.ConnectionContext(address  = credentials['hana_address'],
                                   port     = credentials['hana_port'], 
                                   user     = credentials['hana_user'], 
                                   password = credentials['hana_password'], 
                                  )
conn.connection.isconnected()

Point a hana_ml DataFrame to the view in SAP Datasphere, which was created in the previous notebook

In [None]:
df_remote = conn.table('V2_LUCERNEELECTRICITY')

Retrieve and display a few rows of data from SAP Datasphere

In [None]:
df_remote.head(5).collect()

Split the data into training and test set. This will allow to train the Machine Learning model on one part of the data (df_rem_train) and to test the accuracy of its forecast on the test data (df_rem_test).

In [None]:
df_rem_train = df_remote.filter('''DATEHOUR > '2022-01-01' AND "DATEHOUR" < '2023-06-25'  ''')
df_rem_test = df_remote.filter('''"DATEHOUR" >='2023-06-25' AND DATEHOUR < '2023-06-27' ''')

Train the Machine Learning model on the training data. We use the AdditiveModelForecast algorithm, which is part of the Predictive Analysis Library in SAP HANA Cloud. This algorithm uses the same concept as Facebook's Prophet algorithm, which is very popular for time series forecasts. See the documentation on https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2023_2_QRC/en-US/pal/algorithms/hana_ml.algorithms.pal.tsa.additive_model_forecast.AdditiveModelForecast.html

In [None]:
from hana_ml.algorithms.pal.tsa.additive_model_forecast import AdditiveModelForecast
amf = AdditiveModelForecast()
amf.fit(data=df_rem_train.drop('HOUR'))

Look at the trained model

In [None]:
import json
json.loads(amf.model_.select('MODEL_CONTENT').collect().iloc[0,0])

Predict the consumption for the time period of the test dataset

In [None]:
df_rem_predicted = amf.predict(data=df_rem_test)
df_rem_predicted.head(5).collect()

Combine the known consumption of the test data with the prediction to assess the forecast accurarcy.

In [None]:
df_rem_predicted = df_rem_test.set_index('DATEHOUR').join(df_rem_predicted.set_index('DATEHOUR'))
df_rem_predicted.head(5).collect()

Plot the predicted values versus the actual values to visually compare the actuals with the forecast.

In [None]:
df_data = df_rem_predicted.sort("DATEHOUR").collect()

from matplotlib import pyplot as plt
plt.plot(df_data['DATEHOUR'], df_data['YHAT'])
plt.plot(df_data['DATEHOUR'], df_data['CONSUMPTION_H'])
plt.fill_between(df_data['DATEHOUR'],df_data['YHAT_LOWER'], df_data['YHAT_UPPER'], alpha=.3)
plt.legend(['Forecast', 'Actual'])
plt.xticks(rotation=45);

Calculate an error metrics. We ask for the MAPE, which stands for "Median Absolute Percentage Error". Other error metrics are listed in https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2023_2_QRC/en-US/pal/algorithms/hana_ml.algorithms.pal.tsa.accuracy_measure.accuracy_measure.html

In [None]:
from hana_ml.algorithms.pal.tsa.accuracy_measure import accuracy_measure
accuracy_measure(df_rem_predicted.select(['CONSUMPTION_H', 'YHAT']),
evaluation_metric='mape').collect()

Combine the training dataset with the predicted data. This UNION of hana_ml DataFrames requires both to have the same structure. Hence the training data is extended to also contains the columns that hold the predictions.

In [None]:
df_rem_train = df_rem_train.select('*', ('NULL', 'YHAT'),
                                  ('NULL', 'YHAT_LOWER'),
                                  ('NULL', 'YHAT_UPPER')
                                  )
df_rem_all = df_rem_predicted.union(df_rem_train)
df_rem_all.head(5).collect()

Save the combined dataset as physical table to SAP Datasphere

In [None]:
df_rem_all.save('LUCERNEELECTRICITY_FORECAST', force=True)