<a href="https://colab.research.google.com/github/AntonioMoradoRamos/meia-masterdegree-utils/blob/main/tarefas_ml_3b_train_and_save_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Serializing object with Pickle


The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” 1 or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

<br/>

**Warning:** The pickle module is not secure. Only unpickle data you trust.
It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

Consider signing data with hmac if you need to ensure that it has not been tampered with.

Safer serialization formats such as json may be more appropriate if you are processing untrusted data. See Comparison with json.


- https://docs.python.org/3/library/pickle.html#pickle.dumps

##Libraries

In [None]:
import pandas as pd
import numpy as np

import datetime
from datetime import date, timedelta
from datetime import time

In [None]:
#from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import (RandomForestRegressor, 
                              RandomForestClassifier)

In [None]:
# Versão do python
!python --version

Python 3.7.13


In [None]:
# Versão do panas
print(pd.__version__)

1.3.5


##Load Dataset

In [None]:
csv_file_name = "jan_2019_havc_light_socket_sum_period_day.csv"
ds_merged = pd.read_csv(csv_file_name, sep = ';')

In [None]:
ds_merged.head()

Unnamed: 0.1,Unnamed: 0,date,time,consumption_w,generation_w,temperature_c,humidity_perc,radiation_Wm_2,power_w_zone_1_hvac1,power_w_zone_1_hvac2,...,power_w_zone_4_light,power_w_zone_4_sockets,power_w_zone_5_hvac,power_w_zone_5_light,power_w_zone_5_sockets,power_w_hvac,power_w_light,power_w_socket,isoweekday,period_of_day
0,0,2019-01-01,00:05:00,2985,0,8.7,76.0,0.0,8,0,...,0,171,0,0,232,898,0,2087,2,1
1,1,2019-01-01,00:10:00,2258,0,8.6,76.0,0.0,4,0,...,0,169,0,0,233,182,0,2076,2,1
2,2,2019-01-01,00:15:00,2266,0,8.7,76.0,0.0,8,0,...,0,169,0,0,233,186,0,2080,2,1
3,3,2019-01-01,00:20:00,3016,0,8.7,76.0,0.0,8,0,...,0,167,0,0,232,950,0,2066,2,1
4,4,2019-01-01,00:25:00,2265,0,8.6,76.0,0.0,7,0,...,0,167,0,0,230,186,0,2079,2,1


In [None]:
# Retirar o atributo "Unnamed: 0"
ds_merged.drop(columns=['Unnamed: 0'], inplace=True)

In [None]:
# Converter o tipo de dado do atributo "date" para datetime
ds_merged['date']= pd.to_datetime(ds_merged['date'])

In [None]:
ds_merged.head()

Unnamed: 0,date,time,consumption_w,generation_w,temperature_c,humidity_perc,radiation_Wm_2,power_w_zone_1_hvac1,power_w_zone_1_hvac2,power_w_zone_1_hvac3,...,power_w_zone_4_light,power_w_zone_4_sockets,power_w_zone_5_hvac,power_w_zone_5_light,power_w_zone_5_sockets,power_w_hvac,power_w_light,power_w_socket,isoweekday,period_of_day
0,2019-01-01,00:05:00,2985,0,8.7,76.0,0.0,8,0,0,...,0,171,0,0,232,898,0,2087,2,1
1,2019-01-01,00:10:00,2258,0,8.6,76.0,0.0,4,0,0,...,0,169,0,0,233,182,0,2076,2,1
2,2019-01-01,00:15:00,2266,0,8.7,76.0,0.0,8,0,0,...,0,169,0,0,233,186,0,2080,2,1
3,2019-01-01,00:20:00,3016,0,8.7,76.0,0.0,8,0,0,...,0,167,0,0,232,950,0,2066,2,1
4,2019-01-01,00:25:00,2265,0,8.6,76.0,0.0,7,0,0,...,0,167,0,0,230,186,0,2079,2,1


In [None]:
ds_merged.info()

In [None]:
# Isolar X ( features that contribuite to the prediction ) and Y ( Y is the value to predict )
#x_features = ['power_w_hvac','power_w_light','power_w_socket','isoweekday','temperature_c', 'humidity_perc', 'period_of_day']
x_features = ['power_w_hvac','power_w_light','power_w_socket','temperature_c', 'humidity_perc']


y = ds_merged['consumption_w']     # O que pretendo prever
X = ds_merged[x_features]

In [None]:
# Partition data
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, 
                                                    random_state=1)

In [None]:
print(f"Training features shape: {X_train.shape}")
print(f"Training target shape: {y_train.shape}\n")
print(f"Test features shape: {X_test.shape}")
print(f"Test target shape: {y_test.shape}")
display(X_train.head())

Training features shape: (7141, 5)
Training target shape: (7141,)

Test features shape: (1786, 5)
Test target shape: (1786,)


Unnamed: 0,power_w_hvac,power_w_light,power_w_socket,temperature_c,humidity_perc
784,200,436,4038,12.0,65.0
3836,634,0,2172,5.7,83.0
1512,833,0,2089,5.4,69.0
6320,188,0,2142,10.8,91.0
6858,1280,712,2510,13.1,89.0


In [None]:
# Train a simple model
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

RandomForestRegressor(random_state=42)

In [None]:
# Predict the values

In [None]:
prediction = model.predict(X_test)

In [None]:
prediction

array([2570.98, 5101.21, 5616.25, ..., 3443.88, 3782.86, 7747.54])

Dump the machine learning model

In [None]:
import pickle

ml_model_file_name = 'rf_ml_model.pkl'
with open(ml_model_file_name, 'wb') as fid:
     pickle.dump(model, fid)

Load the machine learning model

In [None]:
# Read the data from the file
with open(ml_model_file_name, 'rb') as fid:
     reconstructed = pickle.load(fid)

In [None]:
#import pickle
# Pickle to serialize and deserialize
#pickled_model = pickle.dumps(model)
#reconstructed = pickle.loads(pickled_model)




In [None]:
prediction_reconstructed_model = reconstructed.predict(X_test)

In [None]:
prediction_reconstructed_model

array([2570.98, 5101.21, 5616.25, ..., 3443.88, 3782.86, 7747.54])

#Bibliografia

***A Gentle Introduction to Serialization for Python***
- https://machinelearningmastery.com/a-gentle-introduction-to-serialization-for-python/


***Saving Machine Learning And Deep Learning Models***
- https://medium.com/analytics-vidhya/save-machine-learning-and-deep-learning-models-3b27f1ea550


***pickle — Python object serialization***
- https://docs.python.org/3/library/pickle.html#module-pickle


***GSFC Python Bootcamp***
- https://colab.research.google.com/github/astg606/py_materials/blob/master/useful_modules/introduction_pickle.ipynb