# Resumen:

El siguiente notebook describe la exploración de un dataset para realización de un modelo de aprendizaje automático sobre un microcontrolador. Por medio de la plataforma Kaglle se obtuvo la base de datos que se usará para el entrenamiento del modelo, esta apunta registra dos datos relevantes que son el SoC y la temperatura de la batería de el automóvil BMW i3, haciendo uso de esta abse de datos de apunta a la estimación de la temperatura de la batería y para ello por medio de la librería PyCaret se realiza el entramiento del modelo la cual permite la automatización del flujo de trabajo de ML.

# Marco Teórico:

## Descripción del conjunto de datos:

Las baterías de los vehículos eléctricos se enfrentan a variaciones debido a la variación de la conducción por parte del conductores, adicional a las cargas auxiliares que consumen energía para satisfacer sus funciones como lo son el sistema de calefacción y el aire acondicionado, esto genera una reducción significativa de la autonomía total del vehículo.

Haciendo uso de la plataforma Kaggle se realiza la busqueda de la base de datos que se implementará para el entrenamiento del modelo de nombre "SoC and Temperature Prediction".
En esta base de datos se registran 72 viajes reales con un BMW i3 los cuales comprenden el sistema de calefacción y de propulsión, cada viaje contiene datos relevantes del ambiente(temperatura, elevación, entre otros), datos del vehículo (aceleración, velocidad, entre otros), datos de batería(SoC, voltaje, corriente, temperatura), circuito de calefacción(temperatura interna, potencia de calefacción, entre otros).

Los datos están separados en múltiples archivos CSV separados por dos catergorías A y B, donde la primera se registró en verano la cual no contiene todos los datos presentados anteriormente en su totalidad debido a problemas en el sistema de medición, mientras que la categoría B es consistente con todos los datos que describe.


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn

*Almacenamiento de la base de datos para ser tratada en una variable que representa la base.*

In [2]:
data = pd.read_csv('AllTrips.csv')

*Lectura de la base de datos, se puede observar que las últimas dos columnas contienen datos tipo NaN, por la tanto se debe hacer limpieza de estas.*

In [3]:
data

Unnamed: 0,Time [s],Velocity [km/h],Elevation [m],Throttle [%],Motor Torque [Nm],Longitudinal Acceleration [m/s^2],Regenerative Braking Signal,Battery Voltage [V],Battery Current [A],Battery Temperature [°C],...,Temperature Feetvent Co-Driver [°C],Temperature Feetvent Driver [°C],Temperature Head Co-Driver [°C],Temperature Head Driver [°C],Temperature Vent right [°C],Temperature Vent central right [°C],Temperature Vent central left [°C],Temperature Vent right [°C].1,Velocity [km/h]]],Unnamed: 23
0,0.0,0.0,575.0,0.0,0.0,-0.23,0.0,390.60,-13.10,8.0,...,16.02,15.85,1.97,3.28,5.11,3.02,1.97,5.64,,
1,0.1,0.0,575.0,0.0,0.0,-0.21,0.0,390.60,-13.10,8.0,...,16.02,15.85,1.97,3.28,5.11,3.02,1.97,5.64,,
2,0.2,0.0,575.0,0.0,0.0,-0.32,0.0,390.60,-13.10,8.0,...,16.02,15.85,1.97,3.28,5.11,3.02,1.97,5.64,,
3,0.3,0.0,575.0,0.0,0.0,-0.23,0.0,390.60,-13.10,8.0,...,16.02,15.85,1.97,3.28,5.11,3.02,1.97,5.64,,
4,0.4,0.0,575.0,0.0,0.0,-0.23,0.0,390.58,-13.17,8.0,...,16.02,15.87,1.97,3.27,5.11,3.02,1.97,5.64,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1094788,1351.6,0.0,513.0,0.0,0.0,0.07,0.0,384.40,-0.70,13.0,...,,,,,,,,,,
1094789,1351.7,0.0,513.0,0.0,0.0,0.07,0.0,384.40,-0.70,13.0,...,,,,,,,,,,
1094790,1351.8,0.0,513.0,0.0,0.0,0.09,0.0,384.40,-0.70,13.0,...,,,,,,,,,,
1094791,1351.9,0.0,513.0,0.0,0.0,0.11,0.0,384.40,-0.70,13.0,...,,,,,,,,,,


*Limpieza de las dos últimas columnas(Velocity [km/h]]] // Unnamed:23)*

In [4]:
data = data.iloc[: , :-2]

*El objetivo del entrenamiento será predecir la temperatura de la batería, por lo tanto se comprueba si dicha columna posee datos NaN, esta debe cumplir con que no posea datos faltantes para dicho objetivo.*

In [8]:
data['Battery Temperature [°C]'].isnull().values.any()

False

In [9]:
data

Unnamed: 0,Time [s],Velocity [km/h],Elevation [m],Throttle [%],Motor Torque [Nm],Longitudinal Acceleration [m/s^2],Regenerative Braking Signal,Battery Voltage [V],Battery Current [A],Battery Temperature [°C],...,Temperature Footweel Driver [°C],Temperature Footweel Co-Driver [°C],Temperature Feetvent Co-Driver [°C],Temperature Feetvent Driver [°C],Temperature Head Co-Driver [°C],Temperature Head Driver [°C],Temperature Vent right [°C],Temperature Vent central right [°C],Temperature Vent central left [°C],Temperature Vent right [°C].1
0,0.0,0.0,575.0,0.0,0.0,-0.23,0.0,390.60,-13.10,8.0,...,2.49,4.06,16.02,15.85,1.97,3.28,5.11,3.02,1.97,5.64
1,0.1,0.0,575.0,0.0,0.0,-0.21,0.0,390.60,-13.10,8.0,...,2.49,4.06,16.02,15.85,1.97,3.28,5.11,3.02,1.97,5.64
2,0.2,0.0,575.0,0.0,0.0,-0.32,0.0,390.60,-13.10,8.0,...,2.49,4.06,16.02,15.85,1.97,3.28,5.11,3.02,1.97,5.64
3,0.3,0.0,575.0,0.0,0.0,-0.23,0.0,390.60,-13.10,8.0,...,2.49,4.06,16.02,15.85,1.97,3.28,5.11,3.02,1.97,5.64
4,0.4,0.0,575.0,0.0,0.0,-0.23,0.0,390.58,-13.17,8.0,...,2.50,4.06,16.02,15.87,1.97,3.27,5.11,3.02,1.97,5.64
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1094788,1351.6,0.0,513.0,0.0,0.0,0.07,0.0,384.40,-0.70,13.0,...,,,,,,,,,,
1094789,1351.7,0.0,513.0,0.0,0.0,0.07,0.0,384.40,-0.70,13.0,...,,,,,,,,,,
1094790,1351.8,0.0,513.0,0.0,0.0,0.09,0.0,384.40,-0.70,13.0,...,,,,,,,,,,
1094791,1351.9,0.0,513.0,0.0,0.0,0.11,0.0,384.40,-0.70,13.0,...,,,,,,,,,,


*Ahora bien, una vez limpio el conjunto de datos, se procede a hacer una descripción estadística de los datos recopilados en la base de datos, por ende se hace uso del método .describe() para observar dichas estadísticas.*

In [9]:
data.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Time [s],1094793.0,1030.379323,894.872554,0.0,391.5,812.1,1380.9,5610.1
Velocity [km/h],1078364.0,44.697899,35.483078,0.0,14.19,40.89,67.9,152.26
Elevation [m],1094793.0,532.208922,45.282726,437.0,487.13,530.91,567.26,664.99
Throttle [%],1094793.0,28.07981,18.757409,0.0,10.33,33.24,42.97499,135.25
Motor Torque [Nm],1094793.0,10.768902,34.803679,-87.9,0.0,6.5,22.36,249.5
Longitudinal Acceleration [m/s^2],1094793.0,-0.004173,0.632472,-9.03,-0.24,-0.02,0.2,4.46
Regenerative Braking Signal,1094793.0,0.052119,0.221979,0.0,0.0,0.0,0.0,1.0
Battery Voltage [V],1094793.0,376.3793,11.859489,301.8,369.13,379.51,385.4,394.75852
Battery Current [A],1094793.0,-17.303016,44.435063,-404.38,-31.27,-11.46,-1.69,144.49
Battery Temperature [°C],1094793.0,15.416753,7.430944,-1.0,9.0,14.0,21.0,32.0


## Entrenamiento:

El objetivo del entrenamiento es la denominada Battery Temperature [°C] en la base de datos, por medio del uso del uso de la Regresión, haciendo uso de la librería PyCaret y aplicando la función 'setup', se permite configurar el objetivo del entrenamiento, esta por defecto entrena los modelos con el 70% para entrenamiento, por lo que ya se cumple con las restricciones planteadas por el proyecto de clase.

*Se puede observar información adicional como el tamaño del dataset y la cantidad de filas que contienen datos faltantes, estos no se eliminan ya que se encuentras dispersos y al sustraerlos se cortarían datos importantes que aportan otras características que no contienen datos faltantes en dichas columnas a eliminar*

In [15]:
from pycaret.regression import *
reg1 = setup(data, target = 'Battery Temperature [°C]', session_id=123, log_experiment=True, experiment_name='insurance1')

Unnamed: 0,Description,Value
0,Session id,123
1,Target,Battery Temperature [°C]
2,Target type,Regression
3,Original data shape,"(1094793, 48)"
4,Transformed data shape,"(1094793, 48)"
5,Transformed train set shape,"(766355, 48)"
6,Transformed test set shape,"(328438, 48)"
7,Numeric features,47
8,Rows with missing values,49.2%
9,Preprocess,True


2024/02/15 15:06:32 INFO mlflow.tracking.fluent: Experiment with name 'insurance1' does not exist. Creating a new experiment.


*La función compare_models de PyCaret proporciona una amplia gama de modelos de aprendizaje automáticos predefinidos utilizando validación cruzada, uno de los parámetros configurados en esta ocasión ha sido el 'fold', el cual se refiere al número de pliegues utilizados para la validación cruzada, en este caso se usan 5, para evaluar el rendimiento de los modelos, el número de pliegues en la validación cruzada controla cuántas divisiones se hacen sobre los datos y cuántas iteraciones de entrenamiento y prueba se realizarán para evaluar el rendimiento del modelo,los datos se dividen en 5 partes iguales. Luego, el modelo se entrenará y probará 5 veces, utilizando una combinación diferente de 4 pliegues para entrenamiento y el pliegue restante para prueba en cada iteración.*

*Después de evaluar todos los modelos utilizando validación cruzada, compare_models() devuelve el mejor modelo basado en una métrica de rendimiento predeterminad, así generando un ranking que contiene medidas de error, siendo 2 de las incluídas, las solicitadas en el proyecto, RMSE y MAE, debido a que es un modelo de regresión el usado en esta ocasión.*

In [17]:
best_model = compare_models(fold=5)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
dt,Decision Tree Regressor,0.0006,0.0004,0.02,1.0,0.0022,0.0001,5.128
rf,Random Forest Regressor,0.0008,0.0003,0.0169,1.0,0.0019,0.0001,337.042
et,Extra Trees Regressor,0.0004,0.0001,0.0109,1.0,0.0012,0.0001,118.282
lightgbm,Light Gradient Boosting Machine,0.1311,0.0359,0.1895,0.9993,0.0244,0.0144,6.706
gbr,Gradient Boosting Regressor,0.3069,0.1381,0.3716,0.9975,0.0448,0.0311,145.566
lr,Linear Regression,0.4175,0.2296,0.4791,0.9958,0.0575,0.0412,2.964
ridge,Ridge Regression,0.4174,0.2296,0.4791,0.9958,0.0575,0.0412,1.184
knn,K Neighbors Regressor,0.0694,0.2624,0.5121,0.9952,0.0607,0.011,246.78
llar,Lasso Least Angle Regression,0.4511,0.2769,0.5262,0.995,0.0762,0.047,1.182
lasso,Lasso Regression,0.4511,0.277,0.5263,0.995,0.0763,0.0471,1.558


In [18]:
lightgbm = create_model('lightgbm')

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.1309,0.0359,0.1894,0.9994,0.0251,0.0148
1,0.1278,0.0353,0.1878,0.9994,0.0238,0.0143
2,0.1316,0.0356,0.1887,0.9994,0.0244,0.0147
3,0.1304,0.0359,0.1895,0.9994,0.0252,0.0146
4,0.1304,0.0355,0.1883,0.9994,0.0248,0.0145
5,0.1303,0.0358,0.1891,0.9994,0.0259,0.0148
6,0.1297,0.0353,0.188,0.9994,0.024,0.0142
7,0.1291,0.0351,0.1874,0.9994,0.024,0.0143
8,0.129,0.0353,0.1879,0.9994,0.0243,0.0144
9,0.1292,0.0352,0.1876,0.9994,0.0239,0.0141


In [19]:
lgbms = [create_model('lightgbm', learning_rate=i) for i in np.arange(0.1,1,0.1)]

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.1309,0.0359,0.1894,0.9994,0.0251,0.0148
1,0.1278,0.0353,0.1878,0.9994,0.0238,0.0143
2,0.1316,0.0356,0.1887,0.9994,0.0244,0.0147
3,0.1304,0.0359,0.1895,0.9994,0.0252,0.0146
4,0.1304,0.0355,0.1883,0.9994,0.0248,0.0145
5,0.1303,0.0358,0.1891,0.9994,0.0259,0.0148
6,0.1297,0.0353,0.188,0.9994,0.024,0.0142
7,0.1291,0.0351,0.1874,0.9994,0.024,0.0143
8,0.129,0.0353,0.1879,0.9994,0.0243,0.0144
9,0.1292,0.0352,0.1876,0.9994,0.0239,0.0141


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.0853,0.0179,0.134,0.9997,0.0169,0.0093
1,0.0852,0.019,0.1378,0.9997,0.0165,0.0092
2,0.0855,0.0178,0.1333,0.9997,0.0165,0.009
3,0.0894,0.0193,0.1389,0.9997,0.0178,0.0097
4,0.0865,0.0182,0.1348,0.9997,0.0163,0.0091
5,0.0861,0.0187,0.1366,0.9997,0.0169,0.0092
6,0.0845,0.0182,0.1348,0.9997,0.0166,0.0089
7,0.0857,0.0178,0.1334,0.9997,0.0164,0.0091
8,0.0865,0.0184,0.1355,0.9997,0.0164,0.0091
9,0.0855,0.0185,0.1361,0.9997,0.0172,0.0092


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.0678,0.012,0.1097,0.9998,0.0139,0.0073
1,0.0673,0.013,0.1141,0.9998,0.0139,0.0074
2,0.0677,0.012,0.1094,0.9998,0.0137,0.0071
3,0.0683,0.0124,0.1114,0.9998,0.014,0.0074
4,0.0675,0.0119,0.1093,0.9998,0.0139,0.0071
5,0.0679,0.0122,0.1106,0.9998,0.0141,0.0072
6,0.0677,0.0123,0.1108,0.9998,0.0134,0.0072
7,0.0667,0.0118,0.1088,0.9998,0.0135,0.0071
8,0.0695,0.0126,0.1123,0.9998,0.0144,0.0073
9,0.0685,0.0125,0.1118,0.9998,0.0132,0.007


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.058,0.0091,0.0952,0.9998,0.0119,0.0062
1,0.0569,0.0099,0.0994,0.9998,0.011,0.006
2,0.0597,0.0093,0.0964,0.9998,0.0121,0.0063
3,0.0593,0.0097,0.0986,0.9998,0.0121,0.0062
4,0.0593,0.0095,0.0975,0.9998,0.0113,0.0061
5,0.0597,0.0093,0.0964,0.9998,0.013,0.0064
6,0.0579,0.0092,0.0959,0.9998,0.012,0.0061
7,0.0578,0.0092,0.0959,0.9998,0.0115,0.0061
8,0.0583,0.0092,0.096,0.9998,0.0119,0.0061
9,0.0569,0.009,0.0951,0.9998,0.0111,0.006


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.0525,0.0077,0.0876,0.9999,0.0115,0.0057
1,0.0514,0.0083,0.0911,0.9998,0.011,0.0057
2,0.0533,0.0079,0.0888,0.9999,0.0113,0.0057
3,0.0534,0.0078,0.0884,0.9999,0.012,0.0058
4,0.0539,0.008,0.0894,0.9999,0.0106,0.0054
5,0.0552,0.0081,0.09,0.9999,0.012,0.006
6,0.051,0.0073,0.0853,0.9999,0.0107,0.0054
7,0.053,0.008,0.0893,0.9999,0.0118,0.0055
8,0.0531,0.0078,0.0885,0.9999,0.011,0.0055
9,0.0514,0.0077,0.0876,0.9999,0.0111,0.0055


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.0479,0.0068,0.0826,0.9999,0.0105,0.0051
1,0.0475,0.0076,0.087,0.9999,0.0109,0.0052
2,0.0498,0.0071,0.0843,0.9999,0.0111,0.0054
3,0.0489,0.0069,0.0833,0.9999,0.0105,0.0053
4,0.0504,0.0071,0.0845,0.9999,0.0116,0.0055
5,0.051,0.0072,0.0849,0.9999,0.0114,0.0055
6,0.0523,0.0075,0.0864,0.9999,0.0114,0.0055
7,0.0507,0.0072,0.0846,0.9999,0.011,0.0054
8,0.0508,0.0074,0.0862,0.9999,0.01,0.0052
9,0.0516,0.0073,0.0854,0.9999,0.0117,0.0056


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.0508,0.0072,0.085,0.9999,0.0109,0.0056
1,0.0457,0.0073,0.0852,0.9999,0.01,0.005
2,0.0467,0.0066,0.0815,0.9999,0.0107,0.0051
3,0.0482,0.0069,0.0829,0.9999,0.0108,0.0053
4,0.0473,0.0065,0.0808,0.9999,0.0106,0.0052
5,0.0477,0.0069,0.0831,0.9999,0.0114,0.0054
6,0.0479,0.0066,0.0814,0.9999,0.0109,0.0054
7,0.0475,0.0067,0.0821,0.9999,0.0103,0.0051
8,0.0503,0.0071,0.0844,0.9999,0.0099,0.0053
9,0.0468,0.0066,0.081,0.9999,0.0107,0.0051


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.0468,0.0067,0.0819,0.9999,0.0107,0.005
1,0.0476,0.008,0.0896,0.9999,0.011,0.0055
2,0.0474,0.0073,0.0854,0.9999,0.0113,0.0052
3,0.0487,0.007,0.0834,0.9999,0.0123,0.0056
4,0.0484,0.0073,0.0856,0.9999,0.0109,0.0052
5,0.0486,0.0072,0.0848,0.9999,0.0118,0.0056
6,0.047,0.0066,0.0814,0.9999,0.01,0.005
7,0.0492,0.0072,0.0849,0.9999,0.0102,0.0052
8,0.0483,0.0071,0.0844,0.9999,0.012,0.0054
9,0.0461,0.0069,0.0832,0.9999,0.0105,0.005


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.0465,0.0073,0.0853,0.9999,0.0113,0.0053
1,0.0461,0.0081,0.0899,0.9999,0.011,0.0054
2,0.0427,0.0063,0.0797,0.9999,0.0107,0.0048
3,0.0483,0.0078,0.0883,0.9999,0.0114,0.0054
4,0.0498,0.0081,0.0901,0.9999,0.0115,0.0055
5,0.05,0.0076,0.0875,0.9999,0.0117,0.0057
6,0.0501,0.0079,0.0888,0.9999,0.0119,0.0056
7,0.0499,0.0075,0.0867,0.9999,0.0114,0.0056
8,0.0469,0.0072,0.085,0.9999,0.0116,0.0052
9,0.0486,0.0073,0.0854,0.9999,0.0114,0.0055
