# SoH computation for Mercedez-benz

In this notebook, we will try to compute the SoH.


In [None]:
from core.spark_utils import *
import os
from core.s3.s3_utils import S3Service
import plotly.express as px
import pandas as pd

## Load data

In [None]:
spark_session = create_spark_session(os.environ.get('S3_KEY'), os.environ.get('S3_SECRET'))

In [None]:
s3 = S3Service()

In [None]:
processed_phase = s3.read_parquet_df_spark(spark_session, f'processed_phases/processed_phases_mercedes_benz.parquet').toPandas()


In [None]:
raw_ts = s3.read_parquet_df_spark(spark_session, f'raw_ts/mercedes-benz/time_series/raw_ts_spark.parquet').toPandas()


In [None]:
processed_phase.columns

In [None]:
processed_phase[['BATTERY_NET_CAPACITY', 'CHARGING_DURATION_OEM', 'CHARGING_RATE_MEAN','ODOMETER_FIRST', 'ODOMETER_LAST', 'RANGE', 'SOC_DIFF', 'SOC_FIRST', 'SOC_LAST',
       'TOTAL_ENERGY_CHARGED']] = processed_phase[['BATTERY_NET_CAPACITY', 'CHARGING_DURATION_OEM', 'CHARGING_RATE_MEAN','ODOMETER_FIRST', 'ODOMETER_LAST', 'RANGE', 'SOC_DIFF', 'SOC_FIRST', 'SOC_LAST',
       'TOTAL_ENERGY_CHARGED']].astype(float)

In [None]:
raw_ts['charging_rate'] = raw_ts['charging_rate'].astype(float)
raw_ts['energy_charged'] = raw_ts['energy_charged'].astype(float)
raw_ts['battery_level'] = raw_ts['battery_level'].astype(float)


In [None]:
px.scatter(processed_phase, x='DATETIME_BEGIN', y="CHARGING_DURATION_OEM")

## SoH computation

## 1st Method

We can estimate the charging energy with `CHARGING_DURATION_OEM * CHARGING_RATE_MEAN`. And by dividing that by **SOC_DIFF** during the charge we can have the an estimated battery capacity.    
So :

$ Capacity\_estimated = \frac{CHARGING\_DURATION\_OEM \times CHARGING\_RATE\_MEAN}{\frac{SOC\_DIFF}{100}}$

and $ SoH = \frac{Capacity\_estimated}{BATTERY\_NET\_CAPACITY}$


In [None]:
df_1 = processed_phase.copy()

In [None]:
df_1.dropna()

In [None]:
df_1['estimated_capacity'] = (processed_phase['CHARGING_DURATION_OEM'] / 3600) * processed_phase['CHARGING_RATE_MEAN'] / (processed_phase['SOC_DIFF'] / 100)
df_1["soh"] = (df_1["estimated_capacity"] / df_1["BATTERY_NET_CAPACITY"]) 


In [None]:
px.scatter(df_1[df_1['soh'] > 0], x='PHASE_INDEX', y='soh', color='VIN')

### Conclusion

We observe that there is too much variance in the SoH calculation results, but this can be explained by the following factors:

- **CHARGING_DURATION_OEM** is much shorter than the actual phase duration.

- A large number of rows have a **CHARGING_RATE** value equal to 0.

- There is a time shift in `raw_ts` between the reception of the **total_charging_duration** value and a **soc** value, which could imply that the vehicle was used after this information was retrieved.


## Second method

We have the **TOTAL_ENERGY_CHARGED** during a charge o we can use it to with the **SOC_DIFF** to estimate the battery capacity.

So:

$Capacity\_estimated = \frac{TOTAL\_ENERGY\_CHARGED}{\frac{SOC\_DIFF}{100}}$

In [None]:
df_2 = processed_phase.copy()

In [None]:
def calcul_soh(energie_kwh, delta_soc, capacite_nominale_kwh):
    estimated_capacity = energie_kwh / (abs(delta_soc) / 100)
    soh = estimated_capacity / capacite_nominale_kwh
    return round(soh * 100, 2) 


df_2['soh'] = df_2.apply(lambda x: calcul_soh(x['TOTAL_ENERGY_CHARGED'], x['SOC_DIFF'],  x['BATTERY_NET_CAPACITY']) ,axis=1).dropna()


In [None]:
px.scatter(df_2, x='PHASE_INDEX', y='soh', color='VIN')

In [None]:
px.scatter(df_2, x='PHASE_INDEX', y='TOTAL_ENERGY_CHARGED', color='VIN')

In [None]:
px.histogram(df_2[['TOTAL_ENERGY_CHARGED']].dropna(), x='TOTAL_ENERGY_CHARGED', nbins=100)

Issue with the current method:

- The **TOTAL_ENERGY_CHARGED** value is not consistently recorded immediately after a charging session. And the period can not be right.
- The **TOTAL_ENERGY_CHARGED** data is inconsistent and have absurd values.