# Soh estimation experimentation of Renault vehicles
Two methods of calculation for the SoH: 
Based on the battery level 
```
soh = charging.battery_energy / (charging.battery_level * model_battery_capacity) 
```
Based on the estimated range 
```
soh = estimated_range / soc * model_battery_range) 
```
The good result is probably a combination of the two.

## Setup

### Imports

In [None]:
from rich import print
import pandas as pd
from pandas import DataFrame as DF
import plotly.express as px

from core.config import *
from transform.ayvens.ayvens_fleet_info import get_fleet_info
from transform.high_mobility.high_mobility_raw_tss import get_raw_tss

We must ensure that the data points of the time series can be compared together.  
To do this, we will extract their corresponding car model from `fleet_info.csv`("List finale des vin a activer" on the drive).

In [None]:
fleet_info = (
    get_fleet_info()
    .query("make == 'renault'")
)

fleet_info #[["model", "version"]].value_counts()

Then we will use data find online to get the default battery capacity of each model.  
Note: *Here a model is a combinatin of the `model` and `version` fleet_info variables since cars of the same model with different type can have different battery capacity*.

In [None]:
COLS_TO_CPY_FROM_FLEET_INFO = [
    "make",
    "model",
    "version",
    "dummy_soh_maker_offset",
    "dummy_soh_model_offset",
    "dummy_soh_model_slope",
    "dummy_soh_vehicle_offset",
    "kwh_capacity",
    "vin",
]

KEYS = [
    "slope",
    "intercept",
    "r_value",
    "p_value",
    "std_err",
]

RENAME_COLS_DICT:dict[str, str] = {
    "date_of_value": "date",
    "diagnostics.odometer": "odometer",
    "odometer.value": "odometer",
    "diagnostics.odometer": "odometer",
    "mileage_km": "odometer",
    "mileage": "odometer",
    "charging.battery_energy": "battery_energy",
    "charging.estimated_range": "estimated_range",
    "charging.battery_level": "soc",
    "soc_hv_header": "soc",
    "charging.battery_energy": "battery_energy",
    "charging.battery_level": "battery_level",
}

COLS_TO_KEEP = [
    "date",
    "soc",
    "odometer",
    "estimated_range",
    "battery_energy",
    "soc",
    "vin",
]

COL_DTYPES = {
    "soc": "float",
    "odometer": "float",
    "estimated_range": "float",
    "battery_energy": "float",
    "soc": "float",
    "vin": "string",
    "capacity": "float",
}
KWH_BATTERY_CAPCITY_DICT = {
    "ZOE": {
        "R90 Life (batterijkoop) 5d": 41,
        "R135 Edition One (batterijkoop) 5d": 52,
        "R135 Intens (batterijkoop) 5d": 52,
        "R135":52
    }
}
KNOW_MODEL_TYPES = ["R90 Life (batterijkoop) 5d", "R135 Edition One (batterijkoop) 5d", "R135 Intens (batterijkoop) 5d", "R135"]

Let's remove the vins that we don't have a known default battery capacity.

Let's extract the raw time seriess of all the cars we have into a multi indexed df. 

In [None]:
raw_tss = get_raw_tss("renault")

In [None]:
# display(raw_tss["vin"].unique())
display(fleet_info["vin"])

In [None]:
# Compter le nombre de VIN uniques
nombre_vin_uniques = raw_tss['vin'].nunique()

print(f"Le nombre de VIN différents dans tss est : {nombre_vin_uniques}")

### Time series processing
Let's implement a naive soh estimation pipeline.  

In [None]:
tss:DF = (
    raw_tss
    .merge(fleet_info[COLS_TO_CPY_FROM_FLEET_INFO], on="vin", how="left")
    .rename(columns={"charging.battery_energy": "battery_energy", "diagnostics.odometer": "odometer", "charging.battery_level": "battery_level","charging.estimated_range": "estimated_range"})
    .eval("soc = battery_level * 100")
    .eval("expected_battery_energy = kwh_capacity * battery_level")
    .eval("soh = 100 * expected_battery_energy / battery_energy / 115") # the division of 115 is to normalize the battery capacity 
)
tss.count()

In [None]:
# tss[tss['vin']=='VF1AG000366046670'].tail(10).head(25)
# columns_of_interest = ['vin', 'soc', 'battery_energy']  # Replace with your desired columns
# value_counts_specific = tss[columns_of_interest].agg('value_counts')
# print(value_counts_specific)

## EDA

## Assumption verification
First, we will verify that the `soc` and `battery_energy` are two "real" variables.  
That is, none of them is calculated from the other.

In [None]:
# Compter le nombre de VIN uniques
nombre_vin_uniques = tss['vin'].nunique()

print(f"Le nombre de VIN différents dans tss est : {nombre_vin_uniques}")


In [None]:
px.scatter(tss, x="soc", y="battery_energy", color="vin")


Looking at this scatter plot we can see that:
- The two variables are in fact two real variables instead of one being a synthetic variable calculated from the other.  
- The difference is much more important at high `soc` values.

Let's verify that the `soh` is not correlated with the `soc` or `odometer`.

In [None]:
px.scatter(tss, x="soc", y="soh", color="vin")

In [None]:
px.scatter(tss, x="odometer", y="soh", color="vin")
