# soh estimation experimentation of Mercedes vehicles
In this notebook, we will try to express the soh at at any point as the energy that the battery would have if the battery would have if it had 100% soh divided by the energy it actually has.  
```
soh = charging.battery_energy / (charging.battery_level * model_battery_capacity) 
```

This method is based on the assumption that the variable `charging.battery_energy` represents the actual energy present in the battery rather than simply `charging.battery_level * model_battery_capacity`.

## Setup

In [None]:
! mkdir -p data_cache

### Imports

In [None]:
import plotly.express as px
from scipy import stats
import numpy as np

from core.pandas_utils import *
from transform.fleet_info.main import fleet_info
from transform.processed_tss.main import get_processed_tss

### Data extraction

In [None]:
tss = get_processed_tss("mercedes-benz", force_update=False)

In [None]:
sanity_check(tss)

In [None]:
fleet_info.query("make == 'mercedes-benz'")["range"].value_counts(dropna=False, sort=True, ascending=False)

In [None]:
fleet_info.query("make == 'mercedes-benz' & range.isna()")[["model", "version"]].value_counts(sort=True, ascending=False).sort_index()

In [None]:
unique_models = fleet_info.query("make == 'mercedes-benz'")["model"].unique()

for model in unique_models:
    nb_with_range = len(fleet_info.query(f"model == '{model}' & range.notna()"))
    nb_total = len(fleet_info.query(f"model == '{model}'"))
    print(f"Model: {model}")
    print(f"Number of {model} with range: {nb_with_range}")
    print(f"Number of {model}: {nb_total}")
    print(f"ratio: {nb_with_range / nb_total:.2f}")
    print()

## Time series

In [None]:
most_common_vin = tss.groupby("vin").size().sort_values(ascending=False).idxmax()
most_common_vin
ts = tss.query(f"vin == '{most_common_vin}'")

In [None]:
px.scatter(ts, x="date", y="soc", title=f"{most_common_vin}")

In [None]:
px.scatter(ts, x="date", y="estimated_range", title=f"{most_common_vin}")

In [None]:
px.scatter(ts, x="date", y="max_range", title=f"{most_common_vin}")

In [None]:
ts = ts.eval("estimated_range_by_soc = estimated_range.ffill() / soc.ffill()")
px.scatter(ts, x="date", y="estimated_range_by_soc", title=f"{most_common_vin}")

In [None]:
corr  = ts.corr(numeric_only=True)
corr["max_range"].sort_values(ascending=False)


## SOH

### Estimtation

In [None]:
# Mercedes soh
tss:DF = (
    tss
    .eval("soh = estimated_range / soc / range * 100")
    .eval("soh2 = (estimated_range / soc) / (range / max_range) ")
    .eval("odometer = odometer.ffill()")
)
#tss.loc[tss.eval("model == 'vito' | model == 'sprinter'"), "soh2"] *= 2 

# Calculate average SOH and last odometer reading for each VIN
soh_per_vehicle = (
    tss
    .reset_index(drop=True)
    .groupby("vin")
    .agg({
        "soh2": "mean",
        "soh": "mean",
        "odometer": "last",
        "model": Series.mode,
        "date": "last",
        "estimated_range": "max",
    })
    .reset_index()
)

## Veisualization

In [None]:
# Create scatter plot
fig = (
    px.scatter(
        tss.dropna(subset=['odometer', 'soh']).eval("model_vin = model.astype('string') + vin"), #.query("soh > 70"),
        x="odometer",
        y="soh",
        color="model_vin",
        height=1000,
        title="Average State-of-Health (SoH) vs Mileage",
        trendline="ols",
        trendline_scope="overall",
    )
    .update_traces(line=dict(color='black', dash='dash'))
    #.update_layout(
    #    yaxis_scaleanchor="x",
    #    yaxis_scaleratio=1
    #)
)

fig.show()

In [None]:
from transform.raw_tss.main import get_raw_tss
raw_tss = get_raw_tss("mercedes-benz", force_update=False)
raw_tss.columns

In [None]:
tss.columns

In [None]:
tss.corr(numeric_only=True)["soh"].sort_values(ascending=False)

We can see that the soh estimation of the Vitos and Sprinters are off.  
Let's try to divide their default range by 2.  

In [None]:
# Instead of dividing the default range by 2 we multiply the soh by 2 to preserve the default range.
soh_per_vehicle.loc[soh_per_vehicle.eval("model == 'Sprinter' | model == 'Vito'"), "soh"] *= 2 
fig = px.scatter(soh_per_vehicle.query("model != 'vito' & soh > 70"),
    x="odometer",
    y="soh",
    trendline="ols",
    color="model",
    trendline_scope="overall"
)
fig.update_traces(line=dict(color='black', dash='dash'))

The resulting sohs follows the overall trend which makes a lot more sense than the previous results.  
We can assume that the informed default ranges in fleet info are wrong.

## Conclusion

Soh from estimated range seems promessing and could be used as our final resulsts to Ayvens.  
We would, however, need to improve the accuracy of the estimator.  