# SOH estimation evaluation
The goal of this notebook is to programmatically evaluate the estimation of the SoH of the tesla vehicles.  
The estimation will be evaluated with the followiing factors:  
- scipy.stats.linregress sign.  
- scipy.stats.linregress stderr.  
- variance of soh estimation within a single charge.

## Setup

### Data cache directory creation

In [None]:
! mkdir -p data_cache
! mkdir -p data_cache/plots

### Imports

In [None]:
import plotly.express as px
from core.pandas_utils import *
from transform.processed_tss.tesla_processed_tss import get_processed_tss

### Data extraction

#### First soh estimation

In theory, we would express the soh as `soh = total_energy_currently_storable / total_energy_originally_storable`.  
But, we usually don't have recordings of the battery at 100% soc so we express it as `soh = (energy_currently_stored / soc) / (original_energy_storable / 100)`.  
But, we don't have the `energy_currently_stored` so we express it as `energy_currently_stored = battery_range * charge_energy_added / charge_miles_added_ideal`.  
All of these work arounds result in the following steps:
```
energy_by_range_added = charge_energy_added / charge_miles_added_ideal 
range_by_soc = battery_range / soc
energy_by_soc = energy_by_range_added * range_by_soc
soh = energy_by_soc / (default_kwh_capacity / 100)
```

While this is the "perfect" way to calculate the soh from a "physics" point of view, we might be able to shortcut some calculation steps.  
For example, dividing energy_by_range_added by `orignal_range / original_capacity`.  

In [None]:
tss = (
    get_processed_tss(force_update=False)
    .eval("energy_by_range_added = charge_energy_added / charge_miles_added_ideal")
    .eval("range_by_soc = battery_range / soc")
    .eval("energy_by_soc = energy_by_range_added * range_by_soc")
    .eval("first_soh = energy_by_soc / (capacity / 100)")
    .eval("range_by_capacity = range / capacity")
    .eval("simple_soh = energy_by_range_added / range_by_capacity")
)

In [None]:
px.box(
    tss.query("in_charge & soc > 20"),
    x="odometer",
    y="first_soh",
    color="vin",
    title="First SoH estimation",
)

In [None]:
tss.columns

In [None]:
energy_added_in_charge = (
    tss
    .query("in_charge_perf_mask")
    .groupby(["vin", "in_charge_perf_idx"])
    .agg(
        energy_added=pd.NamedAgg("charge_energy_added", series_start_end_diff),
        soc_diff=pd.NamedAgg("soc", series_start_end_diff),
        soc_start=pd.NamedAgg("soc", "first"),
        soc_end=pd.NamedAgg("soc", "last"),
        temp=pd.NamedAgg("inside_temp", "mean"),
        capacity=pd.NamedAgg("capacity", "first"),
        odometer=pd.NamedAgg("odometer", "first"),
        fast_charger_type=pd.NamedAgg("fast_charger_type", Series.mode),
        size=pd.NamedAgg("soc", "size"),
        model=pd.NamedAgg("model", "first"),
        version=pd.NamedAgg("version", "first"),
    )
    .reset_index(drop=False)
    .eval("soh = energy_added / (soc_diff / 100 * capacity)")
)

In [None]:
energy_added_in_charge.groupby('vin')[["model", "version"]].first()

In [None]:
energy_added_in_charge.groupby('vin')[["soh", "soc_diff", "capacity"]].count()

In [None]:
px.scatter(
    energy_added_in_charge,
    x="odometer",
    y="soh",
    color="vin",
    title="SoH estimation in charge",
)

In [None]:
px.box(
    energy_added_in_charge,
    x="fast_charger_type",
    y="soh",
    color="fast_charger_type",
    title="SoH estimation in charge",
)

In [None]:
px.scatter(
    energy_added_in_charge.query("fast_charger_type == 'Combo'"),
    x="odometer",
    y="soh",
    color="vin",
    trendline="ols",
    trendline_scope="overall",
    title="SoH estimation in charge",
)

In [None]:
display(energy_added_in_charge["fast_charger_type"].value_counts())
display(energy_added_in_charge.groupby("fast_charger_type")["vin"].count())

In [None]:
px.box(
    energy_added_in_charge,
    x="size",
    y="soh",
    #color="vin",
    title="SoH estimation in charge",
)

In [None]:
energy_added_in_charge[["vin"]].value_counts(ascending=False)

In [None]:
px.scatter(
    energy_added_in_charge,
    x="soc_diff",
    y="soh",
    color="vin",
    title="SoH estimation in charge",
)

In [None]:
px.scatter(
    energy_added_in_charge,
    x="soc_diff",
    y="energy_added",
    color="vin",
    title="SoH estimation in charge",
)

In [None]:
px.scatter(
    energy_added_in_charge.query("size >= 3 & size <= 15 & soc_diff > 20"),
    x="odometer",
    y="soh",
    color="vin",
    trendline="ols",
    trendline_scope="overall",
    title="SoH estimation in charge",
)

In [None]:
energy_added_in_charge.groupby('vin')["soh"].count()

In [None]:
energy_added_in_charge.query("size > 30 & soc_diff > 20"),


In [None]:
px.scatter(
    tss,
    x="soc",
    y="simple_soh",
    color="vin",
    title="SoH estimation in charge",
)

In [None]:
soh_energy_added_per_vehicle = (
    energy_added_in_charge
    .groupby("vin")
    .agg(
        soh=pd.NamedAgg("soh", "mean"),
        odometer=pd.NamedAgg("odometer", "last"),
        model=pd.NamedAgg("model", "first"),
        version=pd.NamedAgg("version", "first"),
    )
    .reset_index(drop=False)
    .eval("model_version = model + version")
)

In [None]:
fig = px.scatter(
    soh_energy_added_per_vehicle,
    x="odometer",
    y="soh",
    color="vin",
    title="SoH per odometer",
    labels={"odometer": "Odometer (km)", "soh": "SoH(%)", "vin": "Vin"},
)
fig.write_html("data_cache/plots/soh_per_odometer.html")
fig.show()

In [None]:
px.box(
    soh_energy_added_per_vehicle,
    x="model_version",
    y="soh",
    color="model_version",
    title="SoH estimation in charge",
)