# estimated ramge SOH
The goal of this notebook is to compute the soh estimated_range, charge_energy_added and soc.

## Setup

### Imports

In [None]:
import scipy.interpolate as interpolate
import scipy.optimize as optimize
import pandas as pd
from pandas import DataFrame as DF
from pandas import Series
import plotly.express as px
import numpy as np

from transform.tesla.tesla_fleet_info import get_fleet_info
from transform.tesla.tesla_processed_tss import get_processed_tss
from transform.tesla.tesla_config import *
from core.pandas_utils import floor_to, uniques_as_series, series_start_end_diff
from core.plt_utils import plt_3d_df

### Data extraction

In [None]:
fleet_info = get_fleet_info()
tss = get_processed_tss()

Let's check the sparcity of the data we will need to estimate the SOH:

In [None]:
tss[["charge_miles_added_ideal", "charge_energy_added", "battery_range", "soc"]].count() / len(tss)

Great, we won't need to do any further preprocessing.

In [None]:
tss = tss.query("model == 'Model 3 Rear-Wheel Drive'")

Also we will take a look at the most common model to make sure that we are doing "apple to apple" comparaisons.  

## SOH estimation

In theory, we would express the soh as `soh = total_energy_currently_storable / total_energy_originally_storable`.  
But, we usually don't have recordings of the battery at 100% soc so we express it as `soh = (energy_currently_stored / soc) / (original_energy_storable / 100)`.  
But, we don't have the `energy_currently_stored` so we express it as `energy_currently_stored = battery_range * charge_energy_added / charge_miles_added_ideal`.  
All of these work arounds result in the following steps:
```
energy_by_range_added = charge_energy_added / charge_miles_added_ideal 
range_by_soc = battery_range / soc
energy_by_soc = energy_by_range_added * range_by_soc
soh = energy_by_soc / (default_kwh_capacity / 100)
```

While this is the "perfect" way to calculate the soh from a "physics" point of view, we might be able to shortcut some calculation steps.  
For example, dividing energy_by_range_added by `orignal_range / original_capacity`.  

### Variables calculations

In [None]:
tss:DF = (
    tss
    .eval("energy_by_range_added = charge_energy_added / charge_miles_added_ideal ")
    .eval("range_by_soc = battery_range / soc")
    .eval("energy_by_soc = energy_by_range_added * range_by_soc")
    .eval("soh = energy_by_soc / (default_capacity / 100)")
)
tss["vin"] = tss.index.get_level_values(0)

For some reasons, the Calculation introduces infinite soh values.  
Those values will introduce error down the line so we will remove their lines.  


In [None]:
is_inf_mask = np.isinf(tss["energy_by_soc"].values)
print(f"nb inf soh values: {sum(is_inf_mask)}")
tss:DF = tss[~is_inf_mask]

### SOH Visualization

In [None]:
px.scatter(tss.sample(frac=0.5), x="odometer", y="soh", color='vin')

By looking at the soh over soc we can see that the variance of the soh is inversly proportional to the soc.  

In [None]:
px.scatter(tss.sample(frac=0.5), x="soc", y="soh", color='vin')

We can also see that the soh variance is (suprisingly) a lot higher during charging.

In [None]:
px.box(tss.sample(frac=0.5), x="soc", y="soh", color='in_charge')

Let's take a look at time series separatly.

In [None]:
vins_to_plot = uniques_as_series(tss["vin"]).sample(n=3)
fig = (
    px.scatter(
        tss.loc[vins_to_plot],
        x="date",
        y="soh",
        facet_col="vin",
        facet_col_wrap=1,
        color="in_charge"
    )
    .update_yaxes(matches=None)
    .update_xaxes(matches=None)
    .update_layout(height=1000)
)
fig

### energy_by_range_added visualization
Let's look at the `energy_by_range_added` alone.

In [None]:
px.scatter(
    tss.query("in_charge & soc > 50"),
    x="odometer",
    y="energy_by_range_added",
    color="vin",
    trendline="ols",
    trendline_scope="trace",
    opacity=0.6,
)

There doesn't seem to be any evolution of soh over 55k odometer diff so it is safe to assume that the `energy_by_range_added` cannot be used alone to skip some steps in the soh calculation.  

### range_by_soc visualization

In [None]:
px.scatter(
    tss.query("in_discharge & soc > 50"),
    x="odometer",
    y="range_by_soc",
    color="vin",
    trendline="ols",
    trendline_scope="trace",
    opacity=0.6,
)

Let's compare this to `soh`.

In [None]:
px.scatter(
    tss.query("in_discharge & soc > 50"),
    x="odometer",
    y="soh",
    color="vin",
    trendline="ols",
    trendline_scope="trace",
    opacity=0.6,
)

There does not seem to be any difference in the shape of the two varialbes so we might be able to skip the last step of the soh calculation and express it as `soh = range_by_soc / (original_range / 100)`.