# SOH estimation EDA
The goal of this notebook is to find a way/lead to compute the soh for tesla vehicles.  
We are using the data received from personal API, not to be mistaken with the fleet telematic API which is what we will eventually, one day maybe,I hope... use.

## Setup

### Imports

In [None]:
from ydata_profiling import ProfileReport
import plotly.express as px
import pandas as pd
from pandas import DataFrame as DF
from pandas import Series

from analysis.tesla.tesla_fleet_info import get_fleet_info
from analysis.tesla.tesla_raw_tss import get_raw_tss
from analysis.tesla.tesla_constants import *
from core.time_series_processing import compute_cum_integrals_of_current_vars
from core.pandas_utils import floor_to, uniques_as_series, series_start_end_diff
from core.plt_utils import plt_3d_df

### Data extraction

In [None]:
fleet_info = get_fleet_info()
raw_tss = get_raw_tss()
raw_tss.loc[:, ["model", "default_capacity"]] = fleet_info.loc[raw_tss["vin"], ["model", "default_kwh_energy_capacity"]].values # Use .values so that pandas ignores the index

## Raw time Series analysis

### Visualization
Let's view some time series to check that everything seems normal.

In [None]:
vins = uniques_as_series(raw_tss["vin"]).sample(n=4)
raw_tss_to_plot = raw_tss.set_index("vin", drop=False).loc[vins]
fig = px.scatter(raw_tss_to_plot, x="date", y="battery_level", facet_col="vin", facet_col_wrap=1)
fig.update_layout(height=1000)

In [None]:
fig = px.scatter(raw_tss_to_plot, x="date", y="power", facet_col="vin", facet_col_wrap=1)
fig.update_layout(height=1000)

In [None]:
fig = px.scatter(raw_tss_to_plot, x="date", y="charger_power", facet_col="vin", facet_col_wrap=1)
fig.update_layout(height=1000)

We can see that the data is there but that it is fairly sparse.  

### Dataset skewness


skewness over vins:

In [None]:
vins_stats = raw_tss["vin"].value_counts().sort_values(ascending=False).to_frame()
vins_stats[["model", "default_kwh_energy_capacity"]] = fleet_info.loc[vins_stats.index, ["model", "default_kwh_energy_capacity"]]
px.pie(vins_stats, values="count", names="model")

The number of raws per model is very skewed.  
We will try to implement a solution to ahandle all models but this might end up being possible for the most common models.

## Raw ts processing

In [None]:
# process raw tss

def process_ts(raw_ts:DF) -> DF:
    return (
        raw_ts
        .rename({
            "battery_level": "soc",
        })
        .sort_values(by="date")
        .assign(
            soc_idx=raw_ts["battery_level"].transform(lambda soc: soc.ffill().diff().ne(0).cumsum()),
            ffilled_odometer=raw_ts["odometer"].ffill(),
            floored_soc=floor_to(raw_ts["battery_level"].ffill(), CHARGING_POINTS_GRP_BY_SOC_QUANTIZATION)
        )
        .pipe(compute_cum_integrals_of_current_vars)
    )

tss:DF = raw_tss.groupby("vin").apply(process_ts)

## Energy distribution
We will try to implement an soh estimation similar to the one we used for watea.  

### Discharge energy distribution
For now we will focus on only the most common model.  

In [None]:

discharge_points = (
    tss
    .query("power > 0 & model == 'Model 3 Rear-Wheel Drive'")
    .groupby(["vin", ""])
    .agg(
        odometer=pd.NamedAgg("ffilled_odometer", "mean"),
        energy_added=pd.NamedAgg("cum_energy", series_start_end_diff),
        voltage=pd.NamedAgg("voltage", "median"),
        # current=pd.NamedAgg("current", "median"),
        temperature=pd.NamedAgg("temp", "median"),
        sec_duration=pd.NamedAgg("date", lambda s: series_start_end_diff(s).total_seconds()),
        date=pd.NamedAgg("date", lambda s: s.iat[0]),
        soc=pd.NamedAgg("floored_soc", "mean"),
        # estimated_range=pd.NamedAgg("ffilled_estimated_range", "median"),
        estimated_range_diff=pd.NamedAgg("ffilled_estimated_range", series_start_end_diff),
    )
)

In [None]:
print(*raw_tss.columns, sep="\n")

## 