# SoH estimation evaluation
The goal of this notebook is to implement a function to evaluate our SoH estimations.  
> Note:  
> Our implementation is based on the assumption that the change of the SoH over the odometer should be linear.  
> This is because the maximum duration of data that would have is short enough to simplify the evolution of SoH over odometer to a linear slope.   

The evaluation will have trhee scores:
-   std relative to the slope (per vehicle): 
    1. We compute the LR(linear regression) of SoH over odometer and
    1. We compute the difference of the soh points from the slope
    1. We compute the std of differences
-   The residual of the slope to a "base slope", this is the average soh slope of SoH lost per km traveled.  
-   The residual to physical SoH estimations sent by Ayvens

## Setup

### Imports

In [None]:
import warnings

import plotly.express as px

from core.pandas_utils import *
from core.stats_utils import *
from transform.raw_results.tesla_results import get_results
from transform.raw_results.get_tesla_soh_readouts import get_aviloo_soh_readouts

### Data extraction

In [None]:
results = get_results()

## Evaluation

In [None]:
warnings.filterwarnings("ignore", message="invalid value encountered in subtract")

BASE_SLOPE = 0.8 / 1e4 # base soh loss per kilometer

def evaluate_esimtations(results:DF, soh_cols:list[str]) -> DF:
    # This is an ugly pd.concat call but it's the first solution I found :-)
    return pd.concat(
        {soh_col: evaluate_single_estimation(results, soh_col, get_aviloo_soh_readouts()) for soh_col in soh_cols},
        axis="columns"
    ).T

def evaluate_single_estimation(results:DF, soh_col:DF, aviloo_soh_readouts:DF) -> DF:
    lr_params:DF = (
        results
        .groupby("vin")
        .apply(lr_params_as_series, "odometer", soh_col, include_groups=False)
        .reset_index(drop=False)
    )
    return Series({
        "soh_to_lr_std": (
            results
            .merge(lr_params, "left", "vin")
            .eval(f"intercept + odometer * slope - {soh_col}")
            .std()
        ),
        "abs_soh_residual": (
            results
            .groupby("vin")
            .agg({soh_col:"median"})
            .merge(aviloo_soh_readouts, "left", "vin")
            .eval(f"soh_readout - {soh_col}")
            .abs()
            .mean()
        ),
        "abs_diff_to_base_trendline": (
            lr_params
            .eval("slope - @BASE_SLOPE")
            .abs()
            .mean()
        ),
    })

evaluations = (
    results
    .assign(random_soh=np.random.rand(len(results)))
    .pipe(evaluate_esimtations, ["soh", "random_soh"])
)
evaluations

## Conclusion
This is a first implementation, and it's somewhat rough around the edges.  
As of right now, the only way to test the function I thought of was to compare the evaluation with a random estimation.  
Once we start testing new estimation methods, we should be able to (ironicaly) also test the evaluation method in more depth.