# Filtering soh results
The goal of this notebook is to show how we filter out soh resutlts that are not valid.  
As of writing this(2024-11-26) all the critirions are arbitrary.

## Setup

### Imports

In [None]:
import plotly.express as px

from core.pandas_utils import *
from transform.results.config import *
from transform.results.tesla_results import get_results

### Data extraction

The results will be used to fill the vehicle_data table so we will format them to match the expected frequency.

In [None]:
results = (
        get_results()
        .assign(date=lambda df: df["date"].dt.floor(UPDATE_FREQUENCY))
        .groupby(["vin", "date"])
        .agg({
            "odometer": "last",    
            "soh": "median",
        })
        .reset_index(drop=False)
)

Let's visualize the raw results.

In [None]:
px.scatter(results, x="odometer", y="soh", color="vin")

It's pretty clear that som results are outliers.  
While this is *okay* for a statistical analysis we would prefet not to show them to our clients.  

## Filtering

To filter the results we will use the following criteria:
- SoH must be between 0.5 and 1.0
- SoH must be within a range defined by two slopes and intercepts.   
These slopes are themselves defined by two points A and B stored in `transform.results.config.VALID_SOH_POINTS`.   
A and B were chosen arbitrarily.  

In [None]:
def filter_results(results: DF) -> DF:
    logger.debug("Filtering results.")
    max_intercept, max_slope = intercept_and_slope_from_points(VALID_SOH_POINTS.xs("max", level=0, drop_level=True))
    print(f"max_intercept: {max_intercept}, max_slope: {max_slope}")
    min_intercept, min_slope = intercept_and_slope_from_points(VALID_SOH_POINTS.xs("min", level=0, drop_level=True))
    return (
        results
        .eval(f"max_valid_soh = odometer * {max_slope:f} + {max_intercept:f}")
        .eval(f"min_valid_soh = odometer * {min_slope:f} + {min_intercept:f}")
        .eval(f"soh_is_valid = soh <= max_valid_soh & soh >= min_valid_soh & soh > 0.5 & soh < 1.0")
        .pipe(debug_df, subset=["soh", "max_valid_soh", "min_valid_soh", "soh_is_valid"], logger=logger)
        .query("soh_is_valid")
        .dropna(subset=["soh", "odometer"], how="any")
    )

def intercept_and_slope_from_points(points: DF) -> tuple[float, float]:
    logger.debug(f"points:\n{points}")
    slope = (points.at["B", "soh"] - points.at["A", "soh"]) / (points.at["B", "odometer"] - points.at["A", "odometer"])
    intercept = points.at["A", "soh"] - slope * points.at["A", "odometer"]
    return intercept, slope

filtered_results = filter_results(results)
px.scatter(filtered_results, x="odometer", y="soh", color="vin", hover_data=["date"])