# Soh estimation using vehicles charging data

### Introduction:
We want to estimate the soh of the watea batteries.  
The vast majority of the electricity data lies in the charging periods.  
Therefore we will base our solution on charging data but it could in theory be used on discharging periods.  
### Vocabulary:
- charging point: Aggregated time series samples over `CHARGING_POINTS_GRP_BY_SOC_QUANTIZATION` defined in `watea_constant`
- `energy_added`: Energy received during a charging point.
- `default_100_soh energy_added`: `energy_added` of a battery with 100% soh.

### Assumptions:
Our main assumption is that: *a battery that requires less energy to gain a certain amount of soc than another battery has a lower soh*.  
Our second assumption is that: *The charges that were made at 3k odometer or less can be used to define the expected energy to gain a certain amount of soc for a 100% soh battery*.  

### Observations:
1.  The required energy to gain a certain amount of soc depends on multiple factors*.  
    **namely**:
    - voltage/soc
    - temperature
    - current
The relationship between the `energy_added` and the aforementioned factors is discontinous, forming different clusters of charging points.  
We call these clusters charging regimes as they are most likely representative of different charger types/brands and regimes (AC/DC and so on).

### Main idea:
We estimate the soh of a charging points as its `energy_added` divided by the `default_100_soh energy_added`.
The `default_100_soh energy_added` for a given charging point is estimated using Linear Regression.
Note: Ideally there would do one regressor per charging regime but here we implement only one regressor for one charging regime.

### Imports

In [None]:
import plotly.express as px
import pandas as pd
from pandas import DataFrame as DF

from core.plt_utils import plt_3d_df
from transform.watea.watea_config import *
from transform.watea.soh_estimation import *

## Extraction
First we extract all the charging points of the processed time series datset.  

In [None]:
charging_points = get_raw_charging_points()
charging_points

Here we can visualize the entirety (minus some outlier points) of the fleet's charging points.

In [None]:
plt_3d_df(charging_points, "soc", "current", "energy_added", color="temperature", colorscale="Rainbow", size=2.5)

## Preprocessing
We want to select a continious relation between input features(soc, voltage, current, ...) and target feature (energy added) regime/cluster of charging points.  
We do this by computing a `regime_separation_feature`.  
This feature is computed as prediction of a 6 deg polyfit of soc->voltage minus the actual voltage.  
This feature would deserve a small notebook of its own but we I am running out of time... sorry :).

In [None]:
preprocessed_charging_points = get_preprocessed_charging_points()

preprocessed_charging_points

In [None]:
plt_3d_df(preprocessed_charging_points, "regime_seperation_feature", "current", "energy_added", "temperature")

Using `regime_seperation_feature` and `current` we can easely isolate a dataset wirth a *fairly* continuous relationship between input and target features.  

In [None]:
plt_3d_df(preprocessed_charging_points, "regime_seperation_feature", "current", "energy_added", "to_use_for_soh_estimation")

In [None]:
plt_3d_df(preprocessed_charging_points.query("to_use_for_soh_estimation"), "regime_seperation_feature", "current", "energy_added", "temeperature")

Note: This method does make us loose the ability to estimate the soh for certain vehicles.  

## Soh estimation
Now that we have a dataset we can use to estimate the `energy_added` based off our features, `regime_seperation_feature` included(this *might* be useless/counterproductive), we can start fittinng a model to it.  
In our case this is a simple polyfit.  
The fitting is performed in two parts:
1. We fit the model to the entirety of the dataset.
1. We adjust the models intercept to the `default_100_soh energy_added` points to estimate the actual `default_100_soh energy_added`.

In [None]:
processed_cluster = get_processed_cluster()

processed_cluster

In [None]:
px.scatter(processed_cluster, 'odometer', 'soh', 'id', trendline="ols", trendline_scope="overall")

## estimation by charges
To see it clearer, we aggregate the points by charge.  

In [None]:
charges = get_soh_per_charges()

In [None]:
px.scatter(charges, 'odometer', 'soh', 'id', trendline="ols", trendline_scope="overall")

## Estimation evaluation

Here we visualize the estimated `default_100_soh energy_added` across the soh estimation features.

In [None]:
plt_3d_df(processed_cluster, "energy_added", "odometer", "general_energy_added", color="temperature")

We can see that there is one particular charge that has a very large spread of soh values.  
Here we take a look at one specific battery(`bob432`) to try to interpret noisy soh estimations.  
This is a very "minimalist" evaluation...  

In [None]:
px.box(processed_cluster.query("id == 'bob432'"), x='odometer', y='soh', color='id')

It seems like these low soh charging points are in a much lower `current` region than the rest.  
It might be worth checking the rest of the batteries to see if they all have abnormally lower soh in this `current` region.  

In [None]:
plt_3d_df(processed_cluster.query("id == 'bob432'"), "voltage", "current", "soh", color="odometer", colorscale="Rainbow", size=2.5)

## Conclusion

The soh estimation is the best implemented so far.  
It could be used to value the data using other insights such as the difference of energy required to charge the battery over a range of temperatures.  
There is a lot of room for improvements.  
Once those improvements are made an estimation of the causes of soh loss seems feasable.  