# Correlation Check

The goal is to ensure that the calculated SoH (State of Health) values are not correlated with the measured SoC (State of Charge):
- at the start of the charging session,
- at the end of the charging session,
- as the average over the entire charging session.

This ensures that the SoH estimation remains independent of the battery's charge level.

## Load data

In [None]:
from core.s3.s3_utils import S3Service
from core.spark_utils import create_spark_session
import os
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd

In [None]:
spark_session = create_spark_session(os.environ.get('S3_KEY'), os.environ.get('S3_SECRET'))

In [None]:
s3 = S3Service()


#### result phases

In [None]:
#s3.read_parquet_df_spark(spark_session, )
result_phases_bmw = s3.read_parquet_df_spark(spark_session, "result_phases/result_phases_bmw.parquet").toPandas()   
result_phases_mercedes = s3.read_parquet_df_spark(spark_session, "result_phases/result_phases_mercedes_benz.parquet").toPandas()
result_phases_tesla = s3.read_parquet_df_spark(spark_session, "result_phases/result_phases_tesla_fleet_telemetry.parquet").toPandas()

## Correlation SoH/soc

In [None]:
corr  = result_phases_tesla.corr(numeric_only=True)
selected_column = "SOH"
selected_corr = corr[[selected_column]].sort_values(by=selected_column, ascending=False)

# heat map of the correlation matrix
px.imshow(selected_corr, title=f"Correlation Matrix for {selected_column}")

In [None]:
px.scatter(result_phases_tesla, x="SOH", y="SOC_DIFF", )