# Estimation du SoH - KIA


### Méthodologie validée

La méthode retenue consiste à :

1. **Calcul initial du SoH** : Utilisation de la formule basée sur la capacité disponible et le niveau de charge (SoC)
   $$\text{Capacity}_{estimated} = \frac{\text{BatteryRemain\_Value}}{\text{BatteryRemain\_Ratio}/100}$$

2. **Correction de la dépendance au SoC** : Ajustement du SoH par régression linéaire par modèle/version pour éliminer le biais lié au niveau de charge

### Next steps

- Il y'a des groupes de SoH calculé a environ 0.05 d'écart au sein d'un même véhicule, aujourd'hui l'identification de la raison de ces groupes n'est pas encore fait. 
- Quelque point sur les phases de décharge sont identifier comme charges.

Cette méthodologie peut être utilisée pour le suivi du vieillissement des batteries de la flotte Kia.


# Import des données

In [None]:
import numpy as np
import plotly.express as px

from core.pandas_utils import *
from core.s3.s3_utils import S3Service
from core.s3.settings import S3Settings
from core.spark_utils import create_spark_session
from core.sql_utils import *
from core.stats_utils import *
from transform.fleet_info.main import fleet_info

# Configuration
settings = S3Settings()
spark = create_spark_session(
    settings.S3_KEY,
    settings.S3_SECRET
)
s3 = S3Service()
company = "kia"

In [None]:
raw_tss = s3.read_parquet_df_spark(
    spark, "raw_ts/kia/time_series/raw_ts_spark.parquet", exclude_columns=["Vehicle_Green_BatteryManagement_BatteryConditioning", 
    "Vehicle_Green_BatteryManagement_BatteryPreCondition_Status",
    "Vehicle_Green_ChargingInformation_ConnectorFastening_State",
    "Vehicle_Green_ChargingInformation_DTE_TargetSoC_Quick",
    "Vehicle_Green_ChargingInformation_DTE_TargetSoC_Standard",
    "Vehicle_Green_ChargingInformation_EstimatedTime_ICCB",
    "Vehicle_Green_ChargingInformation_EstimatedTime_Standard",
    "Vehicle_Green_ChargingInformation_ExpectedTime_EndDay",
    "Vehicle_Green_ChargingInformation_ExpectedTime_EndHour",
    "Vehicle_Green_ChargingInformation_ExpectedTime_EndMin",
    "Vehicle_Green_ChargingInformation_ExpectedTime_StartDay",
    "Vehicle_Green_ChargingInformation_ExpectedTime_StartHour",
    "Vehicle_Green_ChargingInformation_ExpectedTime_StartMin",
    "Vehicle_Green_ChargingInformation_SequenceDetails",
    "Vehicle_Green_ChargingInformation_SequenceSubcode",
    ]
).toPandas()

In [None]:
processed_phase = s3.read_parquet_df_spark(
    spark, "processed_phases/processed_phases_kia.parquet"
).toPandas()


In [None]:
raw_tss.rename(columns={'Vehicle_Green_BatteryManagement_BatteryCapacity_Value': "max_energy_battery",
       #'Vehicle_Green_BatteryManagement_BatteryConditioning': "flag_battery_conditioning",
       #'Vehicle_Green_BatteryManagement_BatteryPreCondition_Status': "preconditionning_status",
       'Vehicle_Green_BatteryManagement_BatteryPreCondition_TemperatureLevel': "TemperatureLevel",
       'Vehicle_Green_BatteryManagement_BatteryRemain_Ratio': "soc",
       'Vehicle_Green_BatteryManagement_BatteryRemain_Value': "available_battery_capacity",
       'Vehicle_Green_BatteryManagement_SoH_Ratio': "soh_oem",
       'Vehicle_Green_ChargingInformation_Charging_RemainTime': "remaining_time_charging",
       #'Vehicle_Green_ChargingInformation_ConnectorFastening_State': 'charger_connection_state',
       #'Vehicle_Green_ChargingInformation_DTE_TargetSoC_Quick': "estimated_DTE_fast_charger_DC",
       #'Vehicle_Green_ChargingInformation_DTE_TargetSoC_Standard': "estimated_DTE_slow_charger_AC",
       'Vehicle_Green_ChargingInformation_ElectricCurrentLevel_State': "electric_current_level", # 0 default, 1 Max, 2 decresing, 3 min 
       #'Vehicle_Green_ChargingInformation_EstimatedTime_ICCB': "charging_remaining_time_120V",
       'Vehicle_Green_ChargingInformation_EstimatedTime_Quick': "remaining_time_fast_charging",
       # 'Vehicle_Green_ChargingInformation_EstimatedTime_Standard': "charging_remaining_time_240V",
       # 'Vehicle_Green_ChargingInformation_ExpectedTime_EndDay': "ExpectedTime_EndDay",
       # 'Vehicle_Green_ChargingInformation_ExpectedTime_EndHour': "ExpectedTime_EndHour  ",
       # 'Vehicle_Green_ChargingInformation_ExpectedTime_EndMin': "ExpectedTime_EndMin",
       #'Vehicle_Green_ChargingInformation_ExpectedTime_StartDay': "ExpectedTime_StartDay",
       # 'Vehicle_Green_ChargingInformation_ExpectedTime_StartHour': "ExpectedTime_StartHour",
       # 'Vehicle_Green_ChargingInformation_ExpectedTime_StartMin': "ExpectedTime_StartMin",
       # 'Vehicle_Green_ChargingInformation_SequenceDetails': "SequenceDetails", # Provides 'charge/discharge operation status' # 0: Not Charging # 1: High Voltage Applied # 2: Slow Connector # Connected (AC) # 3: Fast Connector # Connected (DC) # 4: V2L Connector Connected # 5: Wireless Charging # Connector Connected # 6: Charging Waiting # (Scheduled) # 7: Scheduled Charging # (Start) # 8: Normal Charging (Start) # 9: V2L (Start) # 10: Wireless Charging (Start) # 11: V2G (Start) # 12: Normal End # 13: Charging Failed (AC) # 14: Charging Failed (DC) # 15: V2L/V2G Failed # 501: Not Charging # 510: Invalid
       # 'Vehicle_Green_ChargingInformation_SequenceSubcode': "SequenceSubcode", # detailed information about the charging sequence
        'Vehicle_Green_PowerConsumption_Prediction_Climate': "Climate_power_consumption",
       "Vehicle_Drivetrain_Odometer": "odometer"}, inplace=True)
raw_tss.columns

In [None]:
raw_tss["max_energy_battery_kwh"] = raw_tss["max_energy_battery"] / 3600
raw_tss["available_battery_capacity_kwh"] = raw_tss["available_battery_capacity"] / 3600

In [None]:
raw_tss.columns

In [None]:

with get_connection() as con:
    cursor = con.cursor()
    cursor.execute("""
        SELECT vm.model_name, vm.type, vm.autonomy, v.vin, b.net_capacity 
        FROM vehicle v 
        LEFT JOIN vehicle_model vm ON v.vehicle_model_id = vm.id
        LEFT JOIN battery b ON b.id = vm.battery_id
        LEFT join make m on m.id = vm.make_id
        WHERE m.make_name = 'kia'
    """)
    dbeaver_df = cursor.fetchall()
    dbeaver_df = pd.DataFrame(
        dbeaver_df, 
        columns=[desc[0] for desc in cursor.description]
    )


In [None]:

nombre_vin_uniques = raw_tss["vin"].nunique()
print(f"Le nombre de VIN différents dans tss est : {nombre_vin_uniques}")


In [None]:
# Récupération des informations des véhicules depuis la base de données
with get_connection() as con:
    cursor = con.cursor()
    cursor.execute("""
        SELECT vm.model_name, vm.type, vm.autonomy, v.vin, b.net_capacity 
        FROM vehicle v 
        LEFT JOIN vehicle_model vm ON v.vehicle_model_id = vm.id
        LEFT JOIN battery b ON b.id = vm.battery_id
        LEFT join make m on m.id = vm.make_id
        WHERE m.make_name = 'kia'
    """)
    dbeaver_df = cursor.fetchall()
    dbeaver_df = pd.DataFrame(
        dbeaver_df, 
        columns=[desc[0] for desc in cursor.description]
    )

In [None]:
# Fusion des données de véhicules avec les time series
raw_tss = raw_tss.merge(dbeaver_df, on="vin", how="left")

# Fusion avec les phases traitées
raw_tss_with_phase = raw_tss.merge(processed_phase, left_on='vin', right_on='VIN', how='left')
raw_tss_with_phase = raw_tss_with_phase[raw_tss_with_phase['date'].between(raw_tss_with_phase['DATETIME_BEGIN'], raw_tss_with_phase['DATETIME_END'])]

In [None]:
raw_tss.columns

# 1. Analyse exploratoire des séries temporelles


In [None]:
most_common_vin = raw_tss_with_phase.groupby("vin").size().sort_values(ascending=False).idxmax()
print(f"VIN sélectionné : {most_common_vin}")
ts = raw_tss_with_phase.query(f"vin == '{most_common_vin}'")

## 1.1 Visualisation de la capacité disponible par phase



In [None]:
px.scatter(ts, x="date", y="available_battery_capacity_kwh", title=f"{most_common_vin}", color="PHASE_STATUS", hover_data=["PHASE_INDEX"])

## 1.2 Visualisation du SoC par phase


In [None]:
px.scatter(ts, x="date", y="soc", title=f"{most_common_vin}", color="PHASE_STATUS")

### 1.3 Visualisation de la capacité maximale par véhicule



In [None]:
px.scatter(raw_tss, x="date", y="max_energy_battery_kwh", color="vin")

# 2. Estimation du SoH (State of Health)



## 2.1 Méthode de calcul du SoH

**Formule utilisée :**

$$\text{Capacity}_{estimated} = \frac{\text{BatteryRemain\_Value}}{\text{BatteryRemain\_Ratio}/100}$$

$$\text{SoH} = \frac{\text{Capacity}_{estimated}}{\text{Capacity}_{max}}$$



In [None]:
raw_tss.columns

In [None]:
raw_tss["soh"] = ( raw_tss["available_battery_capacity_kwh"] / (raw_tss["soc"] / 100)) / raw_tss["net_capacity"].astype(float)
raw_tss_with_phase["soh"] = (raw_tss_with_phase["available_battery_capacity_kwh"] / (raw_tss_with_phase["soc"] / 100)) / raw_tss_with_phase["net_capacity"].astype(float)
ts["soh"] = ( ts["available_battery_capacity_kwh"] / (ts["soc"] / 100)) / ts["net_capacity"].astype(float)

## 2.2 Visualisation du SoH en fonction du SoC

**Conclusion** : Plus le SoC augmente plus le SoH augmente. 


In [None]:
px.scatter(raw_tss_with_phase, x="soc", y="soh", color="vin")


In [None]:
corr_ = raw_tss_with_phase.select_dtypes(include=['float64', 'int64']).corr()

# Arrondir les valeurs
mat = corr_[['soh']].dropna().sort_values(
    by='soh', ascending=False
).round(2)

fig = px.imshow(
    mat,
    text_auto=True,      
    aspect="auto",        
    width=800,            
    height=1000
)

fig.update_layout(
    font=dict(size=16)
)

fig.show()

# 3. SoH dépendance

## 3.1 Ajustement du SoH pour corriger la dépendance au SoC

**Problème identifié :** Le SoH estimé présente une dépendance au SoC, ce qui peut biaiser les résultats.

**Solution :** Ajustement du SoH en utilisant une régression linéaire par model/version pour corriger cette dépendance.

**Méthode :**
1. Calcul d'une régression linéaire SoH ~ SoC pour chaque model/type
2. Ajustement du SoH en divisant par la valeur prédite par la régression
3. Utilisation d'un minimum ajusté pour éviter les valeurs aberrantes


In [None]:
raw_tss_bis = raw_tss_with_phase.copy()

from sklearn.linear_model import LinearRegression

def find_equation(group):
    """Trouve l'équation de régression linéaire SoH ~ SoC pour un véhicule"""
    group = group.dropna(subset=['soc', 'soh'])
    X = group[['soc']]
    y = group['soh']
    if len(X) > 1:
        model = LinearRegression()
        model.fit(X, y)
        return model.coef_[0], model.intercept_
    else:
        return 0, 0
    

# Calcul des coefficients de régression par véhicule
adjusted_coef = raw_tss_bis.groupby(['MODEL', 'VERSION']).apply(find_equation).apply(pd.Series)
adjusted_coef.columns = ['coef', 'intercept']

# Fusion des résultats
raw_tss_bis = raw_tss_bis.merge(adjusted_coef, on=['MODEL', 'VERSION'], how='left')

# Calcul du SoH ajusté
raw_tss_bis['soh_updated'] = (
    raw_tss_bis['soh'] / (raw_tss_bis['coef'] * raw_tss_bis['soc'] + raw_tss_bis['intercept'])
).where(raw_tss_bis['coef'] != 0, raw_tss_bis['soh'])


In [None]:
raw_tss_bis.dropna(subset=['soh']).vin.nunique()

In [None]:
px.scatter(raw_tss_bis, x="odometer", y="soh_updated", color="vin", trendline="ols", trendline_scope="overall")

In [None]:
corr_ = raw_tss_bis[['soh_updated', 'soc']].select_dtypes(include=['float64', 'int64']).corr()
px.imshow(corr_[['soh_updated', 'soc']].dropna().sort_values(by='soh_updated', ascending=False))


## 3.2 Étude de la corrélation entre SoH et electric_current_level

- Ne semble pas avoir un impact direct



In [None]:
import pandas as pd
from scipy import stats
from scipy.stats import pearsonr, spearmanr, kruskal
import plotly.graph_objects as go
from plotly.subplots import make_subplots

df_analysis = raw_tss[['soh', 'electric_current_level', 'vin', 'soc']].copy()

df_analysis = df_analysis.dropna(subset=['soh', 'electric_current_level'])

level_mapping = {
    0: 'Default',
    1: 'Max',
    2: 'Decreasing',
    3: 'Min'
}
df_analysis['electric_current_level_label'] = df_analysis['electric_current_level'].map(level_mapping)

print(f"Nombre total d'observations: {len(df_analysis)}")
print(df_analysis['electric_current_level_label'].value_counts())


In [None]:
px.scatter(df_analysis, x="electric_current_level", y="soh", )

## 3.3 Analyse du SoH par phase (PHASE_STATUS)

**Resultats :** On a des valeurs uniquement en décharges donc pas de relation. Les quelque points de charging sont des points de décharges mal identifiés du à une légère remontée du SoC lors de la décharge.


In [None]:
px.scatter(ts, x="odometer", y="soh", color="PHASE_STATUS")


## 3.4 Analyse du Preconditionning_status

**Reusltats**: Pas de corrélation pour le préconditionning

In [None]:
raw_tss_ter = raw_tss_bis.copy()
raw_tss_ter['soh_updated'] = raw_tss_ter.groupby('VIN')['soh_updated'].ffill()

In [None]:

px.scatter(raw_tss_ter[raw_tss_ter["Vehicle_Drivetrain_FuelSystem_AverageFuelEconomy_Drive"] < 200].dropna(subset="Vehicle_Drivetrain_FuelSystem_AverageFuelEconomy_Drive"), x="Vehicle_Drivetrain_FuelSystem_AverageFuelEconomy_Drive", y="soh_updated", trendline="ols", trendline_scope="overall")


# 4 Méthode alternative : Utilisation uniquement des points de charge complète

**Approche :** 
- Filtrer uniquement les points où la batterie est proche de la charge complète (SoC > 95%)
- Ces points sont plus fiables pour estimer la capacité réelle


In [None]:
# Identifier les points où la batterie est proche de la charge complète
# Ces points sont plus fiables pour estimer la capacité
raw_tss_full_charge = raw_tss_with_phase[
    (raw_tss_with_phase['soc'] > 95) &  # Charge presque complète
    (raw_tss_with_phase['net_capacity'].notna())
].copy()

# Calcul du SOH uniquement sur ces points
raw_tss_full_charge['soh_full_charge'] = (
    raw_tss_full_charge['available_battery_capacity_kwh'] / 
    (raw_tss_full_charge['soc'] / 100)
) / raw_tss_full_charge['net_capacity'].astype(float)

# Agrégation par véhicule (médiane pour robustesse)
soh_by_vin_full_charge = raw_tss_full_charge.groupby('vin')['soh_full_charge'].median()


## 4.1 Visualisation du SoH calculé sur les points de charge complète



In [None]:
px.scatter(raw_tss_full_charge, x="odometer", y="soh_full_charge", color="vin")


In [None]:
px.scatter(raw_tss_full_charge, x="soc", y="soh_full_charge", color="vin")


# 5 Aggregation a la Phase 

On a validé la méthode aec la regression par modèle.   

l'aggrégation par phase semble correct:
- la plupart des vin on une différence de SoH < 0.05 
- il y'a quelque valeurs abhérantes qui seront filtré par la suite


## 5.1 aggregation par phase

In [None]:
soh_phases = raw_tss_bis.groupby(["PHASE_INDEX", "VIN"], as_index=False).agg(
    soh = ("soh_updated", "median"),
    count = ("soh_updated", "count"),
    odometer = ("odometer", "max"),
    net_capacity = ("net_capacity", "first"),
    date = ("date", "max"),
)



In [None]:
px.scatter(soh_phases, x="odometer", y="soh", color="VIN")

## 5.2 Calcul de la variance


In [None]:
soh_variance_by_vin = soh_phases.groupby("VIN", as_index=False).agg(
    soh_variance=("soh", "var"),
    soh_std=("soh", "std"),
    soh_mean=("soh", "mean"),
    count=("soh", "count"),
    soh_median=("soh", "median"),
    soh_min=("soh", "min"),
    soh_max=("soh", "max"),
).eval("soh_diff = soh_max - soh_min ").reset_index()




In [None]:
px.scatter(soh_variance_by_vin, x="count", y="soh_variance", color="VIN")

In [None]:
px.scatter(soh_variance_by_vin, x="soh_median", y="soh_diff", color="VIN")

# 6. Agregation à la semaine

In [None]:
UPDATE_FREQUENCY = pd.Timedelta(days=7)


soh_phases["date"] = (
    pd.to_datetime(soh_phases["date"], format='mixed')
    .dt.floor(UPDATE_FREQUENCY)
    .dt.tz_localize(None)
    .dt.date
    .astype('datetime64[ns]')
)

In [None]:
result = soh_phases.groupby(['date', 'VIN'], as_index=False).agg(
    soh = ("soh", "median"),
    count = ("soh", "count"),
    odometer = ("odometer", "max"),
    net_capacity = ("net_capacity", "first"),
)

In [None]:
px.scatter(result, x="odometer", y="soh", color="VIN")

# 7 SoH BiB vs SoH_oem

In [None]:
raw_tss["soh_oem"] = raw_tss.groupby('vin')["soh_oem"].ffill() / 100


In [None]:
soh_bib_oem = result.merge(raw_tss[['soh_oem', 'vin']].dropna(), left_on='VIN', right_on='vin', how='left').drop_duplicates()

In [None]:
px.scatter(soh_bib_oem, x="soh", y="soh_oem", color="vin")