## Study Vin

Ce notebook a pour objectif d’étudier un vin en particulier.
Le déroulé est assez simple :

- Sélectionner un vin
- Chaque partie correspond à une des étapes du pipeline

Pour le moment, les étapes sont assez basiques. Le but est d’ajouter des graphiques au fur et à mesure des besoins, afin de les conserver et de gagner du temps.
L’idée est de pouvoir simplement changer le numéro du vin, exécuter tout le notebook (Run all), et observer où se situe le potentiel problème.

Tout est normalement importé pour l’ensemble des OEM, mais il est possible qu’il manque certaines choses.

## Import

In [None]:
from core.pandas_utils import *
from core.spark_utils import create_spark_session
from core.s3.s3_utils import S3Service
from core.s3.settings import S3Settings

settings = S3Settings()

spark = create_spark_session(
    settings.S3_KEY,
    settings.S3_SECRET
)

s3 = S3Service()

In [None]:
import pandas as pd
import plotly.express as px
from core.sql_utils import get_connection

## Selection du vin 

In [None]:
vin_number = "5YJ3E7EB1KF334219"

# db 

In [None]:

with get_connection() as con:
    cursor = con.cursor()
    cursor.execute(f"""SELECT m.make_name, vm.model_name, vm.type, vm.version, b.net_capacity, b.capacity, vd.odometer, vd.speed, vd.soh, vd.cycles, vd.consumption, vd.soh_comparison, vd.timestamp, vd.level_1, vd.level_2, vd.level_3, vd.soh_oem FROM vehicle_data vd
                    join vehicle v
            on v.id = vd.vehicle_id
            join vehicle_model vm 
            on vm.id = v.vehicle_model_id
            join battery b
            on b.id = vm.battery_id
            join fleet f
            on f.id = v.fleet_id
            join make m
            on m.id=vm.make_id
            WHERE vin in ('{vin_number}');""", con)
    result = cursor.fetchall()

    columns = [ "make_name", "model_name", "type", "version", "net_capacity", "capacity", "odometer", "speed", "soh", "cycles", "consumption", "soh_comparison", "timestamp", "level_1", "level_2", "level_3", "soh_oem"]
    dbeaver = pd.DataFrame(columns=columns, data=result)
dbeaver.head()

## Result_phases

In [None]:
result_phases = s3.read_parquet_df_spark(spark, f"result_phases/result_phases_tesla_fleet_telemetry.parquet/VIN={vin_number}")

In [None]:
px.scatter(result_phases, x='ODOMETER_FIRST', y='SOH')

## Processed_phases

In [None]:
processed_phases = s3.read_parquet_df_spark(spark, f"processed_phases/processed_phases_tesla_fleet_telemetry.parquet/VIN={vin_number}")

In [None]:
px.scatter(processed_phases, x='ODOMETER_FIRST', y='SOC_FIRST')

## Raw_tss

In [None]:
raw_tss = s3.read_parquet_df_spark(spark, f"raw_ts/tesla-fleet-telemetry/time_series/raw_ts_spark.parquet/vin={vin_number}")

In [None]:
raw_tss.columns

In [None]:
px.scatter(raw_tss, x='date', y='BatteryLevel')