### To run this notebook
1. Install Python 3.10.7
2. Create a directory **my_project* : **mkdir my_project**
3. Copy git project to above directory: **git link**
4. Go to git project and create virtual env uing : **python -m venv venv**
5. Activate virtual environment sing : **.\venv\Scripts\activate**
6. Install packagesusing : **pip -r requirements
7. Start jupyter lab using: **jupyter lab**
8. Open superchargers.ipynb notebook and review source data paths
9. Run the notebook.txt**

### The Assessment - This notebook is trying to answer following questions

Tesla Energy Service Engineering Data Evaluation 5/10/23 Welcome to the Data Evaluation. This is an equivalent set of tasks to those you will be expected to perform as part of the Energy Service engineering Infrastructure and Analytics team. Please complete these exercises using python with pandas and present your results in a jupyter notebook. In this hypothetical, we are investigating a subset of the fleet of Superchargers to review possible temperature issues in the handles in March 2022. Please reach out to your recruiter to be provided with a link to the files needed for this exercise, and reach out if you have any questions

1. Using the signals_data files, provide a list of components that have an average temeprature above 105 degrees during any 30 second period. Perform this analysis for all temperature signals in the dataset.
2. Provide the maximum power (given by the LM_PowerLimit signal) that occurs during any 30 second period when a component experiences an average temperature of 105 degrees over that period.
3. Provide the average power and temperature when a temperature related alert fires (see alerts_data for the alerts information and timestamps). Detail your method for aligning theialert timestamps with the signal timestamps, and your method for determining the value of the signal that occurs when the alert fires.
4. Create exploratory data visualizations that indicate the general power, temperature and alert trends of the sites, assets and components in the data. Additionally please provide visualizations that allow for items that are concerning and need investigation to be easily identified for further followup. The hypothetical audience for these visualizations would be engineering stakeholders who are looking to understand any trends or correlations in the provided signals / alerts that could indicate poorly performing equipment and possible causes. You can use any tool or platform you wish (Tableau is preferred) but please present the results with some explanations in the jupyter notebook submission. Please note that
providing an individual temperature graph for every asset, site and component does not achieve these goals.

### My research and understanding about the problem
#### Source : https://www.tesla.com/charging

After reviewing the source material and website about Tesla's charging options, it appears that the problem at hand is related to the analysis of data from their superchargers. It can be assumed that this exercise specifically focuses on the superchargers, despite Tesla offering several methods to charge its vehicles. Tesla claims to have about 50,000 superchargers at this time. These superchargers are located in specific sites, identified by a name or asset, and consist of one or more components that send signal data to Tesla at second intervals.


Based on this assessment, it can be deduced that the business users are seeking to identify potential issues with supercharger components and require an operational dashboard for executives. It is assumed that review meetings occur weekly. It may be helpful to speak with the business teams to gain more insig and requirements ow to organ, processize and store the data more effectively. It would be beneficial to store the data on a distributed platform that can be used for processing and analyzing. However, for this particular exercise, Python Pandas will be used.


### My notes on March 2022 superchargers data

1. Data is available in .\data\Energy Fleet Analyst\
2. It is a subset of March 2022 fleet of superchargers
3. It looks like one CSV file is posted per signal and event_date
4. CSV file names contain 'TempDegC' has temperatures recorded in degree centigrade site/asset/component combination
5. CSV file names contain 'powerLimit' has power output from site/asset/component combination
6. CSV file names contain 'analyst_alerts' has alerts that needs to be investigated for high temp and power
7. All CSV contain identical columns and columns description is below
   - blank column name: sequence number
   - timestamp: epoch timestamp of event
   - site: the location where the charger is installed
   - asset: the charger (mobile/wall/supercharger/battery?)
   - component: an asset is made of one or more components. the component that can send sensing data
   - signal_name: singal sent by component
   - value: some metric depending on signal name (e.g. temperature, power)
   - event_date: event created date
     

### My assumptions on datamodel - A Conceptual interpretation

1. Tesla energy fleet consists of chargers (mobile/wall/superchargers) installed across the globe
2. Site -- The location where the charger is installed
3. Asset -- The charger (mobile/wall/supercharger/battery?)
4. Component -- Specific part of an asset that can send signal data to Tesla
5. In case of supercharging network a site can have multiple assets. In case of mobile/wall the relationship is mostly 1:1
6. When a vehicle is charged at a site using an asset, the components of asset are sending data to Tesla network. They could also send when idle
7. The data needs to be analyzed to see if the asset components showed any unwanted behavior


In [1]:
import os
import pandas as pd
import numpy as np

# define source directory
src_dir = "./data/Energy Fleet Analyst/"

In [2]:
def get_src_files(src_dir,src_file_pat):
    return [file for file in os.listdir(src_dir) if file.replace("'","").endswith(src_file_pat)]

def get_src_schema():
    # define source schema
    return {
        None: int,
        "timestamp": np.int64,
        "SITE": str,
        "ASSET": str,
        "COMPONENT": str,
        "signal_name": str,
        "VALUE": float 
        }
    
def get_cons_df(src_dir,src_files,src_schema):
    df_cons = []
    for src_file in src_files:
        df = pd.read_csv(os.path.join(src_dir,src_file),dtype=src_schema)
        df_cons.append(df)
    return pd.concat(df_cons,ignore_index=True)

# some dataframe operations - rename, rearrange, add columns
def set_col_ops(df,rename_cols):
    df.columns = map(str.lower, df.columns)
    df = df.rename(columns=rename_cols)
    df["event_timestamp"] = pd.to_datetime(np.int64(df["event_epoch"]), unit="ms")
    # df_raw_cons["temp_f"] = df_raw_cons['temp_c'].apply(lambda x: (x * 9/5) + 32)
    # df = df[["event_epoch","event_timestamp","site","asset","component","signal_name","temp_c"]]
    return df

def get_indexed_df(df,index_col):
    return df.set_index(index_col).sort_index()


#### 1. List of components that have an average temeprature above 105 degrees during any 30 second period

In [3]:
# define input file pattern for extraction
src_file_pat  = "CoreTempDegC.csv"

# get list of input files
src_files = get_src_files(src_dir,src_file_pat)
# print(src_files)

# get source schema and columns needed
src_schema = get_src_schema()
# print(src_schema)

# construct conslidated dataframe from all input files
df = get_cons_df(src_dir,src_files,src_schema)
# df.head()
rename_cols = {"timestamp":"event_epoch","value":"temp_c"}
df_col_ops = set_col_ops(df,rename_cols)
df_indexed = get_indexed_df(df_col_ops,"event_timestamp")
# df_indexed.head()
df_temp_resample = df_indexed.groupby(["site","asset","component","signal_name"]).resample("30S")["temp_c"].agg(**{"avg_temp_c": "mean"})
# df_resample.head()
df_overtemp = df_temp_resample[df_temp_resample > 105].dropna()
df_overtemp.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,avg_temp_c
site,asset,component,signal_name,event_timestamp,Unnamed: 5_level_1
-1492083459,1996369523,399812473,LM_handleNegCoreTempDegC,2022-03-15 22:07:00,105.795769
-1492083459,1996369523,399812473,LM_handleNegCoreTempDegC,2022-03-15 22:07:30,106.209674
-1492083459,1996369523,399812473,LM_handleNegCoreTempDegC,2022-03-15 22:08:00,106.013432
-1492083459,1996369523,399812473,LM_handleNegCoreTempDegC,2022-03-15 22:08:30,105.485101
-152918872,365951380,-1996083301,LM_handlePosCoreTempDegC,2022-03-13 20:32:00,105.566328


#### Resample the data by 30 seconds and calculates the maximum value of the power_kw column for each group (site,asset,component,signal_name)

In [4]:
# define input file pattern for extraction
src_file_pat  = "powerLimit.csv"

# get list of input files
src_files = get_src_files(src_dir,src_file_pat)
# print(src_files)

# get source schema and columns needed
src_schema = get_src_schema()
# print(src_schema)

# construct conslidated dataframe from all input files
df = get_cons_df(src_dir,src_files,src_schema)
# df.head()
rename_cols = {"timestamp":"event_epoch","value":"power_kw"}
df_col_ops = set_col_ops(df,rename_cols)
df_indexed = get_indexed_df(df_col_ops,"event_timestamp")
# df_indexed.head()
df_power_resample = df_indexed.groupby(["site","asset","component","signal_name"]).resample("30S")["power_kw"].agg(**{"max_power_kw": "max"})
df_power_resample.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,max_power_kw
site,asset,component,signal_name,event_timestamp,Unnamed: 5_level_1
-1004574621,751395753,-1644468160,LM_powerLimit,2022-03-12 00:13:00,92.718633
-1004574621,751395753,-1644468160,LM_powerLimit,2022-03-12 00:13:30,98.649275
-1004574621,751395753,-1644468160,LM_powerLimit,2022-03-12 00:14:00,97.250263
-1004574621,751395753,-1644468160,LM_powerLimit,2022-03-12 00:14:30,95.810577
-1004574621,751395753,-1644468160,LM_powerLimit,2022-03-12 00:15:00,95.992763


#### 2. Maximum power that occurs during any 30 second period when a component experiences an average temperature of 105 degrees over that period.

In [5]:
df_overtemp_power = pd.merge(df_overtemp, df_power_resample, on=['site', 'asset', 'component', 'event_timestamp'], how='left')
df_overtemp_power.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,avg_temp_c,max_power_kw
site,asset,component,event_timestamp,Unnamed: 4_level_1,Unnamed: 5_level_1
-1492083459,1996369523,399812473,2022-03-15 22:07:00,105.795769,203.661286
-1492083459,1996369523,399812473,2022-03-15 22:07:30,106.209674,195.367797
-1492083459,1996369523,399812473,2022-03-15 22:08:00,106.013432,189.514206
-1492083459,1996369523,399812473,2022-03-15 22:08:30,105.485101,183.284727
-152918872,365951380,-1996083301,2022-03-13 20:32:00,105.566328,241.155983
