# ACN dataset: EV data

The aim of this notebook is to generate EV charging profiles using the open access ACN dataset from Caltech. The raw data over a specified duration is downloaded via the API client from [the ACN webpage](https://ev.caltech.edu/dataset).
For details on the required files to use the Python API client, please refer to `README.md`.

The time period for which data is downloaded via the API client is specified by the parameters `start_time` and `end_time`. The column `chargingCurrent` in the downloaded raw data contains the timeseries of the current in Ampere for an event in time steps of 4 seconds. This timeseries is re-sampled  with the function `resample_timeseries()` into longer intervals (e.g. 30 min) which is controlled by the parameter `time_resolution`, converted to power (in kW) and then saved as a `csv` file. The resampling is based on the function `pd.DataFrame.resample()` and here, `mean()` is used to aggregate the data over the resampling interval.

The filename contains information about the start date/time and end date/time `EV-ACN_<site>--start-<YYYYmmdd>--end-<YYYYmmdd>-<time_resolution>min.csv`. 

The time resolution can be modified by changing the parameter `time_resolution` when generating the timestamps for the dataframe.
The first column of the `csv` contains the timestamps. Each of the remaining columns represents the profile of one individual charging event. The column name is the `sessionID`, which includes the date and the `stationID` (which uniquely identifies the EVSE). 

In [None]:
from acnportal.acndata import data_client
from datetime import date, datetime, time, timedelta
import pytz
import pandas as pd
import flat_table
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import logging 
pd.options.mode.chained_assignment = None  # default='warn', check later


# Set up logging
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

## Parameters

Specify parameters for the data we want to access (site, start and end time/date etc.).

In [None]:
# Specify parameters for the session

start_time = datetime(2021, 5, 1, 0)
end_time = datetime(2021, 5, 5, 0)
time_resolution = "30min" # for re-sampling the timeseries, in min
save_output_file = True

# Timezone of the ACN we are using.
timezone = pytz.timezone("America/Los_Angeles")
# add timezone to start and end time/date
start = timezone.localize(start_time) 
end = timezone.localize(end_time)
# identifier of the site where data will be gathered.
site = "caltech"
min_energy = None 
timeseries = True

## Functions


In [None]:
def resample_timeseries(df, session_id, time_resolution, voltage=220):
    """ resamples the charging current for a single charging event and converts it to power.
    
    The current is averaged over the specified time resolution before converted to power.
    
    Parameters
    -----------
    df: pd.DataFrame
        The dataframe containing the timeseries of the charging current
        for a single charging event
    session_id: str
        identifier for the charging session, used to name columns 
    voltage: int, optional
        Voltage of the network, used to convert current into power.
        
    Returns
    -------
    timeseries: pd.DataFrame
        Dataframe containing the resampled power in kW.
    """
    timeseries = df.set_index("timestamps").resample(time_resolution).mean()
    timeseries = timeseries.assign(timestamps = timeseries.index.time)
    timeseries = timeseries.set_index("timestamps")
    # convert current into power in kW
    timeseries["current"] = timeseries["current"]*voltage/1000.0
    timeseries = timeseries.rename(columns={"current": f"{session_id}"})
    return timeseries


In [None]:
def save_df(df_to_save, start_session, end_session, time_resolution, path=None):
    """  save df containing all resampled charging profiles to .csv
    
    Parameters
    -----------
    df_to_save: pd.DataFrame
        The dataframe to be saved.
    start_session: datetime.datetime object
        Start of the queried session
    end_session: datetime.datetime object
        End of time of the queried session
    time_resolution: str
        Time resolution of resampled timeseries
    path: str
        Directory where file is saved
    """
    
    file_out = f"EV-ACN{site}--start-{start_session.date()}--end-{end_session.date()}--{time_resolution}"
    if path is None:
        path = Path.cwd() 
    path_out = path / f"{file_out}.csv"
    df_to_save.to_csv(path_out, index=True)


In [None]:
def plot_power(df_to_save, num_events=0):
    """ plot charging power of individual events in one plot
    
    Parameters
    -----------
    df_to_save: pd.DataFrame
        The dataframe containing all charging events.
    num_events: int
        Number of events that are plotted. Plots all events if
        num_events is 0. 
        
    """
    x = np.arange(0,24,time_resolution/60)
    if num_events == 0:
        num_events = len(df_to_save.columns) - 1
    fig = plt.subplots(figsize=(12, 6))
    for ind in range(1, num_events):
        y = df_to_save.iloc[:,ind]
        plt.plot(x, y)
    plt.xticks(np.arange(0,24))
    plt.xlabel("Hours")
    plt.ylabel("Power in kW")
    plt.title("Charging profiles")
    plt.show()


## Access ACN dataset 

The ACN dataset is accesed via the provided Python API client. 
A session is the collection of individual charging events that took place in the specified time interval. 


In [None]:
# Set up client to access data via the API client
with open(Path.cwd().parent / "secrets" / "acn_api_token.txt" ) as file:
    api_token = file.readline()
data_client = data_client.DataClient(api_token)



After specifying the relevant parameters, the corresponding ACN data returned by the Python generator is converted into a Pandas dataframe. 

In [None]:
# Get data for the specified session using the API client
session = data_client.get_sessions_by_time(site, start, end, min_energy, timeseries)
df_session = pd.DataFrame(session)
logging.info(f"\nNumber of charging sessions in queried time period: {df_session.shape[0]}")
if df_session.empty:
    logging.error("No events recorded for this time period.")
df_session.head()

## Re-sample timeseries and save charging profile

Each row in the dataframe corresponds to an individual charging event and the column `chargingCurrent` contains the timeseries of the current for an event in time steps of 4 seconds. For each event, this timeseries is re-sampled to 30 minute intervals, converted into power, and added in a new column to the data frame `charging_profiles`, which contains all resampled charging profiles.

In [None]:
# Initialise dataframe to combine all events
timestamps = pd.date_range("00:00", "23:59", freq=time_resolution) # change freq to modify time resolution

charging_profiles = pd.DataFrame(timestamps, columns=['Timestamps']) 
charging_profiles['Timestamps'] = charging_profiles['Timestamps'].apply(lambda x: x.time() )
charging_profiles.set_index("Timestamps", inplace=True)

In [None]:
# go through all charging sessions, resample and add them to charging profiles
for index, row in df_session.iterrows():
    if pd.notnull(row['chargingCurrent']):
        df = pd.DataFrame(row['chargingCurrent'])
        session_id = row['sessionID']
        timeseries = resample_timeseries(df, session_id, time_resolution)
        charging_profiles = charging_profiles.join(timeseries)
        charging_profiles.fillna(0, inplace=True)

In [None]:
charging_profiles

In [None]:
if save_output_file:
    save_df(charging_profiles, start_time, end_time, time_resolution)

## Plot individual charging profiles

In [None]:
# plot_power(df_all_events)