# Aggregating on Weekly Level

This script performs weekly aggregation on charger data, creating a complete daily data frame and aggregating it on a weekly level. It fills missing values, aggregates data by zip code and week, and calculates various key performance indicators (KPIs).

Usage:
    - Make sure to have the necessary input files available: df_charger_22_23.csv and data_independent.csv.
    - Update the file paths if necessary.
    - Run the script.

Dependencies:
    - pandas

Original file is located at
    https://colab.research.google.com/drive/14Kq7M8dhqZbUmIM5SOmyo18DXfESDH7F

## Creating a Complete Daily Data Frame
Before we continue to merch on a weekly level we need to make sure that we capture every charger / date combination from the first time a charger has been seen in the data until the last time a charger has been seen in the data because it might be the case that not all combinations are captured in the observations. 

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import pandas as pd

file_path = '/content/drive/My Drive/df_charger_22_23.csv'
df_charger = pd.read_csv(file_path)

In [None]:
print(df_charger)

               ChargePoint_ID        Date  Transaction_ID    Provider  \
0                 ALF-0002080  2022-01-01               1  VATTENFALL   
1                 ALF-0002080  2022-01-02               1  VATTENFALL   
2                 ALF-0002080  2022-01-03               4  VATTENFALL   
3                 ALF-0002080  2022-01-04               4  VATTENFALL   
4                 ALF-0002080  2022-01-05               5  VATTENFALL   
...                       ...         ...             ...         ...   
1207281  TOTAL-NN001858-003-3  2023-03-05               1       TOTAL   
1207282  TOTAL-NN001858-003-3  2023-03-07               1       TOTAL   
1207283  TOTAL-NN001858-003-3  2023-03-08               1       TOTAL   
1207284  TOTAL-NN001858-003-3  2023-03-11               1       TOTAL   
1207285  TOTAL-NN001858-003-3  2023-03-19               1       TOTAL   

              City                    Address    District  IsFastCharger  \
0        Amsterdam  Johan van Kuyckstraat 111  

In [None]:
# check dates
earliest_date = df_charger['Date'].min()
latest_date = df_charger['Date'].max()

print("Earliest Date:", earliest_date)
print("Latest Date:", latest_date)

Earliest Date: 2022-01-01
Latest Date: 2023-03-31


In [None]:
# creating the complete data frame with all the combinations
# start by sorting the data
df_charger = df_charger.sort_values(['ChargePoint_ID', 'Date']).reset_index(drop=True)

# Identify the first and last date for each charger_ID
first_dates = df_charger.groupby('ChargePoint_ID')['Date'].first()
last_dates = df_charger.groupby('ChargePoint_ID')['Date'].last()

# Create an empty DataFrame to hold the completed data
df_daily_completed = pd.DataFrame()

# Loop over all charger_IDs
for charger_id in df_charger['ChargePoint_ID'].unique():
    # Get all dates from the first date to the last date for this charger_ID
    all_dates = pd.date_range(start=first_dates[charger_id], 
                              end=last_dates[charger_id])

    # Create a DataFrame with all combinations of date and this charger_ID
    temp_df = pd.DataFrame({'Date': all_dates, 'ChargePoint_ID': charger_id})

    # Append this to the completed DataFrame
    df_daily_completed = pd.concat([df_daily_completed, temp_df])

df_daily_completed.reset_index(drop=True, inplace=True)

# Make sure that date format is the same
df_charger['Date'] = pd.to_datetime(df_charger['Date'])
df_daily_completed['Date'] = pd.to_datetime(df_daily_completed['Date'])



In [None]:
# Merge the original data into the completed DataFrame
df_daily_completed = df_daily_completed.merge(df_charger, 
                                              on=['Date', 'ChargePoint_ID'], 
                                              how='left')

print(df_daily_completed)

              Date      ChargePoint_ID  Transaction_ID Provider       City  \
0       2022-07-01          1000022923             2.0   EQUANS  Amsterdam   
1       2022-07-02          1000022923             5.0   EQUANS  Amsterdam   
2       2022-07-03          1000022923             4.0   EQUANS  Amsterdam   
3       2022-07-04          1000022923             5.0   EQUANS  Amsterdam   
4       2022-07-05          1000022923             3.0   EQUANS  Amsterdam   
...            ...                 ...             ...      ...        ...   
1306120 2022-03-24  TOTAL-NN001858-015             NaN      NaN        NaN   
1306121 2022-03-25  TOTAL-NN001858-015             NaN      NaN        NaN   
1306122 2022-03-26  TOTAL-NN001858-015             NaN      NaN        NaN   
1306123 2022-03-27  TOTAL-NN001858-015             NaN      NaN        NaN   
1306124 2022-03-28  TOTAL-NN001858-015             7.0    TOTAL  Amsterdam   

                 Address District IsFastCharger IsAppPayment  \

In order to map filling values for the known attributes per charger such as zipcode, we create a data frame that captures all the unique attributes per ChargePoint_ID. Later on we use this data frame to fill the missing values. 

In [None]:
# Create a new DataFrame to capture all needed attributes per charger_ID
chargers = df_charger.groupby('ChargePoint_ID').agg({
    'Provider': 'first',
    'City': 'first',
    'Address': 'first',
    'District': 'first',
    'IsFastCharger': 'first',
    'Total': 'first',
    'Vattenfall': 'first',
    'Allego': 'first',
    'Equans': 'first',
    'EvBox': 'first',
    'Nuon': 'first',
    'WDS': 'first',
    'Engie': 'first',
    'Pitpoint': 'first',
    'Ecotap': 'first',
    'power': 'first',
    'MaxPower': 'first',
    'MaxOccupancy': 'first',
    'Country': 'first'
}).reset_index()

# Print the new DataFrame
print(chargers)

            ChargePoint_ID        Provider       City              Address  \
0               1000022923          EQUANS  Amsterdam      Voornsestraat 7   
1                  1703183  WE DRIVE SOLAR  Amsterdam  Kleefkruidstraat 17   
2                  1706237  WE DRIVE SOLAR  Amsterdam    Joan Muyskenweg 4   
3                  1706513  WE DRIVE SOLAR  Amsterdam    Joan Muyskenweg 4   
4                  1707443  WE DRIVE SOLAR  Amsterdam    Joan Muyskenweg 4   
...                    ...             ...        ...                  ...   
3519  TOTAL-NN001858-003-2           TOTAL  Amsterdam      Vorticellaweg 2   
3520  TOTAL-NN001858-003-3           TOTAL  Amsterdam      Vorticellaweg 2   
3521    TOTAL-NN001858-013           TOTAL  Amsterdam      Vorticellaweg 2   
3522    TOTAL-NN001858-014           TOTAL  Amsterdam      Vorticellaweg 2   
3523    TOTAL-NN001858-015           TOTAL  Amsterdam      Vorticellaweg 2   

        District  IsFastCharger  Total  Vattenfall  Allego  Equ

In [None]:
# Merge the new_df with df_daily_complete based on chargepoint_ID
merged_df = df_daily_completed.merge(chargers, on='ChargePoint_ID', how='left', suffixes=('_old', ''))

# Select the columns to update
columns_to_update = ['zipcode',
                     'Provider', 
                     'City', 
                     'Address', 
                     'District', 
                     'IsFastCharger', 
                     'Total', 
                     'Vattenfall', 
                     'Allego', 
                     'Equans', 
                     'EvBox', 
                     'Nuon', 
                     'WDS', 
                     'Engie', 
                     'Pitpoint', 
                     'Ecotap', 
                     'power', 
                     'MaxPower', 
                     'MaxOccupancy', 
                     'Country']

In [None]:
# Update the values in df_daily_complete with non-null values from merged_df
df_daily_completed[columns_to_update] = merged_df[columns_to_update].fillna(df_daily_completed[columns_to_update])

# Print the updated df_daily_complete
print(df_daily_completed)

              Date      ChargePoint_ID  Transaction_ID Provider       City  \
0       2022-07-01          1000022923             2.0   EQUANS  Amsterdam   
1       2022-07-02          1000022923             5.0   EQUANS  Amsterdam   
2       2022-07-03          1000022923             4.0   EQUANS  Amsterdam   
3       2022-07-04          1000022923             5.0   EQUANS  Amsterdam   
4       2022-07-05          1000022923             3.0   EQUANS  Amsterdam   
...            ...                 ...             ...      ...        ...   
1306120 2022-03-24  TOTAL-NN001858-015             NaN    TOTAL  Amsterdam   
1306121 2022-03-25  TOTAL-NN001858-015             NaN    TOTAL  Amsterdam   
1306122 2022-03-26  TOTAL-NN001858-015             NaN    TOTAL  Amsterdam   
1306123 2022-03-27  TOTAL-NN001858-015             NaN    TOTAL  Amsterdam   
1306124 2022-03-28  TOTAL-NN001858-015             7.0    TOTAL  Amsterdam   

                 Address District  IsFastCharger IsAppPayment  

In [None]:
# Transform "IsFastCharger" column
df_daily_completed["IsFastCharger"] = df_daily_completed["IsFastCharger"].astype(int)


In order to show that for certain combinations there was no observation, we fill the rest of missing values with zeros. 

In [None]:
# Fill remaining NaN values with zeros
df_daily_completed = df_daily_completed.fillna(0)

In [None]:
df_filled = df_daily_completed
df_filled

Unnamed: 0,Date,ChargePoint_ID,Transaction_ID,Provider,City,Address,District,IsFastCharger,IsAppPayment,ConnectionTimeHours,...,MaxOccupancy,MaxPower,Blocked_kWh,SpareCap_Effective,SpareCap_Occup_kWh,SpareCap_Hrs,Effective%,Occupancy_kwh%,Occupancy_h%,Country
0,2022-07-01,1000022923,2.0,EQUANS,Amsterdam,Voornsestraat 7,Oost,0,False,6.05,...,48,529.92,66.7920,459.910,463.1280,41.95,0.132114,0.126042,0.126042,Netherlands
1,2022-07-02,1000022923,5.0,EQUANS,Amsterdam,Voornsestraat 7,Oost,0,False,24.07,...,48,529.92,265.7328,444.110,264.1872,23.93,0.161930,0.501458,0.501458,Netherlands
2,2022-07-03,1000022923,4.0,EQUANS,Amsterdam,Voornsestraat 7,Oost,0,False,23.60,...,48,529.92,260.5440,451.870,269.3760,24.40,0.147286,0.491667,0.491667,Netherlands
3,2022-07-04,1000022923,5.0,EQUANS,Amsterdam,Voornsestraat 7,Oost,0,False,29.75,...,48,529.92,328.4400,484.160,201.4800,18.25,0.086353,0.619792,0.619792,Netherlands
4,2022-07-05,1000022923,3.0,EQUANS,Amsterdam,Voornsestraat 7,Oost,0,False,18.56,...,48,529.92,204.9024,524.160,325.0176,29.44,0.010870,0.386667,0.386667,Netherlands
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1306120,2022-03-24,TOTAL-NN001858-015,0.0,TOTAL,Amsterdam,Vorticellaweg 2,Noord,0,0,0.00,...,72,529.92,0.0000,0.000,0.0000,0.00,0.000000,0.000000,0.000000,Netherlands
1306121,2022-03-25,TOTAL-NN001858-015,0.0,TOTAL,Amsterdam,Vorticellaweg 2,Noord,0,0,0.00,...,72,529.92,0.0000,0.000,0.0000,0.00,0.000000,0.000000,0.000000,Netherlands
1306122,2022-03-26,TOTAL-NN001858-015,0.0,TOTAL,Amsterdam,Vorticellaweg 2,Noord,0,0,0.00,...,72,529.92,0.0000,0.000,0.0000,0.00,0.000000,0.000000,0.000000,Netherlands
1306123,2022-03-27,TOTAL-NN001858-015,0.0,TOTAL,Amsterdam,Vorticellaweg 2,Noord,0,0,0.00,...,72,529.92,0.0000,0.000,0.0000,0.00,0.000000,0.000000,0.000000,Netherlands


In [None]:
# Ensure that date is a datetime object
df_filled['Date'] = pd.to_datetime(df_filled['Date'])

# Set date as the index
df_filled = df_filled.set_index('Date')

## Aggregation on the Zipcode and Week Level
In the next step we want to aggregate the observations on a zipcode and weekly level. This should be the final aggregation level for our data frame that goes into the prediction model. 

In [None]:
# Group by 'zipcode' and 'date', then resample and apply aggregation functions
df_weekly = df_filled.groupby('zipcode').resample('W-MON').agg({  
    'City': 'first', 
    'District': 'first',
    'ConnectionTimeHours': 'sum',
    'kWh': 'sum',
    'effective_charging_hrs': 'sum',
    'power': 'sum',
    'MaxOccupancy': 'sum',
    'MaxPower': 'sum',
    'Blocked_kWh': 'sum',
    'Total': 'first',
    'Vattenfall': 'sum',
    'Allego': 'sum',
    'Equans': 'sum',
    'EvBox': 'sum',
    'Nuon': 'sum',
    'WDS': 'sum',
    'Engie': 'sum', 
    'Pitpoint': 'sum',
    'Ecotap': 'sum',
    'ChargeSocket_ID_count': 'sum',
    'IsFastCharger': 'sum',
    **{f"{i}-{i+1}": 'sum' for i in range(24)},
    **{f"effective_charging_hrs{i}-{i+1}": 'sum' for i in range(24)}})

In [None]:
# Reset the index
df_weekly.reset_index(inplace=True)

# To ensure that week is displayed by the first day of the week, you can subtract 6 days
# df_weekly['Date'] = df_weekly['Date'] - pd.to_timedelta(6, unit='d')

In [None]:
df_weekly

Unnamed: 0,zipcode,Date,City,District,ConnectionTimeHours,kWh,effective_charging_hrs,power,MaxOccupancy,MaxPower,...,effective_charging_hrs14-15,effective_charging_hrs15-16,effective_charging_hrs16-17,effective_charging_hrs17-18,effective_charging_hrs18-19,effective_charging_hrs19-20,effective_charging_hrs20-21,effective_charging_hrs21-22,effective_charging_hrs22-23,effective_charging_hrs23-24
0,0,2022-01-03,Amsterdam,Zuidoost,0.00,0.00,0.000000,1528.600000,11616,77204.16,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,0,2022-01-10,Amsterdam,Nieuw West,0.00,0.00,0.000000,8682.843810,65712,440953.92,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,0,2022-01-17,Amsterdam,Oost,0.00,0.00,0.000000,8103.960000,58368,400322.88,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,0,2022-01-24,Amsterdam,Nieuw West,0.00,0.00,0.000000,7942.373333,56256,385527.36,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,0,2022-01-31,Amsterdam,Nieuw West,0.00,0.00,0.000000,12315.268571,85920,589624.32,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154988,3114 VP,2023-03-06,Amsterdam,Oost,156.70,336.80,30.507246,77.280000,336,3709.44,...,0.859783,0.000000,0.066667,3.473913,5.104348,4.800000,3.512862,1.672826,1.500000,1.118116
154989,3114 VP,2023-03-13,Amsterdam,Oost,203.43,381.95,34.596920,77.280000,336,3709.44,...,2.250000,2.731884,3.370833,3.498188,4.519203,2.898007,3.219203,2.931522,2.603623,1.204348
154990,3114 VP,2023-03-20,Amsterdam,Oost,159.44,216.41,19.602355,77.280000,336,3709.44,...,1.200000,2.066667,2.283333,4.233333,4.728986,1.111957,0.500000,1.000000,1.783333,1.501087
154991,3114 VP,2023-03-27,Amsterdam,Oost,163.19,325.62,29.494565,77.280000,336,3709.44,...,3.577717,1.000000,1.783152,3.590217,2.360688,2.683333,4.282246,3.000000,3.000000,2.198913


In [None]:
file_path_zip = '/content/drive/My Drive/data_independent.csv'
df_zip = pd.read_csv(file_path_zip)

In [None]:
df_weekly[df_weekly.isna().any(axis=1)]

Unnamed: 0,zipcode,Date,City,District,ConnectionTimeHours,kWh,effective_charging_hrs,power,MaxOccupancy,MaxPower,...,effective_charging_hrs14-15,effective_charging_hrs15-16,effective_charging_hrs16-17,effective_charging_hrs17-18,effective_charging_hrs18-19,effective_charging_hrs19-20,effective_charging_hrs20-21,effective_charging_hrs21-22,effective_charging_hrs22-23,effective_charging_hrs23-24
187,1011 CB,2022-04-04,,,0.0,0.0,0.0,0.0,0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
198,1011 CB,2022-06-20,,,0.0,0.0,0.0,0.0,0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
241,1011 CR,2022-01-24,,,0.0,0.0,0.0,0.0,0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
242,1011 CR,2022-01-31,,,0.0,0.0,0.0,0.0,0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
244,1011 CR,2022-02-14,,,0.0,0.0,0.0,0.0,0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154781,1384 AK,2023-01-02,,,0.0,0.0,0.0,0.0,0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
154782,1384 AK,2023-01-09,,,0.0,0.0,0.0,0.0,0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
154783,1384 AK,2023-01-16,,,0.0,0.0,0.0,0.0,0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
154784,1384 AK,2023-01-23,,,0.0,0.0,0.0,0.0,0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We need to create combinations for each zipcode that exists in Amsterdam, taken from the 'data_independent' file and the corresponding week. A week is always shown by the first date of the week (monday). 

In [None]:
# rename and change structure
df_zip['PC6'] = df_zip['PC6'].apply(lambda x: x[:-2] + ' ' + x[-2:])
df_zip = df_zip.rename(columns={"PC6": "zipcode"})

# Get a list of all unique zipcodes from df_zip instead of df_weekly
all_zipcodes = df_zip['zipcode'].unique()

# Get a list of all weeks from the earliest to the latest date in your DataFrame
all_weeks = pd.date_range(start=df_weekly['Date'].min(), 
                          end=df_weekly['Date'].max(), 
                          freq='W-MON')

# Create a MultiIndex with all combinations of zipcode and week
complete_index = pd.MultiIndex.from_product([all_zipcodes, all_weeks], 
                                            names=['zipcode', 'Date'])

# Reindex your DataFrame using the complete index
df_weekly_complete = df_weekly.set_index(['zipcode', 'Date']).reindex(complete_index).reset_index()

In [None]:
df_weekly_complete

Unnamed: 0,zipcode,Date,City,District,ConnectionTimeHours,kWh,effective_charging_hrs,power,MaxOccupancy,MaxPower,...,effective_charging_hrs14-15,effective_charging_hrs15-16,effective_charging_hrs16-17,effective_charging_hrs17-18,effective_charging_hrs18-19,effective_charging_hrs19-20,effective_charging_hrs20-21,effective_charging_hrs21-22,effective_charging_hrs22-23,effective_charging_hrs23-24
0,1011 AB,2022-01-03,Amsterdam,Centrum,67.59,125.80,22.789855,27.60,240.0,1324.80,...,3.000000,1.820290,1.519928,0.776087,2.000000,1.001449,1.533333,2.000000,2.000000,1.966667
1,1011 AB,2022-01-10,Amsterdam,Centrum,163.83,457.95,82.961957,71.76,624.0,3444.48,...,4.959058,2.844928,2.850000,5.583696,4.716667,4.569928,4.000000,2.702899,1.682246,0.983333
2,1011 AB,2022-01-17,Amsterdam,Centrum,235.20,525.80,95.253623,77.28,672.0,3709.44,...,9.450000,8.726449,9.096377,8.315942,6.650000,5.355797,5.145652,4.582246,4.435870,2.950000
3,1011 AB,2022-01-24,Amsterdam,Centrum,336.12,707.90,128.242754,77.28,672.0,3709.44,...,11.898188,11.068841,8.728986,11.021739,11.170290,12.266667,15.121739,12.266667,13.433333,11.321377
4,1011 AB,2022-01-31,Amsterdam,Centrum,259.77,571.31,103.498188,77.28,672.0,3709.44,...,3.264493,4.984420,2.639493,4.775725,5.893478,4.252174,3.703261,4.614130,4.000000,3.119203
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1220995,1045 BA,2023-03-06,,,,,,,,,...,,,,,,,,,,
1220996,1045 BA,2023-03-13,,,,,,,,,...,,,,,,,,,,
1220997,1045 BA,2023-03-20,,,,,,,,,...,,,,,,,,,,
1220998,1045 BA,2023-03-27,,,,,,,,,...,,,,,,,,,,


Here we also want to fill all the missing values with zeros, as it means that no power could have been supplied and no power has been used. 

In [None]:
# Fill NaN values with zeros
df_weekly_complete = df_weekly_complete.fillna(0)

In [None]:
df_weekly_filled = df_weekly_complete
df_weekly_filled

Unnamed: 0,zipcode,Date,City,District,ConnectionTimeHours,kWh,effective_charging_hrs,power,MaxOccupancy,MaxPower,...,effective_charging_hrs14-15,effective_charging_hrs15-16,effective_charging_hrs16-17,effective_charging_hrs17-18,effective_charging_hrs18-19,effective_charging_hrs19-20,effective_charging_hrs20-21,effective_charging_hrs21-22,effective_charging_hrs22-23,effective_charging_hrs23-24
0,1011 AB,2022-01-03,Amsterdam,Centrum,67.59,125.80,22.789855,27.60,240.0,1324.80,...,3.000000,1.820290,1.519928,0.776087,2.000000,1.001449,1.533333,2.000000,2.000000,1.966667
1,1011 AB,2022-01-10,Amsterdam,Centrum,163.83,457.95,82.961957,71.76,624.0,3444.48,...,4.959058,2.844928,2.850000,5.583696,4.716667,4.569928,4.000000,2.702899,1.682246,0.983333
2,1011 AB,2022-01-17,Amsterdam,Centrum,235.20,525.80,95.253623,77.28,672.0,3709.44,...,9.450000,8.726449,9.096377,8.315942,6.650000,5.355797,5.145652,4.582246,4.435870,2.950000
3,1011 AB,2022-01-24,Amsterdam,Centrum,336.12,707.90,128.242754,77.28,672.0,3709.44,...,11.898188,11.068841,8.728986,11.021739,11.170290,12.266667,15.121739,12.266667,13.433333,11.321377
4,1011 AB,2022-01-31,Amsterdam,Centrum,259.77,571.31,103.498188,77.28,672.0,3709.44,...,3.264493,4.984420,2.639493,4.775725,5.893478,4.252174,3.703261,4.614130,4.000000,3.119203
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1220995,1045 BA,2023-03-06,0,0,0.00,0.00,0.000000,0.00,0.0,0.00,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1220996,1045 BA,2023-03-13,0,0,0.00,0.00,0.000000,0.00,0.0,0.00,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1220997,1045 BA,2023-03-20,0,0,0.00,0.00,0.000000,0.00,0.0,0.00,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1220998,1045 BA,2023-03-27,0,0,0.00,0.00,0.000000,0.00,0.0,0.00,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


Now we recalculate our KPIs, as we cannot simply sum them up. 

In [None]:
# add spare capacity in kwh regarding effective charging
df_weekly_filled['SpareCap_Effective'] = df_weekly_filled['MaxPower'] - df_weekly_filled['kWh']

# add spare capacity in kwh regarding occupancy
df_weekly_filled['SpareCap_Occup_kWh'] = df_weekly_filled['MaxPower'] - (df_weekly_filled['power'] * df_weekly_filled['ConnectionTimeHours'])

# add spare capacity in h regarding occupancy 
df_weekly_filled['SpareCap_Hrs'] = df_weekly_filled['MaxOccupancy'] - df_weekly_filled['ConnectionTimeHours']

# add % effective charging
df_weekly_filled['Effective%'] = df_weekly_filled['kWh'] / df_weekly_filled['MaxPower']

# add % occupancy kwh
df_weekly_filled['Occupancy_kwh%'] = (df_weekly_filled['power'] * df_weekly_filled['ConnectionTimeHours']) / df_weekly_filled['MaxPower']

In [None]:
print(df_weekly_filled)

         zipcode       Date       City District  ConnectionTimeHours     kWh  \
0        1011 AB 2022-01-03  Amsterdam  Centrum                67.59  125.80   
1        1011 AB 2022-01-10  Amsterdam  Centrum               163.83  457.95   
2        1011 AB 2022-01-17  Amsterdam  Centrum               235.20  525.80   
3        1011 AB 2022-01-24  Amsterdam  Centrum               336.12  707.90   
4        1011 AB 2022-01-31  Amsterdam  Centrum               259.77  571.31   
...          ...        ...        ...      ...                  ...     ...   
1220995  1045 BA 2023-03-06          0        0                 0.00    0.00   
1220996  1045 BA 2023-03-13          0        0                 0.00    0.00   
1220997  1045 BA 2023-03-20          0        0                 0.00    0.00   
1220998  1045 BA 2023-03-27          0        0                 0.00    0.00   
1220999  1045 BA 2023-04-03          0        0                 0.00    0.00   

         effective_charging_hrs  power 

In [None]:
# in df_weekly assign a number to each date and call this column "week_number" it should not be the weeknum but a running number
df_weekly_filled['index'] = df_weekly_filled['Date'].rank(method='dense').astype(int)


In [None]:
# fill na with 0
df_weekly_filled = df_weekly_filled.fillna(0)

In [None]:
# saving DF to google drive
df_weekly_filled.to_csv('/content/drive/My Drive/df_weekly_22_23.csv', index=False)