In this notebook we read each consumer's csv file and create a dataframe with which to generate a single line of summary data

In [1]:
import pandas as pd
import os

seasons = ["winter_wd", "winter_we", "summer_wd", "summer_we"]
# FIRST Setup a summary dataframe which we can write to a csv at the end.
# The summary df will include a row for each LCLid
# Each row will contain the LCLid in the first columnn then the following
# 24 columns will be the hourly mean() for the season.

# Create a list of strings containing the two-digit number from 00 to 23
# These will be our column names for the hourly data columns.
hourly_cols = [str(hour).zfill(2) for hour in range(24)]

# THEN loop through the seasons to create winter we/wd and summmer we/wd summary dataframes
for season in seasons:
    directory = "data_2013/" + season

    # Set up the summary df for each season
    summary_df = pd.DataFrame(columns=["LCLid"] + hourly_cols)

    # Now loop through the each seasonal folder and read each consumer's csv file
    # into a temporary dataframe and then calculate the mean KWH for each hour.
    # Then add the mean KWH values to the summary row.

    for filename in os.listdir(directory):

        if filename.startswith("MAC") and filename.endswith(".csv"):
            LCLid = filename.split(".")[0]
            print("Processing: ", season, LCLid)
            df = pd.read_csv(directory + "/" + filename, parse_dates=["HourlyDateTime"])

            df["Time"] = df["HourlyDateTime"].dt.time
            df.drop("HourlyDateTime", axis=1, inplace=True)

            # Now calculate the mean KWH for each hour into a list
            mean_kwh_list = list(df.groupby(["Time"])["KWH"].mean())

            # Now make a dictionary of column names and values by zipping the two lists
            mean_kwh = dict(zip(hourly_cols, mean_kwh_list))

            # Add a summary row to summary_df which contains "LCLid" and hourly mean values
            summary_row = {"LCLid": LCLid, **mean_kwh}

            # now add the summary_row to the summary_df using loc(len(df))
            summary_df.loc[len(summary_df)] = summary_row

    summary_df.to_csv(f"data_2013/summary/{season}.csv", index=False)



Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


Processing:  winter_wd MAC000002
Processing:  winter_wd MAC000003
Processing:  winter_wd MAC000007
Processing:  winter_wd MAC000009
Processing:  winter_wd MAC000010
Processing:  winter_wd MAC000011
Processing:  winter_wd MAC000018
Processing:  winter_wd MAC000019
Processing:  winter_wd MAC000021
Processing:  winter_wd MAC000023
Processing:  winter_wd MAC000027
Processing:  winter_wd MAC000033
Processing:  winter_wd MAC000034
Processing:  winter_wd MAC000039
Processing:  winter_wd MAC000040
Processing:  winter_wd MAC000045
Processing:  winter_wd MAC000049
Processing:  winter_wd MAC000054
Processing:  winter_wd MAC000055
Processing:  winter_wd MAC000057
Processing:  winter_wd MAC000059
Processing:  winter_wd MAC000060
Processing:  winter_wd MAC000063
Processing:  winter_wd MAC000067
Processing:  winter_wd MAC000072
Processing:  winter_wd MAC000074
Processing:  winter_wd MAC000077
Processing:  winter_wd MAC000083
Processing:  winter_wd MAC000084
Processing:  winter_wd MAC000085
Processing