Exploring the Impacts of Residential And Commercial Solar Power Production on Grid Demand
-------
team: Group G – Watt’s Up Down Under
session: Hexamester 2, 2024
coursecode: ZZSC9020
author: 
- Bernard Lo (z3464235)
- Andrew Ryan (z2251397)
- Chadi Abi Fadel (z5442788)
- Joshua Evans (z5409600)


# Abstract

There is a well-known relationship between electricity demand and temperature in the electricity industry, most commercial power suppliers use temperature to forecast energy demand. More and more Australian homes are considering adding solar panels as a source of renewable energy, the team is interested in whether adding solar power as another variable will improve the accuracy of the model that is currently being used. By using convolutional neural network (CNN) and long short-term memory (LSTM) models, we improved the accuracy of the energy forecasting by implementing the solar power output dataset along with the temperature dataset that were originally used. Using temperature and solar power datasets from 2017 to 2021, the team concluded that both CNN and LSTM modelling techniques provided more accurate energy forecasting and comparing both models, LSTM is the superior model over CNN. The findings from this experiment suggested that energy providers should consider implementing datasets from various renewable sources to improve its modelling accuracy in order to improve energy pricing and reduce wastage.  


# Loading the data


## Loading the given dataset

Our group's own python module is in `./src/watts_up`. It contains all the code required to run the functions called from `wup`

In [None]:
# importing thhe python module.
import src.watts_up as wup

### Unzpping Programmatically

- `wup.extract_all_zips(source_dir, dest_dir)`: Extracts all ZIP files from a specified source directory to a destination directory, creating the destination if it doesn't exist.


In [None]:
# source_directory = '../data/Australia'
destination_directory = '../extracted_zips'
wup.extract_all_zips(source_directory, destination_directory)

# Importing CSVs to a dictionary

- `wup.create_dataframes_dict(base_directory)`: Creates a dictionary of DataFrames from CSV files found in subdirectories of a base directory, keyed by CSV file names.

In [None]:
base_directory = '../extracted_zips'  # Change this to your actual directory path
dataframes_dict = wup.create_dataframes_dict(base_directory)

#### Check Import


- `wup.display_dataframes(dataframes)`: Displays basic information and the first few rows for each DataFrame in a given dictionary of DataFrames.

In [None]:
wup.display_dataframes(dataframes_dict)

### Dataframes organised by state in a dict


- `wup.organize_and_print_dataframes(dataframes_dict)`: Organizes DataFrames by state based on naming conventions and prints out each DataFrame's name under its corresponding state.

In [None]:
data_by_state=wup.organize_and_print_dataframes(dataframes_dict)

### Accessing our data


- `wup.get_dataframe_from_state(data_by_state, state, dataframe_key)`: Retrieves a specific DataFrame from a nested dictionary structure based on state and DataFrame key.

In [None]:
#this is an example access to a dataframe from the dict using the function
forecast_demand_vic=wup.get_dataframe_from_state(data_by_state, 'VIC', 'forecastdemand_vic')
forecast_demand_vic.head()

## Scraping PV data

### Downloading

In [None]:
import requests
from pathlib import Path

base_url = "https://nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/"

years = range(2017, 2023)  
months = ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]

file_prefix = "PUBLIC_DVD_ROOFTOP_PV_ACTUAL_"
file_suffix = ".zip"

download_dir = Path("../data/PV_Data/raw_data")  # The specified path
download_dir.mkdir(parents=True, exist_ok=True)

def download_file(url, path):
    response = requests.get(url)
    if response.status_code == 200:
        with open(path, 'wb') as file:
            file.write(response.content)
        print(f"Downloaded {file_name} to {path}")
    else:
        print(f"Failed to download {file_name}: HTTP Status Code {response.status_code}")

for year in years:
    for month in months:
        file_name = f"{file_prefix}{year}{month}010000{file_suffix}"
        full_url = f"{base_url}{year}/MMSDM_{year}_{month}/MMSDM_Historical_Data_SQLLoader/DATA/{file_name}"
        file_path = download_dir / file_name
        
        print(f"Downloading {file_name}...")
        download_file(full_url, file_path)

print("Downloads complete.")

### Unzipping

In [None]:
import zipfile
from pathlib import Path

# Specify the directory with the zip files
zip_files_dir = Path("../data/PV_Data/raw_data")
unzip_dir = zip_files_dir / "unzipped_data"

# Create a directory for the unzipped files if it doesn't exist
unzip_dir.mkdir(parents=True, exist_ok=True)

# Loop through each zip file in the directory and unzip it
for zip_file in zip_files_dir.glob("*.zip"):
    with zipfile.ZipFile(zip_file, 'r') as zip_ref:
        # Extract all the contents of zip file into the unzipped directory
        zip_ref.extractall(unzip_dir)
        print(f"Unzipped {zip_file} into {unzip_dir}")

print("Unzipping complete.")


### Importing CSVs to a dictionary

In [None]:
base_directory = '../extracted_zips'  # Change this to your actual directory path
dataframes_dict = wup.create_dataframes_dict(base_directory)

### Check Import

In [None]:
wup.display_dataframes(dataframes_dict)

### Dataframes organised by state

In [None]:
data_by_state=wup.organize_and_print_dataframes(dataframes_dict)

### Accessing our data

In [None]:
#this is an example access to a dataframe from the dict using the function
forecast_demand_vic=wup.get_dataframe_from_state(data_by_state, 'VIC', 'forecastdemand_vic')
forecast_demand_vic.head()

# Describing the data

## Inspect and Clean data

In [None]:
wup.display_dataframes(dataframes_dict)

## Dataframes organised by state

In [None]:
data_by_state=wup.organize_and_print_dataframes(dataframes_dict)

## Accessing our data

In [None]:
#this is an example access to a dataframe from the dict using the function
forecast_demand_vic=wup.get_dataframe_from_state(data_by_state, 'VIC', 'forecastdemand_vic')
forecast_demand_vic.head()

## Converting to Datetime

In [None]:
columns_to_convert = ['LASTCHANGED', 'DATETIME']
wup.convert_df_columns_to_datetime(data_by_state, columns_to_convert)


## Sanity Check

In [None]:

# Assuming 'data_by_state' is your nested dictionary structure
columns_to_check = ['LASTCHANGED', 'DATETIME']
wup.check_datetime_conversions(data_by_state, columns_to_check)

## Missing Values

In [None]:
wup.print_missing_values_summary(data_by_state)

----------
# Appendix A: another trial to deal with duplicate QLD Forecasts


In [None]:
# !!! TODO: automate the checks that there is not a high variance in the duplicate rows, if low variance, merge by mean
sorted_unique_datetimes = np.sort(forecastdemand_qld['DATETIME'].unique())


high_var=0
cv=[]

# variances
for dt in sorted_unique_datetimes:
    duplicate_readings=forecastdemand_qld[forecast_demand_vic['DATETIME'] == dt][['DATETIME','FORECASTDEMAND']]
    std= duplicate_readings['FORECASTDEMAND'].std()
    mean = duplicate_readings['FORECASTDEMAND'].mean()
#     range_ratio=-(min(duplicate_readings['FORECASTDEMAND'])-max(duplicate_readings['FORECASTDEMAND']))/mean*2
    coefficient_of_variation=std/mean
    if (coefficient_of_variation>=0.1):
        print (dt,'\tVariance information')
        print ("std: ", std, "mean :", mean, "coefficient_of_variation: ", coefficient_of_variation, 'min: ', min(duplicate_readings['FORECASTDEMAND']), "MAX: ", max(duplicate_readings['FORECASTDEMAND']))
        display (duplicate_readings)
        high_var+=1
        if (high_var>=5):
            break
    cv.append(coefficient_of_variation)

In [None]:
average_cv = sum(cv) / len(cv)

print("Average:", average_cv)

## Grouping by mean

In [None]:
# Group by 'ID' and calculate the mean of 'Value', keeping 'Other' column as is (if needed)
unique_forcastdemand_qld = forecastdemand_qld.groupby('DATETIME', as_index=False).agg({'FORECASTDEMAND': 'mean', 'PERIODID': 'first'})

# Display the resulting DataFrame
print("\nDataFrame after dropping duplicates by mean of 'Value':")
display(unique_forcastdemand_qld)
