<div style="background-color: #ccccff ; padding:10px; font-size:x-large; font-weight: bold">
EGD103 Assignment 2 - Analysing Energy Demand Data
</div>

In this assignment, you will be analysing energy demand data collected by the Australian Energy Market Operator (AEMO) in 2023. Demand and generation values are measured in MegaWatts (MW), and prices are measured in dollars per MegaWatt hour ($/MWh).

The assignment is split into four parts which are equally weighted:
* **Part A:** Here you will wrangle the data to ensure it is suitable for analysis.
* **Part B:** Here you will use aggregation and visualisation to answer some questions about the data.
* **Part C:** Here you will integrate some programming theory with the data to create a program.
* **Part D:** Here you will reflect on your learning from completing the assignment.

You will also be marked on your code quality and an in-class activity after assignment submission. View the CRA on Canvas for more details on how you will be graded.

<div style="background-color: #ffcccc; padding:10px">
    
## General Rules and Restrictions
* You must use Jupyter in the cloud (on https://jupyter.eres.qut.edu.au) to develop your solution.
* For the assignments you cannot work with friends or colleagues, or get help from anyone other than the EGD103 teaching team - it needs to be entirely your own work.
* You cannot use AI tools such as ChatGPT or Copilot to help develop your solution.
* You can use the imports included in this template to complete this assignment. If you would like to use additional imports you will need approval from the unit coordinator.
* You should be using formatted Markdown cells to communicate your process and outcomes in each section of the assignment. Clear communication and formatting are required to achieve high marks.
* You may add extra code cells and Markdown cells to this template at your discretion, but do not remove any cells.
* You should only use Python language features that have been taught within this unit. If you use other features we will suspect acadamic misconduct and require you to attend meeting to authenticate your learning.
</div>

<div style="background-color: #b2efb2; padding:10px; font-size:small; font-weight: bold">
    
## Student Details
<p>Task: Using the Markdown cell below, write your name and student number. Also briefly describe any previous programming experience you may have had (if any).</p>

* Name: 
* Student number:
* Previous programming experience: 

## Allowed Imports
You are allowed to use the imports given in the cell below when completing the assignment. Run the cell to import. Do not repeat yourself by adding import statements later in the assignment.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

***


<div style="background-color: #b2efb2; padding:10px; font-size:small; font-weight: bold">

## Part A: Inspecting and preparing data



### A1: Importing data
Import the two datasets as pandas dataframes. Make sure that columns with dates and/or times are imported in datetime format. Inspect the data to ensure it has been imported correctly.

Useful functions: <code>pd.read_csv</code>.

In [None]:
# write your code here

### A2: Joining data
Combine the two datasets together by joining them based on the datetime and location. Inspect the result to make sure it joined correctly.

Useful methods: <code>merge</code>.

In [None]:
# write your code here

### A3: Duplicate values
AEMO has informed you that there will occasionally be duplicate observations for a given datetime and location. In these instances the last observation is correct and the others should be removed. Check the data for these duplicates and correct if you find any.

Useful methods: <code>duplicated</code>, <code>sum</code>, <code>drop_duplicated</code>.

In [None]:
# write your code here

### A4: Organisation
Organise the dataframe to make sure it's sorted based on datetime. When multiple entries have the same datetime, it should sort based on the location.

Useful methods: <code>sort_values</code>.

In [None]:
# write your code here

### A5: Further inspection
Conduct some further inspection to ensure the data is ready for analysis. You may want to consider the following:
* Are there any missing values?
* Are there any outliers?
* Domain / format checks

In [None]:
# write your code here

### A6: Summary
Write a summary of your data preparation process in this Markdown cell.

The summary should discuss:
* Any possible errors you identified in the data.
* Any modifications you made to the data and why.
* A summary of what information is contained in the cleaned data.

Your summary should use clear language that is code agnostic. It should use Markdown formatting features to ensure its presentable.

<div style="background-color: #b2efb2; padding:10px; font-size:small; font-weight: bold">

## Part B: Summarising data



### B1: What was the most expensive price in the data? When and where did it occur?

Write your worded response here

In [None]:
# write code here

### B2: How do prices compare between each location in the data? 
Write your worded response here.

In [None]:
# write code here

### B3: How does daily demand vary with time in the data? Are there any parts of the year that have noticably higher or lower demand than others?
Write your worded response here.

In [None]:
# write code here

### B4: Which two variables have the highest correlation in the data? How strong does their relationship look when plotted?
Write your worded response here.

In [None]:
# write code here

### B5: Your own research question
Write your worded response here.

In [None]:
# write code here

<div style="background-color: #b2efb2; padding:10px; font-size:small; font-weight: bold">

## Part C: Data integration

Demand forecasting is important for energy companies to ensure they can operate as profitably as possible. AEMO publishes there own demand forecasts to help companies with their decisions, but many companies will also have their own in-house forecasting tools to give them an advantage. In this part we will get you to develop your own forecasting tool and compare it to AEMO's forecasts.


### C1: A simple forecasting model
Create a user-defined function that accepts three inputs: a dataframe, a location id and a datetime. It should return the average of the last 5 changes in demand that occured for that location. If there aren't 5 previous changes, then it should a missing value eg. <code>None</code>, <code>NaN</code>. Your function should include exceptions that get raised if an invalid region or datetime is used when calling the function.

Hint: the pandas method <code>diff</code> is useful for computing the changes in demand.

In [None]:
def my_demand_forecast(df, region, datetime):
    # write your code here

In [None]:
# add your own test cases here to make sure the function is working correctly.

### C2: Creating a forecast column
Add a new column to the dataframe containing demand forecasts using the approach described in C1. This will allow you to compare your own forecasts to AEMO's.

Hint: the pandas method <code>diff</code> can again be useful here.

In [None]:
# write your code here

### C3: Compute forecasting error columns
You should now have two forecast columns - one with AEMO's forecasts and one with your own. You now need to calculate the error for each forecast. This can be done by:
1. Finding the demand in the next 5-minute time period for that location.
2. Computing the actual change that occured in the demand.
3. Subtracting this from the forecast change to get the forecasting error.

Perform the process outlined above for each forecast column to compute forecast error columns.

Hint: the pandas method <code>shift</code> is very useful in this task. This can allow you to get the next demand in the same row as the current demand.

In [None]:
# write code here

### C4: Analysing model errors
A common metric used to measure the accuracy of a model is Mean Absolute Error (MAE). This does the following:
* Find the absolute value of each error.
* Find the average (mean) of the absolute errors.

Do this for each of your error columns to compute the MAE for each forecasting model.

In [None]:
# write code here

### C5: Model summary
Write a summary of your forecasting model in this Markdown cell. The summary should discuss the following:
* How to use the model to forecast how much demand will change in the next 5 minutes.
* The average error to expect when using the model.
* How accurate your model is compared to AEMO's.
* Any possible improvements that could be made to the model.

<div style="background-color: #b2efb2; padding:10px; font-size:small; font-weight: bold">

## Part D: Learning reflection (approx 500 words)



<p>Write a reflection of your learning from completing this assignment in this Markdown cell. 
    
Examples of things you might want to discuss include:
* Programming / data processing concepts you learned.
* Errors you made and how you troubleshooted them.
* Earlier drafts of code and how you improved them.
* Resources from EGD103 you found most useful when completing the assignment.
* Good programming / data processing practices you developed.
* Areas of improvement.

The word count is suggestive and won't be rigidly enforced as a limit. Supplementary code provided will not be counted towards the word count.
</p>

In [18]:
# Can insert any code that supports your reflection here
# (eg. initial errors you made, earlier versions of code that you improved on)