# Collect weather and load data 

This notebook provides all the code and documentation to collect the Ausgrid load dataset and the Meteostat Weather dataset. 


# Ausgrid Load Data

The load datasets can be downloaded manually from the Ausgrid Website. You also find further documentation there. 
- https://www.ausgrid.com.au/Industry/Our-Research/Data-to-share/Solar-home-electricity-data

The half-hour electricity data is for 300 homes with rooftop solar systems that are measured by a gross meter that records the total amount of solar power generated every 30 minutes. The data has been sourced from 300 randomly selected solar customers in Ausgrid’s electricity network area that were billed on a domestic tariff and had a gross metered solar system installed for the whole of the period from 1 July 2010 to 30 June 2013. The customers chosen had a full set of actual data for the period from 1 July 2010 to 30 June 2011, gathered through our meter reading processes. We also undertook some data quality checking and excluded customers on the high and low ends of household consumption and solar generation performance during the first year.

# Meteostat Weather Data

The weather data from the Meteostat Homepage can be collected via an API (https://dev.meteostat.net/). Meteostat is an open platform which provides free access to historical weather and climate data.

In [3]:
# Required libraries
import pandas as pd
from datetime import datetime

%pip install meteostat -U
from meteostat import  Daily, Hourly

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [9]:
# Set time period for the dataset
start = datetime(2012, 6, 30)
end = datetime(2013, 7, 1)

# Get hourly data
# Provide the needed zip-code
#For more details look at: https://meteostat.net/en/station/94755?t=2023-10-31/2023-11-07
# 94755 -> Zip code for Timezone Australia/ Sydney
hourly_data = Hourly('94755', start, end)

In [10]:
# Get data
df = hourly_data.fetch()
df.head()

Unnamed: 0_level_0,temp,dwpt,rhum,prcp,snow,wdir,wspd,wpgt,pres,tsun,coco
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2012-06-30 02:00:00,16.0,6.7,54.0,,,350.0,7.6,,1012.6,,
2012-06-30 05:00:00,16.8,4.1,43.0,,,220.0,11.2,,1011.5,,
2012-06-30 08:00:00,10.2,3.5,63.0,,,170.0,5.4,,1013.1,,
2012-06-30 11:00:00,6.7,4.4,85.0,,,360.0,3.6,,1015.0,,
2012-06-30 14:00:00,5.1,4.5,96.0,,,,0.0,,1015.2,,
