# NSW Air Quality Monthly Averages 2000 - 2024 | Combine Partial Datasets
Dataset downloaded in 4-year chunks to avoid gateway timeout on airquality.nsw.go.au. This notebook outlines the process for combining these disjointed datasets into one.

No additional pre-processing or cleaning is completed in this workflow. This work takes place in a separate file


## Dependencies

Ensure that the required libraries have been installed locally as per the README.md file included in this project.

Run the following cell the import the required dependencies for this notebook.

In [30]:
import pandas as pd

# Allows access to xls data format. #todo: check if this is necessary
%pip install xlrd




## Load Raw Disjointed Data

In [31]:
# Load the raw data
data_2000 = pd.read_excel('partial-datasets/2000-2003-raw.xls')
data_2004 = pd.read_excel('partial-datasets/2004-2007-raw.xls')
data_2008 = pd.read_excel('partial-datasets/2008-2011-raw.xls')
data_2012 = pd.read_excel('partial-datasets/2012-2015-raw.xls')
data_2016 = pd.read_excel('partial-datasets/2016-2019-raw.xls')
data_2020 = pd.read_excel('partial-datasets/2020-2024-raw.xls')

# List of datasets
datasets = [data_2000, data_2004, data_2008, data_2012, data_2016, data_2020]



## Exploratory Analysis of Raw Disjointed Data

In [32]:
# Year being displayed
year = 2000
# Display the head of each dataset
for dataset in datasets:
    print(f"Partial Dataset {year}-{year + 3} Head:")
    display(dataset.head())
    year += 4

Partial Dataset 2000-2003 Head:


Unnamed: 0,Monthly Averages Time Range: 01/01/2000 00:00 to 01/01/2004 00:00,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 246,Unnamed: 247,Unnamed: 248,Unnamed: 249,Unnamed: 250,Unnamed: 251,Unnamed: 252,Unnamed: 253,Unnamed: 254,Unnamed: 255
0,Initial Data,RANDWICK SO2 1h average,ROZELLE SO2 1h average,LINDFIELD SO2 1h average,LIVERPOOL SO2 1h average,BRINGELLY SO2 1h average,CHULLORA SO2 1h average,WYONG SO2 1h average,WALLSEND SO2 1h average,CARRINGTON SO2 1h average,...,BERESFIELD PM10 1h average,TAMWORTH PM10 1h average,WOLLONGONG PM10 1h average,KEMBLA GRANGE PM10 1h average,RICHMOND PM10 1h average,BARGO PM10 1h average,ALBURY PM10 1h average,WAGGA WAGGA PM10 1h average,ST MARYS PM10 1h average,VINEYARD PM10 1h average
1,Date,RANDWICK SO2 monthly average [pphm],ROZELLE SO2 monthly average [pphm],LINDFIELD SO2 monthly average [pphm],LIVERPOOL SO2 monthly average [pphm],BRINGELLY SO2 monthly average [pphm],CHULLORA SO2 monthly average [pphm],WYONG SO2 monthly average [pphm],WALLSEND SO2 monthly average [pphm],CARRINGTON SO2 monthly average [pphm],...,BERESFIELD PM10 monthly average [µg/m³],TAMWORTH PM10 monthly average [µg/m³],WOLLONGONG PM10 monthly average [µg/m³],KEMBLA GRANGE PM10 monthly average [µg/m³],RICHMOND PM10 monthly average [µg/m³],BARGO PM10 monthly average [µg/m³],ALBURY PM10 monthly average [µg/m³],WAGGA WAGGA PM10 monthly average [µg/m³],ST MARYS PM10 monthly average [µg/m³],VINEYARD PM10 monthly average [µg/m³]
2,31/01/2000,0.1,,0.1,,0,,,0.2,,...,16.8,,18.3,,15.3,,,,16.9,15.5
3,29/02/2000,0.1,,0.1,,0.1,,,0.2,,...,20.3,,27.8,,21.2,,,,21.5,18.5
4,31/03/2000,,,0.1,,0,,,0.2,,...,17.9,,21.8,,15,,,,15.7,14.5


Partial Dataset 2004-2007 Head:


Unnamed: 0,Monthly Averages Time Range: 01/01/2004 00:00 to 01/01/2008 00:00,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 246,Unnamed: 247,Unnamed: 248,Unnamed: 249,Unnamed: 250,Unnamed: 251,Unnamed: 252,Unnamed: 253,Unnamed: 254,Unnamed: 255
0,Initial Data,RANDWICK SO2 1h average,ROZELLE SO2 1h average,LINDFIELD SO2 1h average,LIVERPOOL SO2 1h average,BRINGELLY SO2 1h average,CHULLORA SO2 1h average,WYONG SO2 1h average,WALLSEND SO2 1h average,CARRINGTON SO2 1h average,...,BERESFIELD PM10 1h average,TAMWORTH PM10 1h average,WOLLONGONG PM10 1h average,KEMBLA GRANGE PM10 1h average,RICHMOND PM10 1h average,BARGO PM10 1h average,ALBURY PM10 1h average,WAGGA WAGGA PM10 1h average,ST MARYS PM10 1h average,VINEYARD PM10 1h average
1,Date,RANDWICK SO2 monthly average [pphm],ROZELLE SO2 monthly average [pphm],LINDFIELD SO2 monthly average [pphm],LIVERPOOL SO2 monthly average [pphm],BRINGELLY SO2 monthly average [pphm],CHULLORA SO2 monthly average [pphm],WYONG SO2 monthly average [pphm],WALLSEND SO2 monthly average [pphm],CARRINGTON SO2 monthly average [pphm],...,BERESFIELD PM10 monthly average [µg/m³],TAMWORTH PM10 monthly average [µg/m³],WOLLONGONG PM10 monthly average [µg/m³],KEMBLA GRANGE PM10 monthly average [µg/m³],RICHMOND PM10 monthly average [µg/m³],BARGO PM10 monthly average [µg/m³],ALBURY PM10 monthly average [µg/m³],WAGGA WAGGA PM10 monthly average [µg/m³],ST MARYS PM10 monthly average [µg/m³],VINEYARD PM10 monthly average [µg/m³]
2,31/01/2004,0.1,,0.1,,0,,,,,...,20.4,18.4,24.5,,22.8,,,25.3,23.6,21.1
3,29/02/2004,0.1,,0.1,,0,,,0.1,,...,24.6,20.4,27.4,,23.9,,,33.3,25.1,21.9
4,31/03/2004,0.1,,0.1,,0,,,0.1,,...,22.2,17.8,24.2,,21.7,,,44.4,20.1,20.2


Partial Dataset 2008-2011 Head:


Unnamed: 0,Monthly Averages Time Range: 01/01/2008 00:00 to 01/12/2011 00:00,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 246,Unnamed: 247,Unnamed: 248,Unnamed: 249,Unnamed: 250,Unnamed: 251,Unnamed: 252,Unnamed: 253,Unnamed: 254,Unnamed: 255
0,Initial Data,RANDWICK SO2 1h average,ROZELLE SO2 1h average,LINDFIELD SO2 1h average,LIVERPOOL SO2 1h average,BRINGELLY SO2 1h average,CHULLORA SO2 1h average,WYONG SO2 1h average,WALLSEND SO2 1h average,CARRINGTON SO2 1h average,...,BERESFIELD PM10 1h average,TAMWORTH PM10 1h average,WOLLONGONG PM10 1h average,KEMBLA GRANGE PM10 1h average,RICHMOND PM10 1h average,BARGO PM10 1h average,ALBURY PM10 1h average,WAGGA WAGGA PM10 1h average,ST MARYS PM10 1h average,VINEYARD PM10 1h average
1,Date,RANDWICK SO2 monthly average [pphm],ROZELLE SO2 monthly average [pphm],LINDFIELD SO2 monthly average [pphm],LIVERPOOL SO2 monthly average [pphm],BRINGELLY SO2 monthly average [pphm],CHULLORA SO2 monthly average [pphm],WYONG SO2 monthly average [pphm],WALLSEND SO2 monthly average [pphm],CARRINGTON SO2 monthly average [pphm],...,BERESFIELD PM10 monthly average [µg/m³],TAMWORTH PM10 monthly average [µg/m³],WOLLONGONG PM10 monthly average [µg/m³],KEMBLA GRANGE PM10 monthly average [µg/m³],RICHMOND PM10 monthly average [µg/m³],BARGO PM10 monthly average [µg/m³],ALBURY PM10 monthly average [µg/m³],WAGGA WAGGA PM10 monthly average [µg/m³],ST MARYS PM10 monthly average [µg/m³],VINEYARD PM10 monthly average [µg/m³]
2,31/01/2008,0,,,,0.1,0.1,,0.1,,...,19,10.9,22.4,21.2,17.1,,22,,18.8,17.1
3,29/02/2008,0,,0,,0.1,0,,0.2,,...,16.7,11.4,17,15.7,12.5,,18.4,23.1,13.8,13.3
4,31/03/2008,0.1,,0,,0,0,,0.1,,...,19.9,16.4,18.7,19.3,14.5,,27.4,36.6,16.1,15.1


Partial Dataset 2012-2015 Head:


Unnamed: 0,Monthly Averages Time Range: 01/01/2012 00:00 to 01/01/2016 00:00,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 246,Unnamed: 247,Unnamed: 248,Unnamed: 249,Unnamed: 250,Unnamed: 251,Unnamed: 252,Unnamed: 253,Unnamed: 254,Unnamed: 255
0,Initial Data,RANDWICK SO2 1h average,ROZELLE SO2 1h average,LINDFIELD SO2 1h average,LIVERPOOL SO2 1h average,BRINGELLY SO2 1h average,CHULLORA SO2 1h average,WYONG SO2 1h average,WALLSEND SO2 1h average,CARRINGTON SO2 1h average,...,BERESFIELD PM10 1h average,TAMWORTH PM10 1h average,WOLLONGONG PM10 1h average,KEMBLA GRANGE PM10 1h average,RICHMOND PM10 1h average,BARGO PM10 1h average,ALBURY PM10 1h average,WAGGA WAGGA PM10 1h average,ST MARYS PM10 1h average,VINEYARD PM10 1h average
1,Date,RANDWICK SO2 monthly average [pphm],ROZELLE SO2 monthly average [pphm],LINDFIELD SO2 monthly average [pphm],LIVERPOOL SO2 monthly average [pphm],BRINGELLY SO2 monthly average [pphm],CHULLORA SO2 monthly average [pphm],WYONG SO2 monthly average [pphm],WALLSEND SO2 monthly average [pphm],CARRINGTON SO2 monthly average [pphm],...,BERESFIELD PM10 monthly average [µg/m³],TAMWORTH PM10 monthly average [µg/m³],WOLLONGONG PM10 monthly average [µg/m³],KEMBLA GRANGE PM10 monthly average [µg/m³],RICHMOND PM10 monthly average [µg/m³],BARGO PM10 monthly average [µg/m³],ALBURY PM10 monthly average [µg/m³],WAGGA WAGGA PM10 monthly average [µg/m³],ST MARYS PM10 monthly average [µg/m³],VINEYARD PM10 monthly average [µg/m³]
2,31/01/2012,0.1,,0,,0,0,,0.1,,...,20.6,12.5,20.2,18.9,15,16.4,13.9,,15.5,15.4
3,29/02/2012,0.1,,0.1,,0,0.1,,0.1,,...,16.5,11.4,15.7,13.1,11.6,12.8,12.7,,11.6,11.9
4,31/03/2012,0.1,,0,,0,0.1,,0,,...,17.2,12.3,16.5,17,11.4,12.3,13.3,,12.9,12


Partial Dataset 2016-2019 Head:


Unnamed: 0,Monthly Averages Time Range: 01/01/2016 00:00 to 01/01/2020 00:00,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 246,Unnamed: 247,Unnamed: 248,Unnamed: 249,Unnamed: 250,Unnamed: 251,Unnamed: 252,Unnamed: 253,Unnamed: 254,Unnamed: 255
0,Initial Data,RANDWICK SO2 1h average,ROZELLE SO2 1h average,LINDFIELD SO2 1h average,LIVERPOOL SO2 1h average,BRINGELLY SO2 1h average,CHULLORA SO2 1h average,WYONG SO2 1h average,WALLSEND SO2 1h average,CARRINGTON SO2 1h average,...,BERESFIELD PM10 1h average,TAMWORTH PM10 1h average,WOLLONGONG PM10 1h average,KEMBLA GRANGE PM10 1h average,RICHMOND PM10 1h average,BARGO PM10 1h average,ALBURY PM10 1h average,WAGGA WAGGA PM10 1h average,ST MARYS PM10 1h average,VINEYARD PM10 1h average
1,Date,RANDWICK SO2 monthly average [pphm],ROZELLE SO2 monthly average [pphm],LINDFIELD SO2 monthly average [pphm],LIVERPOOL SO2 monthly average [pphm],BRINGELLY SO2 monthly average [pphm],CHULLORA SO2 monthly average [pphm],WYONG SO2 monthly average [pphm],WALLSEND SO2 monthly average [pphm],CARRINGTON SO2 monthly average [pphm],...,BERESFIELD PM10 monthly average [µg/m³],TAMWORTH PM10 monthly average [µg/m³],WOLLONGONG PM10 monthly average [µg/m³],KEMBLA GRANGE PM10 monthly average [µg/m³],RICHMOND PM10 monthly average [µg/m³],BARGO PM10 monthly average [µg/m³],ALBURY PM10 monthly average [µg/m³],WAGGA WAGGA PM10 monthly average [µg/m³],ST MARYS PM10 monthly average [µg/m³],VINEYARD PM10 monthly average [µg/m³]
2,31/01/2016,0.1,0.1,0,,0,0,0.1,0.1,0.2,...,19.6,13.4,21.9,23,15.1,15.2,14.6,,16.5,16.7
3,29/02/2016,0.1,0.1,0.1,,0,0.1,0.1,0.1,0.2,...,22.1,17.9,21.7,24.7,18,16.6,20.8,,21.4,20.7
4,31/03/2016,0.1,0.1,0.1,,0.1,0.1,0.1,0.1,0.2,...,17.4,15.7,18.6,21,15.6,14.7,20.1,,14.8,17.5


Partial Dataset 2020-2023 Head:


Unnamed: 0,Monthly Averages Time Range: 01/01/2020 00:00 to 01/09/2024 00:00,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 246,Unnamed: 247,Unnamed: 248,Unnamed: 249,Unnamed: 250,Unnamed: 251,Unnamed: 252,Unnamed: 253,Unnamed: 254,Unnamed: 255
0,Initial Data,RANDWICK SO2 1h average,ROZELLE SO2 1h average,LINDFIELD SO2 1h average,LIVERPOOL SO2 1h average,BRINGELLY SO2 1h average,CHULLORA SO2 1h average,WYONG SO2 1h average,WALLSEND SO2 1h average,CARRINGTON SO2 1h average,...,BERESFIELD PM10 1h average,TAMWORTH PM10 1h average,WOLLONGONG PM10 1h average,KEMBLA GRANGE PM10 1h average,RICHMOND PM10 1h average,BARGO PM10 1h average,ALBURY PM10 1h average,WAGGA WAGGA PM10 1h average,ST MARYS PM10 1h average,VINEYARD PM10 1h average
1,Date,RANDWICK SO2 monthly average [pphm],ROZELLE SO2 monthly average [pphm],LINDFIELD SO2 monthly average [pphm],LIVERPOOL SO2 monthly average [pphm],BRINGELLY SO2 monthly average [pphm],CHULLORA SO2 monthly average [pphm],WYONG SO2 monthly average [pphm],WALLSEND SO2 monthly average [pphm],CARRINGTON SO2 monthly average [pphm],...,BERESFIELD PM10 monthly average [µg/m³],TAMWORTH PM10 monthly average [µg/m³],WOLLONGONG PM10 monthly average [µg/m³],KEMBLA GRANGE PM10 monthly average [µg/m³],RICHMOND PM10 monthly average [µg/m³],BARGO PM10 monthly average [µg/m³],ALBURY PM10 monthly average [µg/m³],WAGGA WAGGA PM10 monthly average [µg/m³],ST MARYS PM10 monthly average [µg/m³],VINEYARD PM10 monthly average [µg/m³]
2,31/01/2020,0,0,,,0,0.1,0.1,0.3,0.1,...,35.6,41.9,45.5,50.3,47.6,54,78.6,,52.7,
3,29/02/2020,0.1,0,,0.1,0,0.1,0.1,0.1,0,...,19.4,12.5,24.5,23.8,18.1,18.3,32.1,,19.6,
4,31/03/2020,0.1,0.1,,0,,0,0,0.1,0,...,15.8,11.9,17.3,20,14.1,14,18.3,,16.4,


## Combine Datasets

In [33]:
# Set appropriate column names for each dataset.
for dataset in datasets:
    dataset.columns = dataset.iloc[1]
    dataset = dataset.iloc[2:]

# Concatenate the datasets.
combined_dataset = pd.concat(datasets)

## Output Combine Raw Dataset

In [34]:
# Save the concatenated dataset.
combined_dataset.to_excel('raw-combined.xlsx', index=False, engine='openpyxl')

## View Combined Raw Dataset

In [36]:
print("Combined Dataset Head:")             
display(combined_dataset.head())         

print("Combined Dataset Tail:")             
display(combined_dataset.tail())   

Combined Dataset Head:


1,Date,RANDWICK SO2 monthly average [pphm],ROZELLE SO2 monthly average [pphm],LINDFIELD SO2 monthly average [pphm],LIVERPOOL SO2 monthly average [pphm],BRINGELLY SO2 monthly average [pphm],CHULLORA SO2 monthly average [pphm],WYONG SO2 monthly average [pphm],WALLSEND SO2 monthly average [pphm],CARRINGTON SO2 monthly average [pphm],...,BERESFIELD PM10 monthly average [µg/m³],TAMWORTH PM10 monthly average [µg/m³],WOLLONGONG PM10 monthly average [µg/m³],KEMBLA GRANGE PM10 monthly average [µg/m³],RICHMOND PM10 monthly average [µg/m³],BARGO PM10 monthly average [µg/m³],ALBURY PM10 monthly average [µg/m³],WAGGA WAGGA PM10 monthly average [µg/m³],ST MARYS PM10 monthly average [µg/m³],VINEYARD PM10 monthly average [µg/m³]
0,Initial Data,RANDWICK SO2 1h average,ROZELLE SO2 1h average,LINDFIELD SO2 1h average,LIVERPOOL SO2 1h average,BRINGELLY SO2 1h average,CHULLORA SO2 1h average,WYONG SO2 1h average,WALLSEND SO2 1h average,CARRINGTON SO2 1h average,...,BERESFIELD PM10 1h average,TAMWORTH PM10 1h average,WOLLONGONG PM10 1h average,KEMBLA GRANGE PM10 1h average,RICHMOND PM10 1h average,BARGO PM10 1h average,ALBURY PM10 1h average,WAGGA WAGGA PM10 1h average,ST MARYS PM10 1h average,VINEYARD PM10 1h average
1,Date,RANDWICK SO2 monthly average [pphm],ROZELLE SO2 monthly average [pphm],LINDFIELD SO2 monthly average [pphm],LIVERPOOL SO2 monthly average [pphm],BRINGELLY SO2 monthly average [pphm],CHULLORA SO2 monthly average [pphm],WYONG SO2 monthly average [pphm],WALLSEND SO2 monthly average [pphm],CARRINGTON SO2 monthly average [pphm],...,BERESFIELD PM10 monthly average [µg/m³],TAMWORTH PM10 monthly average [µg/m³],WOLLONGONG PM10 monthly average [µg/m³],KEMBLA GRANGE PM10 monthly average [µg/m³],RICHMOND PM10 monthly average [µg/m³],BARGO PM10 monthly average [µg/m³],ALBURY PM10 monthly average [µg/m³],WAGGA WAGGA PM10 monthly average [µg/m³],ST MARYS PM10 monthly average [µg/m³],VINEYARD PM10 monthly average [µg/m³]
2,31/01/2000,0.1,,0.1,,0,,,0.2,,...,16.8,,18.3,,15.3,,,,16.9,15.5
3,29/02/2000,0.1,,0.1,,0.1,,,0.2,,...,20.3,,27.8,,21.2,,,,21.5,18.5
4,31/03/2000,,,0.1,,0,,,0.2,,...,17.9,,21.8,,15,,,,15.7,14.5


Combined Dataset Tail:


1,Date,RANDWICK SO2 monthly average [pphm],ROZELLE SO2 monthly average [pphm],LINDFIELD SO2 monthly average [pphm],LIVERPOOL SO2 monthly average [pphm],BRINGELLY SO2 monthly average [pphm],CHULLORA SO2 monthly average [pphm],WYONG SO2 monthly average [pphm],WALLSEND SO2 monthly average [pphm],CARRINGTON SO2 monthly average [pphm],...,BERESFIELD PM10 monthly average [µg/m³],TAMWORTH PM10 monthly average [µg/m³],WOLLONGONG PM10 monthly average [µg/m³],KEMBLA GRANGE PM10 monthly average [µg/m³],RICHMOND PM10 monthly average [µg/m³],BARGO PM10 monthly average [µg/m³],ALBURY PM10 monthly average [µg/m³],WAGGA WAGGA PM10 monthly average [µg/m³],ST MARYS PM10 monthly average [µg/m³],VINEYARD PM10 monthly average [µg/m³]
53,30/04/2024,0.1,0.1,,0.1,0,,0.0,0.2,0.2,...,15.5,12.1,15.8,21.8,13.6,14.1,23.5,,15.6,
54,31/05/2024,0.1,0.1,,0.1,0,,0.0,0.2,0.2,...,14.8,13.0,12.9,16.5,11.3,10.7,24.2,,11.7,
55,30/06/2024,0.1,0.0,,0.1,0,,0.0,0.2,0.3,...,12.8,12.3,9.5,11.4,9.2,8.6,11.7,,10.3,
56,31/07/2024,0.1,0.0,,0.0,0,,0.0,0.2,0.2,...,13.1,10.5,9.7,14.5,8.6,7.9,10.5,,10.9,
57,31/08/2024,0.1,0.1,,0.1,0,,0.1,0.1,0.2,...,16.2,13.8,14.8,22.6,13.1,13.8,13.3,,13.2,
