Code to concat all Climate Data and filtering out unwanted stations.

In [2]:
import os
import pandas as pd

## Code to concatenate

    Columns Explained:    
    Station Name: The name of the weather station.
    Longitude: The longitude coordinate of the weather station.
    Latitude: The latitude coordinate of the weather station.
    Climate ID: An identifier for the climate region.
    Province: The province where the weather station is located.
    Year: The year of the observation.
    Month: The month of the observation.
    Tm: Mean temperature for the month (degrees Celsius).
    DwTm: Days without valid mean temperature.
    D: Number of days the temperature was below 0 degrees Celsius.
    Tx: Maximum temperature for the month (degrees Celsius).
    DwTx: Days without valid maximum temperature.
    Tn: Minimum temperature for the month (degrees Celsius).
    DwTn: Days without valid minimum temperature.
    S: Snowfall for the month (cm).
    DwS: Days without valid snowfall.
    S%N: Percent of normal snowfall.
    P: Total precipitation for the month (mm).
    DwP: Days without valid precipitation.
    P%N: Percent of normal precipitation.
    S_G: Snow on the ground at the end of the month (cm).
    Pd: Number of days with precipitation.
    BS: Bright sunshine hours for the month.
    DwBS: Days without valid sunshine.
    BS%: Percent of normal bright sunshine.
    HDD: Heating degree days (a measure of how much heating is required).
    CDD: Cooling degree days (a measure of how much cooling is required).

In [2]:
# Directory paths for all provinces (local)
directory_path_ON = 'C:/Users/sixte/University of Toronto/ClimateDataMonthlyON/'
directory_path_QC = 'C:/Users/sixte/University of Toronto/ClimateDataMonthlyMB/'
directory_path_MB = 'C:/Users/sixte/University of Toronto/ClimateDataMonthlyQC/'

# Initialize list to store dataframes.
dataframes = []

# Loop through the files in the directory for the province.
for filename in os.listdir(directory_path_ON):
    file_path = os.path.join(directory_path_ON, filename)
    
    # There are problems with the formats in some CSV files, so a try & catch statement is added to avoid crashing.
    try:
        # Files are encoded in latin1.
        df = pd.read_csv(file_path, encoding='latin1')
        dataframes.append(df)
    except Exception as e:
        print(f"Error reading file: {e}")

# This is repeated for all three provinces.
for filename in os.listdir(directory_path_QC):
    file_path = os.path.join(directory_path_QC, filename)
        
    try:
        df = pd.read_csv(file_path, encoding='latin1')
        dataframes.append(df)
    except Exception as e:
        print(f"Error reading file: {e}")

for filename in os.listdir(directory_path_MB):
    file_path = os.path.join(directory_path_MB, filename)
        
    try:
        df = pd.read_csv(file_path, encoding='latin1')
        dataframes.append(df)
    except Exception as e:
        print(f"Error reading file: {e}")

# All monthly data is concatenated into one dataframe.
climateM = pd.concat(dataframes, ignore_index = True)

# A date is included to make future pairing easier. The date assigned is the first of each month, to make things easier.
climateM['Date'] = pd.to_datetime(climateM[['Year', 'Month']].assign(DAY=1))

# Drop columns that aren't needed (see explanation above) to make the final file easier to manage.
columns_to_drop = ['Province', 'DwTm', 'DwTx', 'DwTn', 'DwS', 'DwP', 'DwBS', 'Year', 'Month', 'D', 'S', 'S%N', 'S_G', 'HDD', 'CDD']

climateM = climateM.drop(columns = columns_to_drop)

# Rename Climate ID to Station ID. This was done to match the name of the ID number in a separate Hydrological Data set, that was later omitted due to complexity issues.
climateM.rename(columns={'Climate ID': 'Station ID'}, inplace = True)

# Save the Climate DataFrame as one file.
climateM.to_csv('C:/Users/sixte/University of Toronto/ClimateDataMonthly/concatenated_climate_monthly.csv', index = False)


## Code to filter

Here we filter out the Manitoba and Quebec stations that are far away from Ontario Borders.\
As some stations in these provinces are closer to some places in Ontario than stations in Ontario, their data will be useful to the model we are building.\
However, if they are far away, they can be omitted, meaning most of them are filtered out.

In [3]:
# Read file
# climateM = pd.read_csv('C:/Users/sixte/University of Toronto/ClimateDataMonthly/concatenated_climate_monthly.csv')

# Longitude filtering
climateM = climateM[climateM['Longitude'] >= -96]
climateM = climateM[climateM['Longitude'] <= -78.5]

# Latitude filtering
climateM = climateM[climateM['Latitude'] <= 57]

# Removign data from 1997, 1998 and 1999 that was originially downloaded. These years won't be included in the model anyway.
climateM = climateM[climateM['Date'] >= '2000-01-01']

# Save as new file
climateM.to_csv('C:/Users/sixte/University of Toronto/ClimateDataMonthly/all_filtered_monthly.csv', index = False)