# Environment Canada weather station data

*March 24, 2022*

In [The Pudding newsletter today](https://mailchi.mp/pudding/dune-1280156?e=fc6ae8c1cd), there was a fantastic visualization titled "How many days since a record-high temperature?". I wanted to recreate the same idea, but for Canada, where I live. Here we go.

Start by importing pandas.

In [16]:
import pandas as pd

Rather than import our data right away, I'm going to import a master list of weather stations across Canada, which we'll use to programatically grab the data from Environment Canada.

In [17]:
stations = pd.read_csv('../raw/RAW 2021 ENVIRONMENT CANADA WEATHER STATIONS.csv', encoding="latin-1", header=2)

display(stations.head())

Unnamed: 0,Name,Province,Climate ID,Station ID,WMO ID,TC ID,Latitude (Decimal Degrees),Longitude (Decimal Degrees),Latitude,Longitude,Elevation (m),First Year,Last Year,HLY First Year,HLY Last Year,DLY First Year,DLY Last Year,MLY First Year,MLY Last Year
0,ACTIVE PASS,BRITISH COLUMBIA,1010066,14,,,48.87,-123.28,485200000,-1231700000,4.0,1984,1996,,,1984.0,1996.0,1984.0,1996.0
1,ALBERT HEAD,BRITISH COLUMBIA,1010235,15,,,48.4,-123.48,482400000,-1232900000,17.0,1971,1995,,,1971.0,1995.0,1971.0,1995.0
2,BAMBERTON OCEAN CEMENT,BRITISH COLUMBIA,1010595,16,,,48.58,-123.52,483500000,-1233100000,85.3,1961,1980,,,1961.0,1980.0,1961.0,1980.0
3,BEAR CREEK,BRITISH COLUMBIA,1010720,17,,,48.5,-124.0,483000000,-1240000000,350.5,1910,1971,,,1910.0,1971.0,1910.0,1971.0
4,BEAVER LAKE,BRITISH COLUMBIA,1010774,18,,,48.5,-123.35,483000000,-1232100000,61.0,1894,1952,,,1894.0,1952.0,1894.0,1952.0


Next, we grab only weather stations at airports. This is a quick and lazy way of getting one climate station for every major city in Canada, but you could also hunt down the ones you want to use manually. We also use a filter to make sure we only get active weather stations.

In [18]:
airports_list = (stations
                .loc[(stations["Name"]
                .str.contains("int'l|international|INTL", case=False)) & (stations["Last Year"] == 2021), "Station ID"]
                .to_list()
                )

airports_list

[51337,
 51442,
 27793,
 50149,
 50430,
 27211,
 51441,
 50091,
 51097,
 49568,
 51459,
 51457,
 26892,
 51157,
 30165,
 49608,
 48568,
 50309,
 54282,
 53938,
 50620,
 50088,
 50089]

Now comes the real data import from EC. A double loop (yikes, I know) loops through and grabs daily records for years between 1980 and now, for every airport in our list above. It takes a few minutes to run this code, but will provide us with all the data we need to continue.

In [19]:
li = []

for station_id in airports_list:
    for year in range(1980, 2023):
        df = pd.read_csv(f'https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID={str(station_id)}&Year={year}&timeframe=2')
        df.insert(0, "Station ID", station_id)
        li.append(df)

raw = pd.concat(li, axis=0, ignore_index=True)
raw["Climate ID"] = raw["Climate ID"].astype(str)

display(raw.head())

Unnamed: 0,Station ID,Longitude (x),Latitude (y),Station Name,Climate ID,Date/Time,Year,Month,Day,Data Quality,...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow on Grnd (cm),Snow on Grnd Flag,Dir of Max Gust (10s deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
0,51337,-123.43,48.65,VICTORIA INTL A,1018621,1980-01-01,1980,1,1,,...,,,,,,,,,,
1,51337,-123.43,48.65,VICTORIA INTL A,1018621,1980-01-02,1980,1,2,,...,,,,,,,,,,
2,51337,-123.43,48.65,VICTORIA INTL A,1018621,1980-01-03,1980,1,3,,...,,,,,,,,,,
3,51337,-123.43,48.65,VICTORIA INTL A,1018621,1980-01-04,1980,1,4,,...,,,,,,,,,,
4,51337,-123.43,48.65,VICTORIA INTL A,1018621,1980-01-05,1980,1,5,,...,,,,,,,,,,


Now we can get into some analysis.

### Days since max temp record

Let's start by looking at days since a maximum temperature record is broken in a day. Note that we're not looking for when the last time the HIGHEST temperature was recorded at a weather station, but rather trying to compare each day to that same day on previous years going back to 1980.

In [20]:
lis_max = []

for climate_id in raw["Climate ID"].astype(str).unique():
    
    station_data = (raw[raw["Climate ID"] == climate_id]
                    .pivot(columns=["Climate ID", "Station Name", "Month", "Day"], index="Year", values="Max Temp (°C)")
                    .dropna(how="all", axis=1)
                    )
    
    max = pd.DataFrame(station_data.idxmax()).reset_index().rename(columns={0: "Year"})
    max["date"] = pd.to_datetime(max[["Year", "Month", "Day"]])
    max["days_since_record"] = -(max["date"] - pd.datetime.today()).dt.days

    max = max[["Station Name", "date", "days_since_record"]].set_index("date")
    
    lis_max.append(max)
    
df = pd.concat(lis_max)
display(df.head())

  max["days_since_record"] = -(max["date"] - pd.datetime.today()).dt.days


Unnamed: 0_level_0,Station Name,days_since_record
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-01,VICTORIA INTL A,820
2021-01-02,VICTORIA INTL A,453
2020-01-03,VICTORIA INTL A,818
2019-01-04,VICTORIA INTL A,1182
2015-01-05,VICTORIA INTL A,2642


Now that we've got the "days since last record" information for every day of the year, we need to group by station name and return the minimum value.

In [21]:
max_values = df.groupby("Station Name").min().sort_values("days_since_record")

display(max_values.head())

Unnamed: 0_level_0,days_since_record
Station Name,Unnamed: 1_level_1
VANCOUVER INTL A,2
GANDER INTL A,2
ST. JOHN'S INTL A,3
FREDERICTON INTL A,4
MONCTON / GREATER MONCTON ROMEO LEBLANC INTL A,4


It might be nice to map this information, so we'll grab the lat/long data from the raw dataframe and join it to our max values dataframe.

In [22]:
locations = (raw
             .loc[:, ["Station Name", "Latitude (y)", "Longitude (x)"]]
             .drop_duplicates("Station Name")
             .set_index("Station Name")
             )

final = max_values.join(locations)

display(final)

Unnamed: 0_level_0,days_since_record,Latitude (y),Longitude (x)
Station Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
VANCOUVER INTL A,2,49.19,-123.18
GANDER INTL A,2,48.94,-54.57
ST. JOHN'S INTL A,3,47.62,-52.75
FREDERICTON INTL A,4,45.87,-66.54
MONCTON / GREATER MONCTON ROMEO LEBLANC INTL A,4,46.11,-64.68
MONCTON/GREATER MONCTON ROMEO LEBLANC INTL A,4,46.11,-64.68
HALIFAX STANFIELD INT'L A,5,44.88,-63.51
WINNIPEG INTL A,7,49.91,-97.24
EDMONTON INTL A,7,53.31,-113.58
SASKATOON INTL A,8,52.17,-106.7


And there we have it: the number of days since a daily record has been broken since 1980 at various airport climate stations.

### Days since min temp record

Now the same thing, but for minimum temperatures.

In [23]:
lis_min = []

for climate_id in raw["Climate ID"].astype(str).unique():
    station_data = raw[raw["Climate ID"] == climate_id].pivot(columns=["Climate ID", "Station Name", "Month", "Day"], index="Year", values="Min Temp (°C)").dropna(how="all", axis=1)
    
    min = pd.DataFrame(station_data.idxmin()).reset_index().rename(columns={0: "Year"})
    min["date"] = pd.to_datetime(min[["Year", "Month", "Day"]])
    min["days_since_record"] = -(min["date"] - pd.datetime.today()).dt.days

    min = min[["Station Name", "date", "days_since_record"]].set_index("date")
    lis_min.append(min)
    
df_min = pd.concat(lis_min)

min_values = df_min.groupby("Station Name").min().sort_values("days_since_record")

locations = raw.loc[:, ["Station Name", "Latitude (y)", "Longitude (x)"]].drop_duplicates("Station Name").set_index("Station Name")
final_min = min_values.join(locations)

display(final_min)

  min["days_since_record"] = -(min["date"] - pd.datetime.today()).dt.days


Unnamed: 0_level_0,days_since_record,Latitude (y),Longitude (x)
Station Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
MONTREAL MIRABEL INTL A,2,45.68,-74.04
QUEBEC INTL A,2,46.79,-71.39
OTTAWA INTL A,2,45.32,-75.67
HALIFAX STANFIELD INT'L A,2,44.88,-63.51
MONCTON / GREATER MONCTON ROMEO LEBLANC INTL A,2,46.11,-64.68
TORONTO INTL A,3,43.68,-79.63
MONTREAL/PIERRE ELLIOTT TRUDEAU INTL,3,45.47,-73.74
MONTREAL INTL A,3,45.47,-73.74
WINNIPEG INTL A,3,49.91,-97.24
REGINA INTL A,6,50.43,-104.67


That's all for now!

\-30\-