# Environment Canada weather station data

*March 24, 2022*

In [The Pudding newsletter today](https://mailchi.mp/pudding/dune-1280156?e=fc6ae8c1cd), there was a fantastic visualization titled "How many days since a record-high temperature?". I wanted to recreate the same idea, but for Canada, where I live. Here we go.

Start by importing pandas.

In [1]:
import pandas as pd
from datetime import timedelta

Rather than import our data right away, I'm going to import a master list of weather stations across Canada, which we'll use to programatically grab the data from Environment Canada.

In [2]:
stations = pd.read_csv('../raw/RAW 2021 ENVIRONMENT CANADA WEATHER STATIONS.csv', encoding="latin-1", header=2)

display(stations.head())

Unnamed: 0,Name,Province,Climate ID,Station ID,WMO ID,TC ID,Latitude (Decimal Degrees),Longitude (Decimal Degrees),Latitude,Longitude,Elevation (m),First Year,Last Year,HLY First Year,HLY Last Year,DLY First Year,DLY Last Year,MLY First Year,MLY Last Year
0,ACTIVE PASS,BRITISH COLUMBIA,1010066,14,,,48.87,-123.28,485200000,-1231700000,4.0,1984,1996,,,1984.0,1996.0,1984.0,1996.0
1,ALBERT HEAD,BRITISH COLUMBIA,1010235,15,,,48.4,-123.48,482400000,-1232900000,17.0,1971,1995,,,1971.0,1995.0,1971.0,1995.0
2,BAMBERTON OCEAN CEMENT,BRITISH COLUMBIA,1010595,16,,,48.58,-123.52,483500000,-1233100000,85.3,1961,1980,,,1961.0,1980.0,1961.0,1980.0
3,BEAR CREEK,BRITISH COLUMBIA,1010720,17,,,48.5,-124.0,483000000,-1240000000,350.5,1910,1971,,,1910.0,1971.0,1910.0,1971.0
4,BEAVER LAKE,BRITISH COLUMBIA,1010774,18,,,48.5,-123.35,483000000,-1232100000,61.0,1894,1952,,,1894.0,1952.0,1894.0,1952.0


Next, we grab only weather stations at airports. This is a quick and lazy way of getting one climate station for every major city in Canada, but you could also hunt down the ones you want to use manually. We also use a filter to make sure we only get active weather stations.

In [3]:
airports_list = (stations
                .loc[(stations["Name"].str.contains("int'l|international|INTL", case=False)) & (stations["Last Year"] == 2021), ["Name", "Station ID"]]
                .set_index("Name")
                .drop(["MONCTON / GREATER MONCTON ROMEO LEBLANC INTL A", "MONTREAL MIRABEL INTL A", "MONTREAL/PIERRE ELLIOTT TRUDEAU INTL", "QUEBEC/JEAN LESAGE INTL", "GANDER INTL A", "CALGARY INT'L CS", "EDMONTON INTERNATIONAL CS"])
                .set_index("Station ID")
                .drop(50620)
                .index
                .to_list()
                )

airports_list

[51337,
 51442,
 50149,
 50430,
 51441,
 50091,
 51097,
 49568,
 51459,
 51457,
 51157,
 48568,
 50309,
 53938,
 50089]

Now comes the real data import from EC. A double loop (yikes, I know) loops through and grabs daily records for years between 1980 and now, for every airport in our list above. It takes a few minutes to run this code, but will provide us with all the data we need to continue.

In [4]:
li = []

for station_id in airports_list:
    for year in range(1980, 2023):
        df = pd.read_csv(f'https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID={str(station_id)}&Year={year}&timeframe=2')
        df.insert(0, "Station ID", station_id)
        li.append(df)

raw = pd.concat(li, axis=0, ignore_index=True)
raw["Climate ID"] = raw["Climate ID"].astype(str)

display(raw.head())

Unnamed: 0,Station ID,Longitude (x),Latitude (y),Station Name,Climate ID,Date/Time,Year,Month,Day,Data Quality,...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow on Grnd (cm),Snow on Grnd Flag,Dir of Max Gust (10s deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
0,51337,-123.43,48.65,VICTORIA INTL A,1018621,1980-01-01,1980,1,1,,...,,,,,,,,,,
1,51337,-123.43,48.65,VICTORIA INTL A,1018621,1980-01-02,1980,1,2,,...,,,,,,,,,,
2,51337,-123.43,48.65,VICTORIA INTL A,1018621,1980-01-03,1980,1,3,,...,,,,,,,,,,
3,51337,-123.43,48.65,VICTORIA INTL A,1018621,1980-01-04,1980,1,4,,...,,,,,,,,,,
4,51337,-123.43,48.65,VICTORIA INTL A,1018621,1980-01-05,1980,1,5,,...,,,,,,,,,,


Now we can get into some analysis.

### Days since max temp record

Let's start by looking at days since a maximum temperature record is broken in a day. Note that we're not looking for when the last time the HIGHEST temperature was recorded at a weather station, but rather trying to compare each day to that same day on previous years going back to 1980.

In [16]:
lis_max = []

for climate_id in raw["Climate ID"].astype(str).unique():
    
    station_data = (raw[raw["Climate ID"] == climate_id]
                    .pivot(columns=["Climate ID", "Station Name", "Month", "Day"], index="Year", values="Max Temp (°C)")
                    .dropna(how="all", axis=1)
                    )
    
    max = pd.DataFrame(station_data.idxmax()).reset_index().rename(columns={0: "Year"})
    
    max["date"] = pd.to_datetime(max[["Year", "Month", "Day"]])
    max["days_since_record"] = -(max["date"] - pd.datetime.today()).dt.days

    max = max[["Station Name", "date", "days_since_record"]].set_index("date")
    
    lis_max.append(max)
    
df = pd.concat(lis_max).reset_index()
display(df.head())

  max["days_since_record"] = -(max["date"] - pd.datetime.today()).dt.days


Unnamed: 0,date,Station Name,days_since_record
0,2020-01-01,VICTORIA INTL A,832
1,2021-01-02,VICTORIA INTL A,465
2,2020-01-03,VICTORIA INTL A,830
3,2019-01-04,VICTORIA INTL A,1194
4,2015-01-05,VICTORIA INTL A,2654


Now that we've got the "days since last record" information for every day of the year, we need to group by station name and return the minimum value.

In [6]:
max_values = df.pivot_table(index="Station Name", values=["days_since_record"], aggfunc="min").sort_values("days_since_record")
max_values["date"] = max_values["days_since_record"].apply(lambda x: pd.datetime.today() - timedelta(days=x)).astype(str).str.slice(0, 10)


display(max_values)

  max_values["date"] = max_values["days_since_record"].apply(lambda x: pd.datetime.today() - timedelta(days=x)).astype(str).str.slice(0, 10)


Unnamed: 0_level_0,days_since_record,date
Station Name,Unnamed: 1_level_1,Unnamed: 2_level_1
SASKATOON INTL A,3,2022-04-08
EDMONTON INTL A,4,2022-04-07
MONCTON/GREATER MONCTON ROMEO LEBLANC INTL A,4,2022-04-07
CALGARY INTL A,5,2022-04-06
FREDERICTON INTL A,6,2022-04-05
OTTAWA INTL A,6,2022-04-05
QUEBEC INTL A,8,2022-04-03
ST. JOHN'S INTL A,8,2022-04-03
HALIFAX STANFIELD INT'L A,9,2022-04-02
REGINA INTL A,12,2022-03-30


It might be nice to map this information, so we'll grab the lat/long data from the raw dataframe and join it to our max values dataframe.

In [7]:
locations = (raw
             .loc[:, ["Station Name", "Latitude (y)", "Longitude (x)"]]
             .drop_duplicates("Station Name")
             .set_index("Station Name")
             )

final = (max_values
         .join(locations)
         )

final.index = (final.index
               .str.replace(" INTL A", "")
               .str.replace(" INT'L A", "")
               .str.replace(" INT'L CS", "")
               .str.replace(" INTERNATIONAL CS", "")
               .str.replace(" STANFIELD", "")
               .str.replace("/GREATER MONCTON ROMEO LEBLANC", "")
               )

display(final)

Unnamed: 0_level_0,days_since_record,date,Latitude (y),Longitude (x)
Station Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
SASKATOON,3,2022-04-08,52.17,-106.7
EDMONTON,4,2022-04-07,53.31,-113.58
MONCTON,4,2022-04-07,46.11,-64.68
CALGARY,5,2022-04-06,51.12,-114.01
FREDERICTON,6,2022-04-05,45.87,-66.54
OTTAWA,6,2022-04-05,45.32,-75.67
QUEBEC,8,2022-04-03,46.79,-71.39
ST. JOHN'S,8,2022-04-03,47.62,-52.75
HALIFAX,9,2022-04-02,44.88,-63.51
REGINA,12,2022-03-30,50.43,-104.67


And there we have it: the number of days since a daily record has been broken since 1980 at various airport climate stations.

### Days since min temp record

Now the same thing, but for minimum temperatures.

In [8]:
lis_min = []

for climate_id in raw["Climate ID"].astype(str).unique():
    station_data = raw[raw["Climate ID"] == climate_id].pivot(columns=["Climate ID", "Station Name", "Month", "Day"], index="Year", values="Min Temp (°C)").dropna(how="all", axis=1)
    
    min = pd.DataFrame(station_data.idxmin()).reset_index().rename(columns={0: "Year"})
    min["date"] = pd.to_datetime(min[["Year", "Month", "Day"]])
    min["days_since_record"] = -(min["date"] - pd.datetime.today()).dt.days

    min = min[["Station Name", "date", "days_since_record"]].set_index("date")
    lis_min.append(min)
    
df_min = pd.concat(lis_min)



min_values = df_min.groupby("Station Name").min().sort_values("days_since_record")

locations = raw.loc[:, ["Station Name", "Latitude (y)", "Longitude (x)"]].drop_duplicates("Station Name").set_index("Station Name")
final_min = min_values.join(locations)

final_min.index = (final_min.index
               .str.replace(" INTL A", "")
               .str.replace(" INT'L A", "")
               .str.replace(" INT'L CS", "")
               .str.replace(" INTERNATIONAL CS", "")
               .str.replace(" STANFIELD", "")
               .str.replace("/GREATER MONCTON ROMEO LEBLANC", "")
               )

display(final_min)

  min["days_since_record"] = -(min["date"] - pd.datetime.today()).dt.days


Unnamed: 0_level_0,days_since_record,Latitude (y),Longitude (x)
Station Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
CALGARY,2,51.12,-114.01
HALIFAX,5,44.88,-63.51
QUEBEC,6,46.79,-71.39
VICTORIA,6,48.65,-123.43
MONTREAL,13,45.47,-73.74
OTTAWA,13,45.32,-75.67
TORONTO,13,43.68,-79.63
WINNIPEG,15,49.91,-97.24
REGINA,18,50.43,-104.67
VANCOUVER,19,49.19,-123.18


Let's now join this to our max table!

In [9]:
all = final.join(final_min, rsuffix="_min", how="left").drop_duplicates()
all.index = all.index.str.capitalize()

all[["days_since_record", "days_since_record_min"]].to_clipboard()

That's all for now!

\-30\-

### Temperatures over time

In [10]:


pivot = (raw
        .pivot_table(index="Date/Time", columns="Station Name", values="Max Temp (°C)", aggfunc="max")
)

pivot.columns = (pivot.columns
               .str.replace(" INTL A", "")
               .str.replace(" INT'L A", "")
               .str.replace(" INT'L CS", "")
               .str.replace(" INTERNATIONAL CS", "")
               .str.replace(" STANFIELD", "")
               .str.replace("/GREATER MONCTON ROMEO LEBLANC", "")
               )

pivot.dropna()

Station Name,CALGARY,EDMONTON,FREDERICTON,HALIFAX,MONCTON,MONTREAL,OTTAWA,QUEBEC,REGINA,SASKATOON,ST. JOHN'S,TORONTO,VANCOUVER,VICTORIA,WINNIPEG
Date/Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2019-03-20,16.8,10.1,5.1,4.3,3.3,7.2,6.7,4.3,10.7,9.2,-1.8,9.7,13.7,21.4,3.8
2019-03-21,13.7,7.0,8.9,6.7,8.1,7.2,6.0,5.9,11.1,11.6,5.2,7.9,16.7,15.7,5.7
2019-03-22,15.2,7.1,4.6,7.4,6.0,3.2,2.7,2.5,11.0,14.8,4.0,3.6,17.3,13.2,4.9
2019-03-23,10.8,2.8,4.5,3.9,4.1,1.2,0.9,5.0,15.1,8.8,12.4,5.1,13.4,13.4,6.6
2019-03-26,14.1,2.4,-0.1,1.8,-0.5,0.1,0.4,-3.1,16.2,13.5,-1.4,4.6,13.0,12.1,2.9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-04-06,8.7,9.5,13.0,11.1,11.9,14.1,16.1,5.2,8.0,5.0,2.5,12.1,10.6,11.5,1.6
2022-04-07,17.5,16.2,11.6,10.1,7.5,7.5,7.4,4.3,8.9,10.5,1.9,13.8,13.3,16.5,4.2
2022-04-08,21.1,24.0,7.9,7.7,11.7,7.1,10.0,1.9,13.7,17.3,3.9,9.7,11.8,12.8,5.1
2022-04-09,6.8,9.5,15.3,12.8,13.9,7.9,10.7,5.1,11.0,14.2,2.3,7.8,9.5,10.7,8.5
