![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcurriculum-notebooks&branch=master&subPath=Science/ClimateAcrossProvinces/climate-across-provinces.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"></a>

# Climate Across Provinces

Studying climate patterns across Canadian provinces offers insights into diverse weather behaviors. By analyzing temperature shifts during specific months, we can reveal important information in regard to impacts on agriculture, biodiversity, and disaster preparedness. 

*Curriculum Connections*
- [Investigating factors affecting Climate](https://www.alberta.ca/curriculum-science) 

*Investigating Questions*
- Which provinces experience the highest and lowest annual temperatures? What geographical distinctions contribute to these temperature differences?
- Does the variation in elevation significantly affect temperatures across provinces?
- What natural elements, such as proximity to large water bodies or the presence of mountain ranges, can play a role in influencing temperatures across provinces?

***

### Import the Data
The code below will import the Python programming libraries we need to gather and organize the data to answer our question. `▶Run` the code cell below 

In [None]:
import pandas as pd
import plotly.express as px 
import geopandas as gpd
import folium
print('Libraries imported.')

### Analysis

Let's begin exploring our dataset containing information about provincial temperatures across Canada. The dataset was obtained from [Kaggle](https://www.kaggle.com/datasets/hemil26/canada-weather), data being scraped from [Wikipedia](https://en.wikipedia.org/wiki/Temperature_in_Canada). 

The dataset when imported is also transformed into a dataframe. You can think of a dataframe as essentially a spreadsheet, where each row represents a record or an observation, while each column represents a different attribute or variable.

In [None]:
climate_data = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/Science/ClimateAcrossProvinces/canada_weather.csv')
climate_data

We notice that in our dataset, we have *10* different columns. 

|Column|Description|
|-|-|
|Community|city and province of the climate data|
|Weather station|3 digit airport code|
|Location|latitude and longitude coordinates of the station|
|Elevation|how far the station is from sea-level|

The rest of the columns seems to refer specific month average temperatures, varying from high to low, or annual average temperatures, which also vary from high to low. 

### Data Cleaning

Sometimes mistakes or errors can happen when collecting information. Data cleaning involves finding and fixing these mistakes. It's important because clean data helps us make better decisions and find meaningful patterns.

In our particular situation, we have a couple of things we want to adjust within our dataframe. These include separating provincial codes (ex. AB, BC, NU) from cities and changing the temperature/elevation columns. We want the temperature and elevation columns to only show a single number to make working with these columns easier in the future. 

To begin, let's take the provincial codes from each city and put the codes into their own column. 

Note: To reduce showing large outputs of **climate_data** every time something is adjusted, we will be using *.sample(5)* to show a random sample of 5 different rows in the data as a means of showing change in the dataframe.

In [None]:
climate_data['Province'] = climate_data.Community.str[-2:]
climate_data.sample(5)

Perfect! We now have a new column called `Province` which contains only provincial codes. The next issue we want to tackle is in regard to how our data is presented in columns, particularly the temperature-related columns. 

In these columns, we only want individual temperature values without brackets. This would allow us to define a specific number or float (decimal number) for each column entry instead of the entries being considered strings (words or letters). We can fix this by identifying the first `(`` bracket and then cutting off the string at this point. Afterwards, we can also replace the string negative sign with an actual negative sign. We can then convert our string to a float value.

In [None]:
def clean_and_convert_to_float(s):
    cleaned_value = s.split('(')[0].strip().replace('−', '-') # replace the dash with a negative symbol
    try:
        return float(cleaned_value)
    except ValueError:
        return None  

columns_to_clean = [
    'Annual(Avg. high °C (°F))',
    'January(Avg. high °C (°F))',
    'January(Avg. low °C (°F))',
    'July(Avg. high °C (°F))',
    'July(Avg. low °C (°F))',
    'Annual(Avg. low °C (°F))'
]

for column in columns_to_clean:
    climate_data[column] = climate_data[column].apply(clean_and_convert_to_float)
climate_data.sample(5)

Now our values are only single decimal-point numbers. The next thing we need to do is fix the `Elevation` column. 

Similarly to our temperature columns, we want to only have a single decimal-point number in these column entries for easier data manipulation later on. This is achieved similarly where we find the first instance of an *m* and then cut the string off from that point. We also have cases of numbers having commas in them such as 1,084m. This is an issue because Python identifies the comma as a string and defines 1,084m as a string despite being designed to being a number. To get around this, whenever we see a comma, we can replace it with an empty space.  

In [None]:
def fix_elevation(text):
    new_elevation = text.split('m')[0].replace(',', '').strip()
    try:
        return float(new_elevation)
    except ValueError:
        return None

climate_data['Elevation'] = climate_data['Elevation'].apply(fix_elevation)
climate_data.sample(5)

The last we need to do is separate the *latitude* and *longitude* values from the `Location` column. The process of separating is a bit complicated, but it can be summed up as identifying patterns in our columns. After obtaining our latitude and longitude values and converting them into numeric values, we can drop the `Location` column as we do not need to use it anymore. 

In [None]:
location_parts = climate_data["Location"].str.split("/", expand=True)[1]

latitude = location_parts.str.extract(r'(\d+\.\d+)°N')[0]
longitude = location_parts.str.extract(r'(\d+\.\d+)°W')[0]

latitude = pd.to_numeric(latitude)
longitude = pd.to_numeric(longitude)

climate_data["Latitude"] = latitude
climate_data["Longitude"] = longitude

climate_data = climate_data.drop("Location", axis=1)
climate_data.sample(5)

### Highest and Lowest Temperatures

Our data has been cleaned meaning we can move onto analyzing and visualizing our data. First, since we have access to months, January and July, that are correlated with *winter* and *summer* respectively, let's find which city/province contains the coldest and hottest temperatures in our dataframe.

Let's do this by finding the lowest average temperature found in the column `January(Avg. low °C (°F))` (representing winter) and the highest average temperature found in the column `July(Avg. high °C (°F))`  (representing summer).

In [None]:
lowest_jan_temp_data = climate_data[climate_data['January(Avg. low °C (°F))'] == climate_data['January(Avg. low °C (°F))'].min()]
highest_july_temp_data = climate_data[climate_data['July(Avg. high °C (°F))'] == climate_data['July(Avg. high °C (°F))'].max()]

location_jan = lowest_jan_temp_data['Community'].to_string(index=False)
location_july = highest_july_temp_data['Community'].to_string(index=False)

jan_temp = lowest_jan_temp_data['January(Avg. low °C (°F))'].to_string(index=False)
july_temp = highest_july_temp_data['July(Avg. high °C (°F))'].to_string(index=False)

print(f"The lowest average temperature recorded in January in the dataset was in {location_jan}, with a recorded temperature of {jan_temp}°C")
print(f"The highest average temperature recorded in July in the dataset was in {location_july}, with a recorded temperature of {july_temp}°C")

It appears that the coldest recorded temperature in January was recorded in Resolute, Nunavut and the highest recorded temperature in July was recorded in Kamloops, British Columbia. 

This makes sense due to the geographical and climatic differences between Nunavut and British Columbia. Nunavut is a territory located in northern Canada, characterized by its high latitude and Arctic climate. In January, during the heart of winter, Nunavut experiences extreme cold temperatures due to its proximity to the Arctic Circle and its exposure to polar air masses.

On the other hand, British Columbia is a province situated on the west coast of Canada, with a much lower latitude compared to Nunavut. It benefits from the moderating influence of the Pacific Ocean, which tends to keep its coastal regions milder in comparison to more inland areas. In July, during the summer months, British Columbia can experience higher temperatures due to the ocean's ability to store and release heat slowly, leading to warmer conditions along its coastline.

However, considering the averages of the highest and lowest temperatures for each month might provide a more comprehensive understanding of the overall climate in these regions. Let's do this by taking the average between both the January and July temperature columns. Let's also take the average of annual temperatures for later use.

In [None]:
climate_data['January Average'] = climate_data[['January(Avg. low °C (°F))', 'January(Avg. high °C (°F))']].mean(axis=1)
climate_data['July Average'] = climate_data[['July(Avg. low °C (°F))', 'July(Avg. high °C (°F))']].mean(axis=1)
climate_data['Annual Average'] = climate_data[['Annual(Avg. high °C (°F))', 'Annual(Avg. low °C (°F))']].mean(axis=1)

climate_data.sample(5)

In [None]:
avg_jan_temp = climate_data[climate_data['January Average'] == climate_data['January Average'].min()]
avg_july_temp = climate_data[climate_data['July Average'] == climate_data['July Average'].max()]

location_jan = avg_jan_temp['Community'].to_string(index=False)
location_july = avg_july_temp['Community'].to_string(index=False)

jan_temp = avg_jan_temp['January Average'].to_string(index=False)
july_temp = avg_july_temp['July Average'].to_string(index=False)

print(f"The lowest overall average temperature recorded in January in the dataset was in {location_jan}, with a recorded temperature of {jan_temp}°C")
print(f"The highest overall average temperature recorded in July in the dataset was in {location_july}, with a recorded temperature of {july_temp}°C")

It appears that there is a difference in our output, with the highest overall average temperature in July now being Windsor, Ontario. This would mean that the reason why Windsor overall has a higher overall average temperature is due to a combination of either/both Windsor having a high `July(Avg. low °C (°F))` value or Kamloops having a low `July(Avg. low °C (°F))` value. 

As for general reasons why Windsor has a high overall average temperature, Windsor being situated in southern Ontario, is close to the Great Lakes, leading to similar effects onto the local climate like Kamloops, British Columbia. 

Let's also find the overall highest/lowest temperatures recorded in our dataframe and see if it differs to our findings in January and July. 

In [None]:
overall_highest_temp = climate_data[climate_data['Annual Average'] == climate_data['Annual Average'].max()]
overall_lowest_temp = climate_data[climate_data['Annual Average'] == climate_data['Annual Average'].min()]

highest_overall_loc = overall_highest_temp['Community'].to_string(index=False)
lowest_overall_loc = overall_lowest_temp['Community'].to_string(index=False)

highest_temp = overall_highest_temp['Annual Average'].to_string(index=False)
lowest_temp = overall_lowest_temp['Annual Average'].to_string(index=False)

print(f"The average lowest annual temperature recorded in the dataset was in {lowest_overall_loc}, with a recorded temperature of {lowest_temp}°C")
print(f"The average highest annual temperature recorded in the dataset was in {highest_overall_loc}, with a recorded temperature of {highest_temp}°C")

Looking at the output, we see there is one difference with the average highest annual temperature recorded being in Vancouver, B.C. This isn't very surprising as Vancouver is in the same province as Kamloops, and presents very similar reasons to why it has an overall high temperature year-round.

Aside from this minor difference, the output generally makes sense as the province that records the coldest temperature in January, a month characterized by winter, naturally tends to have a climate that sustains cold conditions throughout the year. In contrast, the province boasting the highest July temperatures, a month representing the peak of summer, naturally maintains a warmer climate on an annual basis. 

### Geographical Influences: Elevation, Latitude, and Longitude

Exploring the provinces/cities with the coldest and hottest average temperatures has provided valuable insights. We can shift our focus to geographical influences to see if they have any significant influence to local climates. To start, we'll investigate whether *elevation* has any significant impact onto the overall temperatures in the dataframe. In this context, elevation refers to a city's height above sea level. This can be done by plotting the elevations of cities in the dataframe versus the overall annual average temperature. 

In [None]:
px.scatter(climate_data, x='Elevation', y='Annual Average', color='Province', hover_data=['Community'], labels={'Elevation': 'Elevation (m)', 'Annual Average': 'Annual Average (°C)'}, title='Elevation Effects on Annual Average Temperature')

Looking at the output of the figures, there doesn't seem to be a definitive case to say that elevation is a main cause of temperatures. There appears to be cities that have low elevation such as Resolute and Victoria but have staggeringly different annual highest temperatures. The former has an annual average of -15.65°C while the latter has an annual average high of 10°C.

In our next visualizations, we'll shift our focus towards assessing whether latitude or longitude has any significant effects on temperature. *Latitude* refers to a city's position north or south of the equator, with higher latitudes being closer to the poles and lower latitudes closer to the equator. On the other hand, *longitude* represents a city's position east or west of the Prime Meridian. 

In [None]:
px.scatter(climate_data, x='Latitude', y='Annual Average', color='Province', hover_data=['Community'], labels={'Latitude': 'Latitude °N', 'Annual Average': 'Annual Average (°C)'}, title='Latitude Effects on Annual Average Temperature').show()
px.scatter(climate_data, x='Longitude', y='Annual Average', color='Province', hover_data=['Community'], labels={'Longitude': 'Longitude °W', 'Annual Average': 'Annual Average (°C)'}, title='Longitude Effects on Annual Average Temperature').show()

Looking at the output of both figures, it appears that *latitude* has a significantly larger impact on climate compared to *longitude*. Many cities that have lower latitude values appear to average higher annual temperatures and vice-versa, cities with higher latitude values appear to average lower annual temperatures. As a result, it appears that longitude has an *inverse* relationship with temperature, meaning higher latitude values result in lower temperatures. However, it appears longitude does not have any significant relationship with temperature. Many cities that have higher longitude values such as Vancouver, Inuvik, and Whitehorse all average drastically different annual temperatures (10.35°C, -8.2°C, and -0.05°C respectively). 


Scientifically, this makes sense as latitude has more impact on climate because of the way solar energy is distributed across Earth's surface. The sun's rays near the equator strikes directly for more hours in a smaller area, leading to warmer temperatures. Near the poles, sunlight is spread over a larger area, resulting in colder temperatures. Longitude on the other hand focuses only on east and west directions, which does not affect the angle how of the sun hits the Earth. 

### Reflection Questions:

1. How do the temperature differences between cities like Resolute and Victoria challenge the assumption that elevation is a major determinant of temperature?
   
2. What factors other than elevation, latitude, and longitude could contribute to the differences in temperature patterns between various cities?
   
3. Considering your own location, how does the combination of elevation, latitude, and longitude contribute to the climate you experience, and how does this compare to other regions?

### Visualizing Provinces

In this final section, we'll be visualizing climate differences between provinces, visualizing it through folium. To begin, let's read the geoJSON data of the provinces in Canada. GeoJSON in this context refers to representing the geographical boundaries and shapes between the different provinces in Canada. 

In [None]:
prov_data = gpd.read_file('https://raw.githubusercontent.com/callysto/data-files/main/Science/ClimateAcrossProvinces/geopandas.geojson')
prov_data

Looking at the imported data, let's convert the names of the provinces to just the abbrevations of the provinces in order to match the format we have of our other *climate_data* dataframe.

In [None]:
prov_data.prov_name_fr.replace(
    {
        'Alberta': 'AB',
        'Manitoba': 'MB',
        'Yukon': 'YT',
        'Terre-Neuve-et-Labrador': 'NL',
        'Nouveau-Brunswick': 'NB',
        'Saskatchewan': 'SK',
        'Nouvelle-Écosse': 'NS',
        'Territoires du Nord-Ouest': 'NT',
        'Île-du-Prince-Édouard': 'PE',
        'Nunavut': 'NU',
        'Québec': 'QC',
        'Ontario': 'ON',
        'Colombie-Britannique': 'BC'
    },
    inplace=True
)

prov_data.rename(columns={'prov_name_fr': "Province"}, inplace=True)  
prov_data

We can also merge our two dataframes *climate_data* and *prov_data* into one dataframe in order to get the right `geometry` data for each city/province as both dataframes contain a `Province` column. By   leveraging a left join approach, it ensures that every entry in the resulting dataframe, named data_w_climate, retains its geometry data while also incorporating climate-related details. 

In [None]:
data_w_climate = prov_data.merge(climate_data, left_on='Province', right_on='Province', how='left')
data_w_climate.sample(5)

Now we have a merged dataframe that contains provincial geometry data for each entry. One last step before plotting our data is to take the average of each province so that we have individual numbers for each province. Afterwards, we can start visualize provincial differences, starting with `Annual(Avg. high °C (°F))`.

In [None]:
agg_data = data_w_climate.groupby("Province")["Annual Average"].mean().reset_index()
merged_data = prov_data.merge(agg_data, on="Province")
merged_data

In [None]:
# Initialize the map and store it in an m object
annual_map = folium.Map(location=[50, -65], zoom_start=3)

folium.Choropleth(
    geo_data=prov_data,
    data=merged_data,
    name="choropleth",
    columns=["Province", "Annual Average"],
    fill_color="RdPu",
    key_on="feature.properties.Province",
    legend_name="Average Highest Temperature (Annual °C)",
).add_to(annual_map)

folium.LayerControl().add_to(annual_map)
annual_map

Looking at the folium map, it appears that provinces like British Columbia, Ontario, Quebec, New Brunswick, and Nova Scotia all appear to have higher annual temperatures than other provinces. In comparison, provinces such as Nunavut and the Northwest Territories are light in colour, indicating an overall lower average annual temperature compared to others provinces.

Immediate differences that are noticed between the colder provinces to the hotter provinces are that Nunavut and the Northwest Territories are higher in latitude (closer to the North Pole). This also makes lines up with our previous analysis where many cities that were higher in latitude had colder annual temperatures compared to cities that were lower in latitude. 

In [None]:
agg_data = data_w_climate.groupby("Province")["January Average"].mean().reset_index()
merged_data_january = prov_data.merge(agg_data, on="Province")

# Initialize the map and store it in an m object
january_map = folium.Map(location=[50, -65], zoom_start=3)

folium.Choropleth(
    geo_data=prov_data,
    data=merged_data_january,
    name="choropleth",
    columns=["Province", "January Average"],
    fill_color="PuBu",
    key_on="feature.properties.Province",
    legend_name="Average Lowest Temperature (Annual °C)",
).add_to(january_map)

folium.LayerControl().add_to(january_map)
january_map

In [None]:
agg_data = data_w_climate.groupby("Province")["July Average"].mean().reset_index()
merged_data_july = prov_data.merge(agg_data, on="Province")

# Initialize the map and store it in an m object
july_map = folium.Map(location=[50, -65], zoom_start=3)

folium.Choropleth(
    geo_data=prov_data,
    data=merged_data_july,
    name="choropleth",
    columns=["Province", "July Average"],
    fill_color="YlOrRd",
    key_on="feature.properties.Province",
    legend_name="Average Lowest Temperature (Annual °C)",
).add_to(july_map)

folium.LayerControl().add_to(july_map)
july_map

Looking at the January (winter) and July (summer) maps, there appears to be slight differences compared to the overall annual temperature map. 

Starting with the summer map, the obvious change is the overall temperature ranges have increased as in summer, temperatures naturally increase. Other than this, Ontario appears to have the highest overall temperature compared to the other provinces on the map. Provinces like Alberta and Saskatchewan have also shown comparable increases in temperature compared to their overall annual temperatures. Nunavut clearly also appears to have the lowest overall temperature in summer. 

Looking at the winter map, overall temperature ranges have decreased. Nunavut and the Northwest Territories appears to have the lowest overall temperatures with Yukon and Manitoba with slightly higher temperatures. British Columbia also appears to have the highest overall temperature in winter. 

### Reflection Questions:

1. How do the temperature differences between winter and summer maps highlight the seasonal variability in the climate of different provinces?

2. Can you identify any patterns or connections between the climatic characteristics of provinces during different seasons and the geographical features of these regions?

3. Based on the observed patterns, what are the potential implications of climate change on provinces with higher and lower average temperatures? How might these changes impact their local ecosystems?

# Conclusion

In this notebook, we imported data that identifies climate trends in the provinces of Canada. We utilized data cleaning, compared different variables to climate, and visualized winter, summer, and annual temperature maps for all provinces. If you would like to learn more about climate data in Canada, the Canadian Government provides information about historical climate data, known as [historical climate data](https://climate.weather.gc.ca/). 

As an extension to this notebook, you can compare local climate differences to the provincial data represented and discover the variations in climate patterns that affect your city.

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)