# **Final Project**

Name: Osaka Khamphavong

Student ID:  200700130

# **Introduction**

The goal of this project is to collect data on a variety of important aspects of Ontario, such as the number of tourists, COVID-19 cases, unemployment rates, and greenhouse gas emissions.

To begin, I want to investigate the impact of the COVID-19 pandemic on the tourism industry in Ontario by looking at the number of tourists who visited prior to the pandemic.

Second, I'm interested in following the progression of COVID-19 cases in Ontario to learn more about the pandemic's impact on public health in the region.

The research then moves on to the number of unemployed people in Ontario, as many people are struggling to find work during the pandemic.

Finally, I chose greenhouse gas emission data to provide a visualization of the country's usage, to see the state of the environment throughout the year, to understand the impact of human activities on the environment, and to develop strategies for reducing emissions.

# **Tourisms Dataset #1**

## Question: What is the number of visitors in Ontario Canada before Covid-19?

The following data set includes the number of tourisms inbound to Ontario.

https://data.ontario.ca/dataset/11bccdd9-6de7-40fb-916f-021f5eff1683/resource/bbbed5b9-f67a-4c3b-a60b-8ddf6712c493/download/tourism_outlook_-_inbound.csv

The plot will use bar charts to visualize how many visitors from all origins visited Ontario in 2013 through 2019. Looking at the graph, we can see that many people visited Ontario during the year preceding Covid-19.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd

# Read data
df = pd.read_csv('data/tourism_outlook_-_inbound.csv')

# Filter data so we only have people with all orgins
people = df[(df['Origin'] == 'All Origins')]
total_visit = people['Total Visits'] / 1000000

plt.bar(x=people['Year'], height=total_visit, color='green')
# Add a title and axis labels
plt.title('Visitors from all Orgins visit Ontario in 2013-2019')
plt.xlabel('Year')
plt.ylabel('Number of Visitors (millions)')

# Show the plot
plt.show()



# **Tourisms Dataset #2**

## Question: Is there a big difference between visitor from All overseas and Other US states?

Using the same dataset as above

This plot will show the difference between tourisms from All overseas and Other US states that come to visited Ontario from 2015 to 2018 using Grouped Bars so it's easy to visualize and compared the difference. Looking at the visualization, it's not that big of a difference but we can tell them apart.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# Read data
df = pd.read_csv('data/tourism_outlook_-_inbound.csv')

# Convert 'Year' column to datetime object
df['Year'] = pd.to_datetime(df['Year'], format='%Y')
# Filter data so we only have people with in United States and Other Provinces 
# in Canada from 2015 to 2018
year = df[df['Year'].between('2015', '2018', inclusive=True)]
Other = year[(df['Origin'] == 'All Overseas')]
US_people = year[(df['Origin'] == 'Other US States')]
Other_visit = Other['Total Visits'] / 1000000
US_visit = US_people['Total Visits'] / 1000000

# Create a grouped bar chart
fig, ax = plt.subplots()
width = 0.35  # the width of the bars
x = year['Year'].dt.year.unique()   # unique years in the filtered data

# Plot the data for Other Provinces
rects1 = ax.bar(x - width/2, Other_visit, width, label='All Overseas')

# Plot the data for United States
rects2 = ax.bar(x + width/2, US_visit, width, label='Other US States')

# Add a title and axis labels
ax.set_xlabel('Year')
ax.set_ylabel('Number of Visitors (millions)')
ax.set_title('Visitors from All overseas and Other US states visiting Ontario, 2015-2018')
ax.set_xticks(x)
ax.legend()

# Show the plot
plt.show()

# **Covid-19 Case Dataset #1**

## Questions: Since covid-19 happened what is the total positive cases in Ontario?

The following dataset include the number of covid-19 cases in Ontario

https://data.ontario.ca/dataset/f4f86e54-872d-43f8-8a86-3892fd3cb5e6/resource/ed270bb8-340b-41f9-a7c6-e8ef587e6d11/download/covidtesting.csv

The plot will depict confirmed positive cases in Ontario from 2020 to 2022 by using time series and smoothing the data. We skipped January because the dataset shows no data for that month, and the graph shows 2023-01 but ends at 2022-12, so no data from 2023 is gathered.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# Read data
df = pd.read_csv('data/covidtesting.csv')

# Convert Reported Date to datetime format
df['Reported Date'] = pd.to_datetime(df['Reported Date'])

# Filter data only positive cases from 2020 to 2023
data = df[(df['Reported Date'] >= '2020-01-01') & (df['Reported Date'] <= '2022-12-31')]
data = data.set_index('Reported Date')['Confirmed Positive']
final_data = data.rolling(window=5).mean()
# Plot the time series
plt.figure(figsize=(12, 7))
plt.plot(final_data, label='Covid cases', color = "green")

# Format x-axis labels to show year and month
date_form = plt.matplotlib.dates.DateFormatter('%Y-%m')
plt.gca().xaxis.set_major_formatter(date_form)

# Add labels and title 
plt.legend()
plt.xlabel('Date')
plt.ylabel('Case Number')
plt.title('Covid-19 Confirmed Positive Cases in Ontario from 2020 to 2022')
plt.show()

# **Unemployed People Dataset #1**

## Question: What is the number of unemployment in Ontario in 2020?


the following dataset include the number of people who are unemployed in Ontario:

https://data.ontario.ca/dataset/4edcd678-6f24-4064-970b-b7254fc6671e/resource/8f786bb5-01ac-4e7d-b89f-3437a601941b/download/mltsd_v0906_18_tab2qq.csv

This visualization will use grouped bars to show the number of unemployed male and females of ages between 25-44 years old in Ontario Regions in 2020. (when covid happened) The plot will display the total data for the months of March, June, and September.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Read data
df = pd.read_csv('data/mltsd_v0906_18_tab2qq.csv')

# Filtered data
unemploy = df[(df['DURATION'] == 'Total unemployed') & (df['GEOGRAPHY'] == 'Total, Ontario regions')
              & (df['AGE GROUP'] == '  25-44') ]
year = unemploy[(df['MONTH'].str.contains('2020'))]

# Extract data for each month
male = year['Male']
female = year['Female']
months = year['MONTH']

# Grouped bar chart
barWidth = 0.4
r1 = np.arange(len(male))
r2 = [x + barWidth for x in r1]

plt.bar(r1, male, color='steelblue', width=barWidth, edgecolor='white', label='Male')
plt.bar(r2, female, color='lightcoral', width=barWidth, edgecolor='white', label='Female')

# Add legend and title
plt.legend()
plt.title('Unemployment of Male and Female age 25-44 in Ontario in 2020')

# Add axis labels and tick marks
plt.xlabel('Month')
plt.ylabel('Number of Unemployed')
plt.xticks([r + barWidth / 2 for r in range(len(male))], months)

# Show plot
plt.show()


# **Greenhouse Gas Emission Dataset #1**

## Question: How is the growth rate of Greenhouse Gas emission in Toronto?

The following dataset include greenhouse gas emissions data from regulated facilities under Ontario Regulation:

https://files.ontario.ca/moe_mapping/downloads/1Air/GHG_by_year/GHG_Data_2010_2020_data_Dec162021.csv

This visualization will display the amount of CO2 emissions released into the atmosphere from sources other than biological materials (such as wood or crops)

It will specifically show data in Toronto by Facility name 'Pearl Street Steam Plant' from 2010 to 2018 using a scatterplot, allowing us to visualize the process's growth.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
# Read data
df = pd.read_csv('data/GHG_Data_2010_2020_data_Dec162021.csv', encoding='ISO-8859-1')

# Filtered Data
city = df[(df['Facility City'] == 'Toronto') & (df['Facility Name'] == 'Pearl Street Steam Plant')]

data = city['Carbon dioxide (CO2) from non-biomass in CO2e (t)']

sorted_data = data.sort_values(ascending=True)

# Plot scatterplot
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x=range(len(sorted_data)), y=sorted_data, alpha=0.5)
ax.set_title('Carbon Dioxide Emissions from Non-Biomass in Toronto from 2010-2018')
ax.set_xlabel('Year')
ax.set_ylabel('Emissions')
ax.set_xticks(range(len(city)))
ax.set_xticklabels(city.loc[sorted_data.index, 'Year'], ha='center')
plt.show()

# **Conclusion**

In this project, I collected data on four different Ontario-related topics: tourism, COVID-19 cases, unemployment rates, and greenhouse gas emissions. I'm curious about the number of tourists who visited Canada prior to the COVID-19 pandemic, as this will help me understand the pandemic's impact on the tourism industry.

In addition, I'd like to look into the current state of COVID-19 cases in Ontario, as well as the region's unemployment rates, which have been impacted by the pandemic.

Finally, I've included data on greenhouse gas emissions to better understand the environmental impact of human activities in Ontario. By researching these topics, I hope to gain a better understanding of the COVID-19 pandemic's impact on various aspects of life in Ontario.