# Pandas Analysis Project

So far in this crash course, you have learned how to use the Python library Pandas to tackle different data processing and analysis tasks. In this section, you will apply what you have learned to a simple data analysis project and extract useful insights from multiple datasets.

As we all know, the world has been facing a significant health crisis from 2019 . Many people around the world have had to change the way they work, study and travel in order to fight the virus. In this project, we will examine different health and human activities datasets to see how COVID19 impacted our daily activities. The following datasets will be used:

* Community Mobility Reports: Aggregated datasets generated by google to report movement trends over time by geographical location. The dataset covers different types of locations such as workspaces, parks, residential and so on.

* COVID19 Confirmed Cases: A daily aggregated dataset collected and maintained by Our World in Data.

We start by accessing the datasets using two Pandas DataFrame objects. The mobility reports dataset must be downloaded into your local computer first; while Covid cases datasets can be accessed from the GitHub repository. Next, we implement a typical exploratory analysis to investigate the content of our datasets using head() and info() builtin functions as shown in the code below:

In [2]:
import pandas as pd

In [None]:
# Access dataset from local computer
df_mobility_data = pd.read_csv('Datasets/Global_Mobility_Report.csv')

# Access dataset from GitHub repository
df_covid_cases = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv')

In [None]:
# Explore the content of the mobility dataset
df_mobility_data.info()
# Show sample records
df_mobility_data.head()

In [None]:
# Explore the content of the Covid cases dataset
df_covid_cases.info()

# Show sample records
df_covid_cases.head()

The results show the mobility report dataset contains 15 different columns describing information about geographical location and how much change in people's traffic behaviour compared to the time before Covid19 pandemic. For more information about the values of each column, find the dataset description page. The Covid cases dataset contain 62 different columns describing infection and vaccination figures in each country in addition to other demographic figures. To learn more about the different values, find the dataset description page. We notice that both datasets contain date values recorded at the daily level. We will use these values to visualize if Covid19 outbreaks impacted your community movement behaviour using time series figures.

To demonstrate this process, let's examine how the emergence of Covid19 cases in New Zealand impacted the normal traffic at workspace and outdoor parks. We first need to assign the date columns as the index value for both DataFrame objects in order to support our visualization task as shown in the code below:

In [None]:
# Set date as index value for both DataFrames
df_mobility_data.set_index('date', inplace = True)
df_covid_cases.set_index('date', inplace = True)

Lets use Pandas plotting feature to visualize Covid19 cases outbreak using the new_cases column as shown in the code below:

In [None]:
# Visualize Workspace and Outdoor Parks’ Traffic Patterns
df_covid_cases[df_covid_cases['iso_code']=='NZL']['new_cases'].plot(figsize = (14,8));

Notice how the Pandas query above used iso_code to filter only records belonging to a specific country code. Also we notice the query uses the column new_cases to apply Pandas visualization function plot(). The results demonstrate that the country of New Zealand had two large waves of Covid19 cases during March to April 2020 and August to September 2021. For other time periods, the outbreak seems to be under control with an average of below 10 cases daily. Next let's investigate if people's normal work and travel behaviour in New Zealand have changed during the same time periods. The following code will query the mobility report DataFrame object to filter records for New Zealand and plot workspace and outdoor parks’ traffic patterns.

In [None]:
# Visualize Workspace and Outdoor Parks’ Traffic Patterns
df_mobility_data[(df_mobility_data['country_region_code']=='NZ') 
                 & (df_mobility_data['sub_region_1'].isnull())][['workplaces_percent_change_from_baseline',
                                                               'parks_percent_change_from_baseline']].plot(figsize = (14,8));

The results above show a clear pattern of reduced traffic during the same time periods of Covid19 in New Zealand. The figure also shows other significant time periods with increased and decreased traffic behaviour. We clearly notice this pattern during the Christmas period and other major holidays.

How was Covid19 outbreak and response in your area? You can use the datasets we used in this project to investigate cases and mobility patterns in your country.