# A Mobility Analysis Of Austria, Belgium & Germany

## Good Practices In Constructing Time Series

In any craft there are basic principles with which one must learn in order to lay the foundation of good work. Time series visualisation is no different, relying on many important decisions by the data scientist before coming to fruition.

We can trace a history of these graphs back to Scottish economist William Playfair. Playfair combined his love of art and data to create the graph we now know as a time series. His first publication in 1786 looks incredibly modern, plotting the cost of wheat against the cost of labour in England. His graph disproved a hypthesis that wages were driving the price of wheat up. Playfair showed that they were actually rising much slower than the price of wheat with this very clever display, rising a new dawn for data visualisations of time periods.

![Playfair's First Time Series](res/playfair_time_series.jpg)

When we introduce time into a graph, it gives rise to many distinct components. 
Common components of a time series include:
* **Trend** - The general tendency to increase or decrease over time.

* **Seasonality** - Peaks / troughs that occur at regular intervals. This can be daily, weekly, monthly or even yearly cycles.

* **Noise** - Random fluctuations in data which are left when all the components have been removed.

In constructing our time series, we will pay respect to each of these components. Seasonality can be tested using statistical tests, while trends are more random. Noise occurs in any real-world data set, and can be dealt with through methods such as smoothing and resampling. We will be constructing time series both with smoothing/resampling and without, in order to gain a complete picture of the data. The aim is to show what the data means, rather than merely what it looks like plotted.

I have decided to use three different types of visualisations to display each attribute. I believe that these four visualisations will give us a strong sense of the magnitude of change in our data and intuitively represent these changes for analysis.

### Universal Line Plot
This plot will contain our original data and represents a starting point for our understanding. This is the root of the visualisations to come, as it will give us the clearest picture of the trends, seasonality and noise within our dataset. 

Additionally, annotations will be added to this plot in order to mark key moments related to the attribute. These may give us insight into particular peaks or troughs in relation to that specific attribute.

### Density Plot W/ Moving Average Smoothing
My aim with this density plot is to give us much more of a feel into the major changes that occurred during the time period. A moving average smoothing technique will be applied in order to remove the effect of outliers on the data and filter out noise. 

### Resampled Bar W/ Differencing
This graph will apply differencing, where the data represents the change from one day to the next. We can set the window of change so that it will tell us the difference between as large a gap as we would like. Resampling will be applied to our bar chart to give it a "more" discrete visualisation and reduce the amount of bins.

## Structuring Our Code
When visualising the data, we don't want to write the same block of code repeatedly in order to get different results. We want to quickly move from attribute to attribute without having to worry to much about the underlying code.

To ensure we can focus on the visualisations, I'm going to set up a class that will give us the above plots for each attribute as simply as we would like. This overlying class I'm going to call the "Mobility Suite". This suite will use plotly, pandas and numpy in order to give us the results that we need from the data.

There are different aspects to this suite such as:
* **Mobility Manager** - Loads the data from CSV files.
* **Graph** - Basic parent class to create a graph.
    * **Transformer** - Performs Resampling, Smoothing & Differencing
    * **Visualisations**:
        * **Visualisation 1**: Universal Line Plot 
        * **Visualisation 2**: Density Plot W/ Rolling Mean
        * **Visualisation 3**: Resampled Bar W/ Differencing
        
#### Standardising Calls To Country/Attribute
Additionally, we will standardise our call to each attribute and country.
Rather than ever using a string to call to an attribute or country, which may only work by coincedence of us using the string correctly, we will use enums. 
    
![Mobility Suite](./res/mobility_suite_structure.png)

### The Mobility Manager
We'll begin by setting up the class to load in our data. It acts as the intermediary between the programmer and the data, ensuring we don't run into any problems in our interactions. 

Firstly, let's create two enums to reference each Country and attribute in our data. We will use the enum class for this.

In [59]:
import enum

class Country(enum.Enum):
	Belgium = 0
	Germany = 1
	Austria = 2

class Attribute(enum.Enum):
	ID = 0
	Country = 1
	Date = 2
	Retail_And_Rec = 3
	Grocery_And_Pharma = 4
	Parks = 5
	Transit = 6
	Workplaces = 7
	Residential = 8

print("Attributes:\n")
for att in Attribute:
	print(att)
print("\n\n")
    
print("Countries:\n")
for c in Country:
	print(c)
print("\n\n")

Attributes:

Attribute.ID
Attribute.Country
Attribute.Date
Attribute.Retail_And_Rec
Attribute.Grocery_And_Pharma
Attribute.Parks
Attribute.Transit
Attribute.Workplaces
Attribute.Residential



Countries:

Country.Belgium
Country.Germany
Country.Austria





Next we will write the implementation for our Mobility Manager. Rather than write a markdown paragraph for each part, I will include python comments that will make clear what I'm creating at each different section. The most useful method, as we will see, will be the **get_attribute** call, which will be extremely useful for loading particular elements of our datasets.

In [60]:
import pandas as pd

class MobilityManager:

	#Our CSV Files
	austria_file = "austria.csv"
	belgium_file = "belgium.csv"
	germany_file = "germany.csv"

	"""
	In order to be able to use the standardised 
	attribute enum we created, we will need a dictionary to 
	convert from these attribute enums to the column 
	name thatwe need from the dataframe.

	For our Countries, we'll store each dataframe itself 
	inside the dict as this is simpler.

	TLDR:
	Dicts Convert 
	(Attribute Enum) => (Column String)
	(Country Enum) => (Country DataFrame)
	"""
	attribute_converter = {}
	country_converter = {}

	def __init__(self):
		#Load Our Dataset From CSV File
		austria_set = self.load_dataset(self.austria_file)
		belgium_set = self.load_dataset(self.belgium_file)
		germany_set = self.load_dataset(self.germany_file)

		#Store Our Dataset In Dict With Enums
		self.country_converter[Country.Austria] = austria_set
		self.country_converter[Country.Belgium] = belgium_set
		self.country_converter[Country.Germany] = germany_set

		#Store Our Attributes in Dict
		attributes = austria_set.columns
		for att_id, att_str in zip(Attribute, attributes):
			self.attribute_converter[str(att_id)] = att_str

	#Load In A Dataset From CSV File
	def load_dataset(self, f):
		return pd.read_csv("./datasets/{}".format(f))

	#Get A Saved Dataset
	def get_set(self, country):
		return self.country_converter[country]

	#Get Attribute Data For A Particular Country
	def get_attribute(self, country, attribute):
		att_str = self.attribute_converter[str(attribute)]
		return self.get_set(country)[att_str]

## The Transformer
I've named the next class like so due to the manipulations it performs on the data. This class will perform three very important steps for us:
* Smoothing (Rolling Average)
* Resampling
* Differencing

These will be used in various amounts in our visualisations in order to present the clearest picture of what the data is telling us rather than the clearest picture of the original data.

In [61]:
class Transformer:

	def __init__(self):
		return

	#Combines Two Series/Attributes Into One Dataframe
	def _combine(self, A, B, name_A, name_B):
		#Combine Name With Series
		df = { 
		name_A : A, 
		name_B : B
		}

		#Concatenate These Series Into Dataset
		return  pd.concat(df,axis=1)

	#Performs Differencing
	def get_difference(self, y, periods):
		return y.diff(periods=periods)

	#Performs Smoothing
	def get_rolling_mean(self, dates, y, windows):
		#Combine Our Dates & Target Series
		rolling_df = self._combine(dates, y, 'date', 'target')

		#Create Rolling Mean On Target Attribute
		rolling_mean = rolling_df['target'].rolling(
			window=windows).mean()
		return dates, rolling_mean

	#Performs Resampling
	def get_resample(self, dates, y, rule):
		#Combine Our Dates & Target Series
		df = self._combine(dates, y, 'date', 'target')

		#Convert to correct format and set as string
		df['date'] = pd.to_datetime(df.date, format='%Y-%m-%d')
		df = df.set_index('date')

		#Resample our data
		resample = df.target.resample(rule).mean()
		return resample.index, resample.values

## Graph Class & Visualisations
The crux of our visualisations will lie with plotly and its excellent plotting library. We can see the structure of this part of our mobility suite below.

* Graph Parent Class:
    * Visualisation 1: Universal Line Plot 
    * Visualisation 2: Density Plot W/ Rolling Mean
    * Visualisation 3: Resampled Bar W/ Differencing
    
When we put this all together, we aim to have all three plots as subplots in a figure. This will form a very elegant and informative picture of any given attribute in any given Country. 

Firstly, let's set up a very basic graph parent class. This will serve to create a transformer for use by any of our plots.

In [62]:
class Graph(object):
	transformer = Transformer()

	def __init__(self):
		return

Not the most complex code we have seen. However, it's good to create a backbone for our more specific plotting classes. 

### Universal Line Plot
This line plot serves to show us our original data. There will be no transformations applied to this data as we want to keep it completely in line (excuse the pun..) with its original form.

At this point I must mention a very important point that will be relevant for every graph that our suite will create. These graphs are in the context of the pandemic that has swept our world, and thus we see huge declines in most if not all of the attributes. This means that if we plotted the data exactly as it is, all of our graphs would move underneath the x axis and look a bit, well, upside down.

To reconcile this, I've chosen to make each graph represent the decline in an attribute rather than the increase. Every data point will be multiplied by negative 1, thus higher numbers will mean greater decline. I believe this serves to improve the viewers understanding by not throwing them off with graphs that look strange. 

We can see below our code for this Line Plot, which inherits from our previous Graph class.

In [63]:
import plotly.graph_objects as go

class LinePlot(Graph):
	def plot(self, dates, target):
		#Represent The Decline In An Attribute
		decrease_target = np.multiply(target,-1)

		#Create Line Plot
		return go.Scatter(x=dates,y=decrease_target, showlegend=False)

### Density Plot
Our density plot will be more complex than our previous plot. It will apply smoothing to the data by using a rolling mean. A rolling mean will separate our data into windows and calculate the mean along these windows to represent any given date. The larger our window, the greater the effort needed to change the data from any given period to the next. This will give us a better sense of the significance of change, rather than the confusion that noise and outliers often cause in the original data.

In [64]:
class DensityPlot(Graph):
	def plot(self, dates, target, windows):
		#Represent Decline 
		target = np.multiply(target,-1)

		#Retrieve Rolling Mean
		dates, roll_mean = self.transformer.get_rolling_mean(dates, 
			target, windows)

		#Ensure Density Is Filled In
		fill = 'tozeroy'
		return go.Scatter(x=dates,y=roll_mean, fill=fill,showlegend=False)

### Resampled Bar W/ Differencing
Differencing is the crux of why this graph will be so useful. Given a particular time span, it will tell us the difference from time period A to time period B. This should show us when the biggest falls due to the pandemic were and the biggest climbs back up. Our data will be flipped again, so that the bigger the fall the higher the value. 

Additionally, we will apply resampling to this data so that we can have the average for each month rather than working with each day. We will have 8 bins for the 8 months in our data. This is much better than plotting each individual day, and will give us a broader perspective. We will call this class a Resampled Bar.

In [65]:
class ResampledBar(Graph):
	def plot(self, country, attribute, rule):
		#Flip The Values
		attribute = np.multiply(attribute,-1)

		#Apply Resampling
		dates, target = self.transformer.get_resample(country, 
			attribute, rule)

		#Create The Bar Graph
		return go.Bar(x=dates, y=target,showlegend=False)

## The Mobility Suite
We have now created all the individual elements of our mobility suite. The work we have done thus far really pays off here, as we will be able to easily create new graphs for various Countries and attributes.

The **plot** function will really carry the weight of our visualisations and will relieve us of worrying about the programming details when analysing the graphs. Our interface for working with the data and visualising it has been completed!

In [66]:
from plotly.subplots import make_subplots
import numpy as np

class MobilitySuite:

	#Number Of Rows/Cols Of Subplots
	subplot_rows = None
	subplot_cols = None

	figure = None

	#Mobility Manager Created
	data_manager = MobilityManager()

	def __init__(self, rows, cols, graph_prefs):
        
        #Set Our Class Variables
		self.subplot_rows = rows
		self.subplot_cols = cols
        
		#Create Our Graph Preferences
		self.setup_graph_prefs(rows, cols, graph_prefs)

	"""
	This plotting function below is the powerhouse of our 
	suite. It combines everything we have worked on thus
	far into one function.
	"""
	def plot(self, country, attribute, density_windows, resampling_bar_rule,
            subplot_prefs):
		dates = self.data_manager.get_attribute(country, Attribute.Date)
        
		target = self.data_manager.get_attribute(country, attribute)
        
		#Create A Line Graph
		graph_line = self.get_plot_line(dates, target)

		#Create A Density Graph
		graph_density = self.get_plot_density(dates, 
			target, density_windows)

		#Create A Bar Graph
		graph_bar = self.get_plot_resampled_bar(dates,
			target, resampling_bar_rule)
        
		#Add These Plots As Subplots
		self.add_plots([graph_line, graph_density, graph_bar], subplot_prefs)

    #Call To Lineplot Class
	def get_plot_line(self, dates, target):
		graph_line = LinePlot()
		return graph_line.plot(dates, target)

    #Call To Density Plot Class
	def get_plot_density(self, dates, target, windows):
		graph_density = DensityPlot()
		return graph_density.plot(dates, target, windows)

    #Call To Resampled Bar Class
	def get_plot_resampled_bar(self, dates, target, rule):
		graph_resampled_bar = ResampledBar()
		return graph_resampled_bar.plot(dates, target, rule)

	#Add A List Of Subplots
	def add_plots(self, plots, prefs):
		for i in range(0, self.subplot_rows):
			nxt_plot = plots[i]

			self.figure.add_trace(nxt_plot, 
				row=i+1, col=1)
            
			xaxis_ttl = prefs[i]["xaxis"]
			yaxis_ttl = prefs[i]["yaxis"]
			self.figure.update_xaxes(title_text=xaxis_ttl, row=i+1,col=1)
			self.figure.update_yaxes(title_text=yaxis_ttl, row=i+1,col=1)

	def setup_graph_prefs(self, rows, cols, graph_prefs):     
		line_ttl = graph_prefs["line_title"]
		density_ttl = graph_prefs["density_title"]
		bar_ttl = graph_prefs["bar_title"]
		graph_height = graph_prefs["height"]
		graph_width = graph_prefs["width"]
		graph_title = graph_prefs["title"]
        
		self.figure = make_subplots(rows = rows, 
		cols = cols, subplot_titles=[line_ttl, density_ttl, bar_ttl])
		self.figure.update_layout(title=graph_title, height=graph_height, width=graph_width)
        
    #Plot Correlation Between Two Attributes
	def plot_correlation(self, country, att_one, att_two, subplot_prefs):
		#Load Our Two Comparing Attributes
		series_one = self.data_manager.get_attribute(
			country, att_one)

		series_two = self.data_manager.get_attribute(
			country, att_two)
        
		#Create A Scatter Plot
		scatter = go.Scatter(x=series_one,y=series_two,
			mode='markers')
        
		#Add Our Correlation Plot
		self.add_plots([scatter], subplot_prefs)
        
		#Calculate & Return The Correlation Matrix
		return np.corrcoef(x=series_one,y=series_two)


	#Show The Graph
	def show(self):
		self.figure.show()

# Characterisation & Visualisation
We have finally completed the quite large task of setting up a data visualisation system for our data. This system allows us to easily and safely work through our data and visualise it's characteristics.

We will analyse each country separately, moving from the western-most countries to the eastern-most. Let's have a look at a map and work from that!

![Belgium, Germany & Austria](./res/european_countries.png)

We can see that Belgium is the western-most country of the three and therefore we will begin with them. Let's jump into the data!

# Belgium 2020: A Mobility Analysis
In analysing a time series well, we must comment on and characterise a few aspects of the graph.
For each visualisation, I will comment on each of these characteristics in order to ensure we have covered all bases.

What are these characteristics you may ask? Well, here's a list of them:
* **Is there a trend?** On average do the data decrease or increase over time?
* **Is there seasonality?** Do we see regularly repeating patterns in different time periods?
* **Are there many outliers?**
* **Are there any abrupt changes** to the variance in the series?

Here's a refresher on the relevant mobility-related attributes for this part of the project:
* **Retail & Recreation**
* **Grocery & Pharmaceuticals**
* **Parks**
* **Transit**
* **Workplaces**
* **Residential**

## Belgium 2020: Retail & Recreation

We'll begin by setting up our mobility suite and plotting the retail and recreation data for Belgium in 2020. We can finally get a view of our hard work thus far!

In [67]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Belgium
attribute = Attribute.Retail_And_Rec
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Representation Of Decrease In Retail & Recreation 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, Attribute.Retail_And_Rec, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

We have lift off! We can see our universal line plot of the original data, our density plot displaying the rolling mean and lastly our bar chart with differencing and resampling applied.

#### Trend
Even in the original, noisy data we can see a very clear trend in this data. We can see a rapid percentage decrease in the amount of retail and recreational mobility starting approximately March 13th. Our original data is very useful here as we can see exact dates of changes. 

#### Seasonality
We can see weekly spikes and declines that represent how people live differently during the week versus at the weekend. This is normal and would occur without the coronavirus pandemic. This seasonality is filtered out in our rolling mean graph and doesn't exist in our differencing bar plot.

#### Outliers
There are multiple notable outliers that we can see from the originla data. A particularly noticeable data point is August 15th, where we see a rapid percentage decline and then a quick restoration. Without this overview, March 14th would absolutely appear as an outlier except that it is merely the beginning of the decline in retail and recreation. 

#### Abrupt Changes
We see many abrupt changes to variance due to the coronavirus pandemic. Due to the pandemic's effect on the graph we see some large deviations from the mean. On a more local level we do see constant variance however our differencing chart makes it clear that there are many quite significant changes from month to month.

## Belgium 2020: Grocery & Pharma
We'll now examine the grocery and pharmaceutical mobility for Belgium in 2020. 

In [68]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Belgium
attribute = Attribute.Grocery_And_Pharma
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Representation Of Decrease In Grocery & Pharmaceutical 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trend
There is a clear trend in the months of March and April as we see a rapid percentage decrease in grocery and pharmaceutical mobility. This levels off however and we see less of a trend for the remainder of the year.

#### Seasonality
There is a clear lack of long term seasonality in these graphs. Once we account for the trends of the pandemic we can see month-by-month a relatively stable level of grocery and pharma mobility. We can however again see a weekly shift from weekday to weekend. People tend towards doing their grocery shopping at the weekend perhaps.

#### Outliers
There are some really interesting outliers here. On April 13th, May 1st, May 21st, June 1st, July 21st and August 15th we see quick spikes (i.e percentage declines) in the grocery and pharmaceutical mobility.

#### Abrupt Changes
In the second half of the year we see far less abrupt changes to the data. It is primarily in the months March/April that we see a huge impact on the grocery and pharma mobility data.

## Belgium 2020: Parks

We'll now visualise and characterise the park mobilit data for Belgium in 2020.

In [69]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Belgium
attribute = Attribute.Parks
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Representation Of Decrease In Park Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

We can immediately see a few distinct differences of this graph to the previous ones. We will explore these differences further in the next section.

#### Trend
There is a small trend towards a decrease in the amount of people in parks in the early stages of the pandemic. However, the more interesting results are in seasonality.

#### Seasonality
We see a very good example of seasonality here during the months of July/August/September. This is likely due to the yearly increase in parks being used every summer due to the good weather.

#### Outliers
There are two significant outliers on June 1st and July 21st. We see a rapid increase in the amount of mobility for the dates of these two outliers.

#### Abrupt Changes
We notice that there are less abrupt changes than in previous graphs. The changes to park mobility are less of a trend from the pandemic and more as a result of seasonal changes in temperature. However, we do still see some smaller abrupt changes during the beginning of the pandemic.

## Belgium 2020: Transit
We will now look at the transit mobility data for Belgium in 2020

In [70]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Belgium
attribute = Attribute.Transit
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Representation Of Decrease In Transit Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trend
We can see a significant decline at the beginning of the year in transit due to the coronavirus pandemic. This trend levels off then as the year continues.

#### Seasonality
There is a weekly seasonality that can be accounted for in this data due to the classic weekday/weekend disparity. Rush hour traffic will always greatly influence the data for this attribute.

#### Outliers
There are not many significant outliers in this case, meaning that a significant spike in transit is less likely perhaps than other attributes. Two notable outliers are perhaps April 13th and May 1st.

#### Abrupt Changes
In our differencing graph we can see the abrupt percentage decreases in the early months of 2020 due to the pandemic. Many people worked from home as Belgium entered lockdown and thus the seasonality of this data is also effected by these abrupt changes.

## Belgium 2020: Workplaces
We will now characterise and visualise the workplace mobility data from Belgium in 2020.

In [71]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Belgium
attribute = Attribute.Workplaces
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Representation Of Decrease In Workplace Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trend
We see a trend towards decline in the amount of mobility at the beginning of the pandemic. This eases off however never returns to baseline levels (pre-covid).

#### Seasonality
We see a clear seasonality in this data both by week and by yearly season. Each week there is a decline in workplace mobility at the weekend. During the summer there is a decline in workplace mobility due to people often taking time off during these months. However, perhaps that effect has been reduced with the lack of holiday options for people during this time.

#### Outliers
We see outliers on April 13th, May 1st, May 21st, June 1st and July 21st.

#### Abrupt Changes
There are much less abrupt changes to workplace mobility in this case, perhaps due to the need of many workplaces for staff to be present. We see slower inclines and declines in the data, perhaps representing a gradual easing of workers towards remote working.

## Belgium 2020: Residential
We will now characterise and visualise the residential data for Berlin in 2020.

In [72]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Belgium
attribute = Attribute.Residential
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Representation Of Decrease In Residential Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trends
There is a noticeable difference here to our previous graphs. At the beginning of the pandemic we see a large trend towards an increase in overall residential mobility (i.e a decline in our curve in which hgiher values represents a decrease in residential mobility). This is clearly the effect of many more isolating themselves in their home.

#### Seasonality
We see constant fluctuations from week to week in residential mobility representing it as a very seasonal attribute. We also see a change during the summer months.

#### Outliers
There are noticeably less outliers in these graphs. The largest outlier is on May 1st.

#### Abrupt Changes
We see an abrupt increase in overall residential mobility at the beginning of the pandemic, due to people deciding to stay in their homes.

# Comparison Of Belgium's Attributes
We have looked in detail at the characterisation and visualisation of the each attribute, however we would like to understand more about how these attributes relate to each other.

## The Early Months
Firstly, due to the pandemic we can see that there is a strong correlation (either positive or negative) in the earlier months of the year between the various attributes. For example, as overall residential mobility abruptly increased in April & May, overall workplace mobility abruptly fell.

This is likely to be a recurring theme in our datasets for the year 2020. We will see strong correlations in these months of April and May that we wouldn't have seen in any other scenario. This universal change to every attribute could only have been caused by something as all-encompassing as a pandemic or similar large event.

This abrupt change from the pandemic eases over time in every attribute. This is perhaps surprising, as Belgium [entered a second lockdown][1] which we may expect to cause a similar abrupt change. However, many countries were much more prepared for the second lockdown than the first.

## Recurring Patterns
There are more ways that the data changed as time went on. We saw some recurring patterns in the data. Often these patterns were due to seasonality, such as workplace mobility's weekly ups and downs.

We often saw a common humpback due to the two general waves of the pandemic that have spread across Belgium. We can represent this pattern as shown below.

![Humpback Pattern](./res/belgium_humpback.png)

[1]: https://www.politico.eu/article/belgium-announces-second-coronavirus-lockdown/

## Correlation Between Attributes
I believe that there is medium/strong correlation between many attributes in the dataset due in part to the universal effects of the coronavirus pandemic. This is what we will examine with three different pairs of attributes.

I will use a function that I have not previously used from our Mobility Suite: **plot_correlation**.
This function will return **Pearson product-moment correlation coefficients** and additionally plot our two attributes on a graph.

We will begin with a comparison of workplace mobility and residential mobility.

In [73]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 1
num_subplot_cols = 1

country = Country.Belgium
attribute_one = Attribute.Workplaces
attribute_two = Attribute.Residential

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Correlation Of Parks & Residential Mobility In 2020",
    "line_title" : None,
    "density_title" : None,
    "bar_title" : None,
    "width" : 700,
    "height" : 700
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Workplaces (% Change)",
        "yaxis" : "Residential (% Change)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
corrcoef_mat = suite.plot_correlation(country, attribute_one, attribute_two, subplot_prefs)
suite.show()

print("Pearson Product-Moment Correlation Coefficient Matrix")
print(corrcoef_mat)

Pearson Product-Moment Correlation Coefficient Matrix
[[ 1.         -0.85346254]
 [-0.85346254  1.        ]]


Interesting!
Our matrix has given us a correlation of approximately -0.85 between Workplace & Residential mobility. What does this tell us? Well the absolute value of 0.85 tells us that there is a very strong correlation between these two variables. The negative sign tells us that this correlation is negative/inverse. Therefore, we can say that there is a **strong inverse correlation between workplace and residential mobility**.

This makes sense! As more people moved to remote work from home, less people work in the workplace. Additionally, even in normal times we would expect a negative correlation between these two variables. This means that each attribute would predict the other pretty well! Linear Regression, anyone?

We will also look at the correlation between Grocery & Pharma mobility with Retail & Recreational data. We may intuitively expect a positive correlation here.

In [74]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 1
num_subplot_cols = 1

country = Country.Belgium
attribute_one = Attribute.Grocery_And_Pharma
attribute_two = Attribute.Retail_And_Rec

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Correlation Of Grocery/Pharma & Retail/Rec Mobility In 2020",
    "line_title" : None,
    "density_title" : None,
    "bar_title" : None,
    "width" : 700,
    "height" : 700
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Grocery/Pharma (% Change)",
        "yaxis" : "Retail/Rec (% Change)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
corrcoef_mat = suite.plot_correlation(country, attribute_one, attribute_two, subplot_prefs)
suite.show()

print("Pearson Product-Moment Correlation Coefficient Matrix")
print(corrcoef_mat)

Pearson Product-Moment Correlation Coefficient Matrix
[[1.        0.6987586]
 [0.6987586 1.       ]]


Here we have an approximate 0.7 correlation coefficient between the two variables. This indicates to us that there is a **medium/high positive correlation** between the two variables. These two variables would be okay predictors of each other, but not excellent. 

We can see that when people are more likely to be grocery/pharma shopping, they are additionally more likely to be in retail/recreational outlets. This makes sense, as these are complementary events.

## Why The Difference In Outliers?
You may have noticed during our characterisation of the data that some outliers occurred regularly in particular attributes and and never in others. Why do these differences occur?

There are multiple recurring outliers in these graphs that we have recorded in our characterisations. One notable outlier is July 21st. It occurs in attributes such as workplace and grocery/pharma, but not in transit or residential. 

As I characterised each graph, I was more and more confused as to why July 21st kept reappearing in some and not others. I assumed that it must be to do with the restrictions.

I decided to do some investigation. Using the [Wayback Machine][1], which allows one to go "back in time" on the internet, I navigated to [The Brussels Times][2] for July 21st. What I found was this main front page.

![Brussels Times Front Page](res/belgium_brussels_times_main.png)

"Ah, okay!", I thought. It must be due to the announcement of new restrictions. Everyones trying to get out of the house before they come in.

However, this didn't fully make sense. The announcement was only about the possibility of new restrictions, I didn't see why this would cause such a univeral outlier in the dataset compared to other announcements. 
Then, I noticed another article hidden on the bottom right of the page...

![Brussels Times Article](res/belgium_brussels_times_small.png)


I found [the article][3], and finally our outlier of July 21st made sense. It's a national holiday in Belgium!

![National Day](res/belgium_national_day.png)

This explains the differences in this outlier appearing in some attributes and not others. A national holiday will affect some attributes and not others, while a pandemic tends to affect them all.

[1]: https://archive.org/
[2]: https://www.brusselstimes.com/
[3]: https://www.brusselstimes.com/news/belgium-all-news/122631/what-does-belgium-celebrate-on-its-national-day/

# Germany 2020: A Mobility Analysis
We now move on to the next country in our analysis: Germany! Germany has been widely applauded and also criticised for its approaches in fighting the pandemic. Let's dive into the data and find out how their year has truly panned out.

## Germany 2020: Retail & Recreation
We'll begin by characterising and visualising retail and recreation in Germany for the year 2020. 

In [75]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Germany
attribute = Attribute.Retail_And_Rec
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Germany: Representation Of Decrease In Retail & Recreation Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trends
We see a trend towards an overall decline in retail and recreational mobility towards the beginning of the pandemic. This eventually returns quite close to it's baseline, which is perhaps surprising.

#### Seasonality
We don't see as much seasonality in this data. There is perhaps the weekly seasonality of people that tend towards retail and recreation at the weekend, however it's not as obvious. Sundays in Germany have a very significant lack of retail, as every business closes for the day (except in some cities). This may offset any seasonality that would be present at the weekend vs weekday.

#### Outliers
We see significant outliers on October 3rd and May 1st.

#### Abrupt Changes
We see an abrupt change in retail and recreational mobility in Germany at the beginning of the pandemic. However, after this change there is primarily slow and gradual changes for the rest of the year.

## Germany 2020: Grocery & Pharma
We will now characterise and visualise the grocery and pharmaceutical data for Germany in 2020.

In [76]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Germany
attribute = Attribute.Grocery_And_Pharma
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Germany: Representation Of Decrease In Grocery & Pharma Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trends 
This dataset clearly stands out as different from the rest. Towards the beginning of the pandemic we actually see increases in overall grocery and pharma mobility. We see a stark shift away from this as the pandemic progresses and then a trend back towards more grocery and pharma mobility as the summer comes to an end. 

#### Seasonality
Through the months July-October we see a clear seasonality. The mobility data has weekly seasonality with people like to grocery shop at the weekend. 

#### Outliers
We see very significant outliers in this data. We can tell these are significant as they are affecting our rolling mean graph's smoothness. We see early-pandemic outliers on April 10th, 13th and May 1st. We see a more recent outlier on October 3rd, towards the beginning of new restrictions in Germany.

#### Abrupt Changes
We see a clear change in our differencing graph from July to August 2020. People move from less grocery shopping to more quite suddenly, and this trend continues into recent times. Notice that the percentage change range is in the 0-20 percent range which is quite low for the effects of the pandemic.

## Germany 2020: Parks
We will now characterise and visualise the park mobility data for Germany in 2020.

In [77]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Germany
attribute = Attribute.Parks
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Germany: Representation Of Decrease In Park Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trends
It is far less clear in our original data, but our differencing graph clearly displays a downward trend in the decrease of park mobility, representing a monthly increase in overall mobility. 

#### Seasonality
Seasonality is more difficult to make out here than one would expect, however we can see the affect of the summer on park mobility with our rolling mean graph.

#### Outliers
We see two significant outliers on May 21st and June 21st. Interestingly, May 21st is the Ascension day and June 21st is the Summer Solstice. These may have cause people to increase their overall park mobility.

#### Abrupt Changes
Our differencing graph shows far less abrupt changes in this dataset than in others. Park mobility shows steady inclines and declines as the year progresses.

# Germany 2020: Transit
We will now characterise and visualise the transit mobility data for Germany in 2020.

In [78]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Germany
attribute = Attribute.Transit
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Germany: Representation Of Decrease In Transit Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trends
After the initial change due to the pandemic we see a gradual trend downward in the curve, implying an overall gradual increase in mobility as the year progresses. 

#### Seasonality
While more difficult to see in the earlier months, we begin to see a seasonality in the transit data for Germany from early summer onwards. This is a weekly seasonality.

#### Outliers
There are a few minor outliers such as April 13th and May 1st. 

#### Abrupt Changes
On our rolling mean graph we can see very clearly the abrupt change in April 2020 towards an overall decrease in transit used (represented by the upward curve). This is the only very abrupt change we see this year.

## Germany 2020: Workplaces
We will now characterise and visualise the workplace data for Germany in 2020.

In [79]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Germany
attribute = Attribute.Workplaces
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Germany: Representation Of Decrease In Workplace Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trends
Our graph appears to have less of a trend, but this is due to the change in axis. We see that the range has increased from previous graphs due to the very large outliers. While our trends appear smaller, they are actually larger than in previous graphs.

#### Seasonality
We see a clear workplace seasonality by week. This indicates to us the difference in workplace mobility during the weekends and during weekdays. Our differencing graph flattens this out.

#### Outliers
We see huge outliers that affect our data tremendously in this example. Examples are April 13th and May 21st. These two dates are common outliers in the dataset.

#### Abrupt Changes
On our differencing graph we can clearly see the abrupt changes in May of 2020. These changes become stable then after this.

## Germany 2020: Residential
We will now characterise and visualise the residential data for Germany in 2020.

In [80]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Germany
attribute = Attribute.Residential
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Germany: Representation Of Decrease In Residential Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trend
Our rolling mean shows the downward curve this year which represents an overall upward trend in residential mobility. It is one of the few attributes where the trend is in this direction.

#### Seasonality
We see a clear weekly seasonality occurring as people tend towards less or more residential mobility during different parts of the week.

#### Outliers
Two examples of distinct outliers are April 10th and May 1st. We are beginning to notice a pattern in the outliers in each data attribute.

#### Abrupt Changes
We can see the abrupt increase in overall residential mobility through our differencing bar chart in May 2020. This increase gradually eases off as the year progresses. However it bounces back in the recent months, likely due to the second German lockdown.

# Comparison Of Germany's Attributes
Germany is the country that is of the most interest to me of these three, as I was actually working on a construction project over there during for the one month period of August 2020. It's really interesting to see the data points that correspond to this period on the graphs above. It was notably hot during this time, and I imagine this has had an effect on these datasets.

## Changes During Summer
Let's for a moment examine only the **changes** to the curve during the German summer of 2020, with respect to the park mobility data. Note that this data still represents **the decrease in park mobility** as values are higher. Thus, a lower value represents higher park mobility. Park mobility is especially interesting during this time as it is greatly influenced by hotter parts of the year. 

![Germany Summer Curve](./res/germany_park_structure.png)

We see that there is an overall general increase in park mobility during the Summer, represented by the downward curve. Note that the later section of curve, as we move into August, is a considerably popular time for park mobility. This is due to the fact that it was the hottest month they had experienced that Summer, and it certainly felt it!

## Correlation Between Attributes
We will now examine some of the correlations or lack thereof between our location attributes in Germany for 2020. I believe an interesting comparison would be workplace mobility and transit mobility. Regularly, these would have a strong positive correlation. This is due to the high transit mobility seen during the work-week. However, will we see that this time? The shift to remote work may reduce this effect. Let's find out!

In [87]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 1
num_subplot_cols = 1

country = Country.Germany
attribute_one = Attribute.Transit
attribute_two = Attribute.Workplaces

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Germany: Correlation Of Transit & Workplace Mobility In 2020",
    "line_title" : None,
    "density_title" : None,
    "bar_title" : None,
    "width" : 700,
    "height" : 700
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Transit (% Change)",
        "yaxis" : "Workplaces (% Change)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
corrcoef_mat = suite.plot_correlation(country, attribute_one, attribute_two, subplot_prefs)
suite.show()

print("Pearson Product-Moment Correlation Coefficient Matrix")
print(corrcoef_mat)

Pearson Product-Moment Correlation Coefficient Matrix
[[1.         0.75095538]
 [0.75095538 1.        ]]


The effect remains strong! We can see a medium/strong positive correlation between transit and workplace mobility. This correlation isn't perfect however, it's bordering on a medium correlation. I would hypothesise that the effect in a regular year would be even stronger, however it's difficult to know for sure. 

Let's also examine the correlation between residential mobility and grocery/pharma mobility. The reason I am interested in these two attributes is that there were notable grocery panics when each lockdown came into effect in Germany. This may correlate to upticks in people isolating.

In [88]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 1
num_subplot_cols = 1

country = Country.Germany
attribute_one = Attribute.Residential
attribute_two = Attribute.Grocery_And_Pharma

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Germany: Correlation Of Residential & Grocery/Pharma Mobility In 2020",
    "line_title" : None,
    "density_title" : None,
    "bar_title" : None,
    "width" : 700,
    "height" : 700
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Residential (% Change)",
        "yaxis" : "Grocery/Pharma (% Change)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
corrcoef_mat = suite.plot_correlation(country, attribute_one, attribute_two, subplot_prefs)
suite.show()

print("Pearson Product-Moment Correlation Coefficient Matrix")
print(corrcoef_mat)

Pearson Product-Moment Correlation Coefficient Matrix
[[ 1.         -0.61059779]
 [-0.61059779  1.        ]]


Ah, interesting! This is not what I expected. We can thank the scientific method for proving me wrong.

We see here our Pearson Product-Moment Correlation Coefficient Matrix (try saying that quickly..) has given us an approximate correlation coefficient of -0.61

This means that there is a **medium inverse correlation between Grocery/Pharma mobility and Residential mobility**. As people stay more in their homes, they are somewhat less likely to be out in the shop or pharmacy.

The reasoning for this is actually quite clear, now that we have examined the correlation. We can see that there was a [huge increase in online shopping in Germany][1] during the year 2020. These increases were driven by people's worry about leaving the home and thus dramatically reduced grocery/pharma mobility during these periods.

[1]: https://ecommercenews.eu/ecommerce-in-germany-e103-4-billion-in-2020/#:~:text=Ecommerce%20in%20Germany%20is%20expected,94%20billion%20euros%20last%20year.&text=Last%20year%2C%2084%20percent%20of,of%2085%20percent%20is%20expected.

# Differences Between Attributes: Whats With These Outliers!?
When examining Belgium, we noted the differences in outliers between various attributes. In that example, there was a mix between outliers affecting some attributes and not others. However, with Germany there are outliers that remain present for almost every attribute.

One notable example is October 3rd. We see October 3rd appering as a spike in most attributes and will investigate to find the root cause. I hypothesise that this is, like Belgium, a significant day in Germany. The reason I do not believe it is pandemic-related is due to the fact that I was following the news at the time, and do not remember this being the date of new restrictions.

After a quick moment of research, October 3rd is found to be [Germany Unity Day][1], a day that celebrates the reunification of Germany. This was surprising, as I had believed that the date of the Berlin wall falling (November 9th) to be the day that this was celebrated. This is a huge national holiday in Germany and explains the differences we see between location attributes for this day. 

![German Unity Day](./res/germany_unity_day.jpg)

[1]: https://en.wikipedia.org/wiki/German_Unity_Day

# Austria 2020: A Mobility Analysis
We've now characterised and visualised each attribute for Belgium and Germany. We are noticing many distinct similarities and differences between countries and their approach to tackling the coronavirus pandemic. We will now move on to our final country: Austria. 

## Austria 2020: Retail & Recreational Mobility
We will begin by characterising and visualising the retail and recreation data for Austria in 2020.

In [81]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Austria
attribute = Attribute.Retail_And_Rec
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Austria: Representation Of Decrease In Retail & Recreational Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trend
Immediately we can see the trend towards an overall decline in retail and recreational mobility in the earlier months of the year. This is due to the effects of the pandemic. As the year progresses we can see on our rollig mean a gradual tendency towards more overall retail and recreational mobility, represented by a decline in our curve.

#### Seasonality
We see a rough form of weekly seasonality in the data. This is not as clear and distinct as in perhaps workplace mobility.

#### Outliers
There are not as many relatively large outliers in this dataset. We can see on August 15th there was quite a large spike in retail and recreational mobility.

#### Abrupt Changes
We can see the abrupt change due to the pandemic in the early months of 2020. This eases off as the year progresses.

## Austria 2020: Grocery & Pharma Mobility
We will now characterise and visualise the grocery and pharma mobility data for Austria in 2020.

In [82]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Austria
attribute = Attribute.Grocery_And_Pharma
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Austria: Representation Of Decrease In Grocery & Pharma Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trend
We see a very quick trend towards an overall decline in grocery and pharmacy mobility at the beginning of the year and then a gradual movement towards baseline. This is most clear in our differencing graph, where we can see May 2020 as a stark contrast to other months.

#### Seasonality
We see a weekly seasonality in the grocery and pharmacy mobility data indicating the different parts of the week that people are likely to go shopping. We do not see any seasonality with respect to the differences in yearly seasons.

#### Outliers
We see significant outliers on May 1st, May 21st and August 15th. Note that August 15th was an outlier we have seen previously.

#### Abrupt Change
We see a very stark decline in overall mobility due to the coronavirus pandemic at the beginning of the year.

## Austria 2020: Park Mobility
We will now characterise and visualise the park mobility data for Austria in 2020.

In [83]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Austria
attribute = Attribute.Parks
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Austria: Representation Of Decrease In Park Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trend
At the beginning of the pandemic we see a general trend towards less time spent outdoors, however this quickly develops into far more time spent in parks than usual. This is likely due to the effect of the lockdown making people use parks to their advantage and additionally the change of seasons.

#### Seasonality
There is a clear seasonality corresponding to the yearly summer season. Summer is hotter and therefore park mobility trends upward. 

#### Outliers
There aren't as many clear outliers in this data as previous attributes. We can see an outlier occurring on June 1st.

#### Abrupt Changes
There is a 50% decline in park mobility at the beginning of the pandemic and then a very large overall 100% increase in park mobility. However this change is not as abrupt as in previous attributes.

## Austria 2020: Transit Mobility
We will now characterise and visualise the transit mobility data for Austria in 2020.

In [84]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Austria
attribute = Attribute.Transit
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Austria: Representation Of Decrease In Transit Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trend
After the initial quick decline of overall transit mobility, we see a trend back towards the baseline as the year progresses.

#### Seasonality
There is a weekly seasonality in transit mobility data for Austria in 2020, due to the difference in the use of transit during the working week and on the weekend.

#### Outliers
We do not see as many stark outliers in this dataset. Two examples of outliers would be June 11th and October 26th.

#### Abrupt Changes
Our rolling mean shows quite nicely the abrupt change in transit mobility as Austria entered it's first lockdown.

## Austria 2020: Workplace Mobility
We will now characterise and visualise the workplace mobility data in Austria for 2020.

In [85]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Austria
attribute = Attribute.Workplaces
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Austria: Representation Of Decrease In Workplace Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trends
There are less visible trends in this data, besides the initial decline in workplace mobility due to the large switch to remote learning. We can see quite a stable differencing graph at the bottom.

#### Seasonality
There is a clear yearly summer seasonality here, with less people in the workplace during the summertime. Additonally, we have the weekly seasonality due to weekday/weekend workplace changes.

#### Outliers
We can see very stark outliers on May 21st, June 1st and June 11th. These have been common outliers in our attributes thus far.

#### Abrupt Changes
Our differencing graph shows nicely the abrupt initial change of the pandemic and then the gradual changes that occur the rest of the year in Austria.

## Austria 2020: Residential Mobility
We will now characterise and visualise residential mobility in Austria during 2020.

In [86]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Austria
attribute = Attribute.Residential
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Austria: Representation Of Decrease In Residential Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trends
Our rolling mean shows us the stark abrupt change at the beginning of the graph and then the trend towards less residential mobility as the year went on and people began to self-isolate less.

#### Seasonality 
We see a weekly seasonality in the residential mobility data for Austria. This is due to people more or less likely to be in their homes during the working week verses at the weekend.

#### Outliers
We see the outliers of April 13th, May 1st and June 11th standing out in the graph. These have been common outliers and represent significant days in relation to holidays or the coronavirus pandemic.

#### Abrupt Changes
Our differencing graph shows well the abrupt initial increase in residential mobility at the beginning of the year in Austria in 2020. This then gradually eases off as the year progresses.

# Comparison Of Austria's Attributes
Austria would be more similar to Germany than Belgium to Germany. We see multiple similarities between the two country's locational attributes and patterns that we did not see originally in Belgium.

These two countries are very historically linked to each other and have fought the coronavirus pandemic in a similar fashion. Let's explore some of the changes over time we see in Austria's dataset.

## Multiple Patterns
We see many familiar patterns as the locational attributes change over time. We can see some of these patterns summarised below. These recurring patterns in the datasets between countries show us how the pandemic and also seasonal changes occur internationally and not only in one country. This is why unique outliers can be so interesting, we expect neighbouring countries to have similar patterns. When something significant occurs in Germany it's worth investigating why it hasn't happened in Belgium or Austria, for example.

![Austria Transit Mobility Pattern](./res/austria_transit_pattern.png)
![Austria Workplace Mobility Pattern](./res/austria_workplace_pattern.png)
![Austria Park Mobility Pattern](./res/austria_park_pattern.png)

## Correlation Between Attributes
We would expect similar correlations as in previous countries between the various locational attributes. Let's examine if the correlation we've seen previously between retail/recreational mobility and grocery/pharma mobility holds true in Austria.

In [90]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 1
num_subplot_cols = 1

country = Country.Austria
attribute_one = Attribute.Retail_And_Rec
attribute_two = Attribute.Grocery_And_Pharma

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Austria: Correlation Of Retail/Rec & Grocery/Pharma Mobility In 2020",
    "line_title" : None,
    "density_title" : None,
    "bar_title" : None,
    "width" : 700,
    "height" : 700
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Retail/Rec (% Change)",
        "yaxis" : "Grocery/Pharma (% Change)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
corrcoef_mat = suite.plot_correlation(country, attribute_one, attribute_two, subplot_prefs)
suite.show()

print("Pearson Product-Moment Correlation Coefficient Matrix")
print(corrcoef_mat)

Pearson Product-Moment Correlation Coefficient Matrix
[[1.         0.67544123]
 [0.67544123 1.        ]]


Here we see a strong/medium positive correlation between the attributes grocery/pharma mobility and retail/recreational mobility. We can see their correlation is almost identical to Belgium's same correlation for theses attributes. Recall that Belgium's was approximately 0.7, a difference of 0.02 to Austria's 0.68.

The small difference in their correlation is not significant, and could be down to small differences in each country that aren't relevant.

Our analysis of correlational between Germany's attributes showed us that the move to online shopping has perhaps caused a medium inverse relationship between grocery shopping and remaining at home. We will see if this holds true for Austria.

In [92]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 1
num_subplot_cols = 1

country = Country.Austria
attribute_one = Attribute.Grocery_And_Pharma
attribute_two = Attribute.Residential

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Austria: Correlation Of Grocery/Pharma & Residential Mobility In 2020",
    "line_title" : None,
    "density_title" : None,
    "bar_title" : None,
    "width" : 700,
    "height" : 700
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Grocery/Pharma (% Change)",
        "yaxis" : "Residential (% Change)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
corrcoef_mat = suite.plot_correlation(country, attribute_one, attribute_two, subplot_prefs)
suite.show()

print("Pearson Product-Moment Correlation Coefficient Matrix")
print(corrcoef_mat)

Pearson Product-Moment Correlation Coefficient Matrix
[[ 1.         -0.57228136]
 [-0.57228136  1.        ]]


Aha! We see that this medium negative correlation holds true for Austria too. Recall that our correlation in Germany for these two attributes was -0.61, and here we have an approximate correlation of -0.57. 

This difference is larger than I expected, and may represent a difference in the scale of panic shopping in each country as they entered lockdown. Additonally, people in one country may be have been more hesitant to move to online shopping.

## Differences Between Locational Attributes
We would now like to examine any differences we have seen between Austria's locational attributes. Previously, we looked at how outliers played a role in these differences. Instead of a further examination of Austria's outliers, we're going to look closer at the differing patterns we see between the attributes.

These patterns, as I have displayed above, are strikingly different for various attributes. Recall again Austria's transit mobility pattern.

![Austria's Transit Mobility](./res/austria_transit_pattern.png)

We see the huge affect of the pandemic on Austria's transit data. A very swift and sudden halt in transit caused this upward tick which represents an overall decline in the attribute. This stands in stark contract to Austria's park mobility pattern.

![Austria's Park Mobility](./res/austria_park_pattern.png)

We see an initial small uptick in park mobility, representing a decline in park usage at the beginning of lockdown, and then a huge swoosh into overall high park usage as we enter the rest of the year. This is a heavily seasonal attribute compared to transit, and the summer season has a huge affect on peoples usage of parks. We can lastly contrast this to workplace mobility in Austria for the year 2020. We can see this pattern again below.

![Austria's Workplace Mobility](./res/austria_workplace_pattern.png)

Here we see the previously mentioned humpback pattern. In addition to seeing this here in Austria's dataset, this was also a recurring pattern in Belgium's data. We had examined this recurring humpback form all the way back in our analysis of Belgium.

I have found these differing patterns between attributes really fascinating. They really stand out, shows our differing needs as humans at various stages of the year. They are particularly striking this year due to the affects of the pandemic and how it has emphasised different attributes so dramatically.

## Conclusion On Individual Analyses
Thus concludes my individual analyses of Belgium, Germany and Austria. I really enjoyed looking at each country's attributes, and found the aforementioned patterns to be of particular interest. They show how data means nothing until we can abstract from it. We can do this with tools such as these data visualisation techniques and the wonderful libraries that have helped me hugely in this project and the previous. 

The data has always contained these hidden insights that we have found in this analysis. However, it is up to us as data scientists to draw the curtain and reveal them.

# Task 2: Between Country Analysis
In this task we will be combining the attributes from all three of our countries and extracting insights about the overall mobility data of Belgium, Germany and Austria for the year 2020.

We will begin by setting up a new set of tools that will help us analyse the attributes. We will name these new tools the Central European Suite (even though Belgium isn't really central Europe..)

These tools will plot three graphs:
* A MultiScatter Plot - Plots all three countries separately for attribute A
* An Average Plot (Rolling Mean Applied) - Draws the average of the three countries for attribute A
* Three-Dimensional Correlation Plot - My personal favourite, this will show us how "in-tune" each country was with each other.

Lets set up our individual plotting classes first.

In [99]:
#Plots Three Countries Separately
class MultilinePlot(Graph):
	def plot(self, dates, country_A, country_B, country_C,
		name_A, name_B, name_C):
        
        #Create A Scatter Trace For Country A
		scatter_A = go.Scatter(x=dates, y=country_A, 
			mode='markers', name=name_A)

        #Create A Scatter Trace For Country B
		scatter_B = go.Scatter(x=dates,y=country_B,
			mode='markers', name=name_B)

        #Create A Scatter Trace For Country C
		scatter_C = go.Scatter(x=dates,y=country_C,
			mode='markers', name=name_C)

		return scatter_A, scatter_B, scatter_C
    
#Draws The Average Of The Three Countries
class CombinedAveragePlot(Graph):

	def plot(self, dates, cA_data, cB_data, cC_data, name):
        
        #Calculate The Average (Sum(x1,x2..xN) / N)
		avg_series = (cA_data + cB_data + cC_data) / 3.0

		#Retrieve Rolling Mean
		dates, roll_mean = self.transformer.get_rolling_mean(dates, 
			avg_series, 20)
        
        #Plot As Line
		combined_avg = go.Scatter(x=dates, 
			y=roll_mean,
			name=name)

		return combined_avg
    
#Shows The Correlation Between The Attribute For The Three Countries
class CorrScatter(Graph):

	def plot(self, att_A, att_B, att_C):
		scatter = go.Scatter3d(x=att_A,
			y=att_B, z=att_C, mode='markers')

		return scatter

Now we'll just straight into creating our main class. It will combine our three countries into three graphs for us.

In [100]:
class CentralEuropeanSuite:

	#Number Of Rows/Cols Of Subplots
	subplot_rows = None
	subplot_cols = None

	figure = None
	prefs = None

	#Mobility Manager Created
	data_manager = MobilityManager()

	def __init__(self, rows, cols, height, width, prefs):
		self.prefs = prefs
        
        #Set Titles For First Two Subplots
		scatter_ttl = self.prefs["scatter_ttl"]
		average_ttl = self.prefs["average_ttl"]

		#Create First Two Subplots
		self.figure = make_subplots(rows = rows, 
		cols = cols, subplot_titles=[scatter_ttl, average_ttl])
        
        #Set Width/Height Of Graph
		self.figure.update_layout(height=height, width=width)

		self.subplot_rows = rows
		self.subplot_cols = cols

	def plot(self, attribute, country_A, country_B, country_C):
		#Retrieve Our Data
		dates = self.data_manager.get_attribute(
			country_A, Attribute.Date)
        
        #Retrieve Country Data 
		cA_data = self.data_manager.get_attribute(
			country_A, attribute)
		cB_data = self.data_manager.get_attribute(
			country_B, attribute)
		cC_data = self.data_manager.get_attribute(
			country_C, attribute)
        
        #Name Each Scatter Trace
		scatter_A_ttl = self.prefs["scatter_trace_A"]
		scatter_B_ttl = self.prefs["scatter_trace_B"]
		scatter_C_ttl = self.prefs["scatter_trace_C"]

		scatter_A, scatter_B, scatter_C = self.get_multiline_plot(
			dates, cA_data, cB_data, cC_data,
			scatter_A_ttl, scatter_B_ttl, scatter_C_ttl)
        
        #X & Y Axis For Scatter Plot
		x_scatter_t = self.prefs["scatter_x_ttl"]
		y_scatter_t = self.prefs["scatter_y_ttl"]
        
        #X & Y Axis For Average Plot
		ttl_avg = self.prefs["average_ttl"]
		x_avg_t = self.prefs["average_x_axis"]
		y_avg_t = self.prefs["average_y_axis"]

		self.add_traces([scatter_A, scatter_B, scatter_C],1,1,
			 x_avg_t, y_avg_t)

		combined_avg_plot = self.get_combined_average_plot(dates, 
			cA_data, cB_data, cC_data, ttl_avg)
		self.add_traces([combined_avg_plot], 2,1,
			 x_avg_t, y_avg_t)

    #Retrieve Combined Average Plot
	def get_combined_average_plot(self, dates, country_A, country_B, country_C, name):
		avg_plot = CombinedAveragePlot()
		return avg_plot.plot(dates, country_A, country_B, country_C, name)

    #Retrieve Multiline Plot
	def get_multiline_plot(self, dates, country_A, country_B, country_C,
		name_A, name_B, name_C):
		ml_plot = MultilinePlot()
		plot = ml_plot.plot(dates, country_A, country_B, country_C,
			name_A, name_B, name_C)
		return plot 
    
    #Retrieve Correlation Scatter Plot
	def get_corr_scatter(self, attribute, 
		country_A, country_B, country_C):

		corr_fig = go.Figure()

		#Retrieve Our Data
		dates = self.data_manager.get_attribute(
			country_A, Attribute.Date)

		cA_data = self.data_manager.get_attribute(
			country_A, attribute)
		cB_data = self.data_manager.get_attribute(
			country_B, attribute)
		cC_data = self.data_manager.get_attribute(
			country_C, attribute)

		ttl = self.prefs["corr_title"]
		x_ttl = self.prefs["corr_x_ttl"]
		y_ttl = self.prefs["corr_y_ttl"]
		z_ttl = self.prefs["corr_z_ttl"]

		plotter = CorrScatter()
		corr_scat = plotter.plot(
			cA_data, cB_data, cC_data)

		corr_fig.add_trace(corr_scat)

		corr_fig.update_layout(scene = dict(
		    xaxis_title=x_ttl,
		    yaxis_title=y_ttl,
		    zaxis_title=z_ttl,
		))

		corr_fig.update_layout(title=ttl)

		return corr_fig
    
    #Add Traces To Figure
	def add_traces(self, traces, row, col, x_ttl, y_ttl):
		for trace in traces:
			self.figure.add_trace(trace,
				row=row, col=col)

			self.figure.update_xaxes(
				title_text=x_ttl, row=row, col=1)

			self.figure.update_yaxes(
				title_text=y_ttl, row=row, col=1)

	#Show The Graph
	def show(self):
		self.figure.show()

## Structuring Our Analyses
We have added a solid structure to our visualisation of each attribute, however we must also add this structure to our characterisations. How will we characterise the attributes that we're working with?

For each attribute, we will characterise it from multiple angles:
* Trend, Seasonality & Outliers
* Correlation Between Countries
* Variance From The Mean

These three headings will form the basis of our characterisation of each attribute for this overall mobility of Belgium, Germany and Austria. 

## Retail And Recreation
We will begin with an analysis of overall retail and recreation. We can now be introduced to the three graphs that will make insights easier for us as the data scientist.

When you run the below code, note that in the third graph (the three-dimensional scatter plot) **you can interact and move the camera angle**. This will be necessary to see the correlation between attributes.

In [106]:
prefs = {
	"scatter_ttl" : "Belgium, Austria & Germany",
	"scatter_x_ttl" : "Dates",
	"scatter_y_ttl" : "Change (%)",
	"scatter_trace_A" : "Belgium",
	"scatter_trace_B" : "Germany",
	"scatter_trace_C" : "Austria",
	"average_ttl" : "Average Of Three Countries (Rolling Mean)",
	"average_x_axis" : "Dates",
	"average_y_axis" : "Change (%)",
	"corr_title" : "Correlation Between Countries",
	"corr_x_ttl" : "Belgium",
	"corr_y_ttl" : "Germany",
	"corr_z_ttl" : "Austria"
	}

#The Attribute We're Examining
attribute = Attribute.Retail_And_Rec

#Our Three Countries
country_A = Country.Belgium
country_B = Country.Germany
country_C = Country.Austria

#Create The Central European Suite 
ce_suite = CentralEuropeanSuite(3,1, 1000, 1000, prefs)

#Plot The First Two Plots
ce_suite.plot(attribute, country_A, country_B, country_C)

#Our 3D Plot Must Be Plotted Separately
corr_scatt = ce_suite.get_corr_scatter(
    attribute, country_A, country_B, country_C)
   
#Display The Three Plots
ce_suite.show()
corr_scatt.show()

## Trend, Seasonality & Outliers
Note that in this part of the project, I have not reversed the sign of the data points. Therefore each decrease represents a decrease and increase represents and increase in percentage mobility. We can immediately see the striking decline in retail and recreation due to the coronavirus pandemic. However, we also notice a trend back to baseline. I would almost call this a bouncing back of retail/recreational mobility. There is a very significant movement back to the baseline as soon as the data reaches its deepest trough. The outliers here have less of an effect than in the previous part of the project, and are smoothed out nicely by our rolling mean graph. 

## Correlation
Allow yourself time to interact with the correlation graph. You should be able to pan in and out, and rotate the camera around a centrepoint. We can see here the clear strong correlation between each country for the retail and recreational mobility data. The 3D aspect of the graph represents this quite nicely, and suits the fact that we are working with three countries.

## Variance
We notice that the variance is at its strongest at the very beginning of the pandemic, as every country was facing the same problems. It's interesting to note how this change as the year goes on. The variance in our data begins to increase as we see countries diverging from each other. There is still a very strong tendency to stay near the mean however, and the correlation remains strong. 

## Grocery And Pharma Mobility
We will now analyse the grocery and pharma mobility data for all three countries together in 2020. We will begin by plotting the data as we have previously.

In [107]:
prefs = {
	"scatter_ttl" : "Belgium, Austria & Germany",
	"scatter_x_ttl" : "Dates",
	"scatter_y_ttl" : "Change (%)",
	"scatter_trace_A" : "Belgium",
	"scatter_trace_B" : "Germany",
	"scatter_trace_C" : "Austria",
	"average_ttl" : "Average Of Three Countries (Rolling Mean)",
	"average_x_axis" : "Dates",
	"average_y_axis" : "Change (%)",
	"corr_title" : "Correlation Between Countries",
	"corr_x_ttl" : "Belgium",
	"corr_y_ttl" : "Germany",
	"corr_z_ttl" : "Austria"
	}

#The Attribute We're Examining
attribute = Attribute.Grocery_And_Pharma

#Our Three Countries
country_A = Country.Belgium
country_B = Country.Germany
country_C = Country.Austria

#Create The Central European Suite 
ce_suite = CentralEuropeanSuite(3,1, 1000, 1000, prefs)

#Plot The First Two Plots
ce_suite.plot(attribute, country_A, country_B, country_C)

#Our 3D Plot Must Be Plotted Separately
corr_scatt = ce_suite.get_corr_scatter(
    attribute, country_A, country_B, country_C)
   
#Display The Three Plots
ce_suite.show()
corr_scatt.show()

## Trends, Seasonality & Outliers
This is interesting, we are seeing far more outliers in the grocery and pharma mobility data than we did in the previous retail and recreational data. Note that these outliers are in both directions and not heavily skewed towards more higher or lower. We see the big decline at the beginning of the pandemic, however it is not as clear due to the range of the outliers affecting our graph range. We don't see any seasonality with respect to the change of the yearly seasons.

## Correlation
As we examine our correlation graph, we can still see this correlation between the three countries. However, we immediately notice the large and varied spread of outliers in different directions. It's clear that grocery and pharma mobility is a more individualistic property to each country than retail and recreational mobility.

## Variance
We see a much higher variance from the mean for this attribute compared to retail and recreation. The outliers largely affect this, however there is also a natural larger spread of the data.

## Parks
We will now examine the data for overall park mobility for the three countries in 2020. We would expect very seasonal data here as the three countries would all experience upticks in heat during the summer season.

Lets begin with our graphs.

In [108]:
prefs = {
	"scatter_ttl" : "Belgium, Austria & Germany",
	"scatter_x_ttl" : "Dates",
	"scatter_y_ttl" : "Change (%)",
	"scatter_trace_A" : "Belgium",
	"scatter_trace_B" : "Germany",
	"scatter_trace_C" : "Austria",
	"average_ttl" : "Average Of Three Countries (Rolling Mean)",
	"average_x_axis" : "Dates",
	"average_y_axis" : "Change (%)",
	"corr_title" : "Correlation Between Countries",
	"corr_x_ttl" : "Belgium",
	"corr_y_ttl" : "Germany",
	"corr_z_ttl" : "Austria"
	}

#The Attribute We're Examining
attribute = Attribute.Parks

#Our Three Countries
country_A = Country.Belgium
country_B = Country.Germany
country_C = Country.Austria

#Create The Central European Suite 
ce_suite = CentralEuropeanSuite(3,1, 1000, 1000, prefs)

#Plot The First Two Plots
ce_suite.plot(attribute, country_A, country_B, country_C)

#Our 3D Plot Must Be Plotted Separately
corr_scatt = ce_suite.get_corr_scatter(
    attribute, country_A, country_B, country_C)
   
#Display The Three Plots
ce_suite.show()
corr_scatt.show()

## Trends, Seasonality & Outliers
We can clearly see the seasonality that we expected in this data to be present for all three countries. We can see a peak in park mobility during the summer, especially August, when temperatures reached their peaks. Note that the increase is huge, reaching double what it was at baseline during January. There are multiple outliers that we can spot in this dataset, however nothing like we had in our previous grocery and pharma mobility data.

## Correlation
On examining our correlation graph, we can see another strong correlation between the countries. Our plot is a bit more spread out in this example where we see a general correlation but some deviation.

## Variance
The variance is quite high for park mobility in the three countries. We can see a strong deviation from the mean throughout the year, however this tightens as we enter summer and the three traces line up more tightly.

# Transit
We will now examine the transit data for the three countries in 2020. Transit is an attribute that can vary quite a bit depending on what country your in, however perhaps the closeness of these countries will cause it to be more aligned. Let's see what the data says!

In [109]:
prefs = {
	"scatter_ttl" : "Belgium, Austria & Germany",
	"scatter_x_ttl" : "Dates",
	"scatter_y_ttl" : "Change (%)",
	"scatter_trace_A" : "Belgium",
	"scatter_trace_B" : "Germany",
	"scatter_trace_C" : "Austria",
	"average_ttl" : "Average Of Three Countries (Rolling Mean)",
	"average_x_axis" : "Dates",
	"average_y_axis" : "Change (%)",
	"corr_title" : "Correlation Between Countries",
	"corr_x_ttl" : "Belgium",
	"corr_y_ttl" : "Germany",
	"corr_z_ttl" : "Austria"
	}

#The Attribute We're Examining
attribute = Attribute.Transit

#Our Three Countries
country_A = Country.Belgium
country_B = Country.Germany
country_C = Country.Austria

#Create The Central European Suite 
ce_suite = CentralEuropeanSuite(3,1, 1000, 1000, prefs)

#Plot The First Two Plots
ce_suite.plot(attribute, country_A, country_B, country_C)

#Our 3D Plot Must Be Plotted Separately
corr_scatt = ce_suite.get_corr_scatter(
    attribute, country_A, country_B, country_C)
   
#Display The Three Plots
ce_suite.show()
corr_scatt.show()

## Trends, Seasonality & Outliers
The pandemic has a very clear entry into our data here as we see a free-fall in transit mobility towards the early months of 2020 in every country. There is a notable climb back up to baseline, however this droops down again in the summer. This could be due to the decreased transit usage during the summer holidays having an effect, even though the effect of the pandemic was still dropping. There are less outliers in this data compared to many previous examples.

## Correlation
We can see a strong correlation between the three countries for transit mobility in 2020. 

## Variance
While the overall variance is low, we do see some differences in each country for this data in how it takes its own path. Notice Germany does not fall as steeply as Belgium and Austria at the beginning of the pandemic. 

# Workplaces
We will now examine the workplace mobility data for the three countries in 2020. We can expect very strong drops in workplace mobility towards the beginning of the pandemic.

In [110]:
prefs = {
	"scatter_ttl" : "Belgium, Austria & Germany",
	"scatter_x_ttl" : "Dates",
	"scatter_y_ttl" : "Change (%)",
	"scatter_trace_A" : "Belgium",
	"scatter_trace_B" : "Germany",
	"scatter_trace_C" : "Austria",
	"average_ttl" : "Average Of Three Countries (Rolling Mean)",
	"average_x_axis" : "Dates",
	"average_y_axis" : "Change (%)",
	"corr_title" : "Correlation Between Countries",
	"corr_x_ttl" : "Belgium",
	"corr_y_ttl" : "Germany",
	"corr_z_ttl" : "Austria"
	}

#The Attribute We're Examining
attribute = Attribute.Workplaces

#Our Three Countries
country_A = Country.Belgium
country_B = Country.Germany
country_C = Country.Austria

#Create The Central European Suite 
ce_suite = CentralEuropeanSuite(3,1, 1000, 1000, prefs)

#Plot The First Two Plots
ce_suite.plot(attribute, country_A, country_B, country_C)

#Our 3D Plot Must Be Plotted Separately
corr_scatt = ce_suite.get_corr_scatter(
    attribute, country_A, country_B, country_C)
   
#Display The Three Plots
ce_suite.show()
corr_scatt.show()

## Trends, Seasonality & Outliers
Our rolling mean shows very starkly the abrupt decline of workplace mobility due to the onset of the pandemic and the huge shift towards remote working. We can see this climb steadily back up to baseline as the year progresses. This is interesting, as many expect the shift to remote work to be permanent for many. Interestingly, we see this keep climbing until about mid-summer when it begins to decline again, likely due to the seasonality of summertime and holidays being taken in work. There are multiple outliers appearing in this dataset, most if not all are interestingly representing a huge decrease one day of workplace mobility.

## Correlation
Interestingly, we see less of a strong correlation here between the countries. While a correlation still remains, it is skewed and not completely the straight line we are looking for. This is interesting, and could be influenced by the size of sectors that can move easily online in some countries. For example, Germany's huge tech sector would be able to move to remote work easier than a large agricultural sector.

## Variance
We see a large variance from the mean in our plot of the three countries. The workplace mobility attribute is clearly not nearly as stable as an attribute such as retail and recreation.

# Residential
We will now examine the residential mobility data for the three countries in 2020. This is our final combined attribute for this dataset. Let's dive straight in!

In [111]:
prefs = {
	"scatter_ttl" : "Belgium, Austria & Germany",
	"scatter_x_ttl" : "Dates",
	"scatter_y_ttl" : "Change (%)",
	"scatter_trace_A" : "Belgium",
	"scatter_trace_B" : "Germany",
	"scatter_trace_C" : "Austria",
	"average_ttl" : "Average Of Three Countries (Rolling Mean)",
	"average_x_axis" : "Dates",
	"average_y_axis" : "Change (%)",
	"corr_title" : "Correlation Between Countries",
	"corr_x_ttl" : "Belgium",
	"corr_y_ttl" : "Germany",
	"corr_z_ttl" : "Austria"
	}

#The Attribute We're Examining
attribute = Attribute.Residential

#Our Three Countries
country_A = Country.Belgium
country_B = Country.Germany
country_C = Country.Austria

#Create The Central European Suite 
ce_suite = CentralEuropeanSuite(3,1, 1000, 1000, prefs)

#Plot The First Two Plots
ce_suite.plot(attribute, country_A, country_B, country_C)

#Our 3D Plot Must Be Plotted Separately
corr_scatt = ce_suite.get_corr_scatter(
    attribute, country_A, country_B, country_C)
   
#Display The Three Plots
ce_suite.show()
corr_scatt.show()

## Trends, Seasonality & Outliers
The pattern here remind me strongly of our retail and recreational data. We see a tight trend upwards at the beginning of the pandemic and then a gradual "swoosh" downwards in all three countries. Each country remains fairly in unison with the others until that return to baseline where they start to separate on their own paths more. We can see at the end of the graph a movement back up again as countries begin to enter their second lockdown.

## Correlation
We can a strong correlation between all three countries. It is perhaps not as perfect as other correlations, but represents the similarity of the impact of residential mobility in each country. These countries, as neighbours, have definitely had similar responses to the pandemic and experienced similar results.

## Variance
We see a large variance from the mean in the plot of the residential mobility data for each country. Belgium appears to remain constantly above the other two countries throughout the course of pandemic in terms of residential mobility. This is interesting, perhaps represented by a larger section of the workforce being unable to move easily to remote working.