# A Mobility Analysis Of Austria, Belgium & Germany

## Good Practices In Constructing Time Series

In any craft there are basic principles with which one must learn in order to lay the foundation of good work. Time series visualisation is no different, relying on many important decisions by the data scientist before coming to fruition.

We can trace a history of these graphs back to Scottish economist William Playfair. Playfair combined his love of art and data to create the graph we now know as a time series. His first publication in 1786 looks incredibly modern, plotting the cost of wheat against the cost of labour in England. His graph disproved a hypthesis that wages were driving the price of wheat up. Playfair showed that they were actually rising much slower than the price of wheat with this very clever display, rising a new dawn for data visualisations of time periods.

![Playfair's First Time Series](res/playfair_time_series.jpg)

When we introduce time into a graph, it gives rise to many distinct components. 
Common components of a time series include:
* **Trend** - The general tendency to increase or decrease over time.

* **Seasonality** - Peaks / troughs that occur at regular intervals. This can be daily, weekly, monthly or even yearly cycles.

* **Noise** - Random fluctuations in data which are left when all the components have been removed.

In constructing our time series, we will pay respect to each of these components. Seasonality can be tested using statistical tests, while trends are more random. Noise occurs in any real-world data set, and can be dealt with through methods such as smoothing and resampling. We will be constructing time series both with smoothing/resampling and without, in order to gain a complete picture of the data. The aim is to show what the data means, rather than merely what it looks like plotted.

I have decided to use three different types of visualisations to display each attribute. I believe that these four visualisations will give us a strong sense of the magnitude of change in our data and intuitively represent these changes for analysis.

### Universal Line Plot
This plot will contain our original data and represents a starting point for our understanding. This is the root of the visualisations to come, as it will give us the clearest picture of the trends, seasonality and noise within our dataset. 

Additionally, annotations will be added to this plot in order to mark key moments related to the attribute. These may give us insight into particular peaks or troughs in relation to that specific attribute.

### Density Plot W/ Moving Average Smoothing
My aim with this density plot is to give us much more of a feel into the major changes that occurred during the time period. A moving average smoothing technique will be applied in order to remove the effect of outliers on the data and filter out noise. 

### Resampled Bar W/ Differencing
This graph will apply differencing, where the data represents the change from one day to the next. We can set the window of change so that it will tell us the difference between as large a gap as we would like. Resampling will be applied to our bar chart to give it a "more" discrete visualisation and reduce the amount of bins.

## Structuring Our Code
When visualising the data, we don't want to write the same block of code repeatedly in order to get different results. We want to quickly move from attribute to attribute without having to worry to much about the underlying code.

To ensure we can focus on the visualisations, I'm going to set up a class that will give us the above plots for each attribute as simply as we would like. This overlying class I'm going to call the "Mobility Suite". This suite will use plotly, pandas and numpy in order to give us the results that we need from the data.

There are different aspects to this suite such as:
* **Mobility Manager** - Loads the data from CSV files.
* **Graph** - Basic parent class to create a graph.
    * **Transformer** - Performs Resampling, Smoothing & Differencing
    * **Visualisations**:
        * **Visualisation 1**: Universal Line Plot 
        * **Visualisation 2**: Density Plot W/ Rolling Mean
        * **Visualisation 3**: Resampled Bar W/ Differencing
        
#### Standardising Calls To Country/Attribute
Additionally, we will standardise our call to each attribute and country.
Rather than ever using a string to call to an attribute or country, which may only work by coincedence of us using the string correctly, we will use enums. 
    
![Mobility Suite](./res/mobility_suite_structure.png)

### The Mobility Manager
We'll begin by setting up the class to load in our data. It acts as the intermediary between the programmer and the data, ensuring we don't run into any problems in our interactions. 

Firstly, let's create two enums to reference each Country and attribute in our data. We will use the enum class for this.

In [580]:
import enum

class Country(enum.Enum):
	Belgium = 0
	Germany = 1
	Austria = 2

class Attribute(enum.Enum):
	ID = 0
	Country = 1
	Date = 2
	Retail_And_Rec = 3
	Grocery_And_Pharma = 4
	Parks = 5
	Transit = 6
	Workplaces = 7
	Residential = 8

print("Attributes:\n")
for att in Attribute:
	print(att)
print("\n\n")
    
print("Countries:\n")
for c in Country:
	print(c)
print("\n\n")

Attributes:

Attribute.ID
Attribute.Country
Attribute.Date
Attribute.Retail_And_Rec
Attribute.Grocery_And_Pharma
Attribute.Parks
Attribute.Transit
Attribute.Workplaces
Attribute.Residential



Countries:

Country.Belgium
Country.Germany
Country.Austria





Next we will write the implementation for our Mobility Manager. Rather than write a markdown paragraph for each part, I will include python comments that will make clear what I'm creating at each different section. The most useful method, as we will see, will be the **get_attribute** call, which will be extremely useful for loading particular elements of our datasets.

In [581]:
import pandas as pd

class MobilityManager:

	#Our CSV Files
	austria_file = "austria.csv"
	belgium_file = "belgium.csv"
	germany_file = "germany.csv"

	"""
	In order to be able to use the standardised 
	attribute enum we created, we will need a dictionary to 
	convert from these attribute enums to the column 
	name thatwe need from the dataframe.

	For our Countries, we'll store each dataframe itself 
	inside the dict as this is simpler.

	TLDR:
	Dicts Convert 
	(Attribute Enum) => (Column String)
	(Country Enum) => (Country DataFrame)
	"""
	attribute_converter = {}
	country_converter = {}

	def __init__(self):
		#Load Our Dataset From CSV File
		austria_set = self.load_dataset(self.austria_file)
		belgium_set = self.load_dataset(self.belgium_file)
		germany_set = self.load_dataset(self.germany_file)

		#Store Our Dataset In Dict With Enums
		self.country_converter[Country.Austria] = austria_set
		self.country_converter[Country.Belgium] = belgium_set
		self.country_converter[Country.Germany] = germany_set

		#Store Our Attributes in Dict
		attributes = austria_set.columns
		for att_id, att_str in zip(Attribute, attributes):
			self.attribute_converter[str(att_id)] = att_str

	#Load In A Dataset From CSV File
	def load_dataset(self, f):
		return pd.read_csv("./datasets/{}".format(f))

	#Get A Saved Dataset
	def get_set(self, country):
		return self.country_converter[country]

	#Get Attribute Data For A Particular Country
	def get_attribute(self, country, attribute):
		att_str = self.attribute_converter[str(attribute)]
		return self.get_set(country)[att_str]

## The Transformer
I've named the next class like so due to the manipulations it performs on the data. This class will perform three very important steps for us:
* Smoothing (Rolling Average)
* Resampling
* Differencing

These will be used in various amounts in our visualisations in order to present the clearest picture of what the data is telling us rather than the clearest picture of the original data.

In [582]:
class Transformer:

	def __init__(self):
		return

	#Combines Two Series/Attributes Into One Dataframe
	def _combine(self, A, B, name_A, name_B):
		#Combine Name With Series
		df = { 
		name_A : A, 
		name_B : B
		}

		#Concatenate These Series Into Dataset
		return  pd.concat(df,axis=1)

	#Performs Differencing
	def get_difference(self, y, periods):
		return y.diff(periods=periods)

	#Performs Smoothing
	def get_rolling_mean(self, dates, y, windows):
		#Combine Our Dates & Target Series
		rolling_df = self._combine(dates, y, 'date', 'target')

		#Create Rolling Mean On Target Attribute
		rolling_mean = rolling_df['target'].rolling(
			window=windows).mean()
		return dates, rolling_mean

	#Performs Resampling
	def get_resample(self, dates, y, rule):
		#Combine Our Dates & Target Series
		df = self._combine(dates, y, 'date', 'target')

		#Convert to correct format and set as string
		df['date'] = pd.to_datetime(df.date, format='%Y-%m-%d')
		df = df.set_index('date')

		#Resample our data
		resample = df.target.resample(rule).mean()
		return resample.index, resample.values

## Graph Class & Visualisations
The crux of our visualisations will lie with plotly and its excellent plotting library. We can see the structure of this part of our mobility suite below.

* Graph Parent Class:
    * Visualisation 1: Universal Line Plot 
    * Visualisation 2: Density Plot W/ Rolling Mean
    * Visualisation 3: Resampled Bar W/ Differencing
    
When we put this all together, we aim to have all three plots as subplots in a figure. This will form a very elegant and informative picture of any given attribute in any given Country. 

Firstly, let's set up a very basic graph parent class. This will serve to create a transformer for use by any of our plots.

In [583]:
class Graph(object):
	transformer = Transformer()

	def __init__(self):
		return

Not the most complex code we have seen. However, it's good to create a backbone for our more specific plotting classes. 

### Universal Line Plot
This line plot serves to show us our original data. There will be no transformations applied to this data as we want to keep it completely in line (excuse the pun..) with its original form.

At this point I must mention a very important point that will be relevant for every graph that our suite will create. These graphs are in the context of the pandemic that has swept our world, and thus we see huge declines in most if not all of the attributes. This means that if we plotted the data exactly as it is, all of our graphs would move underneath the x axis and look a bit, well, upside down.

To reconcile this, I've chosen to make each graph represent the decline in an attribute rather than the increase. Every data point will be multiplied by negative 1, thus higher numbers will mean greater decline. I believe this serves to improve the viewers understanding by not throwing them off with graphs that look strange. 

We can see below our code for this Line Plot, which inherits from our previous Graph class.

In [584]:
import plotly.graph_objects as go

class LinePlot(Graph):
	def plot(self, dates, target):
		#Represent The Decline In An Attribute
		decrease_target = np.multiply(target,-1)

		#Create Line Plot
		return go.Scatter(x=dates,y=decrease_target, showlegend=False)

### Density Plot
Our density plot will be more complex than our previous plot. It will apply smoothing to the data by using a rolling mean. A rolling mean will separate our data into windows and calculate the mean along these windows to represent any given date. The larger our window, the greater the effort needed to change the data from any given period to the next. This will give us a better sense of the significance of change, rather than the confusion that noise and outliers often cause in the original data.

In [585]:
class DensityPlot(Graph):
	def plot(self, dates, target, windows):
		#Represent Decline 
		target = np.multiply(target,-1)

		#Retrieve Rolling Mean
		dates, roll_mean = self.transformer.get_rolling_mean(dates, 
			target, windows)

		#Ensure Density Is Filled In
		fill = 'tozeroy'
		return go.Scatter(x=dates,y=roll_mean, fill=fill,showlegend=False)

### Resampled Bar W/ Differencing
Differencing is the crux of why this graph will be so useful. Given a particular time span, it will tell us the difference from time period A to time period B. This should show us when the biggest falls due to the pandemic were and the biggest climbs back up. Our data will be flipped again, so that the bigger the fall the higher the value. 

Additionally, we will apply resampling to this data so that we can have the average for each month rather than working with each day. We will have 8 bins for the 8 months in our data. This is much better than plotting each individual day, and will give us a broader perspective. We will call this class a Resampled Bar.

In [586]:
class ResampledBar(Graph):
	def plot(self, country, attribute, rule):
		#Flip The Values
		attribute = np.multiply(attribute,-1)

		#Apply Resampling
		dates, target = self.transformer.get_resample(country, 
			attribute, rule)

		#Create The Bar Graph
		return go.Bar(x=dates, y=target,showlegend=False)

## The Mobility Suite
We have now created all the individual elements of our mobility suite. The work we have done thus far really pays off here, as we will be able to easily create new graphs for various Countries and attributes.

The **plot** function will really carry the weight of our visualisations and will relieve us of worrying about the programming details when analysing the graphs. Our interface for working with the data and visualising it has been completed!

In [587]:
from plotly.subplots import make_subplots
import numpy as np

class MobilitySuite:

	#Number Of Rows/Cols Of Subplots
	subplot_rows = None
	subplot_cols = None

	figure = None

	#Mobility Manager Created
	data_manager = MobilityManager()

	def __init__(self, rows, cols, graph_prefs):
        
        #Set Our Class Variables
		self.subplot_rows = rows
		self.subplot_cols = cols
        
		#Create Our Graph Preferences
		self.setup_graph_prefs(rows, cols, graph_prefs)

	"""
	This plotting function below is the powerhouse of our 
	suite. It combines everything we have worked on thus
	far into one function.
	"""
	def plot(self, country, attribute, density_windows, resampling_bar_rule,
            subplot_prefs):
		dates = self.data_manager.get_attribute(country, Attribute.Date)
        
		target = self.data_manager.get_attribute(country, attribute)
        
		#Create A Line Graph
		graph_line = self.get_plot_line(dates, target)

		#Create A Density Graph
		graph_density = self.get_plot_density(dates, 
			target, density_windows)

		#Create A Bar Graph
		graph_bar = self.get_plot_resampled_bar(dates,
			target, resampling_bar_rule)
        
		#Add These Plots As Subplots
		self.add_plots([graph_line, graph_density, graph_bar], subplot_prefs)

    #Call To Lineplot Class
	def get_plot_line(self, dates, target):
		graph_line = LinePlot()
		return graph_line.plot(dates, target)

    #Call To Density Plot Class
	def get_plot_density(self, dates, target, windows):
		graph_density = DensityPlot()
		return graph_density.plot(dates, target, windows)

    #Call To Resampled Bar Class
	def get_plot_resampled_bar(self, dates, target, rule):
		graph_resampled_bar = ResampledBar()
		return graph_resampled_bar.plot(dates, target, rule)

	#Add A List Of Subplots
	def add_plots(self, plots, prefs):
		for i in range(0, self.subplot_rows):
			nxt_plot = plots[i]

			self.figure.add_trace(nxt_plot, 
				row=i+1, col=1)
            
			xaxis_ttl = prefs[i]["xaxis"]
			yaxis_ttl = prefs[i]["yaxis"]
			self.figure.update_xaxes(title_text=xaxis_ttl, row=i+1,col=1)
			self.figure.update_yaxes(title_text=yaxis_ttl, row=i+1,col=1)

	def setup_graph_prefs(self, rows, cols, graph_prefs):     
		line_ttl = graph_prefs["line_title"]
		density_ttl = graph_prefs["density_title"]
		bar_ttl = graph_prefs["bar_title"]
		graph_height = graph_prefs["height"]
		graph_width = graph_prefs["width"]
		graph_title = graph_prefs["title"]
        
		self.figure = make_subplots(rows = rows, 
		cols = cols, subplot_titles=[line_ttl, density_ttl, bar_ttl])
		self.figure.update_layout(title=graph_title, height=graph_height, width=graph_width)


	#Show The Graph
	def show(self):
		self.figure.show()

# Characterisation & Visualisation
We have finally completed the quite large task of setting up a data visualisation system for our data. This system allows us to easily and safely work through our data and visualise it's characteristics.

We will analyse each country separately, moving from the western-most countries to the eastern-most. Let's have a look at a map and work from that!

![Belgium, Germany & Austria](./res/european_countries.png)

We can see that Belgium is the western-most country of the three and therefore we will begin with them. Let's jump into the data!

# Belgium 2020: A Mobility Analysis
In analysing a time series well, we must comment on and characterise a few aspects of the graph.
For each visualisation, I will comment on each of these characteristics in order to ensure we have covered all bases.

What are these characteristics you may ask? Well, here's a list of them:
* **Is there a trend?** On average do the data decrease or increase over time?
* **Is there seasonality?** Do we see regularly repeating patterns in different time periods?
* **Are there many outliers?**
* **Are there any abrupt changes** to the variance in the series?

Here's a refresher on the relevant mobility-related attributes for this part of the project:
* **Retail & Recreation**
* **Grocery & Pharmaceuticals**
* **Parks**
* **Transit**
* **Workplaces**
* **Residential**

## Belgium 2020: Retail & Recreation

We'll begin by setting up our mobility suite and plotting the retail and recreation data for Belgium in 2020. We can finally get a view of our hard work thus far!

In [595]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Belgium
attribute = Attribute.Retail_And_Rec
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Representation Of Decrease In Retail & Recreation 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, Attribute.Retail_And_Rec, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

We have lift off! We can see our universal line plot of the original data, our density plot displaying the rolling mean and lastly our bar chart with differencing and resampling applied.

#### Trend
Even in the original, noisy data we can see a very clear trend in this data. We can see a rapid percentage decrease in the amount of retail and recreational mobility starting approximately March 13th. Our original data is very useful here as we can see exact dates of changes. 

#### Seasonality
We can see weekly spikes and declines that represent how people live differently during the week versus at the weekend. This is normal and would occur without the coronavirus pandemic. This seasonality is filtered out in our rolling mean graph and doesn't exist in our differencing bar plot.

#### Outliers
There are multiple notable outliers that we can see from the originla data. A particularly noticeable data point is August 15th, where we see a rapid percentage decline and then a quick restoration. Without this overview, March 14th would absolutely appear as an outlier except that it is merely the beginning of the decline in retail and recreation. 

#### Abrupt Changes
We see many abrupt changes to variance due to the coronavirus pandemic. Due to the pandemic's effect on the graph we see some large deviations from the mean. On a more local level we do see constant variance however our differencing chart makes it clear that there are many quite significant changes from month to month.

## Belgium 2020: Grocery & Pharma
We'll now examine the grocery and pharmaceutical mobility for Belgium in 2020. 

In [597]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Belgium
attribute = Attribute.Grocery_And_Pharma
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Representation Of Decrease In Grocery & Pharmaceutical 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

#### Trend
There is a clear trend in the months of March and April as we see a rapid percentage decrease in grocery and pharmaceutical mobility. This levels off however and we see less of a trend for the remainder of the year.

#### Seasonality
There is a clear lack of long term seasonality in these graphs. Once we account for the trends of the pandemic we can see month-by-month a relatively stable level of grocery and pharma mobility. We can however again see a weekly shift from weekday to weekend. People tend towards doing their grocery shopping at the weekend perhaps.

#### Outliers
There are some really interesting outliers here. On April 13th, May 1st, May 21st, June 1st, July 21st and August 15th we see quick spikes (i.e percentage declines) in the grocery and pharmaceutical mobility.

#### Abrupt Changes
In the second half of the year we see far less abrupt changes to the data. It is primarily in the months March/April that we see a huge impact on the grocery and pharma mobility data.

## Belgium 2020: Parks

We'll now visualise and characterise the park mobilit data for Belgium in 2020.

In [599]:
#Belgium 2020: Retail & Recreation
num_subplot_rows = 3
num_subplot_cols = 1

country = Country.Belgium
attribute = Attribute.Parks
rolling_mean_windows = 20
resampling_freq = 'M'

#Our Overall Graph Preferences
graph_prefs = {
    "title" : "Belgium: Representation Of Decrease In Park Mobility 2020",
    "line_title" : "Universal Line Plot (Original Data)",
    "density_title" : "Density Plot (Rolling Mean)",
    "bar_title" : "Bar Chart (Differencing & Resampling)",
    "width" : 1000,
    "height" : 1000
}

#Our Subplot-Specific Preferences
subplot_prefs = [
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Date",
        "yaxis" : "Decrease (%)"
    },
    {
        "xaxis" : "Month",
        "yaxis" : "Decrease (%)"
    }
]

#Create The Mobility Suite
suite = MobilitySuite(num_subplot_rows, num_subplot_cols, graph_prefs)

#Plot & Show Our Graphs
suite.plot(country, attribute, rolling_mean_windows, resampling_freq, subplot_prefs)
suite.show()

We can immediately see a few distinct differences of this graph to the previous ones. We will explore these differences further in the next section.

#### Trend
There is a small trend towards a decrease in the amount of people in parks in the early stages of the pandemic. However, the more interesting results are in seasonality.

#### Seasonality
We see a very good example of seasonality here during the months of July/August/September. This is likely due to the yearly increase in parks being used every summer due to the good weather.

#### Outliers
There are two significant outliers on June 1st and July 21st. We see a rapid increase in the amount of mobility for the dates of these two outliers.

#### Abrupt Changes
We notice that there are less abrupt changes than in previous graphs. The changes to park mobility are less of a trend from the pandemic and more as a result of seasonal changes in temperature. However, we do still see some smaller abrupt changes during the beginning of the pandemic.