<a href="https://colab.research.google.com/github/chantalskye/example/blob/master/CA_Vodafone_Masterclass_Final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Coder Academy - Vodafone Masterclass

This notebook has the following sections...

* Part 0: Problem introduction
* Part 1: Using this notebook
* Part 2: Playing with time series
* Part 3: Visualising time series data
* Part 4: Classifying time series data
* Part 5: Detecting changes to the network environment

You can pull up the **Table of contents** on the left hand-side in colab to skip around the notebook. We also **strongly** recommend going to your **Runtime->Change Runtime Type** and switching the **Hardware Accelerator** to **GPU**.

# Part 0: Problem introduction

---

Vodafone collects network data to be able gauge how well a network is performing towards expectation. By constantly analysing performance, Vodafone can be more **_proactive_** instead of **_reactive_** to respond to environmental changes that might affect their mobile network. 

<img src="https://d2bs8hqp6qvsw6.cloudfront.net/article/images/800x800/dimg/vodafone_24.jpg" width="500">


## Scenario introduction

Today we'll analyse the following scenario. Imagine a cell tower with two antennas, one that serves commuter traffic from a nearby train station, and another antenna that serves customers in an apartment building within close proximity to the cell tower.

---

<img src="https://github.com/CoderAcademyEdu/data_science_sc_student/blob/master/img/Problem_Scenario.png?raw=1" width="700">

---

There's a specific mobile traffic pattern that is distinct to an antenna that serves commuter vs. residential traffic. We can graph what's called a **time series** to look at the average daily traffic pattern for each of these antennas.

---
<img src="https://raw.githubusercontent.com/CoderAcademyEdu/data_science_sc_student/master/img/Problem_Scenario_w_Graphs.png" width="700">

---

The question becomes, if we can **teach a computer to recognise the _normal traffic pattern_ for each of these antennas, will a computer subsequently be able to detect when mobile traffic _differs_ from the normal pattern?** Consider the following scenario, where _new_ apartment buildings are built in between the railway station and the antenna serving commuters...

---

<img src="https://github.com/CoderAcademyEdu/data_science_sc_student/blob/master/img/Problem_Scenario_w_new_apt.png?raw=1" width="700">

---

This environmental change might affect the line of sight, or the direct path from an antenna to a customer. If Vodafone can **detect the effects of the environmental changes**, it can make the necessary cell tower adjustments and respond to these changes, continuing a seamless customer experience.

### Thought exercise

Take 10-15 minutes and work with the partner next to you and research how we can use data science to build a network monitoring system. A few questions that might be relevant...

* What is a time series?
* How can a computer learn about patterns within a time series?

# Part 1: Using this notebook
---

## What is Python?

Python is an _interpretive_ programming language invented in the 1980s. It's actually named after Monty Python and Holy Grail. In this class we'll be using Python to build our machine learning algorithms. 

### Why learn Python?

Python has gained popularity because it has an easier syntax (rules to follow while coding) than many other programming languages. Python is very diverse in its applications which has led to its adoption in areas such as data science and web development.

All of the following companies actively use Python:

![Image](https://www.probytes.net/wp-content/uploads/2018/08/appl.png)

## How do I interact with this notebook?

A Jupyter Notebook is an interactive way to work with code in a web browser. Jupyter is a pseudo-acronym for three programming languages: Julia, python and (e)r. Notebooks provide a format to add instructions + code in one file, which is why we're using it!

We'll quickly do some practice to introduce you how to use this notebook. For a list of keyboard shortcuts you can take a look at [Max Melnick's](http://maxmelnick.com/2016/04/19/python-beginner-tips-and-tricks.html) beginner tips for Jupyter Notebook.

Here's a quick run down of some of the most basic commands to use:

- A cell with a **<span style="color:blue">blue</span>** background is in **Command Mode**. This will allow you to toggle up/down cells using the arrow keys. You can press enter/return on a cell in command mode to enter edit mode

- A cell with a **<span style="color:green">green</span>** background is in **Edit Mode**. This will allow you to change the content of cells. You can press the escape key on a cell in command mode to enter edit mode

- To run the contents of a cell, you can type:
  - `cmd + enter`, which will run the cotents of a cell and keep the cursor in place
  - `shift + enter`, which will run the contents of a cell, and move the cursor to the next cell (or create a new cell)

### Exercise

Edit the below by changing "Gretchen" to your own name by entering edit mode, and then running the cell using the directions above.

In [0]:
print("Hello, Gretchen!")

# Part 2: Playing with time series

---

## Opening time series data

Let's dive in by opening up a dataset that we'll use throughout the entire masterclass. This dataset represents simulated network loads across multiple antennas. To open up our dataset and manipulate it, we'll use the [pandas, or pan(el)-da(ta) + s](https://pandas.pydata.org/) library. We need to `import` the pandas library into our Python session to be able to use it.

Run the following code (using `shift + enter`) to

* import pandas
* upload our dataset
* print out some information about the data including...
  * The number of rows
  * The **data types** (do columns contain numbers or words?) within each column, as well as the column names
  * The first five rows of data

In [0]:
# Import pandas
import pandas as pd

# Read the data
timeseries_data = pd.read_csv('https://raw.githubusercontent.com/CoderAcademyEdu/data_science_sc_student/master/data/traffic_workshop_data.csv')

# Print the shape of the data
print('This dataset has %d rows and %d columns \n' % (timeseries_data.shape[0], timeseries_data.shape[1]))

# Data types 
print('Column names and data types')
print(timeseries_data.dtypes)

# Print out the first five rows
timeseries_data.head()

Each row in the data represents a captured network load, from a specific antenna, at a specific time. There are four columns within the data. We can summarise them here...

| Column name | Description |
| ----------- | ----------- |
| ANTENNA | The ID for a specific antenna. In this dataset, we have two antennas, labeled with ID's `0` and `1` |
| SERVICE | The type of service the antenna is intended to cover. In our data, antenna `0` is expected to cover commuter service, representing the <span style="color:green;">**green**</span> antenna, while the service for antenna `1` is residential, representing the <span style="color:red;">**red**</span> antenna|
| TIMESTAMP | The date and time when the specific load was recorded |
| LOAD | The current **number of users** of the mobile network at the specific timestamp |

Because this is timeseries data, it might be helpful to also learn a little bit about _when_ recordings took place. For instance...

* Over what time period do we have data for each cell?
* On average, how many records were recorded per cell, per day?
* Do these records typically span the entire day, or only certain sections of the day?

To do this, it will be easier to convert the `TIMESTAMP` column to a `datetime` object, and then we can extract specific details about the time each record occurred. We can use the [`pd.to_datetime()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) function to do this extraction.

Run the code below to convert the timestamp to a datetime object, and then extract specific information around the timestamp.

**Side note:** If you're interested in why we convert the `TIMESTAMP` to a `datetime` object, it's because it will allow us to use a bunch of internal Python capabilities and manipulations of dates. It's like someone giving you a PDF instead of a word document...it's a lot easier to change the text in a word document than it is a PDF.

In [0]:
# Convert to datetime
timeseries_data['TIMESTAMP'] = pd.to_datetime(timeseries_data['TIMESTAMP'])

# Extract hours month, day of year, hour, day of year, weeknumber, year 
timeseries_data['MONTH'] = [d.month for d in timeseries_data['TIMESTAMP']]
timeseries_data['HOUR'] = [d.hour for d in timeseries_data['TIMESTAMP']]
timeseries_data['DAY'] = [d.dayofyear for d in timeseries_data['TIMESTAMP']]
timeseries_data['DAYOFWEEK'] = [d.dayofweek for d in timeseries_data['TIMESTAMP']]
timeseries_data['WEEKNUMBER'] = [d.isocalendar()[1] for d in timeseries_data['TIMESTAMP']]
timeseries_data['YEAR'] = [d.year for d in timeseries_data['TIMESTAMP']]

Now we can extract the metrics listed above and learn more about our dataset. You'll see a lot of usage of the [pandas groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) function, which allows us to summarise data by different categorical variables.

In [0]:
# Extract min and max date per cell
print('********************************************************************')
print('Min/max per antenna:\n')
print(timeseries_data.groupby(['ANTENNA'])['TIMESTAMP'].agg(['min', 'max']))
print('********************************************************************')

print('Average number of daily recordings:\n')
print(
    timeseries_data.groupby(['ANTENNA', 'DAY'], as_index=False)['LOAD'].count().groupby(['ANTENNA'])['LOAD'].mean()
)
print('********************************************************************')

print('Count of recorded time points:\n')
print(
    timeseries_data.groupby(['ANTENNA', 'HOUR'], as_index=False)['LOAD'].count()
)
print('********************************************************************')

It might be nice to also calculate the minimum, maximum, median and mean network load. Here's an example of calculating the minimum and maximum load per antenna.

In [0]:
# Calculate the minimum and maxmum load per cell
timeseries_data.groupby(['ANTENNA'])['LOAD'].agg(['min', 'max'])

### Exercise

Your turn to code!! (I know...scary thought). Your job is to **change the code cell below** such that instead of the `'min'` and `'max'` network load, we calculate...

* the `'median'` network load and
* the `'mean'` network load per antenna

In [0]:
# CHANGE CODE BELOW to calculate the mean/median network load on each cell
timeseries_data.groupby(['ANTENNA'])['LOAD'].agg(['min', 'max'])

**What do all of these statistics tell us about commuter mobile traffic (`antenna 0`) vs. residential mobile traffic (`antenna 1`)?**

# Part 3: Visualising time series data

---

Now we have some basic idea about what makes commuter traffic differ from residential traffic. Let's now **visualise** our time series to really get into the details about how the network load differs for each antenna.

There are two main libraries we'll use for data visualisation that we need to import. These are the...

* [matplotlib](https://matplotlib.org/) library, which is the main Python plotting library
* [seaborn](https://seaborn.pydata.org/index.html), which is a library built on top of matplotlib, specifically for statistics visualisation

Let's `import` these libraries.

In [0]:
# Imports
import matplotlib.pyplot as plt
import seaborn as sns
# allows the figures to be rendered within the notebook
%matplotlib inline 

Let's try and make a graph of our two datasets, using the [sns.lineplot](https://seaborn.pydata.org/generated/seaborn.lineplot.html) function. We'll graph the `TIMESTAMP` on the x-axis, and the `LOAD` on the y-axis. Meaning, we'll see how the network load changes over time. We'll also separate out the commercial traffic from residential.

In [0]:
# Make a figure
plt.figure(figsize=(15, 7))

# Graph the lineplot
sns.lineplot(
    x='TIMESTAMP',
    y='LOAD',
    hue='ANTENNA',
    data=timeseries_data,
    palette=['green', 'red'],
    err_style="bars",
    ci="sd"
)
plt.title('Two year network load over entire data collected')
sns.despine()

This is **pretty ugly**...and doesn't really give us much information. Let's maybe visualise the data at more **micro-timescales** to see whether we can characterise different traffic types.

## Playing with time windows

> For our purposes, a **time window** is a section of a time series, that has a specific length. For example, a time window with length 22 time points, starting and including time point 3, would end at time point 24. A time window might also be defined by a length of time. For instance, we might take a time window to be a single day. **Question**: Would we always be guaranteed to have the same number of time points per day?

Let's play with our time windows a little bit. What we'll do is graph the following per antenna...

* The mean daily time series traffic with its standard deviation (more on this later)
* The mean weekly time series traffic

In [0]:
# Make a figure
fig = plt.figure(figsize=(15, 10))

# Graph the daily traffic
fig.add_subplot(2, 1, 1)

sns.lineplot(
    x='HOUR',
    y='LOAD',
    hue='ANTENNA',
    data=timeseries_data,
    palette=['green', 'red'],
    err_style="bars",
    ci="sd"
)
plt.title('Daily time series traffic')
plt.xticks(range(24))
sns.despine()

# Graph the weekly traffic
fig.add_subplot(2, 1, 2)
sns.lineplot(
    x='DAYOFWEEK',
    y='LOAD',
    hue='ANTENNA',
    data=timeseries_data,
    palette=['green', 'red'],
    err_style="bars",
    ci="sd"
)
plt.title('Weekly time series traffic')
sns.despine()

fig.subplots_adjust(hspace=.3)

### Thought exercise

Work with the person next to you to answer the following questions...

* What days/hours has the highest/lowest for each type of antenna? (Commuter/residential)
* What numbers indicate days that are weekdays/weekends?
* What days/hours is there the **most variation** in network load? (see the following **side-note**)

#### Side-note: Standard deviation

If you had trouble answering the last question, you can take a detour below (beware...it's a mathy one!). You might have noticed **little bars** on each of our time traffic patterns. Let's zoom in on one...

---

<img src="https://github.com/CoderAcademyEdu/data_science_sc_student/blob/master/img/Standard_Dev.png?raw=1" width="700">

---

The vertical bar represents **one standard devation** away from the mean of our dataset, specifically both one standard deviation **higher** and **lower** from our mean. Even better...

> If our data is **normally distributed**, then we know that 68% of our data falls within one standard deviation from the mean.

Ok...back to english. This means that if we take all the data that was **collected at 8AM** for our **antenna 0** data, then 68% of the **network load data points** should fall within the area defined by the bars. Thus, the standard deviation helps us say when hours have **a lot of varied traffic**, or **no variation**. Here's another visual to help...

---

<img src="https://github.com/CoderAcademyEdu/data_science_sc_student/blob/master/img/Standard_Dev_w_normal.png?raw=1" width="600">

---

**Why would this matter if we're trying to build algorithms to differentiate these traffic patterns?**
* If there's a lot of variation, would it be easier or harder to do this classification?

## Smoothing the curve to look at macro trends

Thus far, we have established weekly and daily trends, but there still might be interesting trends we find by looking at the _entire time period_ of data we have. For instance...

* Maybe usage is _increasing_ or _decreasing_ throughout the year
* Maybe there are specific time periods throughout a year that are unique to each types of traffic

Let's re-graph the curve we initially made.

In [0]:
# Make a figure
plt.figure(figsize=(15, 7))

# Graph the lineplot
sns.lineplot(
    x='TIMESTAMP',
    y='LOAD',
    hue='ANTENNA',
    data=timeseries_data,
    palette=['green', 'red'],
    err_style="bars",
    ci="sd"
)
plt.title('Two year network load over entire data collected')
sns.despine()

It's not super easy to get any relevant info out of this... The issue is, the small micro daily trends inhibit us from seeing the **macro trends** within the data. We call the process of trying to eliminate different trends that are not relevant to our current model view **smoothing**.

From [wikipedia](https://en.wikipedia.org/wiki/Smoothing)

> To **smooth** a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena

Here's a pretty simple example of using smoothing to show macro-trends in temperature over time. **Ignore the red arrow**.

<br>

![](https://dawn.cs.stanford.edu/assets/img/2017-08-07-asap/temp.png)

<br>

As you can see, we are eliminating the details to see the larger picture.

### Creating a smoothing function

Let's create a **filter**, or a function that **smooths** our curve. This function will smooth a curve using either a...

* `monthly`, `daily` or `weekly` average
* using either a `median` or `mean` filter

In [0]:
def curve_smoothing(
    data, time_interval='daily', filter_type='median', col='LOAD', 
    title='Network load over entire data collected'
):
    """
    Smooth then graph a curve.
    
    :param data: The data to smooth
    :param time_interval: either 'daily', 'weekly' or 'monthly' to smooth
    :param filter_type: either 'median' or 'mean', saying to take either the median or the mean filter
    :param col: the column we want to graph
    :param title: the title ofr the plot
    """
    # Copy the data
    data_copy = data.copy()
    data_copy['DAY_2'] = [d.day for d in data_copy.TIMESTAMP]
    
    # Extract daily
    if time_interval == 'monthly':
        data_copy['FILTER_TIMESTAMP'] = pd.to_datetime(
            data_copy['MONTH'].astype(str) + '-' + data_copy['YEAR'].astype(str)
        )
    # Extract monthly
    elif time_interval == 'daily':
        data_copy['FILTER_TIMESTAMP'] = pd.to_datetime(
            data_copy['MONTH'].astype(str) + '-' + data_copy['DAY_2'].astype(str) + '-'
            + data_copy['YEAR'].astype(str)
        )
    # Extract rolling
    elif time_interval == 'weekly':
        # Get the 0 and the week numbers
        week_start = data_copy.loc[
            data_copy['DAYOFWEEK'] == 0, ['TIMESTAMP', 'WEEKNUMBER', 'YEAR']
        ].drop_duplicates()
        # Rename
        week_start.rename(columns={'TIMESTAMP': 'FILTER_TIMESTAMP'}, inplace=True)
        # Merge
        data_copy = pd.merge(left=data_copy, right=week_start, on=['WEEKNUMBER', 'YEAR'], how='left')        
    else:
        return None
        
    # Then groupby type of filter
    grouped_data = data_copy.groupby(['ANTENNA', 'FILTER_TIMESTAMP'], as_index=False)[col].agg(filter_type)
    
    # Graph
    plt.figure(figsize=(15, 7))
    # Graph the lineplot
    sns.lineplot(
        x='FILTER_TIMESTAMP',
        y=col,
        hue='ANTENNA',
        data=grouped_data,
        palette=['green', 'red'],
        err_style="bars",
        ci="sd"
    )
    plt.title('Network load over entire data collected')
    sns.despine()

> A **filter**, in this case, aggregates multiple data into a single data point. For example, if we apply a **mean** filter over each **day** in our dataset, we are aggregating all the data points in a single day by taking the average over these data points. We call a filtered curve **smoothed** if we are able to visualise a trend in our dataset post-filtering.

### Exercise

The following function will **run curve smoothing**. You can change the following parameters...

* change `time_interval` from `daily` to `weekly` to `monthly`
* change `filter_type` between `mean` and `median`

Checkout the resulting graph. What overall trends to do you notice within the data?

In [0]:
# CHANGE THE PARAMETERS BELOW
time_interval = 'monthly'
filter_type = 'mean'

curve_smoothing(timeseries_data, time_interval=time_interval, filter_type='median')

# Part 4: Classifying time series data

---

Now that we have an idea about what our network data looks like, and what differentiates an antenna that picks up commuter network traffic vs. residential traffic, let's work on a **classification** method to train a computer to recognise what a commuter vs. a residential antenna looks like. Once we train a computer to recognise these traffic patterns, maybe we can use the developed algorithm to report on network traffic changes.

So you might be asking, how do we do this? Let's take a little detour into **classification**.

### What is classification?

> **Classification** is an area of machine learning that tries to build **algorithms** that input a set of data, and output a class label. In our case, we want to build algorithms that **inputs a time series** and output whether the time series follows its normal pattern, or changes outside the norm.

If this is too complicated...this [example](https://www.youtube.com/watch?v=vIci3C4JkL0) might help.

<img src="https://d3ansictanv2wj.cloudfront.net/Figure_1-71076f8ac360d6a065cf19c6923310d2.jpg" width="500">

To do this, we can train what is called a **long short-term memory neural network**.

### What is a neural network?


A **neural network** is a type of machine learning algorithm that can be used to classify things. It was actually created to resemble how neurons in the brain connect with each other.

![](https://cdn-images-1.medium.com/max/1200/1*SJPacPhP4KDEB1AdhOFy_Q.png)


More complex _artificial_ neural network models look like the gif below.

![](https://thumbs.gfycat.com/DeadlyDeafeningAtlanticblackgoby-max-1mb.gif)

In the image, we're feeding the **image with the number 7** to the **input layer** of the network. The image has a certain amount of pixels which is fed into an **input** layer. The network has been trained to recognise the seven, and you can see there are certain nodes/edges activated in the **middle/hidden layers**, and **output layers** that are specifically activated when a seven is inputted in the network.

Since this is a network with input, hidden and output layers, we call this type of neural network a **deep neural network**, and this type of machine learning **deep learning**.

#### What does this have to do with our time series?

Instead of training a network to recognise an image, maybe we can **train a neural network** to recognise a **patterns** in our data that constitute **normal commuter or residential traffic** on an attenna. Once it realises these normal patterns, maybe we can train the same network to recognise an environmental change that might affect the network.

---

<img src="https://github.com/CoderAcademyEdu/data_science_sc_student/blob/master/img/ANN.png?raw=1" width="700">

---

## Training a basic network

Let's train a basic neural network to recognise our two traffic patterns. What we'll do is the following...

* We'll divide our dataset into two datasets. The first dataset will include the first year of our time series data, and the second dataset will use the second year of data
* We'll then _train_ our network on the first year of data, and then use the second year of data to _assess_ model performance

We also need to define a _time window_ in a variable called `window_width`...meaning, how many time points will we feed to our network? This translates into the _size of the input layer_ of the network. Really, we're trying to tell the network how much data is needed to distinguish the type of traffic.

Run the code below to create a function that will make a dataset with a given time window.

In [0]:
def shift_data(data, window_width, col):
    """
    Shift data with a given window width
    
    :param data: the dataset
    :param window_width: the window widths to shift by
    :param col: the column to shift
    
    :return data_copy: the data with shifted columns
    :return feature_cols: the names of the created columns
    """
    # Copy data
    data_copy = data.copy()
    data_copy.sort_values(['ANTENNA', 'TIMESTAMP'], inplace=True)
    data_copy.reset_index(drop=True, inplace=True)
    
    # Shift
    feature_cols = ['LOAD']
    all_df = []
    for a in data_copy.ANTENNA.unique():
        temp = data_copy.loc[data_copy['ANTENNA'] == a, :]
        for i in range(1, window_width):
            shifted_data = temp[col].shift(i)
            temp['LOAD_' + str(i)] = None
            if 'LOAD_' + str(i) not in feature_cols:
                feature_cols.append('LOAD_' + str(i))
            temp.loc[shifted_data.index, 'LOAD_' + str(i)] = shifted_data
        all_df.append(temp)
                
    return pd.concat(all_df).reset_index(drop=True).dropna(), feature_cols

Since antenna `0` and `1` had an average of 22 and 11 timepoints per day respectively, let's choose the larger of the two numbers as our time window (so we always capture at least one day of data per antenna). We'll come back to the idea of a time window later. Run the code below to prep the data. We'll also create the `train` and `test` datasets.

In [0]:
# Create shifts
timeseries_w_shift, feature_cols = shift_data(timeseries_data, window_width=22, col='LOAD')

# Now create dummy variables
timeseries_w_shift = pd.concat([timeseries_w_shift, pd.get_dummies(timeseries_w_shift[['SERVICE']])], axis=1)

# Split data in half"
train = timeseries_w_shift.loc[timeseries_w_shift['TIMESTAMP'] < '2017-06-30', :]
test = timeseries_w_shift.loc[timeseries_w_shift['TIMESTAMP'] >= '2017-06-30', :]

# Show result
timeseries_w_shift[feature_cols].head()

#### Side-note: train vs. test sets

Why did we split our data into two datasets, a `train` and a `test` set? Why do we not train and test model performance on the same dataset? Think about studying for a math test...if we practiced using just the problems we already have completed, we'd get really good at understanding those problems, but not necessarily be able to understand new information.

In machine learning, we do not want computers to just understand the data we have at hand, we want to see how it will predict new data it has not been exposed to you. Thus, test sets are held out of model training, and are used to simulate what it's like to expose our algorithms to new information.

#### Another side-note...dummy variables

You may have noticed we used the [pd.get_dummies](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html) function in the code above. Dummy variables are binary variables created out of categorical variables. What we did was take the first column on the left-hand side, and turn it into the two columns on the right-hand side...

| SERVICE | --------------- | SERVICE_commuter | SERVICE_residential |
| - | - | - | - |
| commuter | - | 1 | 0 | 
| residential | - | 0 | 1 |
| residential | - | 0 | 1 |
| commuter | - | 1 | 0 |
| ... | - | ... | ... | 

This is because computers do not like **words (or strings)**, but do like numbers. Also, creating **two columns** will allow us to train a neural network that can recognise whether a traffic pattern is more like a commuter _or_ residential load.

#### One last side-note...promise!!

We actually trained a network with **two different nodes** in the output layer. One node gives the _probability_ that a signal is carrying commuter traffic, and the other node gives the proability that a signal is carrying residential traffic.

We could have trained just one node that distinguishes either commuter or residential traffic. High probabilities would designate one type of traffic, and low probabilities would designate another type of traffic. 

So...why do you think we made two output nodes?

### Network training

Let's actually train the network...the code below will train a network based upon a given window_width. To train the network, we'll use the [Keras](https://keras.io/) library. Keras is a library, often powered by [tensorflow](https://www.tensorflow.org/) (Google's main neural network engine), that can be used to create and train neural networks.

In [0]:
# Import libraries
from keras.models import Sequential
from keras.layers import Dense

# Create function
def create_ann(data, window_width):
    """
    Build a neural network using a given dataset and window_width
    
    :param data: the dataset
    :param window_width: the window width
    
    :return model: return a fully trained networked
    """
    # Build the network
    model = Sequential()
    model.add(Dense(units=window_width, activation='relu'))
    model.add(Dense(units=10, activation='relu'))
    model.add(Dense(units=2, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    # Train the network
    model.fit(
        x=data[feature_cols].values, 
        y=data[['SERVICE_commuter', 'SERVICE_residential']].values,
        epochs=20,
        batch_size=round(timeseries_w_shift.shape[0] / 10),
        validation_split=0.05, 
        verbose=0,
        class_weight='balanced'
    )
    
    return model

model = create_ann(train, window_width=22)

Let's see how well our network performs. We can do this by predicting the _type_ of network using _just the second year of data_ that we did not use for model training. Ideally, all of the `antenna 0` data would be marked as a `commuter`, and all of the `antenna 1 data` would be marked as `residential`. 

We'll graph our results as well, and use the [sklearn roc_auc score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html). Ideally, this score should be a "1" if our model is doing well. If the score is 0.5, the model is not really able to differentiate a commuter/residential signal. You can read more about ROC AUC scores [here](https://medium.com/greyatom/lets-learn-about-auc-roc-curve-4a94b4d88152).

In [0]:
# Import roc AUC
from sklearn.metrics import roc_auc_score

# Predict results
y_pred_comm = pd.DataFrame(
    model.predict(test[feature_cols].values), columns=['Commuter Prediction', 'Residential Prediction']
)
y_pred_comm.index = test.index

# Get AUC score
auc = roc_auc_score(test[['SERVICE_commuter', 'SERVICE_residential']].values, y_pred_comm.values)
print('AUC Score: %.2f' % auc)

# Add to the DataFrame
test_copy = test.copy()
test_copy = pd.concat([test_copy, y_pred_comm], axis=1, sort=False)

curve_smoothing(
    test_copy, time_interval='weekly', filter_type='mean', col='Commuter Prediction', 
    title='Commuter prediction on test data'
)
curve_smoothing(
    test_copy, time_interval='weekly', filter_type='mean', col='Residential Prediction',
    title='Residential prediction on test data'
)

This model is...not doing too well. Ideally, we'd want all the `ANTENNA = 1` lines to be at `1` for the commuter prediction, and `0` for the residential prediction. We'd want the opposite for the `ANTENNA = 0` lines.

Let's dive into the concept of **window widths** a little bit more, which might solve this problem.

### Window widths

Currently, we've given our model our model a `window_width = 22`, meaning, the model is given multiple chunks **22 time points** to learn about whether a given time series is a commuter or residential time series. Is this the correct amount of data we want to give our model?

---

<img src="https://github.com/CoderAcademyEdu/data_science_sc_student/blob/master/img/Window_Widths.png?raw=1" width="700">

---

As you can see, a **larger width** will give more of our data at once to a model to train.

### Exercise

Maybe we're not giving enough data to the input layer of our model.

Adjust the window widths using the `window_width` parameter, and run the below code to retrain and plot a model. Does our model training improve?

In [0]:
# CHANGE WINDOW WIDTH HERE
window_width = 100

# Create shifts
timeseries_w_shift, feature_cols = shift_data(timeseries_data, window_width=window_width, col='LOAD')

# Now create dummy variables
timeseries_w_shift = pd.concat([timeseries_w_shift, pd.get_dummies(timeseries_w_shift[['SERVICE']])], axis=1)

# Split data in half
train = timeseries_w_shift.loc[timeseries_w_shift['TIMESTAMP'] < '2017-06-30', :]
test = timeseries_w_shift.loc[timeseries_w_shift['TIMESTAMP'] >= '2017-06-30', :]

# Train model
model = create_ann(train, window_width=window_width)

# Predict results
y_pred_comm = pd.DataFrame(
    model.predict(test[feature_cols].values), columns=['Commuter Prediction', 'Residential Prediction']
)
y_pred_comm.index = test.index

# Get AUC score
auc = roc_auc_score(test[['SERVICE_commuter', 'SERVICE_residential']].values, y_pred_comm.values)
print('AUC Score: %.2f' % auc)

# Add to the DataFrame
test_copy = test.copy()
test_copy = pd.concat([test_copy, y_pred_comm], axis=1, sort=False)

curve_smoothing(
    test_copy, time_interval='weekly', filter_type='mean', col='Commuter Prediction', 
    title='Commuter prediction on test data'
)
curve_smoothing(
    test_copy, time_interval='weekly', filter_type='mean', col='Residential Prediction',
    title='Residential prediction on test data'
)

## Building memory into a network

The main issue with using neural networks for this problem is that every **time window is treated as a new example** in the network. There's absolutely no linkage in our current network for signals that follow each other within the same antenna if they do not exist within the same time window. There is a way to train a computer to connect the dots between two different, but following time windows, and allow a computer to **remember features** that make a time series distinct. We'll need to teach a computer to **remember** linkages between different time windows.

### What is memory to a computer?

Let's think about how you read a sentence. Pretend I have the following sentence, and it's your job to fill in the blank...

> I promised to feed Linda's cat for the week. I had to run across town yesterday to buy food for **____** so I could go over to Linda's house this afternoon to feed it.

Let's think about how your brain fills in the blank. There's a few key words, prior to the blank, that allows you to infer the missing word, highlighted below.

> I promised to **feed** Linda's **cat** for the week. I had to run across town yesterday to **buy food** for **____** so I could go over to Linda's house this afternoon to feed it.

Now pretend we had a neural network that was trying to predict the blank word, and it was only given the fixed amount of words prior to the blank. Here's the words that the network would be given based upon given window widths.

| Window width | Word features |
| ------------ | ------------- |
| 3 | "buy", "food", "for" |
| 10 | "had", "to", "run", "across", "town", "yesterday", "to", "buy", "food", "for" |
| 17 | "feed", "Linda's", "cat", "for", "the", "week", "I", "had", "to", "run", "across", "town", "yesterday", "to", "buy", "food", "for" |

So...you might conclude that we should have about 17 words as our window width. The issue is, like we **trained** on one dataset and **tested** on another dataset above, we are not going to deploy the algorithm to always fill in the blank for the same sentence (that would be useless...). We might want our algorithm to fill in the blank within the following setences instead.

* **Sentence 1:** Dave has a cat named Alfred. He often goes to the pet-store once a month to buy food for his ___.
* **Sentence 2:** My sister often watches the neighbours' two children and their cat, usually about once a month. The last time she babysat, she was extremely grumpy, as there was no food for the cat, and thus she had to run home and borrow mum's car to buy ___ food.

The window width for sentence 1 would be about 17, but the width for sentence 2 would be _much_ longer. Imagine if we had an entire story that our computer was ingesting information from, and the main character's cat was mentioned early in the story, but not brought up again until 100 pages later!!

**Side note:** We're treating sentences like time series in the examples above. This is actually what's done in the field of [natural language processing](https://en.wikipedia.org/wiki/Natural_language_processing). There a really [cool article](https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html) from the NY Times that talks about how modeling a sentence as a time series, and then using a neural network, changed the face of Google's Translate algorithm.

So...a logical answer to this problem would be to make the time window in our network really big, so that we cover all these potential cases. Let's see what happens when we fit our network with a really big time window.. We'll time different training times with varying window widths using the [time](https://docs.python.org/3/library/time.html) library.

In [0]:
import time

# Window width of 22
start = time.time()
window_width = 22
timeseries_w_shift, feature_cols = shift_data(timeseries_data, window_width=window_width, col='LOAD')
timeseries_w_shift = pd.concat([timeseries_w_shift, pd.get_dummies(timeseries_w_shift[['SERVICE']])], axis=1)
train = timeseries_w_shift.loc[timeseries_w_shift['TIMESTAMP'] < '2017-06-30', :]
test = timeseries_w_shift.loc[timeseries_w_shift['TIMESTAMP'] >= '2017-06-30', :]
model = create_ann(train, window_width=window_width)
y_pred_comm = pd.DataFrame(
    model.predict(test[feature_cols].values), columns=['Commuter Prediction', 'Residential Prediction']
)
y_pred_comm.index = test.index
auc = roc_auc_score(test[['SERVICE_commuter', 'SERVICE_residential']].values, y_pred_comm.values)
end = time.time()
print('AUC Score: %.2f' % auc)
print('Total training time was %.2f seconds\n' % (end - start))

# Window width of 100
start = time.time()
window_width = 100
timeseries_w_shift, feature_cols = shift_data(timeseries_data, window_width=window_width, col='LOAD')
timeseries_w_shift = pd.concat([timeseries_w_shift, pd.get_dummies(timeseries_w_shift[['SERVICE']])], axis=1)
train = timeseries_w_shift.loc[timeseries_w_shift['TIMESTAMP'] < '2017-06-30', :]
test = timeseries_w_shift.loc[timeseries_w_shift['TIMESTAMP'] >= '2017-06-30', :]
model = create_ann(train, window_width=window_width)
y_pred_comm = pd.DataFrame(
    model.predict(test[feature_cols].values), columns=['Commuter Prediction', 'Residential Prediction']
)
y_pred_comm.index = test.index
auc = roc_auc_score(test[['SERVICE_commuter', 'SERVICE_residential']].values, y_pred_comm.values)
end = time.time()
print('AUC Score: %.2f' % auc)
print('Total training time was %.2f seconds\n' % (end - start))

So, tldr; the window width _increases_ the training time dramatically, and _does not help the accuracy of the model!_ So...is there a way to...

* Have a computer somehow link smaller/bigger window widths together?
* Even better, is it possible for a computer to somehow **dynamically piece together the relevant information of various time windows??**
* ...and also a shorter runtime would be lovely

## Long short-term memory (LSTM) networks

There is a way to do this, using the concept of a **long short-term memory, or LSTM** network. The LSTM network is a type of **Recurrent Neural Network (RNN)**. RNN's are a special network architecture that adds information about a previous training example output into the next training example.

![](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png)

Imagine the box on the left is a first time point with a specific window width, the center box is the next time point with the consecutive window, and so-on and so-forth. What we can do is train the network on one time point, and then _pass_ its information into training the network on the following time point.

The main issue with basic RNN's is that they do well at passing on information between time points within close proximity to each other, but not time points that are **far apart**. LSTM's fix this by carrying forward more information about the previous network, and then selectively deciding what to keep/forget gradually. Here's a picture of the inside of an LSTM.

![](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png)

It's pretty complex, and I'm not going to go too into detail on the innerworkings of an LSTM, but [this post](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) does a great job of teaching LSTM's and how they can be advantageous. The main points are...

* LSTM's have information they pass between each "cell" box **A** in the network. This information is not just a previous output, but a "state" that gets updated with each subsequent example in time
* Within each subsequent cell, the LSTM's look at the current state of the cell and the new information, and then decide what to selectively change about the cell state
* LSTM's then output an answer for the current cell ($h_t$), and then pass this output and the updated cell state onto the next cell

### Building an LSTM

Let's actually build an LSTM. We'll need to import the right keras module, but then we can build a function to make the network.

#### Side-node: dropout

You will see in the LSTM building below that we add something called **Dropout**. Dropout helps us control overfitting. Overfit models work _really well_ on the data we used to train on model, but not other datasets.

In [0]:
from keras.layers import LSTM, Dropout

def create_lstm(data, data_labels, window_width, num_features):
    """
    Build an LSTM using a given sequence length and feature length
    
    :param data: the data for the model
    :param data_labels: the data labels for the model
    :param window_width: the amount of data to feed to each cell
    :param num_features: the number of features inputted per time point
    
    :return model: the model
    """
    # Build the network
    model = Sequential()
    model.add(LSTM(input_shape=(window_width, num_features), units=100, return_sequences=True))
    model.add(Dropout(0.2))

    model.add(LSTM(units=50, return_sequences=False))
    model.add(Dropout(0.2))

    model.add(Dense(units=2, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    # Fit
    model.fit(
        x=data,
        y=data_labels,
        epochs=10,
        batch_size=round(data.shape[0] / 10), 
        validation_split=0.05, 
        verbose=1
    )

    return model

We'll need to do some data prep for our LSTM to work. The LSTM takes in an array that is the following size...

`(num_data_samples, window_width, features_per_width)`

This is actually a **3D array**. Thank you to [this azure code on GitHub](https://github.com/Azure/lstms_for_predictive_maintenance/blob/master/Deep%20Learning%20Basics%20for%20Predictive%20Maintenance.ipynb) for the help with building these input arrays.

In [0]:
def gen_sequence(data, window_width, features):
    """
    Generate an input array based upon a dataframe, window_width, and columns
    
    :param data: the data
    :param window_width: the window width to use
    :param features: the features to use
    
    :return data_array[start:stop, :]: the array with the correct features in the right shape
    """
    data_array = data[features].values
    num_elements = data_array.shape[0]
    for start, stop in zip(range(0, num_elements - window_width), range(window_width, num_elements)):
        yield data_array[start:stop, :]
        
def gen_labels(data, window_width, label):
    """
    Generate labels for the input
    
    :param data: the data
    :param window_width: the window width to use
    :param label: the label column for output
    
    :return data_array[seq_length:num_elements]: the correct section of the data array
    """
    data_array = data[label].values
    num_elements = data_array.shape[0]
    return data_array[window_width:num_elements]

Let's now run the code below that will let us format inputs to the LSTM. We will use the [numpy](https://www.numpy.org/) library as well, which has many mathematical functions useful for data analysis.

In [0]:
# Import numpy
import numpy as np

def generate_train_test_sets(data, window_width, features, timestamp_cutoff='2017-06-30'):
    """
    Prep data for the LSTM
    
    :param data: the timeseries data
    :param window_width: the window width to use
    :param features: the features to use
    :param timestamp_cutoff: when to cut our features
    
    :return train_array: the training data
    :return train_label_array: the training labels
    :return test_array: the testing labels
    :return test: the test dataframe
    """

    # Sort data
    data.sort_values(['ANTENNA', 'TIMESTAMP'], inplace=True)

    # Split
    train = data.loc[data['TIMESTAMP'] < timestamp_cutoff, :]
    test = data.loc[data['TIMESTAMP'] >= timestamp_cutoff, :]
    
    # Set test
    if test.shape[0] == 0:
        test = train.copy()

    # Make data
    train_gen = (list(gen_sequence(train.loc[train['ANTENNA'] == a, :], window_width, features)) for a in [0, 1])
    train_array = np.concatenate(list(train_gen)).astype(np.float32)
    test_gen = (list(gen_sequence(test.loc[test['ANTENNA'] == a, :], window_width, features)) for a in [0, 1])
    test_array = np.concatenate(list(test_gen)).astype(np.float32)

    # Generate labels
    train_label_gen = [gen_labels(train.loc[train['ANTENNA'] == a, :], window_width, 'SERVICE') for a in [0, 1]]
    train_label_array = pd.get_dummies(np.concatenate(train_label_gen)).values
    
    return train_array, train_label_array, test_array, test

Now we can train our LSTM on the first year of data (as before) and then predict the output on the final year. Let's make a function to do this.

In [0]:
def run_lstm(data, window_width, features, timestamp_cutoff='2017-06-30'):
    """
    Run LSTM
    
    :param data: the timeseries data
    :param window_width: the window width to use
    :param features: the features to use
    :param timestamp_cutoff: when to cut our features
    
    :return model: the model used
    """
    # Generate data
    train_array, train_label_array, test_array, test = generate_train_test_sets(
        data, window_width, features, timestamp_cutoff
    )
    
    # Build model
    model = create_lstm(
        train_array, train_label_array, 
        window_width=test_array.shape[1], num_features=test_array.shape[2]
    )

    # Predict
    y_pred = pd.DataFrame(
        model.predict(test_array), columns=['Commuter Prediction', 'Residential Prediction']
    )

    # Add to the DataFrame
    test_copy = test.copy()
    test_copy['Commuter Prediction'] = None
    test_copy['Residential Prediction'] = None
    curr = 0
    indices = []
    for a in test_copy.ANTENNA.unique():
        temp = test_copy.loc[test_copy['ANTENNA'] == a, :].index.tolist()
        indices += temp[window_width:len(temp)]
    y_pred.index = indices
    test_copy.loc[y_pred.index, 'Commuter Prediction'] = y_pred['Commuter Prediction']
    test_copy.loc[y_pred.index, 'Residential Prediction'] = y_pred['Residential Prediction']
    test_copy['Commuter Prediction'] = test_copy['Commuter Prediction'].astype(float)
    test_copy['Residential Prediction'] = test_copy['Residential Prediction'].astype(float)
    test_copy.dropna(inplace=True)
    test_label_array = pd.get_dummies(test_copy['SERVICE'])
    
    # Get AUC score
    auc = roc_auc_score(
        test_label_array[['commuter', 'residential']].values, 
        test_copy[['Commuter Prediction', 'Residential Prediction']].values
    )
    print('\nAUC Score: %.2f\n' % auc)

    # Plot
    curve_smoothing(
        test_copy, time_interval='weekly', filter_type='mean', col='Commuter Prediction', 
        title='Commuter prediction on test data'
    )
    curve_smoothing(
        test_copy, time_interval='weekly', filter_type='mean', col='Residential Prediction',
        title='Residential prediction on test data'
    )

    # Return the model
    return model


Now let's run the model.

In [0]:
# Set parameters
window_width = 22
features = ['LOAD']

# Run
run_lstm(timeseries_data, window_width, features)

In [0]:
model = run_lstm(timeseries_data, window_width, features)

A lot better, as we're definitely starting to get the separation we want to between each classifier. Let's play around with a couple of things.

### Exercise

In the cell below, play around with the `window_width` and try to choose an optimal width for training your model. Use the graphs to decide if a specific window width is sufficient.

In [0]:
# Set parameters
window_width = 100
features = ['LOAD']

# Run
run_lstm(timeseries_data, window_width, features)

Let's run a final model. This time, we'll use _all of our data_ to train the model. We'll even include some extra features about the date.

In [0]:
# Set parameters
window_width = 22
features = ['LOAD', 'MONTH', 'HOUR', 'DAY', 'DAYOFWEEK', 'WEEKNUMBER', 'YEAR']
timestamp_cutoff = '2019-01-01'

# Run
end_model = run_lstm(timeseries_data, window_width, features, timestamp_cutoff)

# Final window width
final_window_width = window_width

# Part 5: Detecting changes to the network environment

---

Let's review what we've learned thus far...

1. We learned about time series, and different methods of visualising our specific dataset within specific time ranges. We also learned how we can apply different smoothing techniques which remove irrelevant microtrends within our dataset.
2. We developed a simple method that tried classify normal trends within a dataset, but it did not capture enought information about the relationships between subsequent parts of the same time series
3. We then created a more advanced classifier using the concept of **memory** in a neural network, that was able to recognise and characterise our two types of mobile network traffic data

And now for the true power of what we've done...**can we detect changes to network usage when the environment around a cell tower changes?**

Let's return back to environmental changes described in the picture below...

---

<img src="https://github.com/CoderAcademyEdu/data_science_sc_student/blob/master/img/Problem_Scenario_w_new_apt.png?raw=1" width="700">

---

What do you think happens to our commuter traffic pattern when an apartment complex begins to be built between the antenna and the train station the antenna serves? Let's find out, and see if we can catch when the captured network load changes.

First we'll need to import some data, and prep it.

In [0]:
# Upload new data
new_data = pd.read_csv('https://raw.githubusercontent.com/CoderAcademyEdu/data_science_sc_student/master/data/traffic_workshop_testing_data.csv')
new_data['TIMESTAMP'] = pd.to_datetime(new_data['TIMESTAMP'])
new_data['MONTH'] = [d.month for d in new_data['TIMESTAMP']]
new_data['HOUR'] = [d.hour for d in new_data['TIMESTAMP']]
new_data['DAY'] = [d.dayofyear for d in new_data['TIMESTAMP']]
new_data['DAYOFWEEK'] = [d.dayofweek for d in new_data['TIMESTAMP']]
new_data['WEEKNUMBER'] = [d.isocalendar()[1] for d in new_data['TIMESTAMP']]
new_data['YEAR'] = [d.year for d in new_data['TIMESTAMP']]

# Capture just the antenna
antenna_0 = new_data.loc[new_data['ANTENNA'] == 0, :].reset_index(drop=True).sort_values(by=['TIMESTAMP'])

The following code below will create a simulation of observing the traffic throughout a year as the apartment complex is built up.

In [0]:
%%capture
from matplotlib.animation import FuncAnimation
from IPython.display import HTML

# Plot data as traffic progresses throughout time
plot_width = 500

# First set up the figure, the axis, and the plot element we want to animate
fig, ax = plt.subplots(figsize=(15, 7))
ax.set_xlim((antenna_0.TIMESTAMP.min(), antenna_0.TIMESTAMP.max()))
ax.set_ylim((0, 15000))
ax.set_xlabel('TIMESTAMP')
ax.set_ylabel('LOAD')
ax.set_title('Network load over time')
line, = ax.plot([], [], lw=2)

# animation function. This is called sequentially
def animate(i):
    """
    Animate the network traffic simulation
    
    :param i: the chunk for the current plotting window
    """
    end = min(i * 100 + plot_width, antenna_0.shape[0] - 1)
    data_copy = antenna_0.loc[0:end, :].copy()
    data_copy['DAY_2'] = [d.day for d in data_copy.TIMESTAMP]
    data_copy['FILTER_TIMESTAMP'] = pd.to_datetime(
            data_copy['MONTH'].astype(str) + '-' + data_copy['DAY_2'].astype(str) + '-'
            + data_copy['YEAR'].astype(str)
    )
    grouped_data = data_copy.groupby(['FILTER_TIMESTAMP'], as_index=False)['LOAD'].agg('mean')
    line.set_data(grouped_data['FILTER_TIMESTAMP'], grouped_data['LOAD'])
    return (line,)


# call the animator. blit=True means only re-draw the parts that have changed.
anim = FuncAnimation(
    fig, animate,
    frames=round((antenna_0.shape[0] - plot_width) / 100), blit=True
)

### Thought exercise

Run the simulation below. Visually, at what time do you believe the apartment complex impacts the antenna?

In [0]:
HTML(anim.to_jshtml())

## Adding the prediction

Let's use our model to analyse the prediction. We'll need to build in a real-time prediction using the `end_model` we already built. The following function will build a data prep and prediction layer into our visualisation code.

In [0]:
%%capture

# Plot data as traffic changes throughout time
plot_width = 500

# First set up the figure, the axis, and the plot element we want to animate
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 8))
ax1.set_xlim((antenna_0.TIMESTAMP.min(), antenna_0.TIMESTAMP.max()))
ax1.set_ylim((0, 15000))
ax1.set_ylabel('LOAD')
ax1.set_title('Simulation of network monitoring')
ax2.set_ylabel('Likelihood of normal traffic')
ax2.set_xlabel('TIMESTAMP')

ax2.set_xlim((antenna_0.TIMESTAMP.min(), antenna_0.TIMESTAMP.max()))
ax2.set_ylim((-0.05, 1.05))

line1, = ax1.plot([], [], lw=2)
line2, = ax2.plot([], [], lw=2, color='r')

def animate_w_prediction(i):
    """
    Animate with predictions
    
    :param i: the chunk for the current plotting window
    """
    end = min(i * 100 + plot_width, antenna_0.shape[0] - 1)
    data_copy = antenna_0.loc[0:end, :].copy()
    
    # Prep data
    antenna_0_gen = (list(
        gen_sequence(
            data_copy.loc[data_copy['ANTENNA'] == a, :], final_window_width, features)
    ) for a in [0])
    antenna_0_array = np.concatenate(list(antenna_0_gen)).astype(np.float32)
    
    # Predict
    y_pred = pd.DataFrame(
        end_model.predict(antenna_0_array), columns=['Commuter Prediction', 'Residential Prediction']
    )
    y_pred['TIMESTAMP'] = data_copy.loc[final_window_width:end, 'TIMESTAMP'].copy()
    
    # Group data
    data_copy['DAY_2'] = [d.day for d in data_copy.TIMESTAMP]
    data_copy['FILTER_TIMESTAMP'] = pd.to_datetime(
            data_copy['MONTH'].astype(str) + '-' + data_copy['DAY_2'].astype(str) + '-'
            + data_copy['YEAR'].astype(str)
    )
    grouped_data = data_copy.groupby(['FILTER_TIMESTAMP'], as_index=False)['LOAD'].agg('mean')
    
    # Plot
    line1.set_data(grouped_data['FILTER_TIMESTAMP'], grouped_data['LOAD'])
    line2.set_data(y_pred['TIMESTAMP'], y_pred['Commuter Prediction'])
    
    return (line1, line2)

# call the animator. blit=True means only re-draw the parts that have changed.
anim_w_pred = FuncAnimation(
    fig, animate_w_prediction,
    frames=round((antenna_0.shape[0] - plot_width) / 100), blit=True
)

Finally, let's build our animation, and see how our algorithm would perform in-practice.

In [0]:
HTML(anim_w_pred.to_jshtml())

### Exercise

Let's put everything together! The following code has three steps that can be run subsequently to...

1. Train a model given a specific window width
2. Create the animation
3. Run the simulation

Play with adjusting the window width and seeing how it impacts the final simulation. **YOU WILL NEED TO RUN ALL THREE CELLS TO SEE THE ENTIRE EXERCISE RUN END-TO-END!**

**1. Run this cell** to train a model.

In [0]:
# Set parameters
window_width = 100
features = ['LOAD', 'MONTH', 'HOUR', 'DAY', 'DAYOFWEEK', 'WEEKNUMBER', 'YEAR']
timestamp_cutoff = '2019-01-01'

# Run
print('Training model...')
end_model = run_lstm(timeseries_data, window_width, features, timestamp_cutoff)
print('Model trained')

# Final window width
final_window_width = window_width

---

**2. Run this cell** to setup the animation

---

In [0]:
# call the animator. blit=True means only re-draw the parts that have changed.
anim_w_pred = FuncAnimation(
    fig, animate_w_prediction,
    frames=round((antenna_0.shape[0] - plot_width) / 100), blit=True
)

---
**3. Run this cell** to build the animation

---

In [0]:
# Animate
HTML(anim_w_pred.to_jshtml())

# Wrap-up

Thank you for attending our masterclass! We hope we _demystified_ a little bit of what actually occurs when you build a machine learning process. Big thank you to Martin Oliveiro, Monica Munoz Castillo and Jackie Archer from Vodafone for lending us their time.

If you would like to save your work for today, make sure to **save a copy of the notebook in Google Drive**.

## Survey

We would appreciate it if you could complete a quick feedback survey for tonight's Masterclass. You can find the survey here: http://bit.ly/ml_vodafone_survey

THANK YOU!!