# Enter `fbprophet` 🔮

[fbprophet](https://facebook.github.io/prophet/docs/quick_start.html) is a **Python library** developed by Facebook for time series predictions. Reminder, that **time series** is the field of data analytics and data science that deals with *sequentially recorded data* (such as monthly sales, daily temperature, daily stock prices...)

**Quick reminder about Jupyter notebooks** 

* Type inside the empty cells to write code. These empty cells will have a `In [ ]:` prefix before
* Press the `return/enter ⏎` key to add a new line inside the cell
* To display your results use the Python built in `print(STUFF_YOU_WANT_TO_PRINT)` method or simply put the stuff you want to print as the last line inside the cell. The result of the **last line** will appear as the `Out[]:` or the output of the cell :)
* Press `shift` + `return/enter ⏎` to run your code 🤓 this will run the code inside your currently selected cell and print anything inside `print()` method and the last line of your cell
* To add a new cell, select any cell and press the `b` key (make sure you are not just typing the letter `b` in the cell). This will add a new cell below
* To delete a cell, double press the `d` key (make sure you are not just typing the letter `d` in the cell)

**Try to run the cells below!**

In [None]:
1 + 1

In [None]:
print("Are you ready to do your first prediction?")
"Ready!" # last line of each cell is automatically displayed as Output, no need for print()

## Our first prediction - sales forecasting 🚗

As always, we start with importing the necessary Python libraries.

In [None]:
from fbprophet import Prophet
import pandas as pd
# import seaborn as sns
# import matplotlib.pyplot as plt

And importing our data. For this exercise, we will look at how Python and `fbprophet` can be used for **sales forecasting** - an important task in any business that wants to plan ahead.

Let's look at monthly sales of a *unnamed* car company 😉

In [None]:
df = pd.read_csv('monthly-car-sales.csv')
df.head()

### Understanding our data

Whenever we start with a new dataset, our first goal is to **explore** what's in it! 🔍

**Your turn! 🚀** Check how many rows and columns we have in the dataset.

In [None]:
# your code here

<details><summary>Solution</summary>
    
<pre>
df.shape
</pre>
    
</details>


**Your turn! 🚀** Let's see what are the **data types** of our columns

In [None]:
# your code here

<details><summary>Solution</summary>
    
<pre>
df.dtypes
</pre>
    
</details>


**Your turn! 🚀** Plot a **linechart** of the monthly sales using `df.plot()` function

In [None]:
# your code here

<details><summary>Solution</summary>
    
<pre>
df.plot(x='Month', y='Sales', figsize=(15,6)) # we change to figsize to make it bigger
</pre>
    
</details>


We can already start seeing some patterns, but let's see how we can use machines to make predictions for us!

### Preparing our data for Prophet

If we check the official [Prophet docs](https://facebook.github.io/prophet/docs/quick_start.html) we will see that there's a few conditions it asks of us:

1. Prophet needs the **datetime** column to be called `ds`
2. Prophet needs the **observations** column to be called `y`

First of all let's rename our columns correctly. That's as easy as just assigning a new list of columns, like this:

```python
df.columns = ['column1', 'column2']
```

In [None]:
# Assign the right column names here :)
df.columns = ['ds', 'y']

<details><summary>Solution</summary>
    
<pre>
df.columns = ['ds', 'y']
</pre>
    
</details>


You might have noticed that our `Month` column (now called `ds`) doesn't have the right data type - it's current type is `object`. That's because `pandas` by default only sees text (which it calls `object`) and numbers.

We can use the built-in [pandas.to_datetime()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) function to quickly change that! Notice that by itself, `pd.to_datetime()` function creates a *copy* of the column, so we need to save it back to the original columns, like so:

In [None]:
df['ds'] = pd.to_datetime(df['ds'])

Normally, we would also have to change our DataFrame to match the structure we saw in the slides like this one:

<img src="timeseriesdata.png" width=300/>

**But** the amazing `fbprophet` does that for us! Which means we are ready to start **Machine Learning** 🤖

### Creating the Prophet prediction model

All Machine Learning tasks consist of the same steps:

1. Creating the model
2. Training the model (that's what takes the most time)
3. Using the model to predict
4. Evaluating the performance of the model

So, what are we waiting for?

**1. Creating the model**

At the top of this notebook we imported the `Prophet` *class*. This class is the "factory" which can help us create Prophet models. This is how it looks like:

In [None]:
model = Prophet()

**2. Training the model**

We have a `model` ready, but it is not trained yet. That's our next step - we will give it our `df` DataFrame to *fit* an algorithm, which the machine will observe fits best all the historic observations we have.

In [None]:
model.fit(df)

**Boom 💥** our `model` has found a "logic" that best explains the historic sales data. Now the caveat of Machine Learning in general is that often that "logic" is a **black box** ◼️ The model finds a formula that works for it, but as people we don't have much visibility into it.

But we can leverage it for **predictions!** 🔮

### Forecasting and predictions

When we talk about doing predictions programmatically, we talk about two types of predictions:

1. **In-sample** - we remove part of our historic data, and predict it with the model, to see how well it does
2. **Out-of-sample** (future predictions) - after verifying that our model "makes sense", we use it to generate future predictions

**In-sample** - Let's take the last 12 months of our existing dataset, and see how close are our model's predictions to the real sales

We can slice our DataFrame not only by columns (ex. `df['column']`) but also by rows, like this:

In [None]:
df[100:102]

**Your turn! 🚀** So how do we get the **last 12 months** of the our DataFrame? 🤔



In [None]:
# your code here
sample = df[96:108]

<details><summary>Solution</summary>
    
<pre>
sample = df[96:108]
</pre>
    
</details>


We are now ready to create a **forecast** - a DataFrame with predictions from our Prophet model.

*P.S.* we want the result to be a DataFrame with one column - `ds` - not just one list of dates. So we select the needed column below with double-brackets, like `[['ds]]`

In [None]:
forecast = model.predict(sample[['ds']])
forecast.head()

Wow that's a lot of stuff 🤯 We will explore what is what in a minute, but first let's first **visualize** our predictions, which would be much more human.

In [None]:
model.plot(forecast);

The black dots in the chart above is the real data points. 

The dark blue line is the predicted value - it's a line chart between the `yhat` points that we saw in the `forecast` DataFrame above.

The light blue area is the **confidence interval** - typically a 95% confidence window of the model prediction.

**[Bonus 🥇]** We can also add a **line chart** for the historic data to make it more visual:

In [None]:
ax = model.plot(forecast).gca()
df.plot(ax=ax, x='ds', y='y', label='actual', color='g');

**[Bonus Bonus 💥]** Looks like our model is fitting quite well! But **how well?** 🤔

We have the real data (`y` column) from our `sample`, and we have the predicted `yhat` from `forecast`.  Let's calculate the difference!

In [None]:
difference = forecast['yhat'].values - sample['y'].values
print(difference.mean())
difference

From the above result we can see the **errors** - the difference between the predicted and actual car sales for the last 12 months.

We also calculated the **mean** - or the **average** - error. Considering our car sales those months are between 15,000 ~ 25,000 cars, an error of around 190 cars is already really good! 🥳

Remember that this was the **in-sample** prediction. Now let's move on to **future forecasting! 🕵️‍♂️**

### Future Forecasting - out-of-sample (new data) predictions

Luckily `fbprophet` comes with some amazing tools for predictions, including the [`make_future_dataframe()`](https://facebook.github.io/prophet/docs/non-daily_data.html) method which allows us to quickly create a set of future dates to use for prediction. And it automatically turns it into a `DataFrame` 🙌

In [None]:
future = model.make_future_dataframe(freq='MS', periods=12)
future

You'll see that what it does is **append** the new months to the existing data - very useful, since we want to connect the two for an uninterrupted chart.

**Your turn! 🚀** The prediction code is the same as before, only now we will be using the `future` DataFrame, not the `sample` any more.

In [None]:
# Your code here
future_forecast = model.predict(future)

<details><summary>Solution</summary>
    
<pre>
future_forecast = model.predict(future)
</pre>
    
</details>

**Your turn! 🚀** Let's go ahead and plot it same way as we did before with the `model`.

In [None]:
# Your code here
model.plot(future_forecast);

<details><summary>Solution</summary>
    
<pre>
model.plot(future_forecast);
</pre>
    
</details>

### Congrats! 👏 This is your first prediction on unseen data!

You can also see how the model looks for patterns in historic data, but without **overfitting** - you can see that some black dots are not covered by the blue area. Which is normal, because not every month of car sales follows the same pattern, we've all had better and worse months 😉

**Let's explore our results further**

### Seasonality and Trend

To understand our results better, we need to explore the **components** that influence our car sales. The main two are seasonality and trend:

- **Seasonality** is the waves in our data. Looking at those can help find patterns of highs and lows.
- **Trend** is the general direction of the data. This helps us see whether we're increasing or decreasing over time.

And again `fbprophet` has helpful functions ready to use - let's use the [`plot_components()`](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html) function on our `model`.

In [None]:
model.plot_components(future_forecast);

**[Bonus 🥇]** We can make our plots interactive with another Python library called [Plotly](https://plotly.com/python/):

In [None]:
from prophet.plot import plot_plotly, plot_components_plotly

plot_plotly(model, future_forecast)

### Model Evaluation

We can't jump to conclusions on our model and trust it blindly. Even though we good a low error (difference) when looking at existing data, we need to look at some other important **metrics** to confidently say that our model can help the company! 🏎️

Luckily for us, `fbprophet` comes with a handy [Diagnostics](https://facebook.github.io/prophet/docs/diagnostics.html) library built in.

Let's start by doing [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) - a technique in statistics, to run through the same process multiple times (instead of just one like we did above), and see how consistent are our results. Run the below two cells:

In [None]:
from prophet.diagnostics import cross_validation

In [None]:
df_cv = cross_validation(model, period='180 days', horizon='365 days')

As we saw, this took longer than a regular model training and prediction. That's because it ran the process **10 times**, each time taking a piece of the data as the "seen" data, and predicting the next period, as though it is "unseen", trying to predict up to one year ahead each time. Robots are tricky, eh? 🤖

In [None]:
df_cv

If you look inside the `df_cv` variable, you'll see that we're storing a bunch of data about the real car sales throughout the years (`y` column), the predicted sales (`yhat`) as well as the `cutoff`s - the time blocks used to simulate "unseen" data, so we can better understand our model performance.

You might notice already not everything is as rosy as it seemed 🥀 

But we did **all this in 1 hour**! Imagine what **you can do with more time!!! 🤩**

### Wrapping up - interpretation

To close our current exploration and prediction, let's visualize the `mae` - [**Mean Absolute Error**](https://en.wikipedia.org/wiki/Mean_absolute_erro). This is the average difference that we're getting between the real car sales and predicted ones.

Remember, when we looked at *in-sample* prediction, our average MAE was around 200 cars. Let's see how much harder *real* predictions get 🙈

In [None]:
from prophet.plot import plot_cross_validation_metric
fig = plot_cross_validation_metric(df_cv, metric='mae')

### Explaining these results to your boss 💹

Let's explore the chart above:
* Our horizontal (`x`) axis is how many days into the future we're predicting. As you can see, we tried to predict up to a maximum of 365 days (1 year). That's our **horizon**
* Our vertical (`y`) axis is the Mean Absolute Error - the average error in our predictions in car sales. We can see we are typically making errors of around 1000-2000 cars.

**Our best case interpretation:**

We can see some clear highs and lows in our errors. It seems that are predictions are best at `around 3 to 6 months into the future`, where we are able to `predict with 95% confidence the car sales, with an error of ±1300 cars`.

Before and after that the **volatility** (waviness) or errors becomes too big to say anything with certainty. Your boss will be happy you did our research 🤓

# Congrats on completing this challenge! 🙌 There's always more to learn!