# <p style="background-color:skyblue; font-family:calibri; font-size:200%; text-align:center"> London bike share prediction - Comparing Prophet, XGB, LSTM 🚴‍♀️📈  </p>

![](https://www.pertemps.co.uk/media/2042/london_banner.png)

**The goal of this notebook is to predict the amount of bikeshares.
I'll be comparing models from the Prophet libaries against a XGBoost and LSTM-model.**


# <p style="background-color:skyblue; font-family:calibri; font-size:150%; text-align:center"> Table of contents</p>

* [1.Introduction](#intro)
    * [1.1.Goals](#goals)
    * [1.2.Libraries](#libraries)
* [2.The Data](#data)
    * [2.1.Data preprocessing](#dataprep)
    * [2.2.Exploratory Data Analysis](#eda)
    * [2.3.Feature engineering](#feature)
* [3.Prophet modelling](#prophet)
    * [3.1.Univariate Prophet](#univariateprophet)
    * [3.2.Multivariate Prophet](#multivariateprophet)
* [4.Neural Prophet modelling](#neuralprophet)
* [5.XGBoost modelling](#xgb)
* [6.LSTM modelling](#LSTM)
* [7.Conclusion](#conclusion)


# <p style="background-color:skyblue; font-family:calibri; font-size:150%; text-align:center" id="intro"> 1. Introduction</p>


The dataset we'll be analyzing is the ["Bike Share" data](https://www.kaggle.com/hmavrodiev/london-bike-sharing-dataset). It shows the amount of bikes rented every hour in London and it's related parameters. 

The metadata of the columns:

* "timestamp" - timestamp field for grouping the data
* "cnt" - the count of a new bike shares
* "t1" - real temperature in C
* "t2" - temperature in C "feels like"
* "hum" - humidity in percentage
* "windspeed" - wind speed in km/h
* "weather_code" - category of the weather
* "is_holiday" - boolean field - 1 holiday / 0 non holiday
* "is_weekend" - boolean field - 1 if the day is weekend
* "season" - category field meteorological seasons: 0-spring ; 1-summer; 2-fall; 3-winter.

* "weather_code" category description: 1 = Clear ; mostly clear but have some values with haze/fog/patches of fog/ fog in vicinity 2 = scattered clouds / few clouds 3 = Broken clouds 4 = Cloudy 7 = Rain/ light Rain shower/ Light rain 10 = rain with thunderstorm 26 = snowfall 94 = Freezing Fog

As I mentioned in the title, the main goal of this notebook is to train multiple models coming from a number of libraries. I wanted to see which models performed the best and which where the easiest to use. In the end we'll put each one side by side and see which models does the best job at creating a forecast!

Libraries used:
* [FB Prophet](https://facebook.github.io/prophet/docs/quick_start.html#python-api)
* [Neural Prophet](https://neuralprophet.com/)
* [XGBoost](https://xgboost.readthedocs.io/en/latest/)
* [Keras](https://keras.io/) (LSTM)

In [None]:
# Importing Libraries

import numpy as np 
import pandas as pd
import os

df = pd.read_csv("/kaggle/input/london-bike-sharing-dataset/london_merged.csv")
df.head()

# <p style="background-color:skyblue; font-family:calibri; font-size:150%; text-align:center" id="data"> 2. The Data</p>

<a id="dataprep"></a>
## 2.1. Data preprocessing

In this chapter we'll check the data how "clean" the data is. We'll be looking for missing values, incorrect values, duplicates, outliers, ...

To quickly perform this analysis i'll be trying out the **ProfileReport()** from **pandas_profiling**.

In [None]:
import pandas_profiling as pp 

profile = pp.ProfileReport(df)
profile

Above you can see the **ProfileReport()**. When looking at the report it's clear that we have a clean dataset. 

One thing I did notice is that the timestamp isn't of the type "DateTime". We're gonna change this and also set the timestamp as index.

In [None]:
df["timestamp"] = pd.to_datetime(df["timestamp"])
df = df.set_index("timestamp")

df.head()

We'll be creating these features since they could come in handy whilest training some future models.


In [None]:
df["hour"] = df.index.hour
df["day_of_month"] = df.index.day
df["day_of_week"]  = df.index.dayofweek
df["month"] = df.index.month
df.head()

<a id="eda"></a>
## 2.2. Exploratory Data Analysis

Exploring the data is key to understanding it and finding some usable insights!

One of the most importants conclusions from this dataset is how many bike shares occured over time.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [None]:
plt.figure(figsize=(15, 7))
ax = sns.lineplot(x=df.index, y=df.cnt,data=df)
ax.set_title("Amount of bike shares vs date", fontsize=25)
ax.set_xlabel("Date", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.show()

The plot above shows the bike shares across the time. Even though it shows alls the bike sales it's hard to gain insights from it. Let's see if we can use our own created attributes to find a conclusion about the shares!

### 2.2.1. Amount of bike shares per month


Can we see a distinct difference in the shares per month?

In [None]:
# Resample timeseries, for plotting timeseries month frequency
df_by_month = df.resample("M").sum()

plt.figure(figsize=(16,6))
ax = sns.lineplot(data=df_by_month,x=df_by_month.index,y=df_by_month.cnt)
ax.set_title("Amount of bike shares per month", fontsize=25)
ax.set_xlabel("Month", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.show()

In [None]:
df_by_week = df.resample("D").sum()
plt.figure(figsize=(16,6))
ax = sns.lineplot(data=df_by_week,x=df_by_week.index,y=df_by_week.cnt)
ax.set_title("Amount of bike shares per day", fontsize=25)
ax.set_xlabel("Time", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.show()

We can clearly notice a dip in shares around the winter period of each year. When the year progress bike shares are increasing towards the summer months (see underneed).

In [None]:
plt.figure(figsize=(16,6))
ax = sns.pointplot(data=df,hue=df.season,y=df.cnt,x=df.month)
ax.set_title("Amount of bike shares per season", fontsize=25)
ax.set_xlabel("Month", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.show()

It's clear that more bikes are rented in summer time then winter time. Could this be related to the temperature? (See chapter 2.2.4)

### 2.2.2. Amount of bike shares in a week


So we see a clear pattern when looking at the bike shares per month. What about bike shares on a day to day basis?

In [None]:
plt.figure(figsize=(16, 6))
ax = sns.pointplot(x='day_of_week', y='cnt',data=df)
ax.set_title("Amount of bike shares in a week", fontsize=25)
ax.set_xlabel("Day of the week", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.show()

If we look over a timespan from one week it's clear that less bikes are rented within the weekend.

### 2.2.3. Amount of bike shares in a day


What about a single day? Since there are two types of days (holiday/normal day) let's compare these.

In [None]:
plt.figure(figsize=(16, 6))
ax = sns.pointplot(x='hour', y='cnt',hue='is_holiday',data=df)
ax.set_title("Amount of bike shares per hour in a day", fontsize=25)
ax.set_xlabel("Hour of they day", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.show()

It's clear that on a normal day most bikes are rented within the rush hour peaks. On a holiday this clearly changes, the rush hour peaks are not present and most bikes ar rented in the afternoon.

It can be concluded that's there a distinct difference in shares when it's not a working day. If this trend is true, we should also see it occuring in the weekend.

In [None]:
plt.figure(figsize=(16, 6))
ax = sns.pointplot(x='hour', y='cnt',data=df[df["is_weekend"]==1])
ax.set_title("Amount of bike shares per hour on a weekend day", fontsize=25)
ax.set_xlabel("Hour of they day", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.show()

The trend above confirms the difference between bike shares on normal working days versus holidays.

### 2.2.4. Amount of bike shares related to temperature

In [None]:
plt.figure(figsize=(20,10))

ax = sns.pointplot(x='t1', y='cnt',data=df)
ax.set_title("Amount of bike shares vs real temperature", fontsize=25)
ax.set_xlabel("Real temperature (°C)", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.locator_params(axis='x', nbins=10)

In [None]:
plt.figure(figsize=(20,10))

ax = sns.pointplot(x='t2', y='cnt',data=df)
ax.set_title("Amount of bike shares vs feeling temperature", fontsize=25)
ax.set_xlabel("Feeling temperature (°C)", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.locator_params(axis='x', nbins=10)
plt.show()



Like we tought it's clear that people rent more bikes when the temperature is high. Next to this we see a high correlation between the real and the feelings temperature.

### 2.2.5. Amount of bike shares related to the humidity

In [None]:
plt.figure(figsize=(20,10))

ax = sns.pointplot(x='hum', y='cnt',data=df)
ax.set_title("Amount of bike shares vs humidity", fontsize=25)
ax.set_xlabel("Humidity (%)", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.locator_params(axis='x', nbins=10)
plt.show()

An increase in humidity results in a decrease of the bikes shared.

Spontaniously i asked myself if humidity is related to the outside temperature. Let's call the correlation matrix.



In [None]:
print("Temperature and humidity have a weak negative correlation:")
df["t1"].corr(df["hum"], method = "pearson")

Two more features are left to analyse (wind_speed and weather_code). Let's check them out!


### 2.2.6. Amount of bike shares related to the windspeed

In [None]:
plt.figure(figsize=(20,10))

ax = sns.pointplot(x='wind_speed', y='cnt',data=df)
ax.set_title("Amount of bike shares vs windspeed", fontsize=25)
ax.set_xlabel("Windspeed (km/h)", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.locator_params(axis='x', nbins=10)
plt.show()

It seems like there is small peak when the windspeed is at around 25 km/h.


### 2.2.7. Amount of bike shares related to the weather 

In [None]:
plt.figure(figsize=(16,6))
ax = sns.histplot(data=df,y=df.cnt,x=df.weather_code)
ax.set_title("Amount of bike shares vs the weather", fontsize=25)
ax.set_xlabel("Weather code", fontsize=20)
ax.set_ylabel('Amount of bike shares', fontsize=20)
plt.show()

It's clear that when the weather is good (1,2,3,4,7) more bikes are rented then when the weather is bad (10,26)

### 2.2.8. Feature correlations

Ending this EDA with an overview of all feature correlations.

In [None]:
plt.figure(figsize=(16,6))
sns.heatmap(df.corr(),cmap="YlGnBu",square=True,linewidths=.5,center=0)

# <p style="background-color:skyblue; font-family:calibri; font-size:150%; text-align:center" id="prophet"> 3. Prophet</p>

Opensource library for univariate timeseries forecasting, Luckely prophets also implements an additive time series forecasting model. More information about the Prophet library can be found [here](https://facebook.github.io/prophet/).


In the documentation of Prophet i found the following.

> Prophet will by default fit weekly and yearly seasonalities, if the time series is more than two cycles long. It will also fit daily seasonality for a sub-daily time series. You can add other seasonalities (monthly, quarterly, hourly) using the add_seasonality method (Python) or function (R).
([link Specific Custom Seasonalities](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html))

When you're using Prophet it's important to know that you need atleast two full periods (yearly) of your data so that the model can completely capture the trend. Although it is possible to add your own seasonalities and other trend manually to the model (.[add_seasonality](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#specifying-custom-seasonalities)) I won't be doing this for now.

If you're dealing with holidays, check this [link](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#modeling-holidays-and-special-events) to add these to your model.
Other non daily events can be addressed with through this [method](https://facebook.github.io/prophet/docs/non-daily_data.html#sub-daily-data). As you can see alot of customization can be done to a train model giving you full control of the trend created. That's exactly why I wanted to test this library! 


<a id="univariateprophet"></a>

## 3.1. Univariate Prophet

We we'll be try to predict both the daily and hourly sales. We'll have to use all of the data since we barely have two full yearly periods of bike shares. Unfortunately this means that evaluating the model with unseen data is not possible.

In [None]:
data = pd.read_csv("../input/london-bike-sharing-dataset/london_merged.csv")
data["timestamp"] = pd.to_datetime(data["timestamp"])

mydata = data[['timestamp', 'cnt']].copy()
mydata["timestamp"] = pd.to_datetime(mydata["timestamp"])
mydata = mydata.set_index("timestamp")

# Daily resampling
daydata = mydata.resample("D").sum()

daydata['timestamp'] = daydata.index
daydata.index = range(0,len(daydata['cnt'].to_numpy()))

# No resampling for hourly prediction
hourdata = mydata
hourdata['timestamp'] = hourdata.index
hourdata.index = range(0,len(hourdata['cnt'].to_numpy()))

The prophet library requires some specific transformations.

In [None]:
daydf = daydata[['timestamp','cnt']].copy()
daydf.columns = ['ds','y']

hourdf = hourdata[['timestamp','cnt']].copy()
hourdf.columns = ['ds', 'y']

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error
import math
from fbprophet import Prophet

# Train the models
daymodel = Prophet()
hourmodel = Prophet()

# Fit the model with train set
daymodel.fit(daydf) 
hourmodel.fit(hourdf)

dayfuture = daymodel.make_future_dataframe(periods=365)
hourfuture = hourmodel.make_future_dataframe(periods=365, freq='H')

daypred = daymodel.predict(dayfuture)
hourpred = hourmodel.predict(hourfuture)

Unfortunately since we needed to use all hour training data it's not possible to evaluate the models we created. We'll have to evaluate them visually instead.

In [None]:
# Plot the day forecast
f, ax = plt.subplots(1)
f.set_figheight(6)
f.set_figwidth(15)

daymodel.plot(daypred, ax=ax)

ax.set_title('Bike share prediction per day', fontsize=14)
ax.set_xlabel(xlabel='Date', fontsize=14)
ax.set_ylabel(ylabel='Bikes shares', fontsize=14)

plt.show()

In [None]:
# Plot the hour forecast
f, ax = plt.subplots(1)
f.set_figheight(6)
f.set_figwidth(15)

hourmodel.plot(hourpred, ax=ax)

ax.set_title('Bike share prediction per hour', fontsize=14)
ax.set_xlabel(xlabel='Date', fontsize=14)
ax.set_ylabel(ylabel='Bikes shares', fontsize=14)

plt.show()

It's clear that our model did a good job predicting future bike shares. I'm certain it will be making reliable prediction in the future. Checking the model components (trend + cyclic component [yearly seasonality, weekly seasonality] and effects of the holidays at this stage.

Just so you know it has interactive plotting! (It slowed down this notebook so i added them as comments.

In [None]:
# Python
from fbprophet.plot import plot_plotly, plot_components_plotly

#plot_plotly(daymodel, daypred)

In [None]:
#plot_plotly(hourmodel, hourpred)

Both forecasts are already doing pretty good without that much effort! Let's view their components!

In [None]:
#plot_components_plotly(daymodel, daypred)
fig = daymodel.plot_components(daypred)

In [None]:
#plot_components_plotly(hourmodel, hourpred)
fig = hourmodel.plot_components(hourpred)

<a id="multivariateprophet"></a>
## 3.2. Multivariate prophet

A multivariate model, is a timeseries model where the input are multiple features varying over time. Here we'll be focusing on the daily data since we can easily compare it to the model above.

To add these features we'll use the [.add_regressor()](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#additional-regressors) function.

In [None]:
data["timestamp"] = pd.to_datetime(data["timestamp"])
feature_columns = [
    't1',
    't2',
    'hum',
    'wind_speed',
]
target_column = ['cnt']

multidata = data[['timestamp'] + target_column + feature_columns].copy()
multidata.columns = ['ds', 'y'] + feature_columns

In [None]:
# Train the model
multimodel = Prophet()
multimodel.add_regressor('t1')
multimodel.add_regressor('t2')
multimodel.add_regressor('hum')
multimodel.add_regressor('wind_speed')

# Fit the model with train set
multimodel.fit(multidata)

multifuture = multimodel.make_future_dataframe(periods=365, freq='H')

# Predict on valid set
# multipred = multimodel.predict(multifuture)

Above you can see that we were training a model with additional regressors. Unfortunately due to a lack of data we cant futher test this. The only solution would be to split the data and define our own function for seasonality, add this to the trained model and then test it again. But for now I'm considering this as "out-of-scope".

# <p style="background-color:skyblue; font-family:calibri; font-size:150%; text-align:center" id="neuralprophet"> 5. Neural Prophet Modelling</p>

[Neural Prophet](https://neuralprophet.com/model-overview/#when-to-use-neuralprophet) is another time series forecasting tool based on the Facebook Prophet. It's developed in a fully modular architecture which makes it easily scalable! This library is pretty young and continuous to add more features to the library. 

More information about how this model works can be found [here](https://neuralprophet.com/model-overview/#when-to-use-neuralprophet). One clear difference is that it now uses a neural network called [AR-Net](https://github.com/ourownstory/AR-Net) to handle the auto-regression part.


In [None]:
!pip install neuralprophet
from neuralprophet import NeuralProphet

In [None]:
neuraldaymodel = NeuralProphet()
metrics = neuraldaymodel.fit(daydf, freq='D')

In [None]:
neuralhourmodel = NeuralProphet()
hourmetrics = neuralhourmodel.fit(hourdf, freq='H')

In [None]:
neuralhourfuture = neuralhourmodel.make_future_dataframe(hourdf, periods=365)
neuralhourforecast = neuralhourmodel.predict(neuralhourfuture)

neuraldayfuture = neuraldaymodel.make_future_dataframe(daydf, periods=365)
neuraldayforecast = neuraldaymodel.predict(neuraldayfuture)

dayforecasts_plot = neuraldaymodel.plot(neuraldayforecast)


In [None]:
fig_comp = neuraldaymodel.plot_components(neuraldayforecast)

In [None]:
forecasts_plot = neuralhourmodel.plot(neuralhourforecast)

In [None]:
fig_comp = neuralhourmodel.plot_components(neuralhourforecast)

Yes again both models did a great job at capturing the trend. Even though the NeuralProphet library has less features then the regular Prophet library it has a powerfull underlying model which does a great maby even better job on capturing the trend.

# <p style="background-color:skyblue; font-family:calibri; font-size:150%; text-align:center" id="xgb"> 6. XGBoost Modelling</p>

In [None]:
data

In [None]:
X = data.drop(['timestamp','cnt'],axis=1)
y = data['cnt']
#scaler_x = preprocessing.MinMaxScaler()
#X =  pd.DataFrame(scaler_x.fit_transform(X), columns = X.columns)

In [None]:
def df_split(df,train_percent):
    split_index = int(train_percent * len(df))
    train = df.iloc[:split_index]
    test = df.iloc[split_index:]
    return train,test

In [None]:
X_train,X_test = df_split(X,0.7)
y_train,y_test = df_split(y,0.7)

In [None]:
from xgboost.sklearn import XGBRegressor
from sklearn.metrics import mean_absolute_error,r2_score, mean_squared_log_error,mean_squared_error, make_scorer

In [None]:
xgbmodel = XGBRegressor()
xgbmodel.fit(X_train,y_train)

preds = xgbmodel.predict(X_test)

In [None]:
rmse = np.sqrt(mean_squared_error(y_test, preds))
print("RMSE: %f" % (rmse))

In [None]:
y_test.values

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(16,6))
plt.plot(y_test.values,marker=".",label="actual")
plt.plot(preds,marker=".",label="prediction",color="r")

plt.title('Bike share prediction per hour', fontsize=22)
plt.xlabel(xlabel='Date', fontsize=14)
plt.ylabel(ylabel='Bikes shares', fontsize=14)
plt.legend(['Ground truth', 'Predicted'])

plt.show()

The xgb model did a good job at capturing the trend. But there is still quite a big difference with the ground truth. Let's see if an LSTM neural network can do a better job at this forecasting.

# <p style="background-color:skyblue; font-family:calibri; font-size:150%; text-align:center" id="LSTM"> 7. LSTM Modelling</p>

In this chapter we'll be creating a bidirectional LSTM model. It's basically a LSTM model (RNN) which presents each training sequence forwards and backwards through the layer. Thanks to this it picks up the sequential information about all points before and after it. [Here's](https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/) another example where Bi-LSTM is used to classify sequences.

If you're not familiar with recurrent networks I recommend you to read this [notebook](https://www.kaggle.com/thebrownviking20/intro-to-recurrent-neural-networks-lstm-gru).

We'll be using the Keras library to implement this model.

In [None]:
df

In [None]:
import math
from sklearn.preprocessing import RobustScaler

training_data_len = math.ceil(len(df) *.9) # taking 90% of data to train and 10% of data to test
testing_data_len = len(df) - training_data_len

time_steps = 24
train, test = df.iloc[0:training_data_len], df.iloc[(training_data_len-time_steps):len(df)]
print(df.shape, train.shape, test.shape)

In [None]:
# Scale the all of the data from columns ['t1', 't2', 'hum', 'wind_speed']
train_trans = train[['t1', 't2', 'hum', 'wind_speed']].to_numpy()
test_trans = test[['t1', 't2', 'hum', 'wind_speed']].to_numpy()

scaler = RobustScaler() # Handles outliers
train.loc[:, ['t1', 't2', 'hum', 'wind_speed']]=scaler.fit_transform(train_trans)
test.loc[:, ['t1', 't2', 'hum', 'wind_speed']]=scaler.fit_transform(test_trans)

#Scale the all of the data from columns ['cnt']
train['cnt'] = scaler.fit_transform(train[['cnt']])
test['cnt'] = scaler.fit_transform(test[['cnt']])

In [None]:
from tqdm import tqdm_notebook as tqdm

#Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in tqdm(range(len(train) - time_steps)):
    x_train.append(train.drop(columns='cnt').iloc[i:i + time_steps].to_numpy())
    y_train.append(train.loc[:,'cnt'].iloc[i + time_steps])

#Convert x_train and y_train to numpy arrays
x_train = np.array(x_train)
y_train = np.array(y_train)

In [None]:
#Create the x_test and y_test data sets
x_test = []
y_test = df.loc[:,'cnt'].iloc[training_data_len:len(df)]

for i in tqdm(range(len(test) - time_steps)):
    x_test.append(test.drop(columns='cnt').iloc[i:i + time_steps].to_numpy())
    # y_test.append(test.loc[:,'cnt'].iloc[i + time_steps])

#Convert x_test and y_test to numpy arrays
x_test = np.array(x_test)
y_test = np.array(y_test)

In [None]:
# All 12 columns of the data
print('Train size:')
print(x_train.shape, y_train.shape)
print('Test size:')
print(x_test.shape, y_test.shape)

In [None]:
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout , LSTM , Bidirectional 

model = Sequential()
model.add(Bidirectional(LSTM(50,input_shape=(x_train.shape[1],x_train.shape[2]))))
model.add(Dropout(0.2))
model.add(Dense(units=1))

model.compile(optimizer="adam",loss="mse")

# prepared_model = model.fit(X_train,y_train,batch_size=32,epochs=100,validation_data=[X_test,y_test])

history = model.fit(x_train, y_train, epochs=150, batch_size=24, validation_split=0.1, shuffle=True)

In [None]:
y_pred = model.predict(x_test)
y_pred = scaler.inverse_transform(y_pred)#Undo scaling
y_pred

In [None]:
from sklearn.metrics import mean_squared_error, r2_score
rmse_lstm = np.sqrt(mean_squared_error(y_test, y_pred))
rmse_lstm

In [None]:
plt.figure(figsize=(16, 8))
plt.plot(y_test[1200:1500], label='true')
plt.plot(y_pred[1200:1500], label='predicted')
plt.legend()

In [None]:
plt.plot(history.history["loss"],label="loss")
plt.plot(history.history["val_loss"],label="val_loss")
plt.legend(loc="best")
plt.xlabel("No. Of Epochs")
plt.ylabel("mse score")

# <p style="background-color:skyblue; font-family:calibri; font-size:150%; text-align:center" id="conclusion"> 8. Conclusion</p>

It's clear that there are multiple ways to perform forecasting. Both Prophet-libraries are very easy to use and have some great features! Unfortunately due to a lack of data we weren't capable of further testing the created models. The Prophet models are definitely not the final model, we could add more features like holidays to get an even better performance. I'm honestly really looking forward use these extra features in future projects!

Underneed you can see a summary of the trained models:

| **Model**  | **RMSE**  | **Remark** |
|---|---|---|
| FB Prophet  | /  | Not enough data |
| Neural Prophet | / | Not enough data|
| XGBoost | 1015 | / |
| LSTM model | 414 | / | 


The best performing models was as expected the LSTM model. It took me a little while to get the parameters right, but in the end the prediction looks pretty good and it was well worth the effort! 

Overall I found this a really fun and interesting project, testing out multiple libraries on the same dataset.

### Thank you for reading this notebook! Please comment if you have any questions and an upvote would be appreciated!