# Predict values using classic machine learning

---

In this notebook, we will choose, train, tune, test, and predict velocity values using a classic machine learning model.

### 1. Load packages, open dataset, load model

Open the AI-ready dataframe that includes the series column for prediction. Load the tuned model from the previous notebook.

In [115]:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import numpy as np
from pycaret.regression import *

In [116]:
point_name = 'middle' # 'upstream', 'middle', or 'terminus'
df = pd.read_csv('../data/ai_ready/' + point_name + '_timeseries.csv')

df.head()

Unnamed: 0,series,time,velocity
0,1,2015-01-30 12:00:00,779.53534
1,2,2015-02-11 12:00:00,738.42413
2,3,2015-07-29 12:00:00,741.9333
3,4,2015-09-03 12:00:00,736.29333
4,5,2015-09-15 12:00:00,731.44696


In [117]:
model = load_model('../models/' + point_name + '_model')

Transformation Pipeline and Model Successfully Loaded


In [118]:
df['time'] = pd.to_datetime(df['time'])  # Ensure date is datetime type
df = df.set_index('time').resample('6D').mean(numeric_only=True)  # Resample to every 6 days frequency
df = df.interpolate()  # Interpolate missing values

df.reset_index(inplace=True)
df.head()

Unnamed: 0,time,series,velocity
0,2015-01-30,1.0,779.53534
1,2015-02-05,1.5,758.979735
2,2015-02-11,2.0,738.42413
3,2015-02-17,2.035714,738.549458
4,2015-02-23,2.071429,738.674785


### 2. Visualize test data

The data were split into training and test data. Here we visualize the predictions for the training data and predictions for the test set (highlighted in light gray).

In [119]:
# Ensure the 'time' column is in datetime format
df['time'] = pd.to_datetime(df['time'])

# Generate predictions and use the original time column
predictions = predict_model(model, data=df)
predictions['time'] = df['time']  # Ensure this matches your data

# Ensure the lengths match
predictions = predictions.iloc[:len(df)]

# Plot
fig = px.line(predictions, x='time', y=["velocity", "prediction_label"], template='plotly_dark')
fig.add_vrect(x0="2021-06-01", x1="2024-04-18", fillcolor="grey", opacity=0.25, line_width=0)
fig.show()

### 3. Predict future values

The model is now trained and we can predict into the future. Let's start with values 4 years out at a frequency of 6 days.

Here we create a dataset with time and series values, like our original dataframe. The model will fill in the velocity prediction values.

In [120]:
future_dates = pd.date_range(start = '2024-04-18', end = '2028-04-18', freq = '6D')

future_df = pd.DataFrame()

future_df['time'] = future_dates
future_df['series'] = np.arange(145,(145+len(future_dates)))

future_df.head()

Unnamed: 0,time,series
0,2024-04-18,145
1,2024-04-24,146
2,2024-04-30,147
3,2024-05-06,148
4,2024-05-12,149


Make predictions on the future values.

In [121]:
predictions_future = predict_model(model, data=future_df)
predictions_future.head()

Unnamed: 0,time,series,prediction_label
0,2024-04-18,145,846.870675
1,2024-04-24,146,841.662524
2,2024-04-30,147,841.3532
3,2024-05-06,148,847.316025
4,2024-05-12,149,850.571864


Plot the original data and the future predictions.

In [122]:
concat_df = pd.concat([df,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2015-01-30', end = '2028-04-24', freq = '6D')
concat_df.set_index(concat_df_i, inplace=True)

fig = px.line(concat_df, x=concat_df.index, y=["velocity", "prediction_label"], template = 'plotly_dark')
fig.show()

Overall, the predictions look fairly accurate, with some seasonal spikes during the melt season. The predictions even capture the subtle two spikes in velocity that we have been observing in recent years. Like mentioned earlier, the predictions did not fully capture the intensity of spikes but still were able to capture their general trend.