# Prediction in Time Series

## MLForecast

MLForecast is a time series forecasting library that uses machine learning models to predict future values based on historical data. It provides a simple yet powerful framework to build and evaluate forecasting models, such as Linear Regression, Random Forest, or Gradient Boosting Machines, among others. The library aims to simplify the process of feature engineering and model selection, making it easier to create accurate and reliable forecasts.

[MLForecast video](https://www.youtube.com/live/EnhyJx8l2LE)

 MLForecast offers:


- Feature Engineering: MLForecast automates the creation of relevant features from your time series data. It can generate lag features, rolling statistics (like moving averages), and more complex transformations that help the model understand temporal patterns.

- Model Selection: MLForecast supports a variety of machine learning models, giving you the flexibility to choose the best one for your forecasting task. This can include traditional linear models, as well as more complex tree-based models like Random Forests and Gradient Boosting Machines.

- Hyperparameter Tuning: The library facilitates hyperparameter tuning, which is the process of optimizing the parameters of the machine learning model to achieve the best performance. This can be done through cross-validation and other techniques to ensure that the model generalizes well to unseen data.

- Evaluation Metrics: MLForecast provides a range of metrics to evaluate the performance of your forecasting models, such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and others. This helps in comparing different models and choosing the best one.

- Visualization Tools: The library includes various tools to visualize your time series data and the resulting forecasts. This can help in understanding the trends and patterns in the data, as well as in presenting the results in a clear and interpretable manner.

Exmaple

In [None]:
! pip install mlforecast
!pip install datasetsforecast



In [None]:
import pandas as pd
from utilsforecast.plotting import plot_series
from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt


In [None]:
df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/air-passengers.csv', parse_dates=['ds'])
df.head()


Unnamed: 0,unique_id,ds,y
0,AirPassengers,1949-01-01,112
1,AirPassengers,1949-02-01,118
2,AirPassengers,1949-03-01,132
3,AirPassengers,1949-04-01,129
4,AirPassengers,1949-05-01,121


In [None]:
df['unique_id'].value_counts()


Unnamed: 0_level_0,count
unique_id,Unnamed: 1_level_1
AirPassengers,144


In [None]:
df['unique_id'].value_counts()

Unnamed: 0_level_0,count
unique_id,Unnamed: 1_level_1
AirPassengers,144


In [None]:
fig = plot_series(df)

In [None]:


plt.rcParams['figure.figsize'] = 16, 6



In [None]:
fcst = MLForecast(
    models=LinearRegression(),
    freq='MS',  # our serie has a monthly frequency
    lags=[12],
    target_transforms=[Differences([1])],
)
fcst.fit(df)

MLForecast(models=[LinearRegression], freq=MS, lag_features=['lag12'], date_features=[], num_threads=1)

In [None]:
preds = fcst.predict(12)
preds

Unnamed: 0,unique_id,ds,LinearRegression
0,AirPassengers,1961-01-01,444.656555
1,AirPassengers,1961-02-01,417.470734
2,AirPassengers,1961-03-01,446.903046
3,AirPassengers,1961-04-01,491.01413
4,AirPassengers,1961-05-01,502.622223
5,AirPassengers,1961-06-01,568.751465
6,AirPassengers,1961-07-01,660.044312
7,AirPassengers,1961-08-01,643.343323
8,AirPassengers,1961-09-01,540.666687
9,AirPassengers,1961-10-01,491.462708


In [None]:
preds = fcst.predict(12)
preds


Unnamed: 0,unique_id,ds,LinearRegression
0,AirPassengers,1961-01-01,444.656555
1,AirPassengers,1961-02-01,417.470734
2,AirPassengers,1961-03-01,446.903046
3,AirPassengers,1961-04-01,491.01413
4,AirPassengers,1961-05-01,502.622223
5,AirPassengers,1961-06-01,568.751465
6,AirPassengers,1961-07-01,660.044312
7,AirPassengers,1961-08-01,643.343323
8,AirPassengers,1961-09-01,540.666687
9,AirPassengers,1961-10-01,491.462708


In [None]:
fig = plot_series(df, preds)
plt.show(fig)

## XGBoost as regressor

XGBoost, short for eXtreme Gradient Boosting, is a popular and powerful machine learning algorithm that is widely used for classification, regression, and ranking tasks. It is an implementation of gradient boosted decision trees designed for speed and performance:

- Boosting Technique: XGBoost is based on the boosting technique, which sequentially builds an ensemble of models, usually decision trees, where each new model attempts to correct the errors made by the previous models. This leads to a strong overall model with improved predictive accuracy.

- Gradient Boosting: In XGBoost, gradient boosting is used to minimize the loss function. The algorithm iteratively adds decision trees to the model, and at each step, it fits the new tree to the negative gradient of the loss function (hence "gradient boosting").

- Regularization: XGBoost includes regularization parameters to control overfitting. It allows fine-tuning of the modelâ€™s complexity, which helps in balancing the trade-off between bias and variance and leads to better generalization on unseen data.

- Tree Pruning: The algorithm uses a technique called "max depth" to control the maximum depth of trees. This prevents the trees from growing too deep and complex, which can lead to overfitting.

- Handling Missing Values: XGBoost has built-in mechanisms to handle missing data efficiently, making it robust in practical scenarios where datasets often have missing or incomplete information.

- Parallelization and Speed: One of the key strengths of XGBoost is its ability to parallelize tree-building, making it significantly faster than many other boosting algorithms. It also supports distributed computing, which allows it to scale well with large datasets.

- Scalability and Flexibility: XGBoost is highly scalable and can be used with large datasets. It also supports various objective functions and evaluation metrics, making it flexible for different types of machine learning tasks.




In [None]:
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import pandas as pd

# Sample data (replace with your actual data)
data = {'feature1': [1, 2, 3, 4, 5],
        'feature2': [6, 7, 8, 9, 10],
        'target': [10, 15, 20, 25, 30]}
df = pd.DataFrame(data)

X = df[['feature1', 'feature2']]
y = df['target']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the XGBoost Regressor
model = xgb.XGBRegressor(objective='reg:squarederror',  # Objective for regression
                         n_estimators=100,             # Number of boosting rounds
                         learning_rate=0.1,            # Step size shrinkage
                         max_depth=3,                  # Maximum tree depth
                         random_state=42)

model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 24.33837127685547
