## Feature Selection_Pyflux
## Table of Contents:
* [0. Importing dependencies](#dependencies)
* [1. Feature Extraction](#1.0)
    * [1.1 Pyflux - ARIMA](#1.1)


This code file was not executed due to the high computational demands of Pyflux.

When using the ARIMA (Autoregression Integrated Moving Average)  model for feature extraction, the output's fitted values represent the main dynamics of the data, while the residuals reveal the remaining volatility or anomalies. However, overall, these two features still reflect the primary trends and seasonality of the data, which overlap with Prophet.

Nonetheless, this file is uploaded to the repository for record and reference.

# 0. Importing dependencies  <a class="anchor" id="dependencies"></a>

In [None]:
! pip install pandas numpy matplotlib statsmodels pmdarima

import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
import matplotlib.pyplot as plt
from pmdarima import auto_arima
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)

Collecting pmdarima
  Downloading pmdarima-2.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pmdarima
Successfully installed pmdarima-2.0.4


# 1. Feature Extraction  <a class="anchor" id="1.0"></a>

## 1.1 PyFlux  <a class="anchor" id="1.1"></a>

In [None]:
# Load dataframes
file_path = '/content/gdrive/My Drive/df_merged_m.csv'
df_merged_m = pd.read_csv(file_path)

df = df_merged_m.asfreq('T')

In [None]:
# Define a function to automate the process of modeling and extracting features for each variable (vc, i, ud)
# Use auto_arima to automatically find the best ARIMA parameters
def auto_fit_arima(series, seasonal=True, m=1440):
    auto_model = auto_arima(series, seasonal=seasonal, m=m, trace=True, error_action='ignore', suppress_warnings=True)
    print(auto_model.summary())
    return auto_model.predict_in_sample(), auto_model.resid()

In [None]:
# Run the model for each metric and save the results
# videoConsumption
df_merged_m['vc_fitted'], df_merged_m['vc_resid'] = auto_fit_arima(df_merged_m['videoConsumption'])

Performing stepwise search to minimize aic


In [None]:
# impression
df_merged_m['i_fitted'], df_merged_m['i_resid'] = auto_fit_arima(df_merged_m['impression'])

In [None]:
# uniqueDevice
df_merged_m['ud_fitted'], df_merged_m['ud_resid'] = auto_fit_arima(df_merged_m['uniqueDevice'])

In [None]:
# Save the extracted features
features = df_merged_m[['vc_fitted', 'vc_resid', 'i_fitted', 'i_resid', 'ud_fitted', 'ud_resid']]
path_to_save = '/content/gdrive/My Drive/extracted_features_arima.csv'
features.to_csv(path_to_save, index=False)

In [None]:
# Visualize the result

plt.figure(figsize=(12, 8))
plt.subplot(311)
plt.plot(df['videoConsumption'], label='Original')
plt.plot(df['vc_fitted'], label='Fitted')
plt.title('Video Consumption Fitting')
plt.legend()

plt.subplot(312)
plt.plot(df['impression'], label='Original')
plt.plot(df['i_fitted'], label='Fitted')
plt.title('Impression Fitting')
plt.legend()

plt.subplot(313)
plt.plot(df['uniqueDevice'], label='Original')
plt.plot(df['ud_fitted'], label='Fitted')
plt.title('Unique Device Fitting')
plt.legend()

plt.tight_layout()
plt.show()