Process:
- Dataset is first loaded into the `ADModel` class
- Perform data preparation to:
    - Collapse the rows down to a single dimension
    - Remove anomalous categories - e.g. "Unknown" categories, "NT" when using the State dimension, etc.
    - Create a conversion rate column based on aggregated data
- Find anomalies based on ARIMA model:
    - Split data into training vs. test (historical vs. data from the past month)
    - Fit ARIMA parameters p, d, q to the training data
    - Determine the best ARIMA model based on a grid search of the parameters p, d, q
        - Pick best ARIMA model based on AICc score
    - Use ARIMA model to forecast up to the end of the test dataset
    - Compare forecast confidence intervals to test dataset values; flag points as anomalies if they lie outside the confidence interval

Import modules:

In [None]:
from dotenv import load_dotenv
import os
import sys

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from amodely.amodely import *

load_dotenv()
DATASET_PATH = os.environ.get("DATASET_PATH")

Load dataset into the `ADModel` class:

In [None]:
model = Amodely(pd.read_excel(DATASET_PATH + "Conversion Data Extended Period.xlsx"), measure="conversion_rate")
anomalies = model.detect_anomalies(method="arima", dimension="STATE_CODE", steps=10)
anomalies

In [None]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
from datetime import timedelta
categories = sorted(set(model.df["STATE_CODE"]))

for category in categories:
    df = pl.category_pipeline("STATE_CODE", [category]).fit_transform(model.df)
    
    anomaly_df = anomalies[anomalies["STATE_CODE"] == category]
    
    plt.rcParams["figure.figsize"] = (12,6)
    plt.scatter(anomaly_df["QUOTE_DATE"], anomaly_df[model.measure.upper()], c="red")
    plt.axvspan(*mdates.datestr2num([str(datetime(2021, 10, 31) - timedelta(10*7)), "10/31/2021"]), color="orange", alpha=0.5)
    plt.plot(df["QUOTE_DATE"], df[model.measure.upper()])
    plt.ylim(0.05, 0.3)
    plt.show()