# Detect and display anomalies in the observed metrics

This notebook is dedicated to analyzing time series data from observability metrics of an e-commerce platform

We will conduct the following analyses for some of the columns:
- Generate a time series plot of the data.
- Decompose the time series into trend, seasonality, and residuals.
- Perform a week-by-week correlation analysis between the price and quantity.


In [None]:
import pandas as pd

## Data Loading and Preliminary Inspection

First, we load our dataset and display the first rows. 

In [None]:
import pandas as pd
df=pd.read_csv('../data/anomalies.csv')
df.head()

## Anomaly Detection in `buyer.cookie.duration`

Using statistical methods, we identify days where the average duration spent by buyers on the website (`buyer.cookie.duration`) deviates significantly from the norm. This can help us detect technical issues or exceptional user behavior.


In [None]:
import matplotlib.pyplot as plt
df['date'] = pd.to_datetime(df['date'])

plt.figure(figsize=(10, 6))
plt.plot(df['date'], df['buyer.cookie.duration'], marker='o', linestyle='-')

plt.title('buyer.cookie.duration over time')
plt.xlabel('Date')
plt.ylabel('buyer.cookie.duration')

# Show the plot
plt.show()

We clearly see a drop in the data in June. Let's use the MAD method to confirm it statistically. 

In [None]:
# Instantiate the MAD model
from pyod.models.mad import MAD
mad = MAD()

# Fit the model on the 'buyer.cookie.duration' column
mad.fit(df[['buyer.cookie.duration']])

# Get the anomaly scores
scores = mad.decision_scores_

# Plotting the scores to visualize anomalies
plt.figure(figsize=(10, 6))
plt.plot(df['date'], scores, marker='o')
plt.title('Anomaly Scores from MAD for buyer.cookie.duration')
plt.xlabel('Date')
plt.ylabel('Anomaly Score')
plt.show()

## Correlation Analysis Methodology

We focus on the relationship between `orders.price.mean` and `orders.quantity.mean`. We calculate the Pearson correlation coefficient for these variables on a weekly basis to observe how their relationship evolves over time.

In [None]:

from scipy.stats import pearsonr
df_correlated = df.set_index('date')


df_correlated = df_correlated.resample('W').apply(
    lambda x: pearsonr(x['orders.price.mean'], x['orders.quantity.mean'])[0] if len(x) > 1 else np.nan
)

plt.figure(figsize=(12, 6))
plt.plot(df_correlated.index, df_correlated, marker='o', linestyle='-')
plt.title('Weekly Correlation between Price and Quantity')
plt.xlabel('Week')
plt.ylabel('Pearson Correlation Coefficient')
plt.axhline(0, color='gray', linestyle='--')
plt.show()


## Time Series Decomposition

We apply time series decomposition to the `nrows` column to separate it into trend, seasonal, and residual components. This allows us to analyze and understand underlying patterns and outliers in the context of our time series data. The coming cell shows the data overtime, while the next one focuses on the seasonal decomposition of the 6 first weeks of data, where we clearly see an outlier in the residuals. 

In [None]:
import matplotlib.pyplot as plt
df['date'] = pd.to_datetime(df['date'])


plt.figure(figsize=(10, 6))
plt.plot(df['date'], df['nrows'], marker='o', linestyle='-')


plt.title('nrows Over Time')
plt.xlabel('Date')
plt.ylabel('nrows')

plt.show()


In [None]:

six_weeks_data = df[df['date'] < (df['date'].min() + pd.Timedelta(weeks=6))]

six_weeks_data['day_of_week'] = six_weeks_data['date'].dt.strftime('%a')
plt.figure(figsize=(12, 6))
plt.plot(six_weeks_data['date'], six_weeks_data['nrows'], marker='o', linestyle='-')

plt.title('nrows Over the First 6 Weeks with Days of the Week')
plt.xlabel('Date')
plt.ylabel('nrows')

for i in range(len(six_weeks_data)):
    plt.text(six_weeks_data['date'].iloc[i], 
             six_weeks_data['nrows'].iloc[i] + 150,  # Adjust vertical position for more space
             six_weeks_data['day_of_week'].iloc[i], 
             fontsize=8, ha='right')

plt.xticks(rotation=45)
plt.tight_layout()

plt.show()





In [None]:
import statsmodels.api as sm
import matplotlib.pyplot as plt
decomposition = sm.tsa.seasonal_decompose(six_weeks_data['nrows'], model='additive', period=7)
fig = decomposition.plot()
plt.show()
