#### Statistical Data Analysis
Dataset: 

- _aisles_clean.csv_
- _departments_clean.csv_
- _orders_clean.csv_
- _order_products_clean.csv_
- _products_clean_

Author: Luis Sergio Pastrana Lemus  
Date: 2025-05-05

# Statistical Data Analysis – Purchasing Activity Dataset

## __1. Libraries__

In [9]:
from IPython.display import display, HTML
import os
import pandas as pd
from pathlib import Path
import scipy.stats as st
from scipy.stats import ttest_ind
import sys


# Define project root dynamically, gets the current directory from whick the notebook belongs and moves one level upper
project_root = Path.cwd().parent

# Add src to sys.path if it is not already
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

# Import function directly (more controlled than import *)
from src import *

## __2. Path to Data file__

In [10]:
# Build route to data file and upload
data_file_path = project_root / "data" / "processed" / "product_activity"
df_product_reorder_rate_by_position = load_dataset_from_csv(data_file_path, "product_reorder_rate_by_position.csv", sep=',', header='infer')


## __3. Statistical Data Analysis__

### 3.1  Inferential Tests

Hypothesis: “Do users who add products earlier to their cart reorder them more frequently than users who add them later?”

#### 3.1.1  Hypothesis testing: User activity varies by city

In [11]:
# Hypothesis: Earlier added to the cart products are reordered more frequently than those which are added later to the cart.

# 1. Propose Hypotheses H0, H1
# H0: Earlier added to the cart products are reordered likewise than those which are added later to the cart, user purchase activity is the same (==)
# H1: Earlier added to the cart products are reordered more frequently than those which are added later to the cart, user activity is greater (>)

# Prepare data by city for t-test
early_cart = df_product_reorder_rate_by_position[df_product_reorder_rate_by_position['add_to_cart_order'] <= 5]['reorder_rate']
late_cart = df_product_reorder_rate_by_position[df_product_reorder_rate_by_position['add_to_cart_order'] > 5]['reorder_rate']

# 2. Specify Significance or Confidence
# alpha = 5%
# confidence = 95%

alpha = 0.05

In [12]:
# Levene's test, to ensure that the variances of different samples are equal. 
# Preventing Tests Like ANOVA and t-Tests from Being Incorrect

levene_stat, levene_p = st.levene(early_cart, late_cart)
display(HTML(f"<b>Levene's Test</b> – Statistic: {levene_stat:.4f}, P-value: {levene_p:.4f}"))

# Determining Equality of Variances
if levene_p < 0.05:
    equal_var = False
    display(HTML("<i>Null Hypothesis H₀ is rejected: the variances are different → use equal_var=False</i>"))
else:
    equal_var = True
    display(HTML("<i>Null Hypothesis H₀ is not rejected: the variances are equal → use equal_var=True</i>"))

In [13]:
# 3. Calculate critical and test values, define acceptance and rejection zones

t_stat, p_val = ttest_ind(early_cart, late_cart, equal_var=True)

display(HTML(f"T-statistic: <b>{t_stat:.4f}</b>"))
display(HTML(f"P-value: <b>{p_val:.4f}</b>"))

# 4. Decision and Conclusion

if p_val < alpha:
    display(HTML("The <i>'null hypothesis' is rejected</i>, <b>accepting 'alternative hypothesis'</b>, because there is sufficient statistical evidence to affirm that <b>Early cart additions are more likely to be reordered.</b>"))
else:
    display(HTML("The <i>'null hypothesis' is not rejected</i>, <b>accepting 'null hypothesis'</b>, indicating insufficient evidence to conclude that <b>Early cart additions are reordered likewise those which are added later</b>."))

## 4. Conclusion of Statistical Data Analysis – Order and Product activity

The results of the two-sample t-test revealed a statistically significant difference between the reorder rates of products added early to the cart (positions 1–5) and those added later (positions 6+), with a p-value below the 0.05 threshold.

This finding leads us to reject the null hypothesis, confirming that:

🧠 Products added earlier in the cart are significantly more likely to be reordered than those added later.

This reinforces the behavioral insight that early cart placement correlates with habitual purchasing. From a business perspective, this suggests that:

Frequently reordered products tend to be mentally “top-of-mind” for users.

Optimizing cart experiences and product positioning could positively influence reorder behavior.

These products are strong candidates for featured placement, bundle offers, and inventory prioritization.