# Time Series Modeling for Menstrual Cycle Forecasting

## Introduction

This notebook focuses on forecasting menstrual cycle lengths using time series models. Accurate cycle prediction helps users anticipate their periods, manage symptoms, and detect irregularities early. We implement and compare baseline models: Moving Average, ARIMA, and Prophet. The best model is saved for integration into the RituCare app.

### Why Cycle Prediction is Useful
- Enables proactive health management.
- Identifies patterns linked to conditions like PCOS.
- Improves user experience in menstrual tracking apps.

### Models Overview
- **Moving Average**: Simple smoothing technique using rolling means.
- **ARIMA**: Statistical model for time series forecasting (AutoRegressive Integrated Moving Average).
- **Prophet**: Additive model by Meta for forecasting with seasonality and trends.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from sklearn.metrics import mean_absolute_error
import pickle

# Suppress warnings
warnings.filterwarnings('ignore')

# Set visual style
sns.set_style("whitegrid")
palette = ['#FFB6C1', '#DDA0DD', '#FF69B4', '#FFC0CB', '#E6E6FA']
sns.set_palette(palette)
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12

# Try importing Prophet, fallback to ARIMA
try:
    from prophet import Prophet
    prophet_available = True
except ImportError:
    prophet_available = False
    print("Prophet not available, using ARIMA only.")

from statsmodels.tsa.arima.model import ARIMA

## 1. Import & Load Dataset

Load the processed cycles data, parse dates, and sort.

In [None]:
# Load dataset
df = pd.read_csv('../dataset/processed_cycles.csv')

# Parse dates if available
if 'FirstDayOfCycle' in df.columns:
    df['FirstDayOfCycle'] = pd.to_datetime(df['FirstDayOfCycle'], errors='coerce')
    df = df.sort_values(by=['ClientID', 'FirstDayOfCycle'])
else:
    # Assume cycle index based on order
    df['cycle_index'] = df.groupby('ClientID').cumcount()
    df = df.sort_values(by=['ClientID', 'cycle_index'])

print(f"Loaded dataset with {len(df)} rows.")
print(df.head())

## 2. Prepare Time Series

Group by ClientID, select users with multiple cycles, and prepare target variable.

In [None]:
# Group by ClientID and filter users with >=5 cycles for modeling
user_groups = df.groupby('ClientID')
valid_users = user_groups.filter(lambda x: len(x) >= 5)['ClientID'].unique()

if len(valid_users) == 0:
    print("No users with sufficient cycles. Using a user with max cycles.")
    valid_users = [df['ClientID'].value_counts().idxmax()]

# Pick one user for demonstration
selected_user = valid_users[0]
user_data = df[df['ClientID'] == selected_user].reset_index(drop=True)
user_data['cycle_index'] = range(len(user_data))

# Target: cycle_length
ts_data = user_data[['cycle_index', 'cycle_length']].set_index('cycle_index')

print(f"Selected user: {selected_user}, Cycles: {len(ts_data)}")
print(ts_data.head())

## 3. Baseline Models

Define functions for each model.

In [None]:
# Moving Average
def moving_average_forecast(series, window=3, steps=3):
    ma = series.rolling(window=window).mean().iloc[-1]
    forecast = [ma] * steps
    return forecast

# ARIMA
def arima_forecast(series, steps=3):
    model = ARIMA(series, order=(1, 1, 1))
    model_fit = model.fit()
    forecast = model_fit.forecast(steps=steps)
    return forecast.tolist()

# Prophet
def prophet_forecast(series, steps=3):
    if not prophet_available:
        return arima_forecast(series, steps)
    df_prophet = pd.DataFrame({'ds': series.index, 'y': series.values})
    model = Prophet()
    model.fit(df_prophet)
    future = model.make_future_dataframe(periods=steps)
    forecast = model.predict(future)
    return forecast['yhat'].tail(steps).tolist()

# Models dict
models = {
    'Moving Average': moving_average_forecast,
    'ARIMA': arima_forecast,
    'Prophet': prophet_forecast
}

## 4. Train + Forecast

Train models, forecast next cycles, plot, and store results.

In [None]:
# Split data: train on first 80%, forecast last 20% or next 3
train_size = int(len(ts_data) * 0.8)
train = ts_data.iloc[:train_size]
test = ts_data.iloc[train_size:]
steps = min(3, len(test))

forecasts = {}
for name, func in models.items():
    try:
        pred = func(train['cycle_length'], steps=steps)
        forecasts[name] = pred[:steps]
    except Exception as e:
        print(f"Error in {name}: {e}")
        forecasts[name] = [np.nan] * steps

# Plot
plt.figure(figsize=(12, 6))
plt.plot(ts_data.index, ts_data['cycle_length'], label='Actual', color=palette[0])
for i, (name, pred) in enumerate(forecasts.items()):
    plt.plot(range(train_size, train_size + len(pred)), pred, label=f'{name} Forecast', color=palette[i+1])
plt.title('Cycle Length Forecast')
plt.xlabel('Cycle Index')
plt.ylabel('Cycle Length (Days)')
plt.legend()
plt.grid(True)
plt.show()

print("Forecasts:", forecasts)

## 5. Evaluate

Compute MAE and display table.

In [None]:
# Evaluate MAE
actual = test['cycle_length'].values[:steps]
mae_scores = {}
for name, pred in forecasts.items():
    if len(pred) == len(actual):
        mae = mean_absolute_error(actual, pred)
        mae_scores[name] = mae
    else:
        mae_scores[name] = np.nan

# Display table
mae_df = pd.DataFrame(list(mae_scores.items()), columns=['Model', 'MAE'])
print(mae_df)

## 6. Save Best Model

Save the model with lowest MAE.

In [None]:
# Best model
best_model = min(mae_scores, key=mae_scores.get)
print(f"Best model: {best_model} with MAE {mae_scores[best_model]}")

# Save ARIMA as example (since others are functions)
model = ARIMA(train['cycle_length'], order=(1, 1, 1))
fitted_model = model.fit()
with open('../models/cycle_model.pkl', 'wb') as f:
    pickle.dump(fitted_model, f)

print("Model saved to ../models/cycle_model.pkl")

## 7. Visualizations

Additional plots.

In [None]:
# Rolling average chart
plt.figure(figsize=(10, 6))
sns.lineplot(data=ts_data, x=ts_data.index, y='cycle_length', label='Actual', color=palette[0])
sns.lineplot(data=ts_data, x=ts_data.index, y=ts_data['cycle_length'].rolling(3).mean(), label='Rolling Avg', color=palette[1])
plt.title('Rolling Average Cycle Length')
plt.xlabel('Cycle Index')
plt.ylabel('Cycle Length (Days)')
plt.legend()
plt.grid(True)
plt.show()

# Forecast with confidence (for Prophet if available)
if prophet_available:
    df_prophet = pd.DataFrame({'ds': ts_data.index, 'y': ts_data['cycle_length']})
    model = Prophet()
    model.fit(df_prophet)
    future = model.make_future_dataframe(periods=3)
    forecast = model.predict(future)
    fig = model.plot(forecast)
    plt.title('Prophet Forecast with Confidence')
    plt.show()
else:
    print("Prophet not available for confidence plot.")

## Interpretation of Results

The models provide forecasts for future cycle lengths. Lower MAE indicates better accuracy. Moving Average is simple but may lag; ARIMA captures trends; Prophet handles seasonality. Results help in predicting irregularities.

## Conclusion

Time-series training completed â€” model saved ðŸŽ¯

Next steps: Integrate model into app, add user logs, and evaluate on more users.