# Cumulative Operations and Rolling Windows in Pandas

### What Are Cumulative Operations and Rolling Windows?

In data analysis and time series, **cumulative operations** and **rolling windows** help us observe running totals, averages, and other statistics over a sequence of values. These techniques are essential for detecting **trends**, **seasonality**, **anomalies**, and **local patterns** in our dataset.

In the Titanic dataset, we can simulate rolling windows on numerical columns like `Age`, `Fare`, or cumulative statistics by passenger ID or boarding date (if present or simulated).

## Cumulative Functions in Pandas

Cumulative functions return the running total (or other operation) across rows.

In [None]:
import pandas as pd

df = pd.read_csv("data/train.csv")

df['CumulativeFare'] = df['Fare'].cumsum()
df['CumulativeSurvival'] = df['Survived'].cumsum()
df['CumulativeMaxAge'] = df['Age'].cummax()
print(df[['PassengerId', 'Fare', 'CumulativeFare', 'CumulativeSurvival', 'CumulativeMaxAge']].head())

Other cumulative methods:

- `.cumsum()` → Cumulative sum
- `.cumprod()` → Cumulative product
- `.cummax()` → Cumulative max
- `.cummin()` → Cumulative min

## Rolling Window Functions

Rolling functions apply a moving window over the data and calculate a statistic within that window. It's commonly used in time series for smoothing.

In [None]:
# Rolling mean on Fare
df['RollingFare'] = df['Fare'].rolling(window=5).mean()

# Rolling survival rate (average over last 5 passengers)
df['RollingSurvival'] = df['Survived'].rolling(window=5).mean()
print(df[['Fare', 'RollingFare', 'Survived', 'RollingSurvival']].head(10))

Common functions with `.rolling()`:

- `.mean()`, `.sum()`, `.std()`, `.min()`, `.max()`

### Handling Missing Values:

- Use `.fillna()` to handle NaN values introduced at the beginning of rolling windows.

## AI/ML Use Case: Time-Based Feature Engineering

Cumulative and rolling operations are heavily used in machine learning to create **derived features** that track user behavior, trends, or group statistics over time.

Examples:

- Cumulative sum of purchases
- Rolling average temperature
- Rolling survival rate or delay
- Customer engagement trends over time

These can be **predictive** features that capture long-term patterns or short-term spikes, improving model accuracy.

## Exercises

**Q1.** Compute cumulative sum of `Fare`.

In [None]:
df['CumFare'] = df['Fare'].cumsum()
print(df[['PassengerId', 'Fare', 'CumFare']].head())

**Q2.** Compute rolling mean of `Age` with window of 3.

In [None]:
df['RollingAge'] = df['Age'].rolling(window=3).mean()
print(df[['PassengerId', 'Age', 'RollingAge']].head(10))

**Q3.** Add column for cumulative count of passengers.

In [None]:
df['CumulativeCount'] = range(1, len(df) + 1)

**Q4.** Create rolling standard deviation of `Fare` with window 4.

In [None]:
df['RollingFareStd'] = df['Fare'].rolling(window=4).std()
print(df[['Fare', 'RollingFareStd']].head(10))

**Q5.** Fill missing values in rolling results using forward fill.

In [None]:
df['RollingAge'] = df['RollingAge'].fillna(method='ffill')

## Summary

Cumulative and rolling operations are key tools in Pandas for analyzing trends and smoothing out short-term fluctuations. Cumulative functions like `.cumsum()`, `.cummax()` help track growth over time, while rolling windows allow for local averages and volatility measurement.

Cumulative and rolling operations are key tools in Pandas for analyzing trends and smoothing out short-term fluctuations. Cumulative functions like `.cumsum()`, `.cummax()` help track growth over time, while rolling windows allow for local averages and volatility measurement.

These operations are especially useful in time-series analysis, stock price forecasting, customer behavior modeling, and more. Rolling statistics can be combined with `.groupby()` and `.shift()` for even more powerful feature engineering.

When used properly, these techniques lead to **better model inputs**, **smoother data visualization**, and **insightful analytics**. Always visualize and validate these derived columns before using them in production ML pipelines.