# Advanced Time Series Operations and Resampling in Pandas

### What Are Advanced Time Series Operations?

When working with time-series data, it's often necessary to **resample**, **smooth**, or **roll** over time to see trends, patterns, or summaries. These operations allow us to:

- Downsample (e.g., daily → weekly)
- Upsample (e.g., hourly → minute-level data)
- Smooth noisy data using rolling averages
- Handle missing dates or irregular intervals
- Apply time-window-based statistics (like moving average)

**Pandas provides tools like `.resample()` and `.rolling()` to perform these operations efficiently.**

### Why Are These Important in AI/ML?

In machine learning, especially with time-series datasets like:

- Stock prices
- Sales data
- Sensor readings
- Server logs
- Event timestamps

...We often need to engineer features like:

- Rolling means, trends
- Time-window aggregates
- Resampled values
- Temporal seasonality indicators

These help models understand **temporal structure** in the data.

### **Setup — Simulate a Booking Date Column**

Let's assume Titanic passengers boarded between April 1 and April 15, 1912.

In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv("data/train.csv")

# Simulate a 'JourneyDate' column
np.random.seed(42)
df['JourneyDate'] = pd.to_datetime(
    np.random.choice(pd.date_range("1912-04-01", "1912-04-15"), size=len(df))
)

# Set as index for resampling
df.set_index('JourneyDate', inplace=True)

1. Resampling Data (Using `.resample()`)
    
Resampling allows us to **group data by time intervals**.

In [None]:
daily_counts = df['PassengerId'].resample('D').count()
print(daily_counts.head())

Weekly count:

In [None]:
weekly_counts = df['PassengerId'].resample('W').count()
print(weekly_counts)

2. Forward Fill Missing Dates (`.asfreq()` + `.ffill()`)

Sometimes there are missing dates. We can fill them.

In [None]:
# Reindex to all days
full_range = pd.date_range(start=df.index.min(), end=df.index.max())
df_reindexed = df.reindex(full_range)
    
# Forward fill data
df_filled = df_reindexed.ffill()
print(df_filled.head())

3. Rolling Window (Moving Average)
    
Smooth the daily count trend using a 3-day moving average:

In [None]:
daily_counts = df['PassengerId'].resample('D').count()
rolling_avg = daily_counts.rolling(window=3).mean()
    
print(rolling_avg.head())

4. Shifting Data in Time (Lag Features)
    
Shift values forward or backward — useful for **lag-based prediction**.

In [None]:
daily_counts = df['PassengerId'].resample('D').count()
df_shifted = daily_counts.shift(1)  # Shift by 1 day
    
print(pd.DataFrame({
    'Actual': daily_counts,
    'Previous_Day': df_shifted
}).head())

5. Difference Between Time Periods

Calculate the difference between a day and its previous day.

In [None]:
diff = daily_counts.diff()
print(diff.head())

### AI/ML Use Case: Time-Aware Feature Engineering

In real-world datasets, time-aware features are extremely powerful:

- **Rolling averages** show short-term vs long-term patterns.
- **Lagged values** help predict future behavior (e.g., demand tomorrow).
- **Resampled trends** reduce noise and highlight seasonal effects.
- **Differences** help measure change over time.

Time series models like ARIMA, Prophet, or LSTMs rely on such engineered features.

### Exercises

Q1. How many passengers boarded each week?

In [None]:
weekly_counts = df['PassengerId'].resample('W').count()
print(weekly_counts)

Q2. Smooth daily counts using a 3-day rolling average

In [None]:
daily_counts = df['PassengerId'].resample('D').count()
df['Rolling3'] = daily_counts.rolling(window=3).mean()
print(df['Rolling3'].head())

Q3. Shift the daily count column by 1 day

In [None]:
df['ShiftedBy1'] = daily_counts.shift(1)
print(df['ShiftedBy1'].head())

Q4. Compute difference between each day and the previous

In [None]:
df['DailyDiff'] = daily_counts.diff()
print(df['DailyDiff'].head())

### Summary

Advanced time series operations in Pandas are essential tools for analyzing and understanding datasets that evolve over time. These operations go beyond basic date extraction and allow us to **resample, smooth, shift, and compare values across time periods**. Even though the Titanic dataset does not contain original date fields, we simulated a `JourneyDate` column to demonstrate how powerful these operations can be.

At the core of time series handling is the `.resample()` method, which allows us to group data based on custom time intervals — for instance, summarizing daily passenger counts into weekly totals. This is especially useful for identifying high-level trends, seasonality, or fluctuations over time. When time intervals are missing (e.g., no passengers on a particular day), forward filling techniques like `.ffill()` help fill in gaps for a continuous timeline.

Another critical technique is **rolling window analysis**, enabled via `.rolling()`. This is often used to calculate moving averages, which smooth out short-term variations and highlight long-term patterns. For example, a 3-day moving average of daily passenger counts gives a clearer view of trends and reduces the noise from day-to-day randomness.

We also explored **shifting and differencing**, which are foundational techniques in time-series feature engineering. The `.shift()` function allows us to create lag features — like “passenger count from the previous day” — which can be highly predictive in forecasting models. The `.diff()` method helps measure changes or growth over time by comparing each value to its previous entry. These derived features (rolling mean, lag, diff) play a vital role in models like ARIMA, LSTM, and Prophet, where time-aware relationships matter.

In machine learning workflows, these advanced time series features are particularly useful for tasks such as **sales forecasting, demand prediction, behavioral modeling, anomaly detection**, and more. By converting raw timestamps into engineered time-based insights, we give our models a much better chance at capturing real-world patterns. Whether we’re dealing with financial data, sensor logs, or booking trends, mastering these time series operations in Pandas makes our analysis deeper, more accurate, and much more informative.