# DateTime Handling and Time Series Basics in Pandas

### What Is DateTime Handling and Time Series in Pandas?

Many real-world datasets involve dates and times — such as birth dates, timestamps, or ticket booking times. In the Titanic dataset, the `Date of Journey` or similar time-based fields (hypothetical or added by us) can help identify patterns like group boarding, survival trends by time, or weekly patterns.

**Pandas offers a rich set of DateTime tools** to parse, format, manipulate, and analyze time-based data. These tools are essential for time series analysis and AI/ML tasks involving temporal trends.

### Sample Setup (Custom Date Column)

Since the Titanic dataset doesn't include a Date column by default, we'll simulate one by creating a new column.

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv("data/train.csv")

# Simulate a 'JourneyDate' column between 1912-04-01 and 1912-04-15
np.random.seed(42)
df['JourneyDate'] = pd.to_datetime(
    np.random.choice(pd.date_range("1912-04-01", "1912-04-15"), size=len(df))
)

print(df[['Name', 'JourneyDate']].head())

                                                Name JourneyDate
0                            Braund, Mr. Owen Harris  1912-04-07
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  1912-04-04
2                             Heikkinen, Miss. Laina  1912-04-13
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  1912-04-15
4                           Allen, Mr. William Henry  1912-04-11


### Why Is DateTime Important in AI/ML?

In AI/ML and data analytics, DateTime features are often engineered into:

- Weekday/weekend indicators
- Time since event
- Monthly or hourly patterns
- Lag-based features for time series forecasting

These are critical for tasks like:

- **Survival prediction** (boarding time)
- **Demand prediction** (by hour/day/month)
- **Churn or retention modeling** (time since signup)

### Common DateTime Functions in Pandas

Pandas has a `.dt` accessor just like `.str` for strings. It allows us to apply datetime operations to a column.

1. `.to_datetime()`: Convert to Pandas DateTime format

In [2]:
df['JourneyDate'] = pd.to_datetime(df['JourneyDate'])
print(df.dtypes['JourneyDate'])  # Should show datetime64[ns]

datetime64[ns]


2. `.dt.year`, `.dt.month`, `.dt.day`: Extract parts of date

In [3]:
df['JourneyYear'] = df['JourneyDate'].dt.year
df['JourneyMonth'] = df['JourneyDate'].dt.month
df['JourneyDay'] = df['JourneyDate'].dt.day
    
print(df[['JourneyDate', 'JourneyYear', 'JourneyMonth', 'JourneyDay']].head())

  JourneyDate  JourneyYear  JourneyMonth  JourneyDay
0  1912-04-07         1912             4           7
1  1912-04-04         1912             4           4
2  1912-04-13         1912             4          13
3  1912-04-15         1912             4          15
4  1912-04-11         1912             4          11


3. `.dt.weekday`, `.dt.day_name()`: Day of week

In [4]:
df['IsMonthStart'] = df['JourneyDate'].dt.is_month_start
df['IsMonthEnd'] = df['JourneyDate'].dt.is_month_end
    
print(df[['JourneyDate', 'IsMonthStart', 'IsMonthEnd']].head())

  JourneyDate  IsMonthStart  IsMonthEnd
0  1912-04-07         False       False
1  1912-04-04         False       False
2  1912-04-13         False       False
3  1912-04-15         False       False
4  1912-04-11         False       False


4. Subtracting Dates: Time Delta

In [5]:
latest = df['JourneyDate'].max()
df['DaysFromLast'] = (latest - df['JourneyDate']).dt.days
    
print(df[['JourneyDate', 'DaysFromLast']].head())

  JourneyDate  DaysFromLast
0  1912-04-07             8
1  1912-04-04            11
2  1912-04-13             2
3  1912-04-15             0
4  1912-04-11             4


5. Filtering Based on Date

In [6]:
# Passengers who boarded after April 10
filtered = df[df['JourneyDate'] > "1912-04-10"]
print(filtered[['Name', 'JourneyDate']].head())

                                            Name JourneyDate
2                         Heikkinen, Miss. Laina  1912-04-13
3   Futrelle, Mrs. Jacques Heath (Lily May Peel)  1912-04-15
4                       Allen, Mr. William Henry  1912-04-11
6                        McCarthy, Mr. Timothy J  1912-04-13
12                Saundercock, Mr. William Henry  1912-04-11


### Basic Time Series Grouping (Resampling & Aggregation)

To demonstrate basic time-based grouping, we'll assume each row is a booking. Let's see how many people boarded per day:

In [7]:
daily_counts = df.groupby('JourneyDate').size().reset_index(name='PassengerCount')
print(daily_counts.head())

  JourneyDate  PassengerCount
0  1912-04-01              72
1  1912-04-02              46
2  1912-04-03              69
3  1912-04-04              59
4  1912-04-05              61


We could also use `.resample()` if `JourneyDate` is set as index:

In [8]:
df.set_index('JourneyDate', inplace=True)
daily_summary = df['PassengerId'].resample('D').count()

print(daily_summary.head())

JourneyDate
1912-04-01    72
1912-04-02    46
1912-04-03    69
1912-04-04    59
1912-04-05    61
Freq: D, Name: PassengerId, dtype: int64


### AI/ML Use Case: Feature Engineering with DateTime

Date columns in machine learning often get transformed into:

- **Categorical features** like Weekday, Month, etc.
- **Time-lag features** (days since last event)
- **Time of day** or **seasonal** indicators

These features improve model accuracy when behavior varies by time — for example:

- Passengers on weekdays vs weekends
- Last-minute ticket booking patterns
- Group boarding on specific dates

### Exercise

Q1. What day of the week did each passenger board?

In [9]:
# Make sure JourneyDate is a column
if 'JourneyDate' not in df.columns:
    df.reset_index(inplace=True)

df['DayOfWeek'] = df['JourneyDate'].dt.day_name()
print(df[['JourneyDate', 'DayOfWeek']].head())

  JourneyDate DayOfWeek
0  1912-04-07    Sunday
1  1912-04-04  Thursday
2  1912-04-13  Saturday
3  1912-04-15    Monday
4  1912-04-11  Thursday


Q2. Create a column `IsWeekend` if the day is Saturday or Sunday

In [10]:
df['IsWeekend'] = df['JourneyDate'].dt.weekday >= 5
print(df[['JourneyDate', 'IsWeekend']].head())

  JourneyDate  IsWeekend
0  1912-04-07       True
1  1912-04-04      False
2  1912-04-13       True
3  1912-04-15      False
4  1912-04-11      False


Q3. Create a column `BookingPeriod` with values:

- `"Early"` for days <= April 7
- `"Late"` for days > April 7

In [11]:
df['BookingPeriod'] = df['JourneyDate'].apply(
    lambda x: "Early" if x.day <= 7 else "Late"
)
print(df[['JourneyDate', 'BookingPeriod']].head())

  JourneyDate BookingPeriod
0  1912-04-07         Early
1  1912-04-04         Early
2  1912-04-13          Late
3  1912-04-15          Late
4  1912-04-11          Late


Q4. Group passengers by JourneyDate and count

In [12]:
booking_counts = df.groupby('JourneyDate').size().reset_index(name='PassengerCount')
print(booking_counts.head())

  JourneyDate  PassengerCount
0  1912-04-01              72
1  1912-04-02              46
2  1912-04-03              69
3  1912-04-04              59
4  1912-04-05              61


### Summary

DateTime operations in Pandas are foundational when working with any **temporal data** — whether it's journeys, bookings, timestamps, or log events. While the Titanic dataset didn’t come with a `Date` column, we simulated one to demonstrate how valuable and flexible date-based features can be.

Key takeaways:

- Use `.to_datetime()` to convert columns into true datetime types.
- Access date parts using `.dt.year`, `.dt.day`, `.dt.day_name()`, etc.
- Generate derived features like `IsWeekend`, `DaysFromLast`, or `BookingPeriod`.
- Use `.apply()` for more advanced transformations based on date values.

In ML workflows, time-based features are crucial for **trend analysis, forecasting**, and **behavior prediction**. With Pandas, working with dates becomes fast, expressive, and scalable — essential for any data scientist or ML engineer.