# Time Series Data in Pandas

This notebook covers working with time series data in Pandas, including date/time handling, indexing, and time-based operations.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

Pandas version: 2.2.3
NumPy version: 2.2.4


## Creating Time Series Data

Pandas provides powerful tools for working with dates and times. Let's start by creating time series data.

In [2]:
# Create date range
print("Date range:")
dates = pd.date_range('2023-01-01', periods=10, freq='D')
print(dates)

# Create time series DataFrame
np.random.seed(42)
ts_data = pd.DataFrame({
    'Date': pd.date_range('2023-01-01', periods=100, freq='D'),
    'Value': np.random.randn(100).cumsum() + 100,
    'Category': np.random.choice(['A', 'B', 'C'], 100)
})

print("\nTime series DataFrame:")
print(ts_data.head())

# Set Date as index
ts_data.set_index('Date', inplace=True)
print("\nDataFrame with DateTime index:")
print(ts_data.head())

Date range:
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
               '2023-01-09', '2023-01-10'],
              dtype='datetime64[ns]', freq='D')

Time series DataFrame:
        Date       Value Category
0 2023-01-01  100.496714        A
1 2023-01-02  100.358450        B
2 2023-01-03  101.006138        A
3 2023-01-04  102.529168        A
4 2023-01-05  102.295015        C

DataFrame with DateTime index:
                 Value Category
Date                           
2023-01-01  100.496714        A
2023-01-02  100.358450        B
2023-01-03  101.006138        A
2023-01-04  102.529168        A
2023-01-05  102.295015        C


## DateTime Indexing

DateTime index allows for powerful time-based selection and slicing.

In [3]:
# Select data for specific date
print("Data for 2023-01-05:")
print(ts_data.loc['2023-01-05'])

# Select date range
print("\nData for January 2023:")
print(ts_data.loc['2023-01-01':'2023-01-31'].head())

# Select by year
print("\nData for 2023:")
print(ts_data.loc['2023'].head())

# Select by month
print("\nData for January:")
print(ts_data.loc['2023-01'].head())

# Partial string indexing
print("\nData for January 10-15:")
print(ts_data.loc['2023-01-10':'2023-01-15'])

Data for 2023-01-05:
Value       102.295015
Category             C
Name: 2023-01-05 00:00:00, dtype: object

Data for January 2023:
                 Value Category
Date                           
2023-01-01  100.496714        A
2023-01-02  100.358450        B
2023-01-03  101.006138        A
2023-01-04  102.529168        A
2023-01-05  102.295015        C

Data for 2023:
                 Value Category
Date                           
2023-01-01  100.496714        A
2023-01-02  100.358450        B
2023-01-03  101.006138        A
2023-01-04  102.529168        A
2023-01-05  102.295015        C

Data for January:
                 Value Category
Date                           
2023-01-01  100.496714        A
2023-01-02  100.358450        B
2023-01-03  101.006138        A
2023-01-04  102.529168        A
2023-01-05  102.295015        C

Data for January 10-15:
                 Value Category
Date                           
2023-01-10  104.480611        A
2023-01-11  104.017193        A
2023-01-

## Resampling

Resampling allows you to change the frequency of your time series data (upsampling or downsampling).

In [4]:
# Downsample to weekly data
print("Weekly mean values:")
weekly_data = ts_data.resample('W').mean()
print(weekly_data.head())

# Downsample to monthly data
print("\nMonthly mean values:")
monthly_data = ts_data.resample('M').mean()
print(monthly_data.head())

# Custom aggregation during resampling
print("\nWeekly statistics:")
weekly_stats = ts_data.resample('W').agg({
    'Value': ['mean', 'min', 'max', 'std']
})
print(weekly_stats.head())

# Upsampling (from daily to hourly) - forward fill
print("\nUpsampling to 12-hourly (forward fill):")
hourly_data = ts_data.resample('12H').ffill()
print(hourly_data.head())

Weekly mean values:


TypeError: agg function failed [how->mean,dtype->object]

## Time Series Operations

Pandas provides various operations specifically designed for time series data.

In [None]:
# Calculate rolling statistics
print("7-day rolling mean:")
rolling_mean = ts_data['Value'].rolling(window=7).mean()
print(rolling_mean.head(10))

# Calculate percentage change
print("\nDaily percentage change:")
pct_change = ts_data['Value'].pct_change()
print(pct_change.head())

# Shift data
print("\nShifted data (lag of 1):")
shifted = ts_data['Value'].shift(1)
print(shifted.head())

# Calculate difference
print("\nFirst difference:")
diff = ts_data['Value'].diff()
print(diff.head())

# Time-based grouping
print("\nAverage by month:")
monthly_avg = ts_data.groupby(pd.Grouper(freq='M')).mean()
print(monthly_avg)

## Summary

You have learned key time series operations in Pandas:

- **Creating Time Series**: Using `date_range()` and DateTime indexing
- **DateTime Indexing**: Selecting data by date ranges and partial strings
- **Resampling**: Changing frequency with `resample()` (upsampling/downsampling)
- **Time Series Operations**: Rolling statistics, percentage changes, shifting, and differencing

These tools are essential for analyzing temporal data and time-dependent patterns.