# Level 10: Time Series (Datetime Handling)

Pandas was originally developed for financial data analysis, so it has excellent capabilities for working with time series data. This level covers how to parse, manipulate, and analyze data that is indexed by time.

In [1]:
import pandas as pd

## 10.1 Parsing Dates

### `pd.to_datetime()`
This is the main function for converting strings or other objects into datetime objects.

In [2]:
dates = ['2023-01-01', '2023-01-02', '03-Jan-2023', '2023/01/04']
pd.to_datetime(dates)

ValueError: time data "03-Jan-2023" doesn't match format "%Y-%m-%d", at position 2. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

You can also specify a format for faster and more reliable parsing.

In [None]:
pd.to_datetime('20230105', format='%Y%m%d')

## 10.2 DateTimeIndex

For time series analysis, you'll typically want your dates to be the index of your DataFrame.

In [None]:
# Create a sample time series DataFrame
date_rng = pd.date_range(start='2023-01-01', end='2023-01-10', freq='D')
ts_df = pd.DataFrame(date_rng, columns=['date'])
ts_df['data'] = range(len(date_rng))
ts_df.set_index('date', inplace=True)
ts_df

### The `.dt` Accessor
When a Series has a datetime dtype, you can use the `.dt` accessor to get properties of the dates, like year, month, day, day of the week, etc.

In [None]:
s = pd.Series(pd.date_range('2023-01-01', periods=5, freq='D'))
print(f"Year: {s.dt.year.tolist()}")
print(f"Month: {s.dt.month.tolist()}")
print(f"Day of week: {s.dt.day_name().tolist()}")

If the datetime is in the index, you can access these properties directly.

In [None]:
ts_df.index.year

## 10.3 Time-Based Operations

### Resampling (`.resample()`)
Resampling is the process of changing the frequency of your time series data (e.g., from daily to weekly). It's a `groupby` operation for time series.

- **Downsampling:** Aggregating data to a lower frequency (e.g., daily to monthly). You need an aggregation function.
- **Upsampling:** Converting data to a higher frequency (e.g., daily to hourly). You need a fill method.

In [None]:
# Downsample from daily to 3-day frequency, taking the mean
ts_df.resample('3D').mean()

In [None]:
# Upsample from daily to 12-hour frequency, using forward-fill
ts_df.resample('12H').ffill()

### Shifting (`.shift()`)
Shifts the data by a specified number of periods. Useful for calculating percent changes or differences.

In [None]:
ts_df['previous_data'] = ts_df['data'].shift(1)
ts_df

### Rolling Windows (`.rolling()`)
Provides rolling window calculations. Useful for calculating moving averages, standard deviations, etc.

In [None]:
# Calculate the 3-day rolling mean
ts_df['rolling_mean_3D'] = ts_df['data'].rolling(window=3).mean()
ts_df

### Time Zones
Pandas has full support for time zone handling.

In [None]:
# Create a timezone-naive index
idx_naive = pd.date_range('2023-01-01 09:00:00', periods=3, freq='H')
ts_naive = pd.Series([1, 2, 3], index=idx_naive)
ts_naive

In [None]:
# Localize to a specific timezone (e.g., UTC)
ts_utc = ts_naive.tz_localize('UTC')
ts_utc

In [None]:
# Convert to another timezone
ts_est = ts_utc.tz_convert('US/Eastern')
ts_est