# Python for Time-Series Data (Optional)

This notebook introduces time-series analysis in Python for researchers working with data that changes over time—like energy use, weather, or occupancy logs.

## 1. Introduction

**Time-series data** is any data where each value is associated with a specific date and/or time.

**Examples:**
- Electricity consumption measured every hour
- Indoor temperature logs
- Survey responses with timestamps

Python (with pandas) makes it easy to filter, analyze, and plot time-based data—much faster and more flexibly than spreadsheets.

## 2. Loading and Preparing the Data

Let's create a small sample dataset (in practice, you'd load your own CSV).

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Sample data: hourly electricity use for 10 days
import numpy as np
rng = pd.date_range('2023-01-01', periods=240, freq='H')
usage = np.random.normal(loc=5, scale=2, size=len(rng))
df = pd.DataFrame({'timestamp': rng, 'electricity_kWh': usage})

# Save to CSV for demonstration
df.to_csv('energy_data.csv', index=False)

# Load the data
df = pd.read_csv('energy_data.csv')
df.head()

In [None]:
# Convert the timestamp column to datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])

# Set the datetime column as the index
df = df.set_index('timestamp')
df.head()

## 3. Exploring and Visualizing Time-Series

Let's plot the data and zoom in to a specific time range.

In [None]:
# Plot the full time-series
df['electricity_kWh'].plot(figsize=(10,4), title='Electricity Use Over Time')
plt.ylabel('kWh')
plt.show()

In [None]:
# Filter data for January 3rd
df_jan3 = df['2023-01-03']
df_jan3['electricity_kWh'].plot(title='Electricity Use on Jan 3')
plt.ylabel('kWh')
plt.show()

In [None]:
# Plot a 24-hour moving average
df['moving_avg'] = df['electricity_kWh'].rolling(window=24).mean()
df[['electricity_kWh', 'moving_avg']].plot(figsize=(10,4), title='Electricity Use and 24h Moving Average')
plt.ylabel('kWh')
plt.show()

## 4. Resampling and Aggregating

You can easily summarize time-series data by day, week, or month.

In [None]:
# Daily average electricity use
daily_avg = df['electricity_kWh'].resample('D').mean()
print(daily_avg.head())

# Weekly total electricity use
weekly_sum = df['electricity_kWh'].resample('W').sum()
print(weekly_sum.head())

# Plot monthly totals (if data covers multiple months)
monthly_sum = df['electricity_kWh'].resample('M').sum()
monthly_sum.plot(kind='bar', title='Monthly Total Electricity Use')
plt.ylabel('kWh')
plt.show()

## 5. Handling Missing or Gappy Data

Let's see how to find and fill missing values.

In [None]:
# Simulate missing data
df_missing = df.copy()
df_missing.iloc[10:20] = np.nan

# Check for missing timestamps
print(df_missing.isnull().sum())

# Fill gaps with previous value (forward fill)
df_filled = df_missing.fillna(method='ffill')

# Or interpolate missing values
df_interp = df_missing.interpolate()

# Drop missing rows if needed
df_dropped = df_missing.dropna()

## 6. Extracting Features from Timestamps

You can extract useful information from the datetime index, like hour of day or weekday.

In [None]:
# Add hour and weekday columns
df['hour'] = df.index.hour
df['weekday'] = df.index.dayofweek

# Average profile by hour of day
hourly_profile = df.groupby('hour')['electricity_kWh'].mean()
hourly_profile.plot(title='Average Electricity Use by Hour of Day')
plt.ylabel('kWh')
plt.show()

# Compare weekdays vs weekends
weekday_profile = df.groupby('weekday')['electricity_kWh'].mean()
weekday_profile.plot(kind='bar', title='Average Use by Day of Week (0=Mon)')
plt.ylabel('kWh')
plt.show()

## 7. Saving Results

You can export your cleaned or aggregated data, and save plots for reports.

In [None]:
# Save daily averages to CSV
daily_avg.to_csv('daily_avg.csv')

# Save a plot as an image
fig, ax = plt.subplots()
df['electricity_kWh'].plot(ax=ax, title='Electricity Use Over Time')
plt.ylabel('kWh')
fig.savefig('electricity_timeseries.png')

## 8. Practice Task (Optional)

**Challenge:**
- Resample to weekly averages
- Plot a moving average
- Filter the data for January

## 9. Summary & Resources

**Key skills:**
- Convert strings to datetime
- Resample and aggregate time-series
- Plot and filter by date

### Further Reading
- [pandas time-series documentation](https://pandas.pydata.org/docs/user_guide/timeseries.html)
- [matplotlib documentation](https://matplotlib.org/stable/users/index.html)

With these basics, you can confidently analyze and visualize timestamped data in your research!