# 📓 Lesson 10: Time Series and Date-Based Analysis
📘 What you will learn:

In this lesson, you’ll learn how to:
1. Convert columns to datetime format
2. Extract date parts (year, month, day, hour)
3. Use resample() for time-based grouping
4. Use rolling() and expanding() for moving averages and trends
5. When and why to use time series functions in real projects

## 🧠 Why is this useful?
Many datasets contain date and time information:
- Sales logs
- Website visits
- Sensor or IoT data
- Financial prices

Time-based data is everywhere: sales over time, stock prices, server logs, website traffic...

You often need to group by month, see trends per day, or calculate a rolling average.

Understanding how to work with time helps you:
- Analyze trends (monthly, weekly, hourly)
- Detect patterns over time
- Forecast future behavior

## 🧪 Step 1: Load and prepare the dataset

In [None]:
import pandas as pd

df = pd.read_csv('../data/Sales_January_2019.csv')

# Convert columns
df['Quantity Ordered'] = pd.to_numeric(df['Quantity Ordered'], errors='coerce')
df['Price Each'] = pd.to_numeric(df['Price Each'], errors='coerce')
df['Order Date'] = pd.to_datetime(df['Order Date'], format='%m/%d/%y %H:%M', errors='coerce')

# Drop invalid rows
df = df.dropna(subset=['Quantity Ordered', 'Price Each', 'Order Date'])

# Add total column
df['Total Price'] = df['Quantity Ordered'] * df['Price Each']


💡 Tips: Why did we use format='%m/%d/%y %H:%M' in Order Date?

The pd.to_datetime() function tries to automatically guess the date format.

However, if your file does not have a clear and consistent format defined (e.g. MM/DD/YY HH:MM), Pandas will use the dateutil library for each row separately, which:
- is slower
- may be interpreted inconsistently
- and may convert some records incorrectly or incompletely

📅Most common date format:

| Output       | Date Format       |
| ---------------- | ---------------- |
| `01/22/19`       | `%m/%d/%y`       |
| `2023-12-31`     | `%Y-%m-%d`       |
| `31/12/2023`     | `%d/%m/%Y`       |
| `01/22/19 13:45` | `%m/%d/%y %H:%M` |


## ⌚ Step 2: Extract time components
You can get year, month, day, hour, etc. from a datetime column:

In [None]:
# Extract month, day, hour
df['Month'] = df['Order Date'].dt.month
df['Day'] = df['Order Date'].dt.day
df['Hour'] = df['Order Date'].dt.hour
df['Weekday'] = df['Order Date'].dt.day_name()

print(df[['Order Date', 'Month', 'Day', 'Hour', 'Weekday']].head())

📌 Use these new columns to analyze seasonal trends, peak hours, etc.

## 📅 Step 3: Group by date parts

In [None]:
# Total sales per month
print(df.groupby('Month')['Total Price'].sum())

Or per hour:

In [None]:
# Average sales per hour
print(df.groupby('Hour')['Total Price'].mean())

## 🔁 Step 4: Use resample() for time-based grouping
Set the datetime column as index first:

In [None]:
df = df.set_index('Order Date')

# Resample daily sales (total sales per day)
daily_sales = df['Total Price'].resample('D').sum()
print(daily_sales.head())

📌 Common resample frequencies:

'D' = Daily, 'W' = Weekly, 'M' = Monthly, 'H' = Hourly, etc.

## 📈 Step 5: Use rolling() for moving averages
This is helpful for smoothing out short-term noise and seeing trends:

In [None]:
# 7-day moving average
rolling_avg = daily_sales.rolling(window=7).mean()

print(rolling_avg.head(10))

🧠 What is a Moving Average?

A Moving Average is the average of a fixed number of recent data points that slides forward through the dataset.

🔸 window=7: average over the last 7 days

🔸 Helps you see patterns more clearly than the raw, jumpy daily data

📈 It's commonly used in:
- Stock prices
- Website traffic
- Sales trends
- Sensor data

You can plot it (if needed):

! You need to install the matplotlib package to work with charts:

In [None]:
pip install matplotlib

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))
daily_sales.plot(label='Daily Sales')
rolling_avg.plot(label='7-Day Avg')
plt.legend()
plt.title("Daily Sales and Moving Average")
plt.show()