# Module 5.2: Time Series Forecasting

So far, we've dealt with data where the order of rows doesn't matter. But what if the data is a sequence of observations recorded over time? This could be daily stock prices, monthly sales figures, or hourly temperature readings. This is **Time Series Data**. 📈

**Time Series Forecasting** is the process of analyzing this historical data to identify meaningful patterns and using those patterns to predict future values. It's a critical field used in finance, weather forecasting, supply chain management, and many other industries.

**Goal of this Notebook:**
We will go deep into the theory and practical application of analyzing time series data. We will cover:

1.  **The Anatomy of a Time Series:** Understanding its core components (Trend, Seasonality, Residuals).
2.  **The Importance of the Datetime Index:** How to properly structure time series data in Pandas.
3.  **Time Series Decomposition:** A powerful technique to visually separate the components of our data.

## 1. The Anatomy of a Time Series

A time series can be broken down into three fundamental components. Understanding these is the key to effective forecasting.

* **Trend:** The long-term direction of the data. Is it generally increasing, decreasing, or staying flat over time? For example, the total number of internet users globally has a clear upward trend.

* **Seasonality:** A repeating, fixed, and periodic pattern in the data. For example, ice cream sales are consistently higher in the summer months and lower in the winter. This pattern repeats every year.

* **Residuals (or Noise):** The random, irregular fluctuations in the data that are not explained by the trend or seasonality. This is the unpredictable component.

**The Model:** We can think of a time series as a combination of these parts:

$ \text{Time Series} = \text{Trend} + \text{Seasonality} + \text{Residuals} $

Our goal in analysis is to isolate and understand each of these components.

### Dataset Setup

We'll use a classic dataset of monthly milk production. It's a great example because it has a clear trend and seasonality.

➡️ **Action:** Go to the `02_Data_Analysis_and_Wrangling/data/` folder. Create a new file named `monthly_milk_production.csv` and paste the following content into it:

```csv
Date,Production
1962-01,589
1962-02,561
1962-03,640
1962-04,656
1962-05,727
1962-06,697
1962-07,640
1962-08,599
1962-09,568
1962-10,577
1962-11,553
1962-12,582
1963-01,600
1963-02,566
1963-03,653
1963-04,673
1963-05,742
1963-06,716
1963-07,660
1963-08,617
1963-09,583
1963-10,587
1963-11,565
1963-12,598
```

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')

## 2. The Datetime Index

**Theory:** For Pandas to recognize that we are working with time series data, the index of our DataFrame must be a special `DatetimeIndex` type. This unlocks a huge amount of time-based functionality, like resampling, rolling windows, and easy plotting.

When we load our CSV, the 'Date' column is just a plain string. We need to explicitly tell Pandas to parse it as a date and set it as the index.

In [None]:
# Load the data
df = pd.read_csv('../02_Data_Analysis_and_Wrangling/data/monthly_milk_production.csv')

# Step 1: Convert the 'Date' column to a datetime object
df['Date'] = pd.to_datetime(df['Date'])

# Step 2: Set the 'Date' column as the index
df.set_index('Date', inplace=True)

print("--- DataFrame Info ---")
df.info()
print("\n--- First 5 Rows ---")
df.head()

In [None]:
# With a DatetimeIndex, plotting is easy and the x-axis is automatically formatted.
df.plot(figsize=(12, 6))
plt.title('Monthly Milk Production')
plt.ylabel('Production')
plt.show()

## 3. Time Series Decomposition

**Theory:** Now that our data is properly formatted, we can use a statistical technique to break it down into the three components we discussed: Trend, Seasonality, and Residuals. This is called **decomposition**.

It helps us verify our initial assumptions from the plot and quantify the different patterns in the data. We'll use the `seasonal_decompose` function from the `statsmodels` library.

In [None]:
# Decompose the time series
# The function returns an object with attributes for each component
decomposition = seasonal_decompose(df['Production'], model='additive')

# Plot the decomposed components
fig = decomposition.plot()
fig.set_size_inches(12, 8)
plt.show()

**Interpretation of the Plots:**
* **Observed:** This is our original data.
* **Trend:** You can clearly see a steady, long-term increase in milk production over the years.
* **Seasonal:** This plot shows a very strong and consistent yearly pattern. Production peaks in the middle of the year and is lowest at the beginning and end.
* **Resid:** These are the random fluctuations left over after removing the trend and seasonal components.


## ✅ What's Next?

You now have a solid theoretical and practical understanding of how to handle and analyze time series data. You can identify its core components and use decomposition to isolate them.

This is the end of the **`05_Advanced_Topics`** module. The next logical step in a real project would be to use forecasting models (like ARIMA or Prophet) to predict future values based on these identified patterns.

Congratulations on completing all the core learning modules! You are now ready to apply these skills to the **`06_Portfolio_Projects`**.