# PPG10_s3 - TIME SERIES



1. **Definition and components**  
   Explain what a time series is and list its main components. Provide a real-life example for each component.



# 1. **Definition and Components**  

## **What is a Time Series?**  
A **time series** is a sequence of data points observed at successive time intervals (daily, monthly, yearly, etc.). It is used in fields like finance, economics, meteorology, and signal processing.  

---

## **Main Components of a Time Series**  

1. **Trend**  
   - A long-term movement in the data that shows an overall increase or decrease.  
   - **Example:** The steady rise in global temperatures due to climate change.  

2. **Seasonality**  
   - Regular and predictable patterns that repeat over a fixed period.  
   - **Example:** Retail sales increase every December due to the holiday season.  

3. **Cyclic Patterns**  
   - Long-term fluctuations that occur at irregular intervals, influenced by economic, political, or social factors.  
   - **Example:** Stock market cycles that follow economic booms and recessions.  

4. **Irregular Variations**  
   - Unpredictable fluctuations caused by unexpected events.  
   - **Example:** A sudden drop in airline travel during the COVID-19 pandemic.  

2. **Stationarity**  
   What does it mean for a time series to be stationary? Mention two methods to test if a series is stationary.



* A time series is stationary if its statistical properties (mean, variance, and autocorrelation) remain constant over time.

* **Augmented Dickey-Fuller (ADF) Test** or **Unit Root Test**: It is the most popular statistical test to know if the dataset is stationary or not and with the following assumptions:

    • **Null Hypothesis (H0)**: Series is non-stationary
    
    • **Alternate Hypothesis (HA)**: Series is stationary
    
    • **p-value > 0.05**: Fail to reject (H0)
    
    • **p-value <= 0.05**: Accept (HA)

* **Rolling Statistics (Moving Average & Variance)**

    Plot the rolling mean and variance over time.
    
    If they remain constant, the series is likely stationary.
    
    If they show trends or shifts, the series is non-stationary.


3. **AR and MA Models**  
   Briefly define AR(p) and MA(q) models. Which graph is used to determine the order of each model?



# 3. **AR and MA Models**  

## **Autoregressive Model (AR(p))**  
- Expresses the current value of a time series as a linear combination of its past values.  
- **Formula:**  
  \[
  X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + ... + \phi_p X_{t-p} + \epsilon_t
  \]
- **Example:** Predicting a stock's closing price based on past prices.  

## **Moving Average Model (MA(q))**  
- Expresses the current value of a time series as a linear combination of past error terms.  
- **Formula:**  
  \[
  X_t = \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + ... + \theta_q \epsilon_{t-q}
  \]
- **Example:** The effect of economic news on stock prices dissipating over a few days.  

## **Which Graph is Used to Determine Model Order?**  
- **Autocorrelation Function (ACF):** Used to determine \( q \) in an MA(q) model.  
- **Partial Autocorrelation Function (PACF):** Used to determine \( p \) in an AR(p) model.  


4. **Trend**  
   Explain what "detrending" is and mention two different methods to remove the trend from a time series.



# 4. **Trend**  

## **What is Detrending?**  
**Detrending** is the process of removing the trend component from a time series to analyze underlying patterns such as seasonality and residual variations.  

## **Methods to Remove Trend**  

1. **Differencing**  
   - Computes the difference between consecutive observations:  
     \[
     Y_t = X_t - X_{t-1}
     \]
   - Used to convert non-stationary series into stationary ones.  
   - **Example:** Removing the trend in stock prices by calculating daily price changes.  

2. **Regression-Based Detrending**  
   - Fits a regression model (linear or polynomial) to estimate and remove the trend.  
   - **Example:** Removing a linear trend in temperature data over the years.  

5. **Transformations**  
   Explain the Box-Cox transformation. When is it applied and what are its advantages in time series analysis?



# 5. **Transformations**  

## **Box-Cox Transformation**  
The **Box-Cox transformation** stabilizes variance and normalizes data in time series analysis.  

**Formula:**  
\[
X' =  
\begin{cases}  
\frac{X^\lambda - 1}{\lambda}, & \text{if } \lambda \neq 0  \\  
\ln(X), & \text{if } \lambda = 0  
\end{cases}
\]  

### **When is it Applied?**  
- When a time series has **heteroscedasticity** (non-constant variance).  
- When the data is **skewed** and needs normalization.  
- Before applying **ARIMA models**, which require stationary data.  

### **Advantages**  
✅ Stabilizes variance.  
✅ Makes the data closer to a normal distribution.  
✅ Helps in achieving stationarity, essential for forecasting models.  

6. **Comparison between ARIMA and SARIMA**  
   Describe the differences between an ARIMA model and a SARIMA model. Provide an example of when to use each one.



# 6. **Comparison Between ARIMA and SARIMA**  

| Feature  | ARIMA  | SARIMA  |
|----------|--------|---------|
| Seasonality Handling | ❌ No | ✅ Yes |
| Parameters | \( p, d, q \) | \( p, d, q, P, D, Q, m \) |
| When to Use | Non-seasonal time series | Seasonal time series |
| Example | Stock price prediction | Monthly sales forecasting |

## **Example Use Cases**  
✅ **ARIMA:** Predicting monthly inflation, which does not follow a strong seasonal pattern.  
✅ **SARIMA:** Predicting electricity demand, which has daily and yearly seasonal patterns.  

7. **Autocorrelation**  
   What is the difference between autocorrelation and autocovariance? What does a high value in the ACF at early lags indicate?



# 7. **Autocorrelation**  

## **Autocorrelation vs. Autocovariance**  
- **Autocovariance** measures the linear relationship between a time series and its lagged values in terms of variance:  
  \[
  \gamma(h) = E[(X_t - \mu)(X_{t-h} - \mu)]
  \]
  where \( h \) is the lag, \( \mu \) is the mean, and \( X_t \) is the time series.  

- **Autocorrelation** is the normalized version of autocovariance, measuring the strength of the relationship on a scale of \([-1,1]\):  
  \[
  \rho(h) = \frac{\gamma(h)}{\gamma(0)}
  \]
  where \( \gamma(0) \) is the variance of the series.  

## **What Does a High ACF Value at Early Lags Indicate?**  
- A high ACF value at early lags suggests **strong short-term correlation**, meaning past values significantly influence future values.  
- If the ACF slowly decays, the series is likely **non-stationary** and may require **differencing**.  
- If ACF exhibits periodic spikes, the series has **seasonality**. 

8. **Panel Data vs. Time Series**  
   Compare panel data with time series data. Explain how time behaves in each type and give an example of use for each.



# 8. **Panel Data vs. Time Series**  

| Feature  | Time Series  | Panel Data  |
|----------|-------------|-------------|
| Structure | Observations of a single entity over time | Observations of multiple entities over time |
| Time Behavior | Observed for one variable at successive time points | Observed across multiple entities and over time |
| Example | Monthly sales of a company over 10 years | Monthly sales of 100 companies over 10 years |

## **Example Use Cases**  
✅ **Time Series:** Forecasting GDP growth of the U.S. over the next decade.  
✅ **Panel Data:** Analyzing the impact of education policies on student performance across different states over time.  


9. **Decomposition and full modeling**  
   Given a series with trend, seasonality, and heteroskedasticity (e.g., monthly U.S. air passenger numbers from 1949–1960), explain how you would decompose it and what kind of model you would use to forecast it. Justify each step.



# 9. **Decomposition and Full Modeling**  

### **Given a Time Series with Trend, Seasonality, and Heteroskedasticity (e.g., U.S. Air Passenger Data 1949–1960), How to Model It?**  

### **Step 1: Decomposition**  
1. **Trend Extraction**  
   - Use a moving average or **low-pass filter** to remove short-term fluctuations and extract the trend.  
   - **Alternative:** Apply a **polynomial regression** to model the trend.  

2. **Seasonality Identification**  
   - Apply a **Seasonal Decomposition of Time Series (STL decomposition)** to separate the seasonal component.  
   - Examine **Autocorrelation Function (ACF)** to detect seasonal patterns.  

3. **Heteroskedasticity Treatment**  
   - Apply a **Box-Cox transformation** to stabilize variance.  

---

### **Step 2: Model Selection**  
✅ **SARIMA** (\(p, d, q \times P, D, Q, m\)) for trend + seasonality.  
✅ **GARCH (Generalized Autoregressive Conditional Heteroskedasticity)** to model changing variance over time.  

---

### **Step 3: Forecasting**  
- **If the seasonality is deterministic:** Use **SARIMA** to predict future values.  
- **If the variance changes over time:** Combine SARIMA + GARCH.  

**Justification:**  
- SARIMA handles trend and seasonality.  
- GARCH captures dynamic variance changes.  

10. **Model selection and validation**  
   You are given a non-stationary time series with a strong seasonal component. Explain how you would identify and validate a suitable SARIMA model. Include parameter selection criteria and model validation methods.

# 10. **Model Selection and Validation for a Non-Stationary Seasonal Time Series**  

## **Step 1: Identify the Need for a SARIMA Model**  
- **Check for Seasonality:**  
  - Use **ACF** to detect repeating seasonal patterns.  
  - Use **seasonal decomposition** (STL) to visualize trends and seasonality.  

- **Check for Stationarity:**  
  - Apply the **Augmented Dickey-Fuller (ADF) test**:  
    - If non-stationary, apply differencing (both regular and seasonal).  
  - Check **KPSS test** to confirm stationarity after differencing.  

---

## **Step 2: Selecting SARIMA Parameters**  

A SARIMA model is defined as **SARIMA(p, d, q) × (P, D, Q, m)**, where:  
- **(p, d, q):** ARIMA parameters (trend-based).  
- **(P, D, Q, m):** Seasonal components.  

- **Use ACF and PACF plots to determine:**  
  - \( d \) and \( D \) (based on ADF/KPSS test).  
  - \( p \) (from PACF) and \( q \) (from ACF).  
  - \( P \) and \( Q \) (from seasonal ACF/PACF).  

---

## **Step 3: Model Validation**  

### **1. Goodness of Fit**  
✅ Use **Akaike Information Criterion (AIC)** and **Bayesian Information Criterion (BIC)** to select the best model.  
✅ Check **residuals** using a **Ljung-Box test** (residuals should be white noise).  

### **2. Forecast Evaluation**  
✅ Compare actual vs. predicted values using **Mean Absolute Error (MAE)**, **Mean Squared Error (MSE)**, and **Root Mean Squared Error (RMSE)**.  
✅ Perform **cross-validation** using rolling-window forecasting.  

---

## **Step 4: Final Model Deployment**  
Once validated, the SARIMA model can be used for future forecasting.  

### **Note for Students:**

You must work **inside the `week10_s3` folder of your personal GitHub repository** for this activity.

At the end of the session, you are required to **commit and push your work** to your personal GitHub repository. Your submission must include the PG10_s3.ipynb with answers.

Use a clear and descriptive commit message (e.g., `"PPG10_s3 submission"`).