# What is Time Series Data and Time Series Analysis?

1. **Definition**
2. **Real-Time Examples**
3. **Key Characteristics of Time Series Data**
4. **Goals of Time Series Analysis**
5. **Components**
   - Trend
   - Seasonality
   - Cyclic
   - Residuals



# Stationarity

1. **Why do we need stationarity**
2. **Types of Stationarity**  
   - Weak stationarity  
   - Strong stationarity  
3. **Testing for Weak Stationarity**
4. **Testing for Strict Stationarity**  
5. **Making Time Series Stationary**
   1. **Differencing**  
      - First order  
      - Second order  
   2. **Transformation**  
      - Logarithmic  
      - Power  
      - Box-Cox  
   3. **Detrending**  
      - Linear  
      - Moving Average  

6. **Seasonal Adjustment**


# Stationarity:
- In time series analysis, stationarity refers to a stochastic process whose statistical properties, including {mean, variance, autocorrelation} remain constant over time.
- A stochastic process is a mathematical concept describing a system that evolves over time in a random or unpredictable manner.
- Stationary series has no trends or seasonality, making it more predictable for analysis and modeling
   ![Stationarity](https://cdn-images-1.medium.com/max/1600/1*tkx0_wwQ2JT7pSlTeg4yzg.png)


1. Most statistical smoothing techniques and time series analysis techniques assume that the data is stationary.

2.  
   ### Strict Stationarity
   - The **joint probability distribution** of observations remains unchanged when shifted in time.
   - All statistical properties (**mean, variance, higher-order moments, skewness, kurtosis**) are constant over time.
   - It is a stricter condition, rarely observed in real-world data.

   ### Weak Stationarity (Wide-Sense Stationarity)
   - **Constant Mean:** The mean does not vary with time.
   - **Constant Variance:** The variance is fixed and does not change over time.
   - **Time-Invariant Autocovariance:** Autocovariance depends only on the lag, not the specific time points.

   **Note:** Weak stationarity is more practical and commonly used in time series analysis.

3. **Weak Stationarity Tests:**
   - Augmented Dickey-Fuller (ADF) Test
   - KPSS Test
   - Phillips-Perron (PP) Test
   - ACF & PACF Plots
   - Ljung-Box Test
   - Variance Ratio Test

4. **Strict (Strong) Stationarity Tests:**
   - Detrended Fluctuation Analysis (DFA)
   - Jarque-Bera Test
   - Kolmogorov-Smirnov (KS) Test
   - Cusum Test

5. **Making Time Series Stationary**

   a. **Differencing** makes a time series stationary by subtracting previous values from current ones.
      - **1st Order Differencing:** Subtracts value at time `t-1` from value at time `t` to remove linear trends.
      - **2nd Order Differencing:** Applies differencing again to remove quadratic trends or further non-stationarity.

         **Cons:**
         - Can lose long-term trends or relationships.
         - May not work well for high volatility or structural changes.

   b. **Transformations:**
      - **Logarithmic:** Applies the natural logarithm to the data to stabilize variance.
      - **Power:** Raises the data to a power (e.g., square root or cube) to stabilize variance.
      - **Box-Cox:**
         * The Box-Cox transformation is a family of power transformations used to stabilize variance.
         * It makes data more normally distributed by applying a parameterized function to positive continuous data.
           - y(λ) = (Y^λ - 1) / λ, for λ ≠ 0
           - Y(0) = ln(Y), for λ = 0

   c. **Detrending**: It is used to remove trends from time series data. 
      - **Linear Detrending:** This method involves fitting a linear regression line to the data and subtracting the fitted values from the original data. This helps remove any linear trends.
      - **Moving Average Detrending:** This method involves calculating the moving average of the data over a specified window size. The moving average is then subtracted from the original data to remove any trends.


6. **Seasonal Adjustment**:  
   - Removing the seasonal component from time series data helps isolate the underlying trend and irregular components.  
   - STL Decomposition can be used.




In [None]:
! pip install yfinance

In [None]:
import yfinance as yf

# Download data for a specific stock (e.g., Apple Inc.)
df = yf.download('AAPL', start='2020-01-01', end='2025-01-01')

# Display the data
print(df.head(),"\n\n\n",df.tail())


# Time Series Decomposition

1. **Types of Decomposition Methods**
   - Additive
   - Multiplicative
2. **STL Decomposition**

## Time Series Decomposition
### There are 2 types of Decomposition:

#### Additive Seasonal Decomposition
- Assumes constant seasonal variation regardless of trend.
- Can work with negative or zero values.

1. Trend Calculation (T):
    * $T_t = \text{Moving Average of the time series at time } t$
    * Smooth the data using a moving average to capture the long-term trend.
2. De-trended Data (D):
    *  $D_t = Y_t - T_t$
    * Subtract the trend (T) from the original data (Y) to get the de-trended data.
3. Seasonal Component (S): 
    * $S_t = \frac{1}{N} \sum_{i=1}^{N} D_t$ for the same period (e.g., month)
    * Calculate the average of de-trended values for each period (e.g., monthly average for all January values).
4. Residual Component (R):
    * $R_t = Y_t - T_t - S_t$
    * Subtract both the trend (T) and seasonal (S) components from the original data (Y).

#### Multiplicative Seasonal Decomposition 
- The multiplicative model can't handle zero or negative values because it involves multiplying the components (seasonal, trend, residual).
- Multiplying by zero or a negative number would distort the results, making the decomposition invalid.
- Best when seasonal variation increases with the trend.

1. Trend Calculation (T): 
    * $T_t = \text{Moving Average of the time series at time } t$
    * Smooth the data using a moving average to capture the long-term trend.
2. De-trended Data (D): 
    * $D_t = \frac{Y_t}{T_t}$
    * Divide the original data (Y) by the trend (T) to get the de-trended data.
3. Seasonal Component (S): 
    * $S_t = \frac{1}{N} \sum_{i=1}^{N} D_t$ for the same period (e.g., month)
    * Calculate the average of the de-trended data for each period, just like in additive decomposition.
4. Residual Component (R): 
    * $R_t = \frac{Y_t}{T_t \cdot S_t}$
    * Divide the original data (Y) by both the trend (T) and seasonal (S) components to get the residuals.
    

<small>

##### Exceptional Cases:

- **Outliers**: Both models are sensitive to extreme outliers.
- **Non-stationary Data**: Models may struggle with data having a changing trend.
- **Short Time Series**: Both methods may not perform well with very few data points.

</small>

![Additive decomposition & Multiplicative decomposition](https://cf3.ppt-online.org/files3/slide/k/KB2ir9by6pTG3LU78tZquAENfman0HPCdQOIwF/slide-4.jpg)


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose


# Additive Decomposition
additive_decomposition = seasonal_decompose(df['Volume'], model='additive', period=12)

# Plot Additive Decomposition
fig, axes = plt.subplots(4, 1, figsize=(10, 8))
additive_decomposition.observed.plot(ax=axes[0], title='Observed Data')
additive_decomposition.trend.plot(ax=axes[1], title='Trend Component')
additive_decomposition.seasonal.plot(ax=axes[2], title='Seasonal Component')
additive_decomposition.resid.plot(ax=axes[3], title='Residual Component')
plt.tight_layout()
plt.show()

# Multiplicative Decomposition
multiplicative_decomposition = seasonal_decompose(df['Volume'], model='multiplicative', period=12)

# Plot Multiplicative Decomposition
fig, axes = plt.subplots(4, 1, figsize=(10, 8))
multiplicative_decomposition.observed.plot(ax=axes[0], title='Observed Data')
multiplicative_decomposition.trend.plot(ax=axes[1], title='Trend Component')
multiplicative_decomposition.seasonal.plot(ax=axes[2], title='Seasonal Component')
multiplicative_decomposition.resid.plot(ax=axes[3], title='Residual Component')
plt.tight_layout()
plt.show()


<small>

## STL Decomposition Overview

- STL breaks a time series into **Trend**, **Seasonal**, and **Residual** components using the LOESS (Locally Estimated Scatterplot Smoothing) method.
- It is robust, flexible, and highly suitable for data with changing seasonality or outliers.
- STL decomposition is designed for additive data:

  \[
  Y_t = T_t + S_t + R_t
  \]

- For multiplicative data, apply a logarithmic transformation to convert it into an additive form:

  \[
  log(Y_t) = log(T_t) + log(S_t) + log(R_t)
  \]

  After STL decomposition, exponentiate the results to revert to the original scale.

---

## STL Decomposition Process

#### 1. **Seasonal Component (S)**:
- STL uses LOESS to estimate the seasonal component.
- The seasonal pattern is smoothed over a cycle (e.g., months, quarters).
- LOESS smoothing allows for changing seasonality over time, unlike in additive decomposition, where seasonality is constant.

  \[
  S_t = LOESS smoothing of the time series based on seasonality
  \]

#### 2. **De-trended Data (D)**:
- After removing the seasonal component, the data becomes de-seasonalized.
- The de-seasonalized data contains only the trend and residuals without any seasonal influence.

  \[
  D_t = Y_t - S_t
  \]

#### 3. **Trend Component (T)**:
- LOESS is applied again to the de-seasonalized data to estimate the Trend component.
- LOESS smoothing allows more flexibility, capturing local variations in the trend, which is more adaptive compared to simple moving averages in additive decomposition.

  \[
  T_t = LOESS smoothing of (D_t)
  \]

#### 4. **Residual Component (R)**:
- The residuals represent the noise or errors after extracting the trend and seasonal components.
- The residuals are calculated by subtracting both the trend and seasonal components from the original data.

  \[
  R_t = Y_t - T_t - S_t
  \]

---

## Iterative Refinement
- STL performs this decomposition process iteratively.
- The iterative process refines the seasonal and trend components, making STL more robust to outliers.
- This iterative process improves the accuracy of the decomposition by adjusting and removing outliers.

</small>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import STL


# STL Decomposition
stl = STL(df['Volume'], period=12,seasonal=13)  # 'seasonal' is the window size for seasonal component
stl_result = stl.fit()

# Plot STL Decomposition
fig, axes = plt.subplots(4, 1, figsize=(10, 8))
stl_result.observed.plot(ax=axes[0], title='Observed Data')
stl_result.trend.plot(ax=axes[1], title='Trend Component')
stl_result.seasonal.plot(ax=axes[2], title='Seasonal Component')
stl_result.resid.plot(ax=axes[3], title='Residual Component')
plt.tight_layout()
plt.show()


# Stationarity

1. **Why do we need stationarity**
2. **Types of Stationarity**  
   - Weak stationarity  
   - Strong stationarity  
3. **Testing for Weak Stationarity**
4. **Testing for Strict Stationarity**  
5. **Making Time Series Stationary**
   1. **Differencing**  
      - First order  
      - Second order  
   2. **Transformation**  
      - Logarithmic  
      - Power  
      - Box-Cox  
   3. **Detrending**  
      - Linear  
      - Moving Average  

6. **Seasonal Adjustment**

7. **Choosing the Right Method**


### Stationarity:
- In time series analysis, stationarity refers to a stochastic process whose statistical properties, including {mean, variance, autocorrelation} remain constant over time.
- A stochastic process is a mathematical concept describing a system that evolves over time in a random or unpredictable manner.
- Stationary series has no trends or seasonality, making it more predictable for analysis and modeling
   ![Stationarity](https://cdn-images-1.medium.com/max/1600/1*tkx0_wwQ2JT7pSlTeg4yzg.png)


1. Most statistical smoothing techniques and time series analysis techniques assume that the data is stationary.

2.  
   ### Strict Stationarity
   - The **joint probability distribution** of observations remains unchanged when shifted in time.
   - All statistical properties (**mean, variance, higher-order moments, skewness, kurtosis**) are constant over time.
   - It is a stricter condition, rarely observed in real-world data.

   ### Weak Stationarity (Wide-Sense Stationarity)
   - **Constant Mean:** The mean does not vary with time.
   - **Constant Variance:** The variance is fixed and does not change over time.
   - **Time-Invariant Autocovariance:** Autocovariance depends only on the lag, not the specific time points.

   **Note:** Weak stationarity is more practical and commonly used in time series analysis.

3. **Weak Stationarity Tests:**
   - Augmented Dickey-Fuller (ADF) Test
   - KPSS Test
   - Phillips-Perron (PP) Test
   - ACF & PACF Plots
   - Ljung-Box Test
   - Variance Ratio Test

4. **Strict (Strong) Stationarity Tests:**
   - Detrended Fluctuation Analysis (DFA)
   - Jarque-Bera Test
   - Kolmogorov-Smirnov (KS) Test
   - Cusum Test

5. **Making Time Series Stationary**

   a. **Differencing** makes a time series stationary by subtracting previous values from current ones.
      - **1st Order Differencing:** Subtracts value at time `t-1` from value at time `t` to remove linear trends.
      - **2nd Order Differencing:** Applies differencing again to remove quadratic trends or further non-stationarity.

      **Cons:**
      - Can lose long-term trends or relationships.
      - May not work well for high volatility or structural changes.

   b. **Transformations:**
      - **Logarithmic:** Applies the natural logarithm to the data to stabilize variance.
      - **Power:** Raises the data to a power (e.g., square root or cube) to stabilize variance.
      - **Box-Cox:**
         * The Box-Cox transformation is a family of power transformations used to stabilize variance.
         * It makes data more normally distributed by applying a parameterized function to positive continuous data.
           - \( Y(\lambda) = \frac{Y^{\lambda} - 1}{\lambda} \),  for \( \lambda \neq 0 \)
           - \( Y(0) = \ln(Y) \),  for \( \lambda = 0 \)
