# AR Simulation

#### Objective
- Forecast the next k time steps, repeated 10 times, using the testing data.
- Compare the performance between using normalization and not using normalization
 
## Simulation Data

A simulated time series data $\mathbf{X} = (\mathbf{X}_1, \dots, \mathbf{X}_n)^\top \in \mathbb{R}^{n \times L}$ with $\mathbf{X}_i = (X_{i1}, \dots, X_{iL})^\top$ and

$$
	X_{it} = \mu^0 + \phi X_{i(t-1)} + \epsilon_{it}	
$$

where
- $i = 1, \dots, n$
- $t = 1, \dots, L$
- $\epsilon_{it} \sim \mathcal{N}(0, \sigma^2)$ is independent of $X_{i(t-1)}$ for all $i, t$
- $\mathbf{X}_i$ and $\mathbf{X}_j$ are independent for all $i \neq j$

## Simulation Settings

- Number of series: $n = 1024$
    - 80% for training data
    - 20% for testing data
- Length of each series: $L = 200$
- Intercept: $\mu^0 = 10$
- Coefficient: $\phi = 0.8$
- Variance of error term: $\sigma^2 = 1$
- Data splitting:
    - Training data: $\mathbf{X}_{\text{train}}$
        - $\mathbf{X}_{\text{train}} = \{\mathbf{X}_i\}_{i \in \mathcal{I}(\text{train})}$
    - Testing data: $\mathbf{X}_{\text{test}}$
        - $\mathbf{X}_{\text{test}} = \{\mathbf{X}_i\}_{i \in \mathcal{I}(\text{test})}$

- Diffusion sample size: 50
- k: 24
### Normalization for Data Preprocessing

The steps for normalization are as follows:

1. **Compute the Mean and Standard Deviation:**
    - Calculate the global mean ($\mu_g$) and global standard deviation ($\sigma_g$) across all data points in the training set:
  
    $$
    \mu_g = \frac{1}{n_{\text{train}} \cdot L} \sum_{i \in \mathcal{I}(\text{train})} \sum_{t=1}^{L} X_{it}
    $$
    $$
    \sigma_g = \sqrt{\frac{1}{n_{\text{train}} \cdot L} \sum_{i \in \mathcal{I}(\text{train})} \sum_{t=1}^{L} (X_{it} - \mu_g)^2}
    $$

2. **Normalize the Data:**
    - Subtract the global mean from each data point and divide by the global standard deviation:

    $$
    X'_{it} = \frac{X_{it} - \mu_g}{\sigma_g}
    $$

### Metrics
Assume $X_j$  is the actual value and $\hat{X}_j$  is the predicted value, we have following two metrics
- Mean Squared Error (MSE)

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (X_j - \hat{X}_j)^2$$

- Mean Absolute Percentage Error (MAPE)

$$
\text{MAPE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{X_i - \hat{X}_i}{ X_i} \right |
$$
    


## Results

##### $N(0, 1)$

###### Overall Pattern
::::{grid}
::: {grid-item-card}
```{figure} artifact/ar1/normal_0_1_forecast.png
---
width: 100%
name: fig-forecast-0-1
---
Comparison of mean trends for the true series and generated series over the entire time series.
```
:::
::: {grid-item-card}
```{figure} artifact/ar1/global_normalization_normal_0_1_forecast.png
---
width: 100%
name: fig-global-forecast-0-1
---
Comparison of mean trends for the true series and generated series over the entire time series.
```
::::
###### Last Missing Period


| Prediction Type                | Non-Normalization | Normalization |
|--------------------------------|-------------------|---------------|
| Mean Squared Error             | 0.568274          | 2.988      |
| Mean Absolute Percentage Error | 1.094803          | 2.157      |

::::{grid}
::: {grid-item-card}
```{figure} artifact/ar1/normal_0_1_forecast_only.png
---
width: 400px
name: fig-forecast-only-0-1
---
Forecasted mean trend with confidence bands representing the standard deviation.
```
:::
::: {grid-item-card}
```{figure} artifact/ar1/global_normalization_normal_0_1_forecast_only.png
---
width: 400px
name: fig-global-forecast-only-0-1
---
Forecasted mean trend with confidence bands representing the standard deviation.
```
::::

##### $N(0, 100)$
###### Overall Pattern
::::{grid}
::: {grid-item-card}
```{figure} artifact/ar1/normal_0_10_forecast.png
---
width: 400px
name: fig-forecast
---
Comparison of mean trends for the true series and generated series over the entire time series.
```
:::
::: {grid-item-card}
```{figure} artifact/ar1/global_normalization_normal_0_10_forecast.png
---
width: 400px
name: fig-forecast
---
Comparison of mean trends for the true series and generated series over the entire time series.
```
::::
###### Last Missing Period


| Prediction Type                | Non-Normalization | Normalization |
|--------------------------------|-------------------|---------------|
| Mean Squared Error             | 269.326         | 2.973090  |
| Mean Absolute Percentage Error | 5.730          |0.013571 |

::::{grid}
::: {grid-item-card}
```{figure} artifact/ar1/normal_0_10_forecast_only.png
---
width: 400px
name: fig-forecast-only
---
Forecasted mean trend with confidence bands representing the standard deviation.
```
:::

::: {grid-item-card}
```{figure} artifact/ar1/global_normalization_normal_0_10_forecast_only.png
---
width: 400px
name: fig-forecast-only
---
Forecasted mean trend with confidence bands representing the standard deviation.
```
::::


##### $N(100, 1)$
###### Overall Pattern
::::{grid}
::: {grid-item-card}
```{figure} artifact/ar1/normal_100_1_forecast.png
---
width: 400px
name: fig-forecast
---
Comparison of mean trends for the true series and generated series over the entire time series.
```
:::
::: {grid-item-card}
```{figure} artifact/ar1/global_normalization_normal_100_1_forecast.png
---
width: 400px
name: fig-forecast
---
Comparison of mean trends for the true series and generated series over the entire time series.
```
::::
###### Last Missing Period


| Prediction Type                | Non-Normalization | Normalization |
|--------------------------------|-------------------|---------------|
| Mean Squared Error             | 9017.191       | 2.973      |
| Mean Absolute Percentage Error | 0.950          | 0.014      |
::::{grid}
::: {grid-item-card}
```{figure} artifact/ar1/normal_100_1_forecast_only.png
---
width: 400px
name: fig-forecast-only
---
Forecasted mean trend with confidence bands representing the standard deviation.
```
:::
::: {grid-item-card}
```{figure} artifact/ar1/global_normalization_normal_100_1_forecast_only.png
---
width: 400px
name: fig-forecast-only
---
Forecasted mean trend with confidence bands representing the standard deviation.
```
::::