# How to prepare time series data for machine larning by creating a window dataset

- Creating a window dataset is a common technique for preparing time series data for machine larning. It inveolves splitting the data into overlapping segments (or windows) that capture the temporal relationships within the data. 

## What is a window dataset

- A window dataset breaks a time series into input-output pairs by defining a fixed "window size". The window includes several consecutive time steps of the series as inputs, and optionally the next step as outputs. This makes the data suitable for machine learning algorithms that typically expect fixed-size inputs. 

## Steps to prepare a window dataset

1. Define the window size
- Decide how many consecutive time steps will be included in each input window. For example a window size of 3 means each input will consist of 3 time steps

2. Slide the window across the seris
- Use a sliding window approach to generate overlapping input-output pairs:
- Inputs: The window of time step
- Outputs: The value you want to predict (Usually the next time step)

3. Hanle Train-Test split:
- Split the dataset into training, validation, and test sets. Ensure no data leakage.

4. Feature scaling(Optional)
- Normalize or standardize the data to improve model performance.

5. Convert to arrays
- Machine learning models typicallly expect data in a structure format like NumPy arrays, Tensorflow datasets or Pytorch tensor. 

6. Data augmentation(Optional)
- Augment the data by creating additional features such as 
- Lagged values (e.g, values from t-1, t-2)
- Moving averages or rolling statistics

7. Reshape for the model(Optional)
- For certain models (e.g recurrent neural networks), the input may need reshaping to include dimentions for samples, time steps and features.

In [1]:
import numpy as np

# Sample time series data
time_series = np.array([1, 2, 3, 4, 5, 6])

# Function to create windows
def create_window_dataset(data, window_size):
    X, y = [], []
    for i in range(len(data) - window_size):
        X.append(data[i:i + window_size])  # Input window
        y.append(data[i + window_size])   # Next value (output)
    return np.array(X), np.array(y)

# Define window size
window_size = 3

# Create the window dataset
X, y = create_window_dataset(time_series, window_size)

# Output
print("Inputs (X):", X)
print("Outputs (y):", y)


Inputs (X): [[1 2 3]
 [2 3 4]
 [3 4 5]]
Outputs (y): [4 5 6]
