The code provided defines a Python function `apply_windowing` that takes as input a numpy array `X`, representing a dataset, and performs a windowing operation to prepare the data for machine learning, specifically for sequence prediction tasks.

Here's a detailed explanation of the code:

1. Importing Libraries:
   - The code begins by importing the necessary libraries: `pandas` and `numpy`. These libraries are commonly used for data manipulation and array operations.

2. Function Definition:
   - The `apply_windowing` function is defined with the following parameters:
     - `X`: A numpy array representing the dataset, where rows represent samples, and columns represent features.
     - `initial_time_step`: The starting time step for the windowing operation.
     - `max_time_step`: The maximum time step to consider in the windowing operation.
     - `window_size`: The size of the rolling window.
     - `target_idx`: The index of the target variable in the dataset.

3. Input Data Validation:
   - The function starts by performing various input data validation checks using `assert` statements. It checks if the `target_idx` is within the valid column index range and if the time steps are non-negative and within a valid range.

4. Window Creation:
   - The code calculates the indices for creating sub-windows within the dataset. It creates an array `sub_windows` that represents the indices of the rolling windows. It uses the `numpy` functions `expand_dims` and `arange` to generate these indices. The `sub_windows` array is a 2D array where each row corresponds to the indices of a rolling window.

5. Data Slicing:
   - The function then uses the calculated `sub_windows` array to slice the original dataset `X` and create `X_temp`, which contains the rolling windows of the data.
   - It also creates `y_temp`, which represents the target variable for the rolling windows. `y_temp` is obtained by slicing the target variable from `X`, considering the corresponding time steps.

6. Handling Missing Values:
   - The code identifies and filters out rows in `y_temp` where the target values are missing (NaN). This is done by creating an index array `idx_y_train_not_nan` containing the valid target indices.

7. Handling NaN Values in Features:
   - The code identifies and records indices where there are NaN values in the rolling windows of features. The indices of NaN values in the rolling windows are stored in the `x_train_is_nan_idx` array.

8. Return Values:
   - Finally, the function returns two values:
     - `X_temp`: The rolling windows of feature data.
     - `y_temp`: The corresponding target values for the rolling windows.

This function is designed to facilitate the preparation of data for sequence prediction tasks where rolling windows of data are used, and it handles the extraction of feature windows and corresponding target values.

In [15]:
import pandas as pd
import numpy as np

def apply_windowing(X,
                    initial_time_step,
                    max_time_step,
                    window_size,
                    target_idx):

    assert target_idx >= 0 and target_idx < X.shape[1]
    assert initial_time_step >= 0
    assert max_time_step >= initial_time_step

    start = initial_time_step

    sub_windows = (
        start +
        np.expand_dims(np.arange(window_size), 0) +
        np.expand_dims(np.arange(max_time_step + 1), 0).T
    )

    X_temp, y_temp = X[sub_windows], X[window_size:(
        max_time_step+window_size+1):1, target_idx]

    idx_y_train_not_nan = np.where(~np.isnan(y_temp))[0]
    assert len(idx_y_train_not_nan) == len(y_temp)

    x_train_is_nan_idx = np.unique(np.where(np.isnan(X_temp)))

    return X_temp, y_temp

In [17]:
import pandas as pd
import numpy as np

# Sample multivariate time series data in a DataFrame
data = pd.DataFrame({
    'feature1': [1, 2, 3, 4, 5, 6, 7, 8, 9],
    'feature2': [10, 20, 30, 40, 50, 60, 70, 80, 90]
}, index=pd.date_range(start='2023-01-01', periods=9, freq='D'))

print(data)

            feature1  feature2
2023-01-01         1        10
2023-01-02         2        20
2023-01-03         3        30
2023-01-04         4        40
2023-01-05         5        50
2023-01-06         6        60
2023-01-07         7        70
2023-01-08         8        80
2023-01-09         9        90


In [18]:
window_size = 3
apply_windowing(
    X = data.to_numpy(), 
    initial_time_step = 0, 
    max_time_step = len(data) - window_size - 1, 
    window_size = window_size,
    target_idx = 1)

(array([[[ 1, 10],
         [ 2, 20],
         [ 3, 30]],
 
        [[ 2, 20],
         [ 3, 30],
         [ 4, 40]],
 
        [[ 3, 30],
         [ 4, 40],
         [ 5, 50]],
 
        [[ 4, 40],
         [ 5, 50],
         [ 6, 60]],
 
        [[ 5, 50],
         [ 6, 60],
         [ 7, 70]],
 
        [[ 6, 60],
         [ 7, 70],
         [ 8, 80]]]),
 array([40, 50, 60, 70, 80, 90]))