<a href="https://colab.research.google.com/github/Ishu2311/prognosAI_project/blob/main/rolling_window_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Rolling Window Sequence
A `rolling window sequence` in time series is a method where a fixed-size "window" or subset of consecutive time steps moves ("rolls") sequentially over the data. At each position, the window captures a segment of the data (for example, the last 30 cycles of sensor readings), which can then be used as input for models or calculations. The window shifts forward by one or more time steps, always covering the same number of points, allowing for dynamic analysis that reflects recent context while preserving temporal ordering.

- Why do we generate rolling window sequences?

  - This step is essential for time-series modeling techniques (like LSTMs or GRUs) that require input data shaped as sequences of fixed length rather than individual time points.

  - Rolling windows create these context-rich, fixed-size sequences from the continuous stream of data for each engine, capturing temporal dependencies and trends.

  - It allows models to learn from patterns that span multiple cycles, rather than isolated measurements.

  - Even after earlier steps that compute rolling statistics or aggregates, rolling window sequence generation formats the data structurally for model training.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [8]:
# Import necessary modules
import pandas as pd
import numpy as np

# Load the dataset
df = pd.read_csv('/content/drive/MyDrive/Dataset of ai/cmapss_feature_engineered_FD0010.csv')
df.head()

Unnamed: 0,engine_id,cycle,op_setting_1,op_setting_2,op_settings_3,sensor_1,sensor_2,sensor_3,sensor_4,sensor_5,...,sensor_17_rollmean5,sensor_17_rollstd5,sensor_18_rollmean5,sensor_18_rollstd5,sensor_19_rollmean5,sensor_19_rollstd5,sensor_20_rollmean5,sensor_20_rollstd5,sensor_21_rollmean5,sensor_21_rollstd5
0,1,1,-0.0007,-0.0004,100.0,518.67,641.82,1589.7,1400.6,14.62,...,392.0,,2388.0,,100.0,,39.06,,23.419,
1,1,2,0.0019,-0.0003,100.0,518.67,642.15,1591.82,1403.14,14.62,...,392.0,0.0,2388.0,0.0,100.0,0.0,39.03,0.042426,23.4213,0.003253
2,1,3,-0.0043,0.0003,100.0,518.67,642.35,1587.99,1404.2,14.62,...,391.333333,1.154701,2388.0,0.0,100.0,0.0,39.003333,0.055076,23.3956,0.044573
3,1,4,0.0007,0.0,100.0,518.67,642.35,1582.79,1401.87,14.62,...,391.5,1.0,2388.0,0.0,100.0,0.0,38.9725,0.076322,23.390175,0.037977
4,1,5,-0.0019,-0.0002,100.0,518.67,642.37,1582.85,1406.22,14.62,...,391.8,1.095445,2388.0,0.0,100.0,0.0,38.958,0.073621,23.39302,0.033498


In [9]:
# Columns to use as features (exclude id and cycle, plus target if any)
exclude_cols = ['engine_id', 'cycle']
feature_cols = [col for col in df.columns if col not in exclude_cols]

# Sort data by engine_id and cycle to ensure correct temporal order
df = df.sort_values(['engine_id', 'cycle']).reset_index(drop=True)


In [10]:
def generate_rolling_windows(data,  engine_col, features, window_size=30):
  sequences = []
  engine_ids = []
  cycle_ids = []

  for engine in data[engine_col].unique():
    engine_data = data[data[engine_col] == engine]
    engine_features = engine_data[features].values

    #Generating sequences with rolling window
    for i in range(window_size-1, len(engine_data)):
      seq =  engine_features[i - window_size+1 : i+1]
      sequences.append(seq)
      engine_ids.append(engine)
      cycle_ids.append(engine_data.iloc[i]['cycle'])

  # Convert to array for modeling
  sequence = np.array(sequences)
  return sequence, engine_ids, cycle_ids


In [11]:
window_size = 30  # Typical rolling window length; adjust as needed
sequences, engine_ids, cycle_ids = generate_rolling_windows(df, 'engine_id', feature_cols, window_size)

print("Shape of rolling window sequences:", sequences.shape)  # (num_sequences, window_size, num_features)
print("Example sequence shape:", sequences[0].shape)


Shape of rolling window sequences: (17731, 30, 66)
Example sequence shape: (30, 66)


In [12]:
# Print the first sequence info
print(f"Engine ID: {engine_ids[0]}, Cycle: {cycle_ids[0]}")
print("Sequence data for first time window (shape {}):".format(sequences[0].shape))
print(sequences[0])


Engine ID: 1, Cycle: 30.0
Sequence data for first time window (shape (30, 66)):
[[-7.00000000e-04 -4.00000000e-04  1.00000000e+02 ...             nan
   2.34190000e+01             nan]
 [ 1.90000000e-03 -3.00000000e-04  1.00000000e+02 ...  4.24264069e-02
   2.34213000e+01  3.25269119e-03]
 [-4.30000000e-03  3.00000000e-04  1.00000000e+02 ...  5.50757055e-02
   2.33956000e+01  4.45730860e-02]
 ...
 [-2.40000000e-03  5.00000000e-04  1.00000000e+02 ...  6.14003257e-02
   2.33889200e+01  6.52350136e-02]
 [ 1.20000000e-03 -1.00000000e-04  1.00000000e+02 ...  5.84807661e-02
   2.33869400e+01  6.61123513e-02]
 [-2.20000000e-03  0.00000000e+00  1.00000000e+02 ...  7.42967025e-02
   2.33833800e+01  6.37311305e-02]]


In [13]:
# Check that sequence length matches window size
assert sequences.shape[1] == window_size, "Sequence window length mismatch"

# Check that sequences are ordered by cycle (manual inspection example)
assert all(cycle_ids[i] > cycle_ids[i-1] or engine_ids[i] != engine_ids[i-1] for i in range(1, len(cycle_ids))), "Cycle order violation"

print("Basic validation checks passed.")


Basic validation checks passed.


In [15]:
# Save sequences and metadata for modeling
np.save('/content/drive/MyDrive/Dataset of ai/rolling_window_sequnces.npy', sequences)
pd.DataFrame({'engine_id': engine_ids, 'cycle': cycle_ids}).to_csv('/content/drive/MyDrive/Dataset of ai/sequence_metadata.csv', index=False)