IMPLEMENTATION
- LSTM/RNN AND THEN TRANSFORMERS INSTEAD RNN
- MODIFY THE GENERATOR AND INTRODUCE A WASSERSTEIN + GRADIENT PENALTY
- IMPLEMENTATION IN THE ORIGINAL TIME GAN CODE

In [1]:
#Import Data

#Import libraries
import scipy.io as sio
from sklearn.preprocessing import MinMaxScaler
import numpy as np
import glob


# Look for file signals_for_GAN_XX.mat ---
data_dir = r"C:\Users\Dario\Desktop\ThesiS JBP\Data"  # la r evita problemas con \
file_list = sorted(glob.glob(f"{data_dir}/signals_for_GAN_*.mat"))

print(f"Encontrados {len(file_list)} archivos:")
for f in file_list:
    print(f)

if len(file_list) == 0:
    raise FileNotFoundError("Not found file 'signals_for_GAN_*.mat'.")

# Concatenate Data vertically
# Before normalization we need to organize the global data
all_data_concat = []
for file in file_list:
    mat = sio.loadmat(file)
    data = mat['data_all']  # [N x 6]
    all_data_concat.append(data)

data_global = np.vstack(all_data_concat)
print(f"‚úÖ Concatenated data shape: {data_global.shape}")

#Check size length
shapes = [d.shape for d in all_data_concat]
print("Tama√±os por test:", shapes)
assert len(set([s[0] for s in shapes])) == 1, "‚ùå Different lengths in test"
print("‚úÖ All tests has same length")

from sklearn.preprocessing import MinMaxScaler
from lib.data_preprocess import load_data
# ------------------------------------------------------------
# üîÑ Create fixed-length sequences for TimeGAN
# ------------------------------------------------------------
seq_len = 256  # HERE MODIFY ACCORDINT TO THE PAPER SELECTED AS REFERENCE
num_seq = data_global.shape[0] // seq_len

# cortar para que sea m√∫ltiplo exacto
data_global = data_global[:num_seq * seq_len]

# crear las secuencias 3D (num_seq, seq_len, features)
data_sequences = data_global.reshape(num_seq, seq_len, -1).astype(np.float32)

print(f"‚úÖ Created {len(data_sequences)} sequences (each shape = {data_sequences[0].shape})")

# üíæ save for next runs
np.save("data_sequences.npy", data_sequences)
print("üíæ Saved 'data_sequences.npy' for next runs.")

def safe_generation(model, num_samples, batch_size=64):
    """
    Genera datos sint√©ticos en bloques peque√±os para evitar errores de √≠ndice o memoria.
    Compatible con el c√≥digo original de TimeGAN sin modificarlo.
    """
    generated_all = []
    n_batches = (num_samples + batch_size - 1) // batch_size

    print(f"üß© Generando {num_samples} muestras en {n_batches} mini-batches de {batch_size}...")
    for b in range(n_batches):
        start = b * batch_size
        end = min((b + 1) * batch_size, num_samples)
        current_n = end - start

        # usar longitudes de ventana reales solo para este bloque
        model.T = model.ori_time[start:end]

        # generar bloque actual (usa el m√©todo original del modelo)
        gen_batch = model.generation(num_samples=current_n)
        generated_all.extend(gen_batch)

        print(f"  ‚úÖ Bloque {b+1}/{n_batches} generado ({current_n} muestras)")

    return generated_all





Encontrados 1 archivos:
C:\Users\Dario\Desktop\ThesiS JBP\Data\signals_for_GAN_01.mat
‚úÖ Concatenated data shape: (6144001, 6)
Tama√±os por test: [(6144001, 6)]
‚úÖ All tests has same length
‚úÖ Created 24000 sequences (each shape = (256, 6))
üíæ Saved 'data_sequences.npy' for next runs.


In [None]:

# =============================================
#  Full TimeGAN Training (All Phases)
# =============================================

import sys
sys.path.append('./')

import torch
import numpy as np
from options_TGAN import Options
from lib.TimeGAN import TimeGAN  # üëà make sure you import the right class
from generation_TGAN import safe_generation
from visualization_TGAN import visualization

# ------------------------------------------------------------
# 1Ô∏è‚É£ Load your preprocessed rotor data
# ------------------------------------------------------------

data_sequences = np.load("data_sequences.npy", allow_pickle=True)
print(f"‚úÖ Loaded {len(data_sequences)} sequences (each shape = {data_sequences[0].shape})")

opt_parser = Options()
opt = opt_parser.parser.parse_args(args=[])
# ------------------------------------------------------------
# 3Ô∏è‚É£ Initialize and train the full TimeGAN
# ------------------------------------------------------------
print("\nüß© Initializing TimeGAN model...")
model = TimeGAN(opt, ori_data=data_sequences)
print("‚úÖ Model initialized successfully.\n")

print("üöÄ Starting FULL TimeGAN training (all phases)...\n")
model.train()  # üëà this automatically runs ER ‚Üí S ‚Üí Joint (G+D)
print("\n‚úÖ Full TimeGAN training completed successfully.")

# ------------------------------------------------------------
# 4Ô∏è‚É£ Generate synthetic sequences
# ------------------------------------------------------------
print("\nüé® Generating synthetic sequences...")
generated_data = safe_generation(model, num_samples=2000, batch_size=64)
print(f"‚úÖ Generated {len(generated_data)} synthetic sequences.")

# ------------------------------------------------------------
# 5Ô∏è‚É£ Visualization (PCA + t-SNE)
# ------------------------------------------------------------
print("\nüìä Visualizing results (PCA & t-SNE)...")
visualization(data_sequences[:1000], generated_data[:1000], analysis='pca')
visualization(data_sequences[:1000], generated_data[:1000], analysis='tsne')

print("\nüéâ All phases completed successfully!")



In [None]:
# Modification 1 : 
# Discriminator - Change the discriminator to WGAN-GP, delete Spectral Norm, no sigmoid  and add layernorm. 
# We did this in ordert to avoid saturation and collapse in discriminator. Normal TIMEGAN uses sigmoid + BCE  to classify real/fake, is classifficator not critic regression.
# We delete the sigmoid to get real scores, dsicriminatori not classify now estiamte Wasserstein distance, delete spectral norm, and add layernorm to give stability.
# We did this change because BCE tends to collapse, WGAN-GP produces smooth gradients, more stable. Is ideal for vibrations .

# Modification 2: 
# We dont include LSTM , instead we add LayerNorm to all sub networks (5 networks also modified its forward functions accordign to this) in order to reduce the instability and collapse, specially in signals viration.
#TimeGAN only has GRU raw, without normalziation, so high variation of activations , noise + lenght seq generates collapse and inestbaility.
# So we add LayerNorm in each GRU Encoder, Recovery, Generator, Supervisor, Discriminator. Layernorm stabilize each step of time, GRU becomes more stable wiht real noise, training smooth. Improve convergence.
# 
# 
#  Modification 3: 
# Add Gradient Penalty TimeGAN, and reeplce all the backward_d of discriminator to WGAN-GP. 
# Here th timegan original used the BCE which measures probabilities true or false, and have gradients 0 o 1. So we use WGAN, , and by force GRADIENT PENALTY.
# WGAN measures the real distance between distributions, not probailities. Avoid collapse, is more stable for continuous signals, and and produce smooth training.

# Modification 4:
# Trainning Loop WGAN, in TIMEGAN it train 1:1 G y D, is bad for WGAN generates poor gradients. New modification in training loop we train the critic 5 times more than generator.
# This is better because , critic is strong, gradients are high quality when send to generator. Convergence is stable.


# Modification 5 : change the real data loading indata pre processing, now it overlap 75% of information. As vibration signals change fast, is not o stable. Considering overlaping 75% we are sure will take all infroamtion, noramlly papers consider 50,75 and 90%, in order to make 
# the model learn correctly. for example if u have 1000 points a windows of 200, and failure occurs between 350-400, u will have small infromation o nly one window, but if u apply 75% , u will have 6-7 windows. so will have more relvant infroamtion. So this help to catch transitions better.BaseExceptionimprove GAN trainning , increase the number of windows 
# and dont loose important parts per window.
# Original timegan dont use overlapping, so have low windows, less transition, so bad representation of virbation. Now we use overlapping of 75%, so generate 4 times more windows, captures more information .
# so more infromatio is better cause enrich timegan.