# Generov√°n√≠ Syntetick√Ωch Dat (Sim2Real Strategy)

Tento notebook slou≈æ√≠ k vygenerov√°n√≠ tr√©novac√≠ a validaƒçn√≠ sady pro KalmanNet/KalmanFormer.
Data jsou generov√°na na z√°kladƒõ statistiky z re√°ln√© trajektorie (`data.mat`), aby se model uƒçil na datech co nejpodobnƒõj≈°√≠ch realitƒõ.

**Kl√≠ƒçov√© vlastnosti:**
- Resampling rychlost√≠ a √∫hlov√Ωch rychlost√≠ z re√°ln√Ωch dat (zachov√°n√≠ korelace).
- Vyhlazov√°n√≠ trajektori√≠ (Setrvaƒçnost).
- Validace pomoc√≠ UKF (Odfiltrov√°n√≠ fyzik√°lnƒõ nesmysln√Ωch nebo pro filtr p≈ô√≠li≈° n√°roƒçn√Ωch trajektori√≠).
- Generov√°n√≠ dat s biasem a ≈°umem, ale validace na datech bez biasu.

In [None]:
from pathlib import Path
from scipy.io import loadmat
import sys
import os

# Robust path finding for data.mat
current_path = Path.cwd()
possible_data_paths = [
    current_path / 'data' / 'data.mat',
    current_path.parent / 'data' / 'data.mat',
    current_path.parent.parent / 'data' / 'data.mat',
    # Fallback absolute path
    Path('/home/luky/skola/KalmanNet-main/data/data.mat')
]

dataset_path = None
for p in possible_data_paths:
    if p.exists():
        dataset_path = p
        break

if dataset_path is None or not dataset_path.exists():
    print("Warning: data.mat not found automatically.")
    dataset_path = Path('data/data.mat')

print(f"Dataset path: {dataset_path}")

# Add project root to sys.path (2 levels up from debug/test)
notebook_dir = os.getcwd()
project_root = os.path.abspath(os.path.join(notebook_dir, '..', '..'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)
print(f"Project root added: {project_root}")
import numpy as np
import torch
import matplotlib.pyplot as plt
from scipy.interpolate import RegularGridInterpolator
import random
from tqdm import tqdm
import shutil
import Filters
from Systems import DynamicSystemTAN
import torch.nn.functional as func


Dataset path: /home/luky/skola/KalmanNet-for-state-estimation/data/data.mat
Project root added: /home/luky/skola/KalmanNet-for-state-estimation


In [2]:
# Nastaven√≠ seed≈Ø pro reprodukovatelnost
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(42)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Device: {device}")

Device: cpu


  return torch._C._cuda_getDeviceCount() > 0


In [3]:
# Naƒçten√≠ dat a mapy
mat_data = loadmat(dataset_path)

souradniceX_mapa = mat_data['souradniceX']
souradniceY_mapa = mat_data['souradniceY']
souradniceZ_mapa = mat_data['souradniceZ']
souradniceGNSS = mat_data['souradniceGNSS'] 
x_axis_unique = souradniceX_mapa[0, :]
y_axis_unique = souradniceY_mapa[:, 0]

print(f"Map X: {x_axis_unique.shape}, Y: {y_axis_unique.shape}, Z: {souradniceZ_mapa.shape}")

# Interpol√°tor (pro pou≈æit√≠ mimo torch, pokud pot≈ôeba)
terMap_interpolator = RegularGridInterpolator(
    (y_axis_unique, x_axis_unique),
    souradniceZ_mapa,
    bounds_error=False, 
    fill_value=np.nan
)

Map X: (2500,), Y: (2500,), Z: (2500, 2500)


In [4]:
# Definice System Modelu
state_dim = 4
obs_dim = 3
dT = 1
q = 1

F = torch.tensor([[1.0, 0.0, dT, 0.0],
                   [0.0, 1.0, 0.0, dT],
                   [0.0, 0.0, 1.0, 0.0],
                   [0.0, 0.0, 0.0, 1.0]])

Q = q* torch.tensor([[dT**3/3, 0.0, dT**2/2, 0.0],
                   [0.0, dT**3/3, 0.0, dT**2/2],
                   [dT**2/2, 0.0, dT, 0.0],
                   [0.0, dT**2/2, 0.0, dT]])
R = torch.tensor([[3.0**2, 0.0, 0.0],
                   [0.0, 1.0**2, 0.0],
                   [0.0, 0.0, 1.0**2]])

initial_velocity = torch.from_numpy(np.array([0,0]))
initial_position = torch.from_numpy(souradniceGNSS[:2, 0])
x_0 = torch.cat([
    initial_position,
    initial_velocity
]).float()

P_0 = torch.tensor([[25.0, 0.0, 0.0, 0.0],
                    [0.0, 25.0, 0.0, 0.0],
                    [0.0, 0.0, 0.5, 0.0],
                    [0.0, 0.0, 0.0, 0.5]])

# Diferencovateln√° funkce h(x) pro ter√©n
def h_nl_differentiable(x: torch.Tensor, map_tensor, x_min, x_max, y_min, y_max) -> torch.Tensor:
    batch_size = x.shape[0]

    px = x[:, 0]
    py = x[:, 1]

    px_norm = 2.0 * (px - x_min) / (x_max - x_min) - 1.0
    py_norm = 2.0 * (py - y_min) / (y_max - y_min) - 1.0

    sampling_grid = torch.stack((px_norm, py_norm), dim=1).view(batch_size, 1, 1, 2)

    vyska_terenu_batch = func.grid_sample(
        map_tensor.expand(batch_size, -1, -1, -1),
        sampling_grid, 
        mode='bilinear', 
        padding_mode='border',
        align_corners=True
    )

    vyska_terenu = vyska_terenu_batch.view(batch_size)

    eps = 1e-12
    vx_w, vy_w = x[:, 2], x[:, 3]
    norm_v_w = torch.sqrt(vx_w**2 + vy_w**2).clamp(min=eps)
    cos_psi = vx_w / norm_v_w
    sin_psi = vy_w / norm_v_w

    vx_b = cos_psi * vx_w - sin_psi * vy_w 
    vy_b = sin_psi * vx_w + cos_psi * vy_w

    result = torch.stack([vyska_terenu, vx_b, vy_b], dim=1)
    return result

terMap_tensor = torch.from_numpy(souradniceZ_mapa).float().unsqueeze(0).unsqueeze(0).to(device)
x_min, x_max = x_axis_unique.min(), x_axis_unique.max()
y_min, y_max = y_axis_unique.min(), y_axis_unique.max()

h_wrapper = lambda x: h_nl_differentiable(
    x, 
    map_tensor=terMap_tensor, 
    x_min=x_min, 
    x_max=x_max, 
    y_min=y_min, 
    y_max=y_max
)

system_model = DynamicSystemTAN(
    state_dim=state_dim,
    obs_dim=obs_dim,
    Q=Q.float(),
    R=R.float(),
    Ex0=x_0.float(),
    P0=P_0.float(),
    F=F.float(),
    h=h_wrapper,
    x_axis_unique=x_axis_unique, 
    y_axis_unique=y_axis_unique,
    device=device
)

print("System Model Initialized")

INFO: DynamicSystemTAN inicializov√°n s hranicemi mapy:
  X: [1476611.42, 1489541.47]
  Y: [6384032.63, 6400441.34]
System Model Initialized


In [5]:
# === 1. EXTRAKCE STATISTIKY Z RE√ÅLN√â TRAJEKTORIE ===
def extract_driving_stats(real_traj_tensor):
    positions = real_traj_tensor[:, :2]
    deltas = positions[1:] - positions[:-1]
    speeds = torch.norm(deltas, dim=1)
    headings = torch.atan2(deltas[:, 1], deltas[:, 0])
    yaw_rates = headings[1:] - headings[:-1]
    # O≈°et≈ôen√≠ p≈ôechodu -pi/pi
    yaw_rates = (yaw_rates + np.pi) % (2 * np.pi) - np.pi
    speeds = speeds[:-1]
    
    # Filtrujeme jen rozumn√© rychlosti (nap≈ô. kdy≈æ auto stoj√≠, yaw_rate je ≈°um)
    valid_mask = (torch.isfinite(speeds) & torch.isfinite(yaw_rates) & (speeds > 0.5))
    clean_speeds = speeds[valid_mask]
    clean_yaw_rates = yaw_rates[valid_mask]
    
    print(f"‚úÖ Statistika: {len(clean_speeds)} vzork≈Ø. AvgSpeed: {clean_speeds.mean():.2f} m/s")
    return clean_speeds, clean_yaw_rates

# P≈ô√≠prava re√°ln√Ωch dat
real_traj_np = souradniceGNSS[:2, :].T 
real_traj_tensor = torch.from_numpy(real_traj_np).float().to(device)
real_speeds, real_yaws = extract_driving_stats(real_traj_tensor)

‚úÖ Statistika: 1263 vzork≈Ø. AvgSpeed: 26.76 m/s


In [6]:
# === 2. VEKTORIZOVAN√ù GENER√ÅTOR TRAJEKTORI√ç ===
def simulate_batch_ackermann(system, batch_size, seq_len, stats_speeds, stats_yaws, speed_scale=0.99):
    device = system.Ex0.device
    dt = 1.0
    
    margin = 150.0
    x_min, x_max = system.min_x + margin, system.max_x - margin
    y_min, y_max = system.min_y + margin, system.max_y - margin
    
    start_x = (torch.rand(batch_size, 1, device=device) * (x_max - x_min)) + x_min
    start_y = (torch.rand(batch_size, 1, device=device) * (y_max - y_min)) + y_min
    start_psi = (torch.rand(batch_size, 1, device=device) * 2 * np.pi) - np.pi
    
    # Resampling z re√°ln√Ωch statistik
    # Pou≈æ√≠v√°me stejn√Ω index pro v i omega -> zachov√°n√≠ korelace
    rand_idx = torch.randint(0, len(stats_speeds), (batch_size, seq_len), device=device)
    chosen_v = stats_speeds[rand_idx] * speed_scale
    chosen_omega = stats_yaws[rand_idx]
    
    # Vyhlazov√°n√≠ (Simulace setrvaƒçnosti)
    alpha_v = 0.3      
    alpha_omega = 0.4  
    
    smooth_v = torch.zeros_like(chosen_v)
    smooth_omega = torch.zeros_like(chosen_omega)
    
    curr_v = chosen_v[:, 0]
    curr_omega = chosen_omega[:, 0]
    
    for t in range(seq_len):
        curr_v = (1 - alpha_v) * curr_v + alpha_v * chosen_v[:, t]
        curr_omega = (1 - alpha_omega) * curr_omega + alpha_omega * chosen_omega[:, t]
        smooth_v[:, t] = curr_v
        smooth_omega[:, t] = curr_omega
        
    traj_x = []
    traj_y = []
    traj_vx = []
    traj_vy = []
    
    curr_x, curr_y, curr_psi = start_x.squeeze(), start_y.squeeze(), start_psi.squeeze()
    
    for t in range(seq_len):
        v = smooth_v[:, t]
        omega = smooth_omega[:, t]
        vx = v * torch.cos(curr_psi)
        vy = v * torch.sin(curr_psi)
        
        traj_x.append(curr_x)
        traj_y.append(curr_y)
        traj_vx.append(vx)
        traj_vy.append(vy)
        
        curr_x = curr_x + vx * dt
        curr_y = curr_y + vy * dt
        curr_psi = curr_psi + omega * dt
        
    X = torch.stack(traj_x, dim=1)
    Y = torch.stack(traj_y, dim=1)
    VX = torch.stack(traj_vx, dim=1)
    VY = torch.stack(traj_vy, dim=1)
    
    return torch.stack([X, Y, VX, VY], dim=2)

In [7]:
# === 3. HLAVN√ç LOOP GENEROV√ÅN√ç ===

DATA_DIR = './generated_data_clean_motion'
if os.path.exists(DATA_DIR): shutil.rmtree(DATA_DIR)
os.makedirs(DATA_DIR)

BATCH_SIZE_GEN = 256  
filter_model = Filters.UnscentedKalmanFilter(system_model)
filter_name = "UKF"

# Konfigurace dataset≈Ø
CONFIGS = [
    # (D√©lka, C√≠l Train, C√≠l Val, Max RMSE [m])
    (10,  2200, 500, 10.0),
    (100, 2200, 500, 30.0), 
    (300, 2200, 500, 70.0) 
]

for seq_len, n_train, n_val, filter_thresh in CONFIGS:
    subset_dir = os.path.join(DATA_DIR, f'len_{seq_len}')
    os.makedirs(subset_dir, exist_ok=True)
    
    print(f"\nüöÄ Generuji sadu (Clean Motion): D√©lka={seq_len} | C√≠l={n_train}/{n_val} | Max {filter_name} RMSE < {filter_thresh}m")
    
    for split, target_count in [('train', n_train), ('val', n_val)]:
        valid_x_list = []
        valid_y_list = []
        
        stats = {
            'generated': 0, 'dropped_bounds': 0, 'dropped_flat': 0,
            'dropped_filter_crash': 0, 'dropped_rmse_high': 0,
            'accepted': 0, 'avg_rmse_rejected': 0.0
        }
        
        pbar = tqdm(total=target_count, desc=f"  -> {split.upper()}")
        batch_counter = 0
        
        while len(valid_x_list) < target_count:
            batch_counter += 1
            x_batch = simulate_batch_ackermann(system_model, BATCH_SIZE_GEN, seq_len, real_speeds, real_yaws)
            stats['generated'] += BATCH_SIZE_GEN
            
            # Pre-filter (Hranice mapy)
            in_bounds = (
                (x_batch[:, :, 0].min(dim=1).values > system_model.min_x) &
                (x_batch[:, :, 0].max(dim=1).values < system_model.max_x) &
                (x_batch[:, :, 1].min(dim=1).values > system_model.min_y) &
                (x_batch[:, :, 1].max(dim=1).values < system_model.max_y)
            )
            x_cands = x_batch[in_bounds]
            stats['dropped_bounds'] += (BATCH_SIZE_GEN - len(x_cands))
            if len(x_cands) == 0: continue
            
            try:
                batch_size_real = x_cands.shape[0]
                flat_x = x_cands.reshape(-1, 4)
                flat_y = system_model.measure(flat_x) 
                y_ideal = flat_y.reshape(batch_size_real, seq_len, 3)
                
                # 1. B√≠l√Ω ≈°um
                noise_std = torch.tensor([9.0, 3.0, 3.0], device=system_model.Ex0.device)
                white_noise = torch.randn_like(y_ideal) * noise_std
                
                # Verze bez biasu a bez odometry scale (pro validaci)
                y_for_validation = y_ideal + white_noise

                # 2. Bias Barometru (Rozsah -5 a≈æ +5)
                bias_min = -5.0
                bias_max = 5.0
                bias_span = bias_max - bias_min
                
                baro_bias = (torch.rand(batch_size_real, 1, 1, device=system_model.Ex0.device) * bias_span) + bias_min
                bias_tensor = torch.zeros_like(y_ideal)
                bias_tensor[:, :, 0] = baro_bias[:, :, 0]
                
                # --- VERZE PRO DATASET (Hardcore: ≈†um + Bias) ---
                # Pozn: Odometry scale error byl odstranƒõn na p≈ô√°n√≠ u≈æivatele.
                y_for_dataset = y_for_validation.clone() # Zaƒçneme s ≈°umem
                y_for_dataset += bias_tensor             # P≈ôid√°me bias

            except Exception as e:
                print(f"Chyba mƒõ≈ôen√≠: {e}")
                continue

            # Filter Validace
            rmse_sum_rejected = 0
            rmse_count_rejected = 0
            
            for i in range(len(x_cands)):
                if len(valid_x_list) >= target_count: break
                
                # Kontrola roviny
                if torch.std(y_for_dataset[i, :, 0]) < 0.5: 
                    stats['dropped_flat'] += 1
                    continue
                
                try:
                    x_gt = x_cands[i]
                    
                    # Validujeme ƒçist≈°√≠ data (y_for_validation)
                    res = filter_model.process_sequence(y_for_validation[i], Ex0=x_gt[0], P0=system_model.P0)
                    x_est = res['x_filtered']
                    
                    len_est = x_est.shape[0]
                    len_gt = x_gt.shape[0]
                    min_len = min(len_est, len_gt)
                    
                    diff = x_est[:min_len, :2] - x_gt[:min_len, :2]
                    rmse = torch.sqrt(torch.mean(torch.sum(diff**2, dim=1))).item()
                    
                    if rmse < filter_thresh:
                        valid_x_list.append(x_gt.cpu())
                        valid_y_list.append(y_for_dataset[i].cpu())
                        stats['accepted'] += 1
                        pbar.update(1)
                    else:
                        stats['dropped_rmse_high'] += 1
                        rmse_sum_rejected += rmse
                        rmse_count_rejected += 1
                        
                except Exception:
                    stats['dropped_filter_crash'] += 1
                    continue
            
            if rmse_count_rejected > 0:
                current_avg = rmse_sum_rejected / rmse_count_rejected
                if stats['avg_rmse_rejected'] == 0: stats['avg_rmse_rejected'] = current_avg
                else: stats['avg_rmse_rejected'] = 0.9 * stats['avg_rmse_rejected'] + 0.1 * current_avg

            if batch_counter % 5 == 0:
                acc_rate = (stats['accepted'] / stats['generated']) * 100
                tqdm.write(f"\n--- DEBUG (Batch {batch_counter}) ---")
                tqdm.write(f"   Gen: {stats['generated']} | Acc: {stats['accepted']} ({acc_rate:.2f}%)")
                tqdm.write(f"   Drop: RMSE>{filter_thresh}m={stats['dropped_rmse_high']} (Avg rej: {stats['avg_rmse_rejected']:.1f}m)")
                    
        pbar.close()
        torch.save({'x': torch.stack(valid_x_list), 'y': torch.stack(valid_y_list)}, os.path.join(subset_dir, f'{split}.pt'))

print("\n‚ú® Hotovo. Dataset (Synthetic) vygenerov√°n.")


üöÄ Generuji sadu (Clean Motion): D√©lka=10 | C√≠l=2200/500 | Max UKF RMSE < 10.0m


  -> TRAIN:  47%|‚ñà‚ñà‚ñà‚ñà‚ñã     | 1039/2200 [00:07<00:07, 150.87it/s]


--- DEBUG (Batch 5) ---
   Gen: 1280 | Acc: 1014 (79.22%)
   Drop: RMSE>10.0m=262 (Avg rej: 12.9m)


  -> TRAIN:  94%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé| 2059/2200 [00:14<00:01, 139.12it/s]


--- DEBUG (Batch 10) ---
   Gen: 2560 | Acc: 2049 (80.04%)
   Drop: RMSE>10.0m=499 (Avg rej: 13.5m)


  -> TRAIN: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2200/2200 [00:15<00:00, 144.56it/s]
  -> VAL: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 500/500 [00:03<00:00, 150.22it/s]



üöÄ Generuji sadu (Clean Motion): D√©lka=100 | C√≠l=2200/500 | Max UKF RMSE < 30.0m


  -> TRAIN:  35%|‚ñà‚ñà‚ñà‚ñç      | 765/2200 [00:53<02:04, 11.52it/s]


--- DEBUG (Batch 5) ---
   Gen: 1280 | Acc: 762 (59.53%)
   Drop: RMSE>30.0m=269 (Avg rej: 63.2m)


  -> TRAIN:  70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 1543/2200 [01:49<00:38, 16.99it/s]


--- DEBUG (Batch 10) ---
   Gen: 2560 | Acc: 1543 (60.27%)
   Drop: RMSE>30.0m=526 (Avg rej: 57.2m)


  -> TRAIN: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2200/2200 [02:39<00:00, 13.83it/s]



--- DEBUG (Batch 15) ---
   Gen: 3840 | Acc: 2200 (57.29%)
   Drop: RMSE>30.0m=745 (Avg rej: 54.2m)


  -> VAL: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 500/500 [00:36<00:00, 13.67it/s]



üöÄ Generuji sadu (Clean Motion): D√©lka=300 | C√≠l=2200/500 | Max UKF RMSE < 70.0m


  -> TRAIN:  27%|‚ñà‚ñà‚ñã       | 595/2200 [01:38<04:12,  6.36it/s]


--- DEBUG (Batch 5) ---
   Gen: 1280 | Acc: 594 (46.41%)
   Drop: RMSE>70.0m=30 (Avg rej: 84.3m)


  -> TRAIN:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 1188/2200 [03:17<02:47,  6.03it/s]


--- DEBUG (Batch 10) ---
   Gen: 2560 | Acc: 1187 (46.37%)
   Drop: RMSE>70.0m=54 (Avg rej: 83.7m)


  -> TRAIN:  81%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 1785/2200 [04:54<01:24,  4.90it/s]


--- DEBUG (Batch 15) ---
   Gen: 3840 | Acc: 1784 (46.46%)
   Drop: RMSE>70.0m=72 (Avg rej: 84.4m)


  -> TRAIN: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2200/2200 [06:06<00:00,  6.00it/s]
  -> VAL: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 500/500 [01:25<00:00,  5.82it/s]


--- DEBUG (Batch 5) ---
   Gen: 1280 | Acc: 500 (39.06%)
   Drop: RMSE>70.0m=21 (Avg rej: 83.9m)

‚ú® Hotovo. Dataset (Synthetic) vygenerov√°n.



