 """
# Abu Dhabi Traffic Flow Vehicle Behavior Classification Project

## Research Objective:
Train a CNN-LSTM hybrid model using actual vehicle labels from Aimsun data:
- HDV Aggressive → Aggressive behavior
- HDV Conventional Gipps Model → Normal behavior
- HDV Cooperative → Cooperative behavior
- CAV → Autonomous vehicle (excluded from training)

## Model Validation:
Compare CNN-LSTM predictions with actual vehicle labels to evaluate accuracy and F1 scores. Split the data into 80% training and 20% testing. Get the data from BOX using BOX API
"""
print("🚗 Abu Dhabi Traffic Flow Vehicle Behavior Classification System")
print("📊 Training with Actual Vehicle Labels from Data Files")


CHeck the GPU

In [28]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Sat Jul 26 17:07:17 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   67C    P0             34W /   70W |    7562MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [29]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 54.8 gigabytes of available RAM

You are using a high-RAM runtime!


flowchart TD

    A[Raw CSV Data] --> B[Data Parsing & Cleaning]
    B --> C[Feature Extraction]
    C --> D[Data Preparation]
    D --> E[Model Training (CNN-LSTM)]
    E --> F[Prediction & Continuous Learning]
    F --> G[Behavior Output: Aggressive/Cooperative/Normal]


In [30]:
!pip install --upgrade tensorflow




In [31]:
!pip install boxsdk



In [32]:
!pip install "boxsdk[jwt]"



In [33]:
import os, json, numpy as np, pandas as pd, warnings
import tensorflow as tf
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import (Input, LSTM, Conv1D, Dense, Dropout, BatchNormalization, Concatenate, GlobalMaxPooling1D, Masking)
from sklearn.preprocessing import StandardScaler, LabelEncoder
import joblib
import matplotlib.pyplot as plt
import io
import json
import hashlib
from tensorflow.keras.layers import Lambda
from sklearn.metrics import f1_score, precision_score, recall_score, classification_report
from boxsdk import Client, JWTAuth
import pandas as pd
import random
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.saving import register_keras_serializable
from sklearn.model_selection import train_test_split


auth = JWTAuth.from_settings_file('key.json')
client = Client(auth)


@register_keras_serializable(package="custom_layers")
def compute_accel(speed_tensor):
    # … your numpy/tf logic …
    return acceleration_tensor


warnings.filterwarnings('ignore')
np.random.seed(42)
tf.random.set_seed(42)
print("✅ All libraries imported successfully")


✅ All libraries imported successfully


Data Processing functions

In [34]:
def parse_array_string(array_str):
    """Parse array strings with error handling"""
    try:
        if pd.isna(array_str): return []
        cleaned = str(array_str).replace('inf', '0').replace('-inf', '0').replace('nan', 'None')
        return eval(cleaned)
    except:
        return []

def interpolate_missing_values(sequence):
    """Handle missing values using linear interpolation"""
    """
    Interpolate only None or np.nan values in the sequence.
    Leave -inf and +inf unchanged.
    """
    if not sequence:
        return []
    arr = np.array(sequence, dtype=float)
    # Identify missing values (None or np.nan)
    missing_mask = np.isnan(arr)
    # Valid values are finite and not missing
    valid_mask = ~missing_mask & np.isfinite(arr)
    valid_indices = np.where(valid_mask)[0]

    # If all are missing, return zeros except for -inf/+inf
    if np.all(missing_mask | ~np.isfinite(arr)):
        return [v if np.isinf(v) else 0.0 for v in arr]

    # If only one valid value, fill missing with that value (but keep infs)
    if len(valid_indices) == 1:
        fill_value = arr[valid_indices[0]]
        return [
            v if not np.isnan(v) else fill_value
            for v in arr
        ]

    # Standard case: interpolate only missing values, leave infs untouched
    interp_values = np.copy(arr)
    # Indices to interpolate: missing and not inf
    interp_indices = np.where(missing_mask & ~np.isinf(arr))[0]
    if len(valid_indices) >= 2 and len(interp_indices) > 0:
        # Interpolate only where needed
        interp_result = np.interp(
            interp_indices, valid_indices, arr[valid_mask]
        )
        interp_values[interp_indices] = interp_result

    # Convert to list and return
    return interp_values.tolist()

def parse_coordinate_string(coord_str):
    """Parse coordinate data with interpolation"""
    try:
        if pd.isna(coord_str): return []
        cleaned_str = str(coord_str).replace('inf', '0').replace('-inf', '0').replace('nan', 'None')
        coords = eval(cleaned_str)
        x_coords = [c[0] for c in coords]
        y_coords = [c[1] for c in coords]
        return list(zip(
            interpolate_missing_values(x_coords),
            interpolate_missing_values(y_coords)
        ))
    except:
        return []

def calculate_lateral_speeds(front_coords, rear_coords):
    """Calculate lateral movement speeds"""
    if len(front_coords) < 2: return []
    lateral_speeds = []
    for i in range(1, len(front_coords)):
        try:
            dx = front_coords[i][0] - front_coords[i-1][0]
            dy = front_coords[i][1] - front_coords[i-1][1]
            lateral_speeds.append(np.sqrt(dx**2 + dy**2))
        except:
            lateral_speeds.append(0)
    return interpolate_missing_values(lateral_speeds)


In [35]:
def extract_vehicle_labels(df):
    """Extract labels directly from VehTypeName column"""
    def map_vehicle_type(veh_type_name):
        veh_type = str(veh_type_name).strip()
        if 'CAV' in veh_type:
            return 'autonomous'  # Will be excluded from training
        elif 'Aggressive' in veh_type:
            return 'aggressive'
        elif 'Cooperative' in veh_type:
            return 'cooperative'
        elif 'Conventional' in veh_type or 'Gipps' in veh_type:
            return 'normal'
        else:
            return 'normal'  # Default classification

    df['behavior_label'] = df['VehTypeName'].apply(map_vehicle_type)
    return df

Feature extraction and if any record is more than half empty then it is discarded

In [36]:
def extract_speed_position_features(vehicle_row, min_sequence_length=5):
    """
    Extract only speed and position sequences from vehicle data row.
    Returns None if data is insufficient.
    """
    try:
        # Parse speed sequence
        speeds = parse_array_string(vehicle_row['Speeds'])
        # Parse front coordinates (positions)
        front_coords = parse_coordinate_string(vehicle_row['VehFrontCoords'])

        # Validation: Check for at least 30 invalid values in any sequence
        def count_invalid(seq):
            return sum(
                (v is None) or
                (isinstance(v, float) and (np.isnan(v) or np.isinf(v)))
                for v in seq
            )

        if count_invalid(speeds) >= 30 or count_invalid(front_coords) >= 30:
            return None

        # Clean sequences
        speeds_clean = interpolate_missing_values(speeds)
        positions_clean = front_coords  # Already interpolated by parse_coordinate_string

        # Ensure consistent length
        min_length = min(len(speeds_clean), len(positions_clean))
        if min_length < min_sequence_length:
            return None

        # Truncate to same length
        speeds_clean = speeds_clean[:min_length]
        positions_clean = positions_clean[:min_length]

        # Convert to arrays
        speeds_arr = np.array(speeds_clean).reshape(-1, 1)
        positions_arr = np.array(positions_clean).reshape(-1, 2)

        return {
            'speeds': speeds_arr,          # shape (N, 1)
            'positions': positions_arr,    # shape (N, 2)
            'sequence_length': min_length
        }
    except Exception as e:
        print(f"Feature extraction error: {e}")
        return None


Data reading function

In [37]:
def read_aimsun_data(file_path, column_map=None):
    """Read CSV with flexible column mapping"""
    DEFAULT_MAP = {
        'vehicle_id': 'VehNr', 'timestep': 'Timestep', 'speed': 'Speeds',
        'acceleration': 'Accelerations', 'front_coords': 'VehFrontCoords',
        'rear_coords': 'VehRearCoords', 'vehicle_type': 'VehTypeName', 'length': 'Length'
    }
    col_map = column_map or DEFAULT_MAP
    try:
        df = pd.read_csv(file_path)
        return df.rename(columns={v: k for k, v in col_map.items()})
    except Exception as e:
        print(f"Error reading {file_path}: {str(e)}")
        return None

def extract_vehicle_table(df, vehicle_id):
    """Extract time-series table for specific vehicle"""
    vehicle_data = df[df['vehicle_id'] == vehicle_id].copy()
    vehicle_data['front_coords'] = vehicle_data['front_coords'].apply(parse_coordinate_string)
    vehicle_data['rear_coords'] = vehicle_data['rear_coords'].apply(parse_coordinate_string)
    vehicle_data['position'] = vehicle_data.apply(
        lambda x: np.mean([x['front_coords'], x['rear_coords']], axis=0), axis=1)
    return vehicle_data[['timestep', 'speed', 'acceleration', 'position']]

print("✅ Data reading functions ready")


✅ Data reading functions ready


In [38]:
def process_labeled_data_speed_position(csv_files, folder_path, max_files=None):
    """
    Process data files using only speed and position features.
    """
    all_features = []
    all_labels = []
    #vehicle_details = []
    processed_count = 0

    files_to_process = csv_files[:max_files] if max_files else csv_files

    for filename in files_to_process:
        try:
            print(f"📄 Processing {filename}...")
            file_path = os.path.join(folder_path, filename)
            df = pd.read_csv(file_path)

            # Extract labels from VehTypeName
            df = extract_vehicle_labels(df)

            print(f"   Labels in {filename}:")
            file_labels = df['behavior_label'].value_counts()
            for label, count in file_labels.items():
                print(f"     {label}: {count}")

            for idx, row in df.iterrows():
                if row['behavior_label'] == 'autonomous':
                    continue

                features = extract_speed_position_features(row)
                if features is not None:
                    all_features.append(features)
                    all_labels.append(row['behavior_label'])
                    processed_count += 1

            print(f"   ✅ Extracted {processed_count} valid vehicles so far")
        except Exception as e:
            print(f"   ❌ Error: {e}")

    return all_features, all_labels


In [39]:
def prepare_speed_position_data(features_list, max_sequence=60):
    """
    Prepare padded speed and position arrays for model input.
    Returns:
        X_speed: (num_samples, max_sequence, 1)
        X_pos: (num_samples, max_sequence, 2)
    """
    X_speed = []
    X_pos = []
    for features in features_list:
        speeds = features['speeds'][:max_sequence]
        positions = features['positions'][:max_sequence]
        seq_len = min(len(speeds), max_sequence)

        # Pad if needed
        speed_padded = np.zeros((max_sequence, 1))
        pos_padded = np.zeros((max_sequence, 2))
        speed_padded[:seq_len] = speeds[:seq_len]
        pos_padded[:seq_len] = positions[:seq_len]

        X_speed.append(speed_padded)
        X_pos.append(pos_padded)
    return np.array(X_speed), np.array(X_pos)


In [40]:
# Replace your build_speed_position_model_fixed function with this version:

def build_speed_position_model_fixed(sequence_length=60, num_classes=3):
    """
    Fixed model that properly handles multiple classes and computes
    acceleration/lateral movement correctly. Fixed cuDNN compatibility.
    """
    # Inputs
    speed_input = Input(shape=(sequence_length, 1), name='speed')
    position_input = Input(shape=(sequence_length, 2), name='position')

    # Compute acceleration (difference between consecutive speeds)
    def compute_acceleration(speed_tensor):
        # Pad with zeros at the beginning for the first timestep
        zeros = tf.zeros_like(speed_tensor[:, :1, :])
        accel = speed_tensor[:, 1:, :] - speed_tensor[:, :-1, :]
        return tf.concat([zeros, accel], axis=1)

    accel = Lambda(compute_acceleration, name='acceleration')(speed_input)

    # Compute lateral movement (Euclidean distance between consecutive positions)
    def compute_lateral_movement(pos_tensor):
        # Calculate displacement vectors
        displacement = pos_tensor[:, 1:, :] - pos_tensor[:, :-1, :]
        # Calculate magnitudes (lateral speed)
        lateral_speed = tf.norm(displacement, axis=-1, keepdims=True)
        # Pad with zeros at the beginning
        zeros = tf.zeros_like(lateral_speed[:, :1, :])
        return tf.concat([zeros, lateral_speed], axis=1)

    lateral = Lambda(compute_lateral_movement, name='lateral')(position_input)

    # Concatenate all features
    features = Concatenate(axis=-1)([speed_input, accel, lateral])  # shape: (batch, seq, 3)

    # REMOVE MASKING - this causes cuDNN issues
    # masked = Masking(mask_value=0.0)(features)

    # CNN branch - work directly on features
    cnn = Conv1D(64, 3, activation='relu', padding='same')(features)
    cnn = BatchNormalization()(cnn)
    cnn = Dropout(0.3)(cnn)
    cnn = Conv1D(128, 3, activation='relu', padding='same')(cnn)
    cnn = BatchNormalization()(cnn)
    cnn_out = GlobalMaxPooling1D()(cnn)

    # LSTM branch - disable cuDNN to avoid masking issues
    lstm = LSTM(128, return_sequences=True, use_cudnn=False)(features)  # FIX: Disable cuDNN
    lstm = BatchNormalization()(lstm)
    lstm_out = LSTM(64, use_cudnn=False)(lstm)  # FIX: Disable cuDNN

    # Feature fusion
    combined = Concatenate()([cnn_out, lstm_out])
    x = Dense(256, activation='relu')(combined)
    x = BatchNormalization()(x)
    x = Dropout(0.5)(x)
    x = Dense(128, activation='relu')(x)

    # FIXED: Output layer should have num_classes neurons, not 1
    output = Dense(num_classes, activation='softmax')(x)

    model = Model(inputs=[speed_input, position_input], outputs=output)

    # Use appropriate optimizer and learning rate
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

    print("🧠 Fixed Model Architecture:")
    model.summary()
    return model

# Also fix your prepare_speed_position_data function:

def prepare_speed_position_data(features_list, max_sequence=60):
    """
    Prepare padded speed and position arrays for model input.
    Fixed for cuDNN compatibility.
    Returns:
        X_speed: (num_samples, max_sequence, 1)
        X_pos: (num_samples, max_sequence, 2)
    """
    X_speed = []
    X_pos = []
    for features in features_list:
        speeds = features['speeds'][:max_sequence]
        positions = features['positions'][:max_sequence]
        seq_len = min(len(speeds), max_sequence)

        # Pad if needed - ensure proper dtype
        speed_padded = np.zeros((max_sequence, 1), dtype=np.float32)  # FIX: Add dtype
        pos_padded = np.zeros((max_sequence, 2), dtype=np.float32)    # FIX: Add dtype
        speed_padded[:seq_len] = speeds[:seq_len]
        pos_padded[:seq_len] = positions[:seq_len]

        X_speed.append(speed_padded)
        X_pos.append(pos_padded)
    return np.array(X_speed, dtype=np.float32), np.array(X_pos, dtype=np.float32)  # FIX: Ensure dtype

graph TD

    TS[Time Series Input<br/>60 x 3] --> Mask[Masking Layer]
    Mask --> CNN1[Conv1D 64<br/>Kernel=3]
    CNN1 --> BN1[BatchNorm]
    BN1 --> Drop1[Dropout 0.3]
    Drop1 --> CNN2[Conv1D 128<br/>Kernel=3]
    CNN2 --> BN2[BatchNorm]
    BN2 --> GMP[GlobalMaxPooling1D]
    
    Mask --> LSTM1[LSTM 128<br/>return_sequences=True]
    LSTM1 --> BN3[BatchNorm]
    BN3 --> LSTM2[LSTM 64]
    
    S[Static Features<br/>13 dims] --> Dense1[Dense 64]
    Dense1 --> BN4[BatchNorm]
    
    GMP --> Concat[Concatenate]
    LSTM2 --> Concat
    BN4 --> Concat
    
    Concat --> Dense2[Dense 256]
    Dense2 --> BN5[BatchNorm]
    BN5 --> Drop2[Dropout 0.5]
    Drop2 --> Dense3[Dense 128]
    Dense3 --> Output[Softmax Output<br/>3 Classes]


Calculate the F1 score

In [41]:
def classify_vehicle_behavior_speed_position(vehicle_data, model, max_sequence=60):
    """
    Classify vehicle behavior using only speed and position sequences.
    """
    features = extract_speed_position_features(vehicle_data)
    if not features:
        return "unknown"

    speeds = features['speeds'][:max_sequence]
    positions = features['positions'][:max_sequence]
    seq_len = min(len(speeds), max_sequence)

    # Pad
    speed_padded = np.zeros((1, max_sequence, 1))
    pos_padded = np.zeros((1, max_sequence, 2))
    speed_padded[0, :seq_len] = speeds
    pos_padded[0, :seq_len] = positions

    prediction = model.predict([speed_padded, pos_padded])
    class_id = np.argmax(prediction)
    return {0: 'aggressive', 1: 'cooperative', 2: 'normal'}.get(class_id, 'unknown')


In [42]:
def analyze_f1_performance(comparisons, actual_labels, predictions):
    """Detailed F1 score analysis with visualizations"""

    # Classification report
    print("\n📋 Detailed Classification Report:")
    print("=" * 60)
    print(classification_report(actual_labels, predictions))

    # F1 scores by confidence ranges
    df_comp = pd.DataFrame(comparisons)
    confidence_ranges = [(0.0, 0.5), (0.5, 0.7), (0.7, 0.9), (0.9, 1.0)]

    print("\n📊 F1 Score by Confidence Ranges:")
    print("-" * 50)
    for low, high in confidence_ranges:
        mask = (df_comp['confidence'] >= low) & (df_comp['confidence'] < high)
        subset = df_comp[mask]
        if len(subset) > 0:
            subset_f1 = f1_score(subset['actual'], subset['predicted'], average='macro')
            print(f"Confidence {low:.1f}-{high:.1f}: F1={subset_f1:.3f} (n={len(subset)})")

    # Misclassified vehicles analysis
    print(f"\n❌ Misclassified Vehicles Analysis:")
    misclassified = df_comp[df_comp['correct'] == False]
    if not misclassified.empty:
        print(f"Total misclassified: {len(misclassified)}")
        for behavior in ['aggressive', 'cooperative', 'normal']:
            behavior_miss = misclassified[misclassified['actual'] == behavior]
            if len(behavior_miss) > 0:
                print(f"  {behavior}: {len(behavior_miss)} vehicles")

def plot_f1_results(comparisons, actual_labels, predictions):
    """Visualize F1 score results"""

    fig, axes = plt.subplots(2, 2, figsize=(15, 12))

    # 1. F1 scores by class
    classes = sorted(set(actual_labels + predictions))
    if len(classes) > 1:
        f1_per_class = f1_score(actual_labels, predictions, average=None, labels=classes)
        axes[0,0].bar(classes, f1_per_class, color=['#ff9999','#66b3ff','#99ff99'])
        axes[0,0].set_title('F1 Score by Vehicle Behavior Class')
        axes[0,0].set_ylabel('F1 Score')
        axes[0,0].set_ylim(0, 1.1)

    # 2. Prediction confidence distribution
    df_comp = pd.DataFrame(comparisons)
    axes[0,1].hist([df_comp[df_comp['correct']]['confidence'],
                   df_comp[~df_comp['correct']]['confidence']],
                  bins=20, alpha=0.7, label=['Correct', 'Incorrect'])
    axes[0,1].set_title('Prediction Confidence Distribution')
    axes[0,1].set_xlabel('Confidence')
    axes[0,1].legend()

    # 3. Confusion matrix
    if len(classes) > 1:
        from sklearn.metrics import confusion_matrix
        import seaborn as sns
        cm = confusion_matrix(actual_labels, predictions, labels=classes)
        sns.heatmap(cm, annot=True, fmt='d', xticklabels=classes, yticklabels=classes, ax=axes[1,0])
        axes[1,0].set_title('Confusion Matrix')
        axes[1,0].set_xlabel('Predicted')
        axes[1,0].set_ylabel('Actual')

    # 4. F1 vs Accuracy comparison
    accuracy = df_comp['correct'].mean()
    f1_macro = f1_score(actual_labels, predictions, average='macro') if len(classes) > 1 else accuracy

    metrics = ['Accuracy', 'F1-Macro']
    values = [accuracy, f1_macro]
    axes[1,1].bar(metrics, values, color=['lightblue', 'lightcoral'])
    axes[1,1].set_title('Accuracy vs F1 Score')
    axes[1,1].set_ylabel('Score')
    axes[1,1].set_ylim(0, 1.1)

    plt.tight_layout()
    plt.show()

print("✅ F1 analysis and visualization functions ready")


✅ F1 analysis and visualization functions ready


In [43]:
def compare_predictions_with_labels_f1_speed_position(model, features_list, vehicle_details, label_encoder, max_sequence=60):
    """
    Compare model predictions with actual vehicle labels using F1 score.
    """
    predictions = []
    actual_labels = []
    comparisons = []

    print("🔍 Comparing predictions with actual labels (F1 Score Evaluation)...")
    print("-" * 80)
    print(f"{'VehNr':<8} {'Actual Label':<15} {'Predicted':<15} {'Match':<8} {'Confidence':<12}")
    print("-" * 80)

    correct_predictions = 0

    X_speed, X_pos = prepare_speed_position_data(features_list, max_sequence=max_sequence)

    for i, vehicle_info in enumerate(vehicle_details):
        # Predict
        prediction_probs = model.predict([X_speed[i:i+1], X_pos[i:i+1]], verbose=0)
        predicted_class_idx = np.argmax(prediction_probs)
        predicted_label = label_encoder.inverse_transform([predicted_class_idx])[0]
        confidence = np.max(prediction_probs)
        actual_label = vehicle_info['actual_label']
        is_correct = predicted_label == actual_label

        if is_correct:
            correct_predictions += 1

        predictions.append(predicted_label)
        actual_labels.append(actual_label)
        comparison = {
            'VehNr': vehicle_info['VehNr'],
            'actual': actual_label,
            'predicted': predicted_label,
            'correct': is_correct,
            'confidence': confidence,
            'file': vehicle_info['file']
        }
        comparisons.append(comparison)

        if i < 20:
            match_symbol = "✅" if is_correct else "❌"
            print(f"{vehicle_info['VehNr']:<8} {actual_label:<15} {predicted_label:<15} {match_symbol:<8} {confidence:.3f}")

    # Calculate F1 scores
    try:
        f1_macro = f1_score(actual_labels, predictions, average='macro')
        f1_weighted = f1_score(actual_labels, predictions, average='weighted')
        f1_per_class = f1_score(actual_labels, predictions, average=None, labels=label_encoder.classes_)
        precision_macro = precision_score(actual_labels, predictions, average='macro')
        recall_macro = recall_score(actual_labels, predictions, average='macro')
        accuracy = correct_predictions / len(vehicle_details)

        print("-" * 80)
        print(f"🎯 Overall Accuracy: {accuracy:.3f} ({correct_predictions}/{len(vehicle_details)})")
        print(f"📊 F1 Score (Macro): {f1_macro:.3f}")
        print(f"📊 F1 Score (Weighted): {f1_weighted:.3f}")
        print(f"📊 Precision (Macro): {precision_macro:.3f}")
        print(f"📊 Recall (Macro): {recall_macro:.3f}")

        print("\n🏷️ F1 Score per Class:")
        for i, class_name in enumerate(label_encoder.classes_):
            if i < len(f1_per_class):
                print(f"  {class_name:15s}: {f1_per_class[i]:.3f}")
    except Exception as e:
        print(f"⚠️ Could not calculate F1 scores: {e}")
        f1_macro = f1_weighted = 0.0

    return comparisons, accuracy, f1_macro, f1_weighted


In [44]:
import io
import json
import os
import hashlib

class ContinuousLearningSystem:

    def __init__(self, model_path, scaler_path, le_path, data_path, processed_path, box_client=None, box_folder=None):
        self.model_path = model_path
        self.scaler_path = scaler_path
        self.le_path = le_path
        self.data_path = data_path
        self.processed_path = processed_path
        self.processed_hashes = self.load_processed_hashes()
        self.box_client = box_client
        self.box_folder = box_folder

        # Load existing model artifacts if they exist
        self.model = self.load_model()
        self.scaler = self.load_scaler()
        self.label_encoder = self.load_label_encoder()


    def load_processed_hashes(self):
        try:
            if os.path.exists(self.processed_path):
                with open(self.processed_path, 'r') as f:
                    return set(json.load(f))
            else:
                return set()
        except Exception as e:
            print(f"Error loading processed hashes: {e}")
            return set()
    def load_model(self):
        try:
            if os.path.exists(self.model_path):
                print(f"Loading existing model from {self.model_path}")
                return load_model(self.model_path)
            else:
                print("No existing model found. A new model will be built.")
                return None
        except Exception as e:
            print(f"Error loading model from {self.model_path}: {e}")
            return None

    def load_scaler(self):
        try:
            if os.path.exists(self.scaler_path):
                print(f"Loading existing scaler from {self.scaler_path}")
                return joblib.load(self.scaler_path)
            else:
                print("No existing scaler found.")
                return None
        except Exception as e:
            print(f"Error loading scaler from {self.scaler_path}: {e}")
            return None

    def load_label_encoder(self):
        try:
            if os.path.exists(self.le_path):
                print(f"Loading existing label encoder from {self.le_path}")
                return joblib.load(self.le_path)
            else:
                print("No existing label encoder found.")
                return None
        except Exception as e:
            print(f"Error loading label encoder from {self.le_path}: {e}")
            return None



    def save_processed_hashes(self):
        try:
            with open(self.processed_path, 'w') as f:
                json.dump(list(self.processed_hashes), f)
        except Exception as e:
            print(f"Error saving processed hashes: {e}")

    def csv_stream_hash(self, csv_stream):
        pos = csv_stream.tell()
        csv_stream.seek(0)
        content = csv_stream.read()
        csv_stream.seek(pos)  # Restore position
        return hashlib.sha256(content.encode('utf-8')).hexdigest()

    def stream_all_csv_files_from_box(self, folder):
        """Yield (file_name, csv_stream) for every CSV file in Box folder and subfolders."""
        for item in folder.get_items(limit=2000):
            if item.type == 'folder':
                yield from self.stream_all_csv_files_from_box(self.box_client.folder(item.id))
            elif item.type == 'file' and item.name.endswith('.csv'):
                file_content = item.content()
                csv_stream = io.StringIO(file_content.decode('utf-8'))
                yield item.name, csv_stream

    def get_new_box_files(self):
        """Get new CSV files from Box that have not been processed yet."""
        new_files = []
        if not self.box_client or not self.box_folder:
            print("Box client or folder not configured.")
            return new_files

        for file_name, csv_stream in self.stream_all_csv_files_from_box(self.box_folder):
            file_hash = self.csv_stream_hash(csv_stream)
            if file_hash not in self.processed_hashes:
                new_files.append((file_name, csv_stream, file_hash))
        return new_files


    def update_model(self):
        model = self.model
        scaler = self.scaler
        label_encoder = self.label_encoder

        # fetch all file lists
        train_files = [...]    # wherever you get these
        test_files  = [...]

        # Process existing training & testing data
        train_features, train_labels, _ = self.process_labeled_data_streams(train_files)
        test_features,  test_labels,  _ = self.process_labeled_data_streams(test_files)

        # Check for newly arrived data
        new_files = self.get_new_box_files()
        if not new_files:
            print("No new files to process from Box")
            return model, scaler, label_encoder

        new_features, new_labels, _ = self.process_labeled_data_streams(new_files)
        if not new_features:
            print("No valid features extracted from new Box files")
            return model, scaler, label_encoder

        # (re)fit scaler & encoder if needed
        if scaler is None:
            print("Fitting new scaler on new data")
            static_feats = [f['static'] for f in new_features]
            scaler = StandardScaler().fit(static_feats)
            self.scaler = scaler

        if label_encoder is None:
            print("Fitting new label encoder on new data")
            label_encoder = LabelEncoder().fit(new_labels)
            self.label_encoder = label_encoder

        # —— NEW: split incoming new_features/new_labels 80/20 —— #
        new_train_feats, new_test_feats, new_train_lbls, new_test_lbls = train_test_split(
            new_features,
            new_labels,
            test_size=0.2,
            random_state=42,
            stratify=new_labels  # if you want to preserve class balance
        )

        # Merge splits into your main train/test sets
        train_features.extend(new_train_feats)
        train_labels.extend(new_train_lbls)
        test_features.extend(new_test_feats)
        test_labels.extend(new_test_lbls)

        # Prepare model inputs
        X_ts_train, X_static_train, y_train = self.prepare_new_data(
            train_features, train_labels, scaler, label_encoder
        )
        X_ts_test,  X_static_test,  y_test  = self.prepare_new_data(
            test_features, test_labels, scaler, label_encoder
        )

        # Build model if first time
        if model is None:
            print("Building a new model for initial training.")
            num_features         = X_ts_train.shape[-1] if X_ts_train.size else 3
            static_feature_count = X_static_train.shape[-1] if X_static_train.size else 13
            num_classes          = len(label_encoder.classes_) if label_encoder else 3

            model = build_speed_position_model(
                sequence_length=X_ts_train.shape[1],
                num_features=num_features,
                static_feature_count=static_feature_count,
                num_classes=num_classes
            )
            self.model = model

        # Train on the enlarged training set
        print(f"Training model on {len(train_features)} total training samples")
        model.fit(
            [X_ts_train, X_static_train],
            y_train,
            epochs=3,
            batch_size=32,
            validation_split=0.2
        )

        # (Optional) Evaluate on the enlarged test set
        loss, acc = model.evaluate([X_ts_test, X_static_test], y_test, verbose=0)
        print(f"Test loss: {loss:.4f}, Test accuracy: {acc:.4f}")

        # Save everything
        self.save_model()
        self.save_scaler()
        self.save_label_encoder()

        # Mark new files as processed
        for _, _, fhash in new_files:
            self.processed_hashes.add(fhash)
        self.save_processed_hashes()

        print(f"Updated model with {len(new_features)} new samples "
              f"({len(new_train_feats)}→train, {len(new_test_feats)}→test)")
        return self.model, self.scaler, self.label_encoder

    def process_labeled_data_streams(self, file_streams):
        """Process labeled data from a list of (file_name, csv_stream) tuples."""
        all_features = []
        all_labels = []
        vehicle_details = []
        processed_count = 0

        for file_name, csv_stream, _ in file_streams:
            try:
                print(f"📄 Processing {file_name} from Box stream...")
                df = pd.read_csv(csv_stream)
                df = extract_vehicle_labels(df)
                print(f"   Labels in {file_name}:")
                file_labels = df['behavior_label'].value_counts()
                for label, count in file_labels.items():
                    print(f"     {label}: {count}")
                for idx, row in df.iterrows():
                    if row['behavior_label'] == 'autonomous':
                        continue
                    features = extract_speed_position_features(row)
                    if features is not None:
                        all_features.append(features)
                        all_labels.append(row['behavior_label'])
                        vehicle_details.append({
                            'VehNr': row['VehNr'],
                            'VehTypeName': row['VehTypeName'],
                            'actual_label': row['behavior_label'],
                            'file': file_name
                        })
                        processed_count += 1
                print(f"   ✅ Extracted {processed_count} valid vehicles so far")
            except Exception as e:
                print(f"   ❌ Error processing {file_name}: {e}")

        return all_features, all_labels, vehicle_details


    def save_model(self):
        try:
            if self.model:
                self.model.save(self.model_path)
                print(f"Model saved to {self.model_path}")
        except Exception as e:
            print(f"Error saving model to {self.model_path}: {e}")

    def save_scaler(self):
        try:
            if self.scaler:
                joblib.dump(self.scaler, self.scaler_path)
                print(f"Scaler saved to {self.scaler_path}")
        except Exception as e:
            print(f"Error saving scaler to {self.scaler_path}: {e}")

    def save_label_encoder(self):
        try:
            if self.label_encoder:
                joblib.dump(self.label_encoder, self.le_path)
                print(f"Label encoder saved to {self.le_path}")
        except Exception as e:
            print(f"Error saving label encoder to {self.le_path}: {e}")

    def load_all_data_from_box(self):
        """
        Load all vehicle data from Box into a single list.
        Each item: {'features': ..., 'label': ..., 'details': {...}}
        """
        all_data = []
        file_counter = 0

        for file_name, csv_stream in self.stream_all_csv_files_from_box(self.box_folder):
            fhash = self.csv_stream_hash(csv_stream)
            if fhash in self.processed_hashes:
                continue

            try:
                csv_stream.seek(0)
                df = pd.read_csv(csv_stream)
                df = extract_vehicle_labels(df)

                for _, row in df.iterrows():
                    if row['behavior_label'] == 'autonomous':
                        continue
                    features = extract_speed_position_features(row)
                    if features is not None:
                        record = {
                            'features': features,
                            'label': row['behavior_label'],
                            'details': {
                                'VehNr': row['VehNr'],
                                'VehTypeName': row['VehTypeName'],
                                'file': file_name
                            }
                        }
                        all_data.append(record)
                self.processed_hashes.add(fhash)
                file_counter += 1
                if file_counter ==500: return all_data
                if file_counter % 10 == 0:
                    print(f"Files loaded: {file_counter}")
                    self.save_processed_hashes()
            except Exception as e:
                print(f"Error processing {file_name}: {e}")

        self.save_processed_hashes()
        return all_data


Extract the data from the folder

In [45]:
from google.colab import drive
drive.mount('/content/drive')

DRIVE_PATH = '/content/drive/MyDrive/Test_Output'
MODEL_PATH = f'{DRIVE_PATH}/traffic_model.keras'
SCALER_PATH = f'{DRIVE_PATH}/scaler.joblib'
LE_PATH = f'{DRIVE_PATH}/label_encoder.joblib'
PROCESSED_PATH = f'{DRIVE_PATH}/processed.json'

os.makedirs(DRIVE_PATH, exist_ok=True)

print("✅ All output and data will be stored in:", DRIVE_PATH)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
✅ All output and data will be stored in: /content/drive/MyDrive/Test_Output


In [46]:
main_folder_id = '326383492292'
main_folder = client.folder(main_folder_id).get()

# Find the "Parsed Time Series Data" subfolder
parsed_folder = None
for item in main_folder.get_items(limit=100):
    if item.type == 'folder' and item.name == 'Parsed Time Series Data':
        parsed_folder = client.folder(item.id)
        break

if not parsed_folder:
    raise Exception("Parsed Time Series Data folder not found.")

In [47]:
def debug_scenario_composition(scenario_name='scen3_1rep3', check_all_files=True):
    """
    Debug the vehicle composition across all files in a scenario
    """
    print(f"🔍 COMPREHENSIVE SCENARIO ANALYSIS: {scenario_name}")
    print("=" * 80)

    # Navigate to the specific folder
    main_folder = client.folder(main_folder_id)
    parsed_folder = None

    for item in main_folder.get_items():
        if item.name == 'Parsed Time Series Data':
            parsed_folder = client.folder(item.id)
            break

    target_folder = None
    for item in parsed_folder.get_items():
        if item.name == scenario_name:
            target_folder = client.folder(item.id)
            break

    if not target_folder:
        print(f"❌ Folder {scenario_name} not found")
        return

    # Aggregate statistics across all files
    total_vehicles = {}
    all_vehicle_types = set()
    file_count = 0

    # Check all CSV files in the folder
    for file_item in target_folder.get_items(limit=1000):
        if file_item.type == 'file' and file_item.name.endswith('.csv'):
            file_count += 1

            try:
                content = file_item.content()
                df = pd.read_csv(io.StringIO(content.decode('utf-8')))

                if 'VehTypeName' in df.columns:
                    # Count vehicle types in this file
                    veh_counts = df['VehTypeName'].value_counts()

                    for vtype, count in veh_counts.items():
                        vtype_clean = str(vtype).strip()
                        all_vehicle_types.add(vtype_clean)

                        if vtype_clean not in total_vehicles:
                            total_vehicles[vtype_clean] = 0
                        total_vehicles[vtype_clean] += count

                if not check_all_files and file_count >= 5:  # Sample first 5 files if not checking all
                    break

            except Exception as e:
                print(f"❌ Error reading {file_item.name}: {e}")

    print(f"\n📊 Analyzed {file_count} files")
    print(f"\n🚗 All Vehicle Types Found:")
    for vtype in sorted(all_vehicle_types):
        print(f"   '{vtype}'")

    # Calculate percentages
    grand_total = sum(total_vehicles.values())
    print(f"\n📈 Vehicle Distribution (Total: {grand_total} vehicles):")
    for vtype, count in sorted(total_vehicles.items(), key=lambda x: x[1], reverse=True):
        percentage = (count / grand_total) * 100
        print(f"   {vtype}: {count} ({percentage:.1f}%)")

    # Map to behavior categories
    behavior_counts = {
        'autonomous': 0,
        'aggressive': 0,
        'cooperative': 0,
        'normal': 0
    }

    for vtype, count in total_vehicles.items():
        if 'CAV' in vtype:
            behavior_counts['autonomous'] += count
        elif 'Aggressive' in vtype:
            behavior_counts['aggressive'] += count
        elif 'Cooperative' in vtype:
            behavior_counts['cooperative'] += count
        elif 'Conventional' in vtype:
            behavior_counts['normal'] += count


    print(f"\n🏷️ Behavior Category Distribution:")
    for behavior, count in behavior_counts.items():
        percentage = (count / grand_total) * 100 if grand_total > 0 else 0
        print(f"   {behavior}: {count} ({percentage:.1f}%)")


    return all_vehicle_types, total_vehicles

# Run the analysis
all_types, counts = debug_scenario_composition('scen3_5rep3', check_all_files=True)

🔍 COMPREHENSIVE SCENARIO ANALYSIS: scen3_5rep3

📊 Analyzed 61 files

🚗 All Vehicle Types Found:
   'CAV'
   'HDV Aggressive'
   'HDV Conventional Gipps Model'
   'HDV Cooperative'

📈 Vehicle Distribution (Total: 35133 vehicles):
   HDV Conventional Gipps Model: 14028 (39.9%)
   HDV Cooperative: 8891 (25.3%)
   HDV Aggressive: 8671 (24.7%)
   CAV: 3543 (10.1%)

🏷️ Behavior Category Distribution:
   autonomous: 3543 (10.1%)
   aggressive: 8671 (24.7%)
   cooperative: 8891 (25.3%)
   normal: 14028 (39.9%)


In [48]:
# RATE-LIMITED ULTRA TRAINER - RESPECTS BOX API LIMITS
# Fixed version that won't hit rate limits

import os
import time
import gc
import pickle
import json
import numpy as np
import pandas as pd
import io
import threading
import random
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import List, Tuple, Optional
import warnings
warnings.filterwarnings('ignore')

# ====== GPU MEMORY MANAGEMENT ======
print("🔧 Setting up GPU memory management...")
import tensorflow as tf
try:
    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
        tf.config.experimental.set_memory_growth(gpus[0], True)
        print(f"✅ GPU memory growth enabled: {gpus[0]}")
    else:
        print("⚠️ No GPU detected - using CPU")
except Exception as e:
    print(f"⚠️ GPU setup warning: {e}")

# ====== RATE LIMITER CLASS ======
class BoxAPIRateLimiter:
    """
    Handles Box API rate limiting and retry logic
    """
    def __init__(self, max_requests_per_second=8, max_retries=5):
        self.max_requests_per_second = max_requests_per_second
        self.max_retries = max_retries
        self.last_request_time = 0
        self.request_count = 0
        self.window_start = time.time()
        self.lock = threading.Lock()

    def wait_if_needed(self):
        """Wait if we're approaching rate limits"""
        with self.lock:
            current_time = time.time()

            # Reset counter every second
            if current_time - self.window_start >= 1.0:
                self.request_count = 0
                self.window_start = current_time

            # If we're at the limit, wait
            if self.request_count >= self.max_requests_per_second:
                sleep_time = 1.0 - (current_time - self.window_start)
                if sleep_time > 0:
                    time.sleep(sleep_time)
                self.request_count = 0
                self.window_start = time.time()

            self.request_count += 1

    def execute_with_retry(self, func, *args, **kwargs):
        """Execute function with automatic retry on rate limit"""
        for attempt in range(self.max_retries):
            try:
                self.wait_if_needed()
                return func(*args, **kwargs)

            except Exception as e:
                error_str = str(e).lower()

                # Check if it's a rate limit error
                if '429' in error_str or 'rate limit' in error_str or 'too many requests' in error_str:
                    wait_time = 2 ** attempt + random.uniform(0, 1)  # Exponential backoff
                    print(f"⚠️ Rate limited (attempt {attempt + 1}), waiting {wait_time:.1f}s...")
                    time.sleep(wait_time)
                    continue
                else:
                    # Not a rate limit error, re-raise
                    raise e

        # All retries exhausted
        raise Exception(f"Max retries ({self.max_retries}) exceeded for Box API call")

# Global rate limiter
rate_limiter = BoxAPIRateLimiter(max_requests_per_second=6)  # Conservative limit

# ====== RUNTIME KEEP-ALIVE SYSTEM ======
class RuntimeKeepAlive:
    def __init__(self):
        self.running = False
        self.thread = None

    def start(self):
        if not self.running:
            self.running = True
            self.thread = threading.Thread(target=self._keep_alive, daemon=True)
            self.thread.start()
            print("🔄 Runtime keep-alive started")

    def stop(self):
        self.running = False
        print("🛑 Runtime keep-alive stopped")

    def _keep_alive(self):
        while self.running:
            time.sleep(1800)  # 30 minutes
            if self.running:
                print(f"💓 Runtime alive at {time.strftime('%H:%M:%S')}")

# Start keep-alive
keep_alive = RuntimeKeepAlive()
keep_alive.start()

# ====== PROGRESS MONITOR ======
class ProgressMonitor:
    def __init__(self, checkpoint_dir):
        self.checkpoint_dir = checkpoint_dir
        self.progress_file = f"{checkpoint_dir}/progress.json"

    def update(self, **kwargs):
        try:
            progress = {
                'timestamp': time.time(),
                'time_str': time.strftime('%Y-%m-%d %H:%M:%S'),
                **kwargs
            }

            with open(self.progress_file, 'w') as f:
                json.dump(progress, f, indent=2)

        except Exception as e:
            print(f"⚠️ Progress update failed: {e}")

    def show(self):
        try:
            if os.path.exists(self.progress_file):
                with open(self.progress_file, 'r') as f:
                    progress = json.load(f)

                print(f"📊 CURRENT PROGRESS:")
                print(f"   Time: {progress.get('time_str', 'Unknown')}")
                print(f"   Files processed: {progress.get('files_processed', 0):,}")
                print(f"   Samples collected: {progress.get('samples_collected', 0):,}")
                print(f"   Current batch: {progress.get('batch_num', 0)}")
                print(f"   Phase: {progress.get('phase', 'Unknown')}")
                return progress
            else:
                print("📊 No progress data found")
                return {}
        except Exception as e:
            print(f"❌ Error reading progress: {e}")
            return {}

# ====== RATE-LIMITED TRAINER ======
class RateLimitedTrainer:
    """
    Trainer that respects Box API rate limits
    """

    def __init__(self, client, main_folder_id, checkpoint_dir='/content/drive/MyDrive/Training_Checkpoints'):
        self.client = client
        self.main_folder_id = main_folder_id
        self.checkpoint_dir = checkpoint_dir

        # RATE-LIMITED SETTINGS (Respects API limits)
        self.batch_size = 100           # Smaller batches to reduce API calls
        self.max_workers = 4            # Fewer workers to avoid overwhelming API
        self.checkpoint_interval = 500   # Checkpoint more frequently
        self.memory_cleanup_interval = 300
        self.max_rows_per_file = 500    # Balance between speed and data quality
        self.max_sequence_length = 50   # Good balance

        # Rate limiting
        self.rate_limiter = rate_limiter

        # Create directories
        os.makedirs(checkpoint_dir, exist_ok=True)

        # File paths
        self.processed_files_log = f"{checkpoint_dir}/processed_files.json"
        self.features_checkpoint = f"{checkpoint_dir}/features_checkpoint.pkl"
        self.progress_monitor = ProgressMonitor(checkpoint_dir)

        print(f"🚀 Rate-Limited Trainer initialized")
        print(f"   Batch size: {self.batch_size} (API-friendly)")
        print(f"   Workers: {self.max_workers} (rate-limited)")
        print(f"   API rate limit: 6 requests/second")
        print(f"   Retry logic: Enabled")

    def save_checkpoint(self, processed_files, all_features, all_labels, batch_num):
        """Save training progress"""
        try:
            print(f"💾 Saving checkpoint at batch {batch_num}...")

            with open(self.processed_files_log, 'w') as f:
                json.dump(processed_files, f)

            checkpoint_data = {
                'features': all_features,
                'labels': all_labels,
                'batch_num': batch_num,
                'timestamp': time.time()
            }

            with open(self.features_checkpoint, 'wb') as f:
                pickle.dump(checkpoint_data, f, protocol=pickle.HIGHEST_PROTOCOL)

            self.progress_monitor.update(
                batch_num=batch_num,
                files_processed=len(processed_files),
                samples_collected=len(all_features),
                phase='data_processing'
            )

            print(f"✅ Checkpoint saved: {len(processed_files):,} files, {len(all_features):,} samples")

        except Exception as e:
            print(f"❌ Checkpoint save failed: {e}")

    def load_checkpoint(self):
        """Load previous training progress"""
        print("🔄 Checking for previous checkpoints...")

        if not os.path.exists(self.processed_files_log):
            print("No previous checkpoint found. Starting fresh.")
            return [], [], [], 0

        try:
            with open(self.processed_files_log, 'r') as f:
                processed_files = json.load(f)

            with open(self.features_checkpoint, 'rb') as f:
                checkpoint_data = pickle.load(f)

            print(f"📚 Checkpoint loaded!")
            print(f"   Files processed: {len(processed_files):,}")
            print(f"   Samples: {len(checkpoint_data['features']):,}")
            print(f"   Last batch: {checkpoint_data['batch_num']}")

            return (processed_files, checkpoint_data['features'],
                   checkpoint_data['labels'], checkpoint_data['batch_num'])

        except Exception as e:
            print(f"❌ Error loading checkpoint: {e}")
            return [], [], [], 0

    def get_csv_files_rate_limited(self, max_files=80000):
        """Rate-limited CSV discovery with rep1/rep2/rep3 filtering"""
        print(f"📡 RATE-LIMITED CSV DISCOVERY (max {max_files:,} files)")
        print("🎯 FILTERING: Only rep1, rep2, rep3 scenarios")
        print("=" * 60)

        self.progress_monitor.update(phase='file_discovery', files_target=max_files)

        # Check cache
        """
        cache_file = f"{self.checkpoint_dir}/csv_files_cache_filtered.json"

        if os.path.exists(cache_file):
            print("📁 Loading cached filtered file list...")
            try:
                with open(cache_file, 'r') as f:
                    cached_files = json.load(f)

                if len(cached_files) >= max_files:
                    print(f"✅ Using {len(cached_files[:max_files]):,} files from cache (rep1/rep2/rep3 only)")
                    return cached_files[:max_files]
            except:
                print("Cache corrupted, rescanning...")


        """
        start_time = time.time()
        # Navigate with rate limiting
        def get_folder_items_safe(folder, limit=1000):
            """Get folder items with rate limiting"""
            return self.rate_limiter.execute_with_retry(
                lambda: list(folder.get_items(limit=limit))
            )

        # Get main folder
        main_folder = self.client.folder(self.main_folder_id)

        # Find Parsed Time Series Data
        print("🔍 Finding Parsed Time Series Data folder...")
        parsed_folder = None

        main_items = get_folder_items_safe(main_folder, limit=100)
        for item in main_items:
            if item.name == 'Parsed Time Series Data':
                parsed_folder = self.client.folder(item.id)
                break

        if not parsed_folder:
            print("❌ Parsed Time Series Data folder not found")
            return []

        # Get scenario folders with rate limiting and filtering
        print("📁 Getting scenario folders (filtering for rep1/rep2/rep3)...")
        scenario_folders = []
        filtered_out_folders = []

        parsed_items = get_folder_items_safe(parsed_folder, limit=500)
        for item in parsed_items:
            if item.type == 'folder':
                folder_name = item.name

                # FILTER: Only include rep1, rep2, rep3
                if (folder_name.endswith('rep1') or
                    folder_name.endswith('rep2') or
                    folder_name.endswith('rep3')):
                    scenario_folders.append((folder_name, item.id))
                    print(f"   ✅ Including: {folder_name}")
                else:
                    filtered_out_folders.append(folder_name)
                    print(f"   ❌ Filtering out: {folder_name}")

        print(f"\n📊 Filtering results:")
        print(f"   ✅ Included scenarios: {len(scenario_folders)}")
        print(f"   ❌ Filtered out scenarios: {len(filtered_out_folders)}")

        if filtered_out_folders:
            print(f"   📋 Filtered out scenarios:")
            for folder in filtered_out_folders[:10]:  # Show first 10
                print(f"      - {folder}")
            if len(filtered_out_folders) > 10:
                print(f"      ... and {len(filtered_out_folders) - 10} more")

        if not scenario_folders:
            print("❌ No rep1/rep2/rep3 scenarios found!")
            return []

        # Rate-limited folder scanning
        all_csv_files = []

        def scan_folder_safe(folder_info):
            """Scan folder with rate limiting"""
            folder_name, folder_id = folder_info
            csv_files = []

            try:
                folder = self.client.folder(folder_id)

                # Rate-limited file listing
                items = self.rate_limiter.execute_with_retry(
                    lambda: list(folder.get_items(limit=2000))
                )

                for item in items:
                    if item.type == 'file' and item.name.endswith('.csv'):
                        csv_files.append({
                            'name': item.name,
                            'id': item.id,
                            'scenario': folder_name
                        })

                print(f"   📊 {folder_name}: {len(csv_files)} CSV files")

            except Exception as e:
                print(f"⚠️ Folder {folder_name} failed: {e}")

            return csv_files

        # Sequential processing to respect rate limits
        print(f"\n📄 Scanning {len(scenario_folders)} filtered folders for CSV files...")
        for folder_info in scenario_folders:
            csv_files = scan_folder_safe(folder_info)
            all_csv_files.extend(csv_files)

            if len(all_csv_files) % 1000 == 0:
                elapsed = time.time() - start_time
                print(f"   Found {len(all_csv_files):,} CSV files ({elapsed:.1f}s)")

            if len(all_csv_files) >= max_files:
                break

            # Small delay between folders to be extra safe
            time.sleep(0.1)

        # Analyze scenario distribution
        scenario_counts = {}
        for file_info in all_csv_files:
            scenario = file_info['scenario']
            scenario_counts[scenario] = scenario_counts.get(scenario, 0) + 1

        print(f"\n📊 CSV files by scenario (rep1/rep2/rep3 only):")
        for scenario, count in sorted(scenario_counts.items()):
            print(f"   {scenario}: {count:,} files")

        # Cache results
        try:
            with open(cache_file, 'w') as f:
                json.dump(all_csv_files[:max_files], f)
            print(f"💾 Cached {len(all_csv_files):,} filtered files")
        except:
            pass

        elapsed = time.time() - start_time
        print(f"\n✅ Filtered discovery complete:")
        print(f"   Total CSV files: {len(all_csv_files):,}")
        print(f"   Only rep1/rep2/rep3: ✅")
        print(f"   Discovery time: {elapsed:.1f}s")

        return all_csv_files[:max_files]

    def process_single_file_safe(self, file_info):
        """Process single file with rate limiting"""
        try:
            file_id = file_info['id']
            filename = file_info['name']

            # Rate-limited file download
            def download_file():
                file_obj = self.client.file(file_id)
                return file_obj.content()

            content = self.rate_limiter.execute_with_retry(download_file)

            # Fast CSV parsing
            df = pd.read_csv(
                io.StringIO(content.decode('utf-8')),
                usecols=[ 'VehTypeName', 'Speeds', 'VehFrontCoords'],
                nrows=self.max_rows_per_file,
                dtype={'VehTypeName': 'str'},
                low_memory=False,
                engine='c'
            )


            # DEBUG: Check what vehicle types we have
            if self.debug_first_file and filename == self.debug_first_file:
                print(f"\n🔍 DEBUG: Analyzing {filename}")
                print(f"Total rows in file: {len(df)}")
                print(f"Unique VehTypeName values:")
                for vtype in df['VehTypeName'].unique():
                    print(f"  - '{vtype}'")
                self.debug_first_file = None  # Only debug once

            if len(df) == 0:
                return [], [], filename

            # Fast label mapping
            def fast_map(veh_type):
                s = str(veh_type)
                if 'CAV' in s: return 'autonomous'
                elif 'Aggressive' in s: return 'aggressive'
                elif 'Cooperative' in s: return 'cooperative'
                elif 'Conventional' in s: return 'normal'

            df['behavior_label'] = df['VehTypeName'].apply(fast_map)


            # DEBUG: Check label distribution after mapping
            if not hasattr(self, 'file_count'):
                self.file_count = 0
            self.file_count += 1

            if self.file_count <= 5:  # Debug first 5 files
                print(f"\n📊 File {self.file_count}: {filename}")
                label_counts = df['behavior_label'].value_counts()
                print(f"Labels after mapping:")
                for label, count in label_counts.items():
                    print(f"  {label}: {count}")

            # Filter and process
            valid_df = df[
                (df['behavior_label'] != 'autonomous') &
                (df['Speeds'].notna()) &
                (df['VehFrontCoords'].notna())
            ]


            # DEBUG: Check what happened after filtering
            if self.file_count <= 5:
                if len(valid_df) > 0:
                    valid_label_counts = valid_df['behavior_label'].value_counts()
                    print(f"Labels after filtering:")
                    for label, count in valid_label_counts.items():
                        print(f"  {label}: {count}")
                else:
                    print(f"  No valid rows after filtering!")

            # Check if we lost any normal vehicles
            if 'normal' in label_counts and ('normal' not in valid_label_counts or valid_label_counts.get('normal', 0) == 0):
                print(f"  ⚠️ WARNING: Lost all 'normal' vehicles during filtering!")

            if len(valid_df) == 0:
                return [], [], filename

            # Feature extraction
            features_list = []
            labels_list = []

            for _, row in valid_df.iterrows():
                try:
                    speeds_str = str(row['Speeds']).replace('inf', '0').replace('nan', '0').replace('-inf', '0')
                    coords_str = str(row['VehFrontCoords']).replace('inf', '0').replace('nan', '0').replace('-inf', '0')

                    speeds = eval(speeds_str)
                    coords = eval(coords_str)

                    if len(speeds) >= 5 and len(coords) >= 5:
                        max_len = self.max_sequence_length
                        speeds = speeds[:max_len]
                        coords = coords[:max_len]

                        min_len = min(len(speeds), len(coords))

                        features = {
                            'speeds': np.array(speeds[:min_len], dtype=np.float32).reshape(-1, 1),
                            'positions': np.array(coords[:min_len], dtype=np.float32),
                            'sequence_length': min_len
                        }

                        features_list.append(features)
                        labels_list.append(row['behavior_label'])

                except:
                    continue

            return features_list, labels_list, filename

        except Exception as e:
            return [], [], file_info.get('name', 'unknown')

    def process_files_rate_limited(self, csv_files):
        """Process files with proper rate limiting"""
        print(f"📡 RATE-LIMITED PROCESSING {len(csv_files):,} files")
        print(f"Workers: {self.max_workers} (API-safe)")
        print("=" * 60)

        # Load checkpoint
        processed_files, all_features, all_labels, start_batch = self.load_checkpoint()
        processed_files_set = set(processed_files)

        # Filter remaining files
        remaining_files = [f for f in csv_files if f['name'] not in processed_files_set]

        if not remaining_files:
            print("✅ All files already processed!")
            return all_features, all_labels

        print(f"📊 Processing {len(remaining_files):,} remaining files")

        batch_size = self.batch_size
        total_batches = (len(remaining_files) + batch_size - 1) // batch_size

        start_time = time.time()
        files_processed = len(processed_files)

        for batch_idx in range(start_batch, total_batches):
            batch_start = batch_idx * batch_size
            batch_end = min(batch_start + batch_size, len(remaining_files))
            batch_files = remaining_files[batch_start:batch_end]

            print(f"\n📦 RATE-LIMITED BATCH {batch_idx + 1}/{total_batches}")
            print(f"   Files {batch_start + 1:,}-{batch_end:,}")

            batch_start_time = time.time()

            # Rate-limited parallel processing
            with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
                future_to_file = {
                    executor.submit(self.process_single_file_safe, file_info): file_info
                    for file_info in batch_files
                }

                batch_features = []
                batch_labels = []
                batch_processed = []

                for future in as_completed(future_to_file):
                    try:
                        features_list, labels_list, filename = future.result(timeout=60)

                        if features_list:
                            batch_features.extend(features_list)
                            batch_labels.extend(labels_list)

                        batch_processed.append(filename)
                        files_processed += 1

                    except Exception as e:
                        print(f"⚠️ File processing failed: {e}")
                        continue

            # Add batch results
            all_features.extend(batch_features)
            all_labels.extend(batch_labels)
            processed_files.extend(batch_processed)

            batch_time = time.time() - batch_start_time
            total_time = time.time() - start_time

            # Progress stats
            rate = files_processed / total_time if total_time > 0 else 0
            eta = (len(remaining_files) - files_processed) / rate if rate > 0 else 0

            print(f"   ✅ Batch complete: {len(batch_features):,} samples ({batch_time:.1f}s)")
            print(f"   📊 Progress: {files_processed:,}/{len(remaining_files):,} files")
            print(f"   📡 Rate: {rate:.1f} files/sec (API-safe)")
            print(f"   ⏱️ ETA: {eta/60:.1f} min")
            print(f"   💾 Total samples: {len(all_features):,}")

            # Update progress
            self.progress_monitor.update(
                batch_num=batch_idx + 1,
                files_processed=files_processed,
                samples_collected=len(all_features),
                processing_rate=rate,
                eta_minutes=eta/60
            )

            # Checkpoint more frequently
            if (batch_idx + 1) % (self.checkpoint_interval // batch_size) == 0:
                self.save_checkpoint(processed_files, all_features, all_labels, batch_idx + 1)

            # Memory cleanup
            if (batch_idx + 1) % (self.memory_cleanup_interval // batch_size) == 0:
                print("🧹 Memory cleanup...")
                del batch_features, batch_labels
                gc.collect()

        # Final checkpoint
        self.save_checkpoint(processed_files, all_features, all_labels, total_batches)

        total_time = time.time() - start_time
        print(f"\n✅ RATE-LIMITED PROCESSING COMPLETE")
        print(f"   Files processed: {files_processed:,}")
        print(f"   Total samples: {len(all_features):,}")
        print(f"   Total time: {total_time:.1f}s ({total_time/60:.1f} min)")
        print(f"   Safe rate: {files_processed/total_time:.1f} files/sec")

        return all_features, all_labels

    def train_with_rate_limits(self, max_files=80000):
        """Train with proper rate limiting"""
        print("📡 RATE-LIMITED 80K FILE TRAINING")
        print("=" * 70)

        # Initialize debug flags
        self.debug_first_file = True  # Will be set to first filename
        self.file_count = 0
        self.unexpected_types = set()



        training_start = time.time()

        self.progress_monitor.update(phase='starting', max_files=max_files)

        # Step 1: Rate-limited file discovery
        print("🔍 Step 1: Rate-limited file discovery...")
        csv_files = self.get_csv_files_rate_limited(max_files=max_files)

        if not csv_files:
            print("❌ No CSV files found")
            return None

        print(f"🎯 Will process {len(csv_files):,} files with rate limiting")

        # Step 2: Rate-limited processing
        print("\n📡 Step 2: Rate-limited processing...")
        all_features, all_labels = self.process_files_rate_limited(csv_files)

        if not all_features:
            print("❌ No features extracted")
            return None


        # Step 3: Data analysis
        print(f"\n📊 Step 3: Analyzing dataset...")
        from collections import Counter
        label_distribution = Counter(all_labels)

        print(f"RATE-LIMITED DATASET SUMMARY:")
        print(f"   Files processed: {len(csv_files):,}")
        print(f"   Total vehicles: {len(all_features):,}")
        print(f"   Label distribution:")
        for label, count in label_distribution.items():
            percentage = (count / len(all_labels)) * 100
            print(f"     {label}: {count:,} ({percentage:.1f}%)")

        self.progress_monitor.update(
            phase='data_preparation',
            total_samples=len(all_features),
            label_distribution=dict(label_distribution)
        )

        # Step 4: Data preparation
        print(f"\n🔧 Step 4: Data preparation...")

        def prepare_data_safe(features_list, max_sequence=50):
            total_samples = len(features_list)

            X_speed = np.zeros((total_samples, max_sequence, 1), dtype=np.float32)
            X_pos = np.zeros((total_samples, max_sequence, 2), dtype=np.float32)

            for i, features in enumerate(features_list):
                try:
                    speeds = features['speeds'][:max_sequence]
                    positions = features['positions'][:max_sequence]
                    seq_len = min(len(speeds), len(positions), max_sequence)

                    X_speed[i, :seq_len, 0] = speeds[:seq_len, 0]
                    X_pos[i, :seq_len, :] = positions[:seq_len, :]

                except:
                    continue

                if i % 10000 == 0:
                    print(f"   Processed {i:,}/{total_samples:,} samples")

            return X_speed, X_pos

        X_speed, X_pos = prepare_data_safe(all_features, max_sequence=self.max_sequence_length)

        # Clean up
        del all_features
        gc.collect()

        print(f"✅ Data prepared: {X_speed.shape[0]:,} samples")

        # Continue with training (same as before)
        split_idx = int(0.8 * len(all_labels))

        X_speed_train = X_speed[:split_idx]
        X_speed_test = X_speed[split_idx:]
        X_pos_train = X_pos[:split_idx]
        X_pos_test = X_pos[split_idx:]

        train_labels = all_labels[:split_idx]
        test_labels = all_labels[split_idx:]

        print(f"Train samples: {len(train_labels):,}")
        print(f"Test samples: {len(test_labels):,}")

        # Label encoding
        from sklearn.preprocessing import LabelEncoder
        label_encoder = LabelEncoder()
        label_encoder.fit(all_labels)

        y_train = label_encoder.transform(train_labels)
        y_test = label_encoder.transform(test_labels)


        from collections import Counter
        print(f"Label distribution in all_labels: {Counter(all_labels)}")
        print(f"Unique labels: {set(all_labels)}")
        print(f"Total samples: {len(all_labels)}")

        print(f"Classes: {label_encoder.classes_}")

        self.progress_monitor.update(
            phase='model_training',
            train_samples=len(y_train),
            test_samples=len(y_test),
            classes=list(label_encoder.classes_)
        )

        # Model training
        print(f"\n🚀 Step 5: Model training...")

        num_classes = len(label_encoder.classes_)
        model = build_speed_position_model_fixed(
            sequence_length=self.max_sequence_length,
            num_classes=num_classes
        )

        print(f"Training with rate-limited dataset...")
        print(f"Training samples: {len(y_train):,}")

        try:
            callbacks = [
                tf.keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
                tf.keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=3),
            ]

            history = model.fit(
                [X_speed_train, X_pos_train], y_train,
                epochs=20,
                batch_size=256,
                validation_split=0.15,
                callbacks=callbacks,
                verbose=1
            )

            print("✅ Training completed!")

            # Evaluation
            print(f"\n📊 Evaluation...")

            y_pred_probs = model.predict([X_speed_test, X_pos_test], batch_size=512, verbose=0)
            y_pred = np.argmax(y_pred_probs, axis=1)

            accuracy = np.mean(y_test == y_pred)

            from sklearn.metrics import classification_report, f1_score
            f1_macro = f1_score(y_test, y_pred, average='macro')

            print(f"RATE-LIMITED RESULTS:")
            print(f"   Test Accuracy: {accuracy:.3f}")
            print(f"   F1 Score (macro): {f1_macro:.3f}")

            print(f"\nClassification Report:")
            print(classification_report(y_test, y_pred, target_names=label_encoder.classes_))

            # Save model
            print(f"\n💾 Saving model...")

            model.save(f"{self.checkpoint_dir}/rate_limited_80k_model.keras")

            cl_system.model = model
            cl_system.label_encoder = label_encoder
            cl_system.save_model()
            cl_system.save_label_encoder()

            total_time = time.time() - training_start

            self.progress_monitor.update(
                phase='completed',
                total_time=total_time,
                final_accuracy=accuracy,
                f1_macro=f1_macro
            )

            print(f"\n🎉 RATE-LIMITED TRAINING COMPLETE!")
            print(f"=" * 60)
            print(f"Files processed: {len(csv_files):,}")
            print(f"Total vehicles: {len(all_labels):,}")
            print(f"Training time: {total_time:.1f}s ({total_time/3600:.1f} hours)")
            print(f"Final accuracy: {accuracy:.3f}")
            print(f"API-safe: No rate limit errors")

            return {
                'model': model,
                'label_encoder': label_encoder,
                'history': history,
                'accuracy': accuracy,
                'f1_macro': f1_macro,
                'files_processed': len(csv_files),
                'total_vehicles': len(all_labels),
                'training_time': total_time,
                'label_distribution': label_distribution
            }

        except Exception as e:
            print(f"❌ Training failed: {e}")
            import traceback
            traceback.print_exc()
            return None

        # At the end, check how many normal vehicles made it through
        normal_extracted = sum(1 for label in labels_list if label == 'normal')
        print(f"   Normal vehicles: {normal_in_valid} in valid_df -> {normal_extracted} extracted")


        # DEBUG: Print summary of what we found
        print(f"\n🔍 DEBUG SUMMARY:")
        print(f"Total files processed: {files_processed}")
        print(f"Total samples collected: {len(all_features)}")

        # Check label distribution
        from collections import Counter
        label_counts = Counter(all_labels)
        print(f"Final label distribution:")
        for label, count in label_counts.items():
            print(f"  {label}: {count}")

        if hasattr(self, 'unexpected_types') and self.unexpected_types:
            print(f"\nUnexpected vehicle types found:")
            for vtype in sorted(self.unexpected_types):
                print(f"  - '{vtype}'")

    def debug_process_single_file_ultra_fast(self, file_info):
      """Debug version with label tracking"""
      try:
          file_id = file_info['id']
          filename = file_info['name']

          # Get file and download
          file_obj = self.client.file(file_id)
          content = file_obj.content()

          # Ultra-fast CSV parsing
          df = pd.read_csv(
              io.StringIO(content.decode('utf-8')),
              usecols=['VehNr', 'VehTypeName', 'Speeds', 'VehFrontCoords'],
              nrows=self.max_rows_per_file,
              dtype={'VehNr': 'str', 'VehTypeName': 'str'},
              low_memory=False,
              engine='c'
          )

          if len(df) == 0:
              return [], [], filename, {}

          # Debug: Check unique VehTypeName values
          unique_types = df['VehTypeName'].unique()

          # Ultra-fast label mapping with debug info
          def ultra_fast_map(veh_type):
              s = str(veh_type)
              if 'CAV' in s: return 'autonomous'
              elif 'Aggressive' in s: return 'aggressive'
              elif 'Cooperative' in s: return 'cooperative'
              elif 'Conventional' in s or 'Gipps' in s or 'Normal' in s: return 'normal'
              else: return 'normal'

          df['behavior_label'] = df['VehTypeName'].apply(ultra_fast_map)

          # Debug: Count labels before filtering
          label_counts_before = df['behavior_label'].value_counts().to_dict()

          # Filter and process
          valid_df = df[
              (df['behavior_label'] != 'autonomous') &
              (df['Speeds'].notna()) &
              (df['VehFrontCoords'].notna())
          ]

          # Debug: Count labels after filtering
          label_counts_after = valid_df['behavior_label'].value_counts().to_dict()

          debug_info = {
              'unique_types': list(unique_types),
              'labels_before_filter': label_counts_before,
              'labels_after_filter': label_counts_after
          }

          if len(valid_df) == 0:
              return [], [], filename, debug_info

          # Rest of processing...
          features_list = []
          labels_list = []

          for _, row in valid_df.iterrows():
              try:
                  speeds_str = str(row['Speeds']).replace('inf', '0').replace('nan', '0').replace('-inf', '0')
                  coords_str = str(row['VehFrontCoords']).replace('inf', '0').replace('nan', '0').replace('-inf', '0')

                  speeds = eval(speeds_str)
                  coords = eval(coords_str)

                  if len(speeds) >= 3 and len(coords) >= 3:
                      max_len = self.max_sequence_length
                      speeds = speeds[:max_len]
                      coords = coords[:max_len]

                      min_len = min(len(speeds), len(coords))

                      features = {
                          'speeds': np.array(speeds[:min_len], dtype=np.float32).reshape(-1, 1),
                          'positions': np.array(coords[:min_len], dtype=np.float32),
                          'sequence_length': min_len
                      }

                      features_list.append(features)
                      labels_list.append(row['behavior_label'])

              except:
                  continue

          return features_list, labels_list, filename, debug_info

      except Exception as e:
          print(f"Error processing file: {e}")
          return [], [], file_info.get('name', 'unknown'), {}
# ====== MONITORING FUNCTIONS ======
def show_progress():
    """Show current training progress"""
    try:
        monitor = ProgressMonitor('/content/drive/MyDrive/Training_Checkpoints')
        return monitor.show()
    except Exception as e:
        print(f"❌ Error showing progress: {e}")

def resume_training():
    """Resume training from checkpoint"""
    print("🔄 RESUMING RATE-LIMITED TRAINING FROM CHECKPOINT")
    trainer = RateLimitedTrainer(client, main_folder_id)
    return trainer.train_with_rate_limits(max_files=80000)

# ====== MAIN EXECUTION ======
print("📡 RATE-LIMITED 80K TRAINER READY")
print("=" * 70)
print("🔧 Rate limiting features:")
print("   ✅ 6 requests/second limit")
print("   ✅ Exponential backoff retry")
print("   ✅ 4 workers (API-safe)")
print("   ✅ 100 files per batch")
print("   ✅ Connection pool management")
print("   ✅ Auto-retry on 429 errors")
print()
print("📋 Available commands:")
print("   show_progress() - Check current progress")
print("   resume_training() - Resume from checkpoint")

# Create and start rate-limited trainer
print("\n📡 STARTING RATE-LIMITED 80K FILE TRAINING")
print("=" * 70)

trainer = RateLimitedTrainer(client, main_folder_id)

print(f"Rate-Limited Configuration:")
print(f"   Max files: 80,000")
print(f"   Workers: 4 (API-safe)")
print(f"   Rate limit: 6 requests/second")
print(f"   Batch size: 100 files")
print(f"   Auto-retry: Enabled")
print(f"   Checkpoints: Every 500 files")

# Start rate-limited training
result = trainer.train_with_rate_limits(max_files=80000)

if result:
    print(f"\n🎉 RATE-LIMITED SUCCESS!")
    print(f"   Files processed: {result['files_processed']:,}")
    print(f"   Total vehicles: {result['total_vehicles']:,}")
    print(f"   Training time: {result['training_time']/3600:.1f} hours")
    print(f"   Final accuracy: {result['accuracy']:.3f}")
    print(f"   No API errors: ✅")

    # Stop keep-alive
    keep_alive.stop()
else:
    print(f"\n❌ Training failed - run resume_training() to continue")

print(f"\n💾 All checkpoints saved to: /content/drive/MyDrive/Training_Checkpoints")
print(f"📡 API-safe trainer - no more 429 errors!")

🔧 Setting up GPU memory management...
✅ GPU memory growth enabled: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
🔄 Runtime keep-alive started
📡 RATE-LIMITED 80K TRAINER READY
🔧 Rate limiting features:
   ✅ 6 requests/second limit
   ✅ Exponential backoff retry
   ✅ 4 workers (API-safe)
   ✅ 100 files per batch
   ✅ Connection pool management
   ✅ Auto-retry on 429 errors

📋 Available commands:
   show_progress() - Check current progress
   resume_training() - Resume from checkpoint

📡 STARTING RATE-LIMITED 80K FILE TRAINING
🚀 Rate-Limited Trainer initialized
   Batch size: 100 (API-friendly)
   Workers: 4 (rate-limited)
   API rate limit: 6 requests/second
   Retry logic: Enabled
Rate-Limited Configuration:
   Max files: 80,000
   Workers: 4 (API-safe)
   Rate limit: 6 requests/second
   Batch size: 100 files
   Auto-retry: Enabled
   Checkpoints: Every 500 files
📡 RATE-LIMITED 80K FILE TRAINING
🔍 Step 1: Rate-limited file discovery...
📡 RATE-LIMITED CSV DISCOVERY (m

Training with rate-limited dataset...
Training samples: 40,000
Epoch 1/20
[1m133/133[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 68ms/step - accuracy: 0.6714 - loss: 0.7032 - val_accuracy: 0.4883 - val_loss: 0.7472 - learning_rate: 0.0010
Epoch 2/20
[1m133/133[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 18ms/step - accuracy: 0.7131 - loss: 0.6099 - val_accuracy: 0.5175 - val_loss: 0.7290 - learning_rate: 0.0010
Epoch 3/20
[1m133/133[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 17ms/step - accuracy: 0.7229 - loss: 0.5888 - val_accuracy: 0.5318 - val_loss: 0.7673 - learning_rate: 0.0010
Epoch 4/20
[1m133/133[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 17ms/step - accuracy: 0.7330 - loss: 0.5744 - val_accuracy: 0.5347 - val_loss: 0.7629 - learning_rate: 0.0010
Epoch 5/20
[1m133/133[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 17ms/step - accuracy: 0.7357 - loss: 0.5718 - val_accuracy: 0.5360 - val_loss: 0.7784 - learning_rate: 0.0010
E



RATE-LIMITED RESULTS:
   Test Accuracy: 0.510
   F1 Score (macro): 0.463

Classification Report:
              precision    recall  f1-score   support

  aggressive       0.50      0.82      0.62      4912
 cooperative       0.55      0.21      0.30      5088

    accuracy                           0.51     10000
   macro avg       0.53      0.52      0.46     10000
weighted avg       0.53      0.51      0.46     10000


💾 Saving model...
❌ Training failed: name 'cl_system' is not defined

❌ Training failed - run resume_training() to continue

💾 All checkpoints saved to: /content/drive/MyDrive/Training_Checkpoints
📡 API-safe trainer - no more 429 errors!


Traceback (most recent call last):
  File "/tmp/ipython-input-48-2476193913.py", line 817, in train_with_rate_limits
    cl_system.model = model
    ^^^^^^^^^
NameError: name 'cl_system' is not defined
