<a href="https://colab.research.google.com/github/Micro-Maxis/Intrusion-detection-system-dli/blob/main/Mateen_revised.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
!git clone https://github.com/ICL-ml4csec/Mateen.git
!cd /content/Mateen
!find . -name "*.py" -type f | head -10

fatal: destination path 'Mateen' already exists and is not an empty directory.
./Mateen/Mateen.py
./Mateen/MateenUtils/main.py
./Mateen/MateenUtils/nsl_preprocessing.py
./Mateen/MateenUtils/AE.py
./Mateen/MateenUtils/merge_utils.py
./Mateen/MateenUtils/utils.py
./Mateen/MateenUtils/selection_utils.py


In [None]:
# Quick fix for Mateen - run this in Colab

%cd /content/Mateen/MateenUtils

# Read the current main.py
with open('main.py', 'r') as f:
    content = f.read()

# Find and replace the ensemble_training function call in adaptive_ensemble
# The issue is that load_mode parameter is not being passed

# Replace the problematic line
old_line = 'model = ensemble_training(x_train, y_train=y_train, num_epochs=100, mode="init", scenario=args.dataset_name)'
new_line = 'model = ensemble_training(x_train, y_train=y_train, num_epochs=100, mode="init", scenario=args.dataset_name, load_mode="new")'

if old_line in content:
    content = content.replace(old_line, new_line)
    print("✓ Fixed ensemble_training call to force new model")
else:
    print("⚠ Line not found, trying alternative fix...")
    # Alternative: modify the load_model function itself
    old_load_func = '''def load_model(load_mode, input_shape, scenario, train_loader, data, num_epochs):
    if load_mode == "new":
        model = model_base.autoencoder(input_shape)
        model = model_update(data, num_epochs=num_epochs, model=model)
    else:
        model = torch.load(f'Models/{scenario}.pth').to(device)
    return model'''

    new_load_func = '''def load_model(load_mode, input_shape, scenario, train_loader, data, num_epochs):
    # Always create new model for now (override load_mode)
    if load_mode == "new" or load_mode is None or True:  # Force new model creation
        model = model_base.autoencoder(input_shape)
        model = model_update(data, num_epochs=num_epochs, model=model)
    else:
        # Fallback to loading existing model (won't reach here due to condition above)
        try:
            model = torch.load(f'Models/{scenario}.pth').to(device)
        except FileNotFoundError:
            print(f"Model file not found, creating new model instead...")
            model = model_base.autoencoder(input_shape)
            model = model_update(data, num_epochs=num_epochs, model=model)
    return model'''

    # Replace the load_model function
    if 'def load_model(' in content:
        # Find the function and replace it
        lines = content.split('\n')
        new_lines = []
        in_load_model = False
        skip_lines = False

        for line in lines:
            if 'def load_model(' in line:
                in_load_model = True
                skip_lines = True
                new_lines.extend(new_load_func.split('\n'))
            elif in_load_model and (line.startswith('def ') or line.startswith('class ')):
                in_load_model = False
                skip_lines = False
                new_lines.append(line)
            elif not skip_lines:
                new_lines.append(line)
            elif in_load_model and line.strip() == 'return model':
                skip_lines = False  # End of function

        content = '\n'.join(new_lines)
        print("✓ Modified load_model function to always create new models")

# Write the fixed main.py
with open('main.py', 'w') as f:
    f.write(content)

print("✅ main.py has been fixed to force new model creation!")

# Also ensure the Models directory exists
import os
os.makedirs('/content/Mateen/Models', exist_ok=True)
print("✅ Models directory created")

/content/Mateen/MateenUtils
✓ Fixed ensemble_training call to force new model
✅ main.py has been fixed to force new model creation!
✅ Models directory created


In [None]:
%cd /content/Mateen

# Restart the imports to pick up the fixed main.py
import sys
sys.path.append('MateenUtils/')

# Clear the module cache to reload the fixed main.py
if 'main' in sys.modules:
    del sys.modules['main']
if 'Mateen_main' in sys.modules:
    del sys.modules['Mateen_main']

# Import the fixed modules
import nsl_preprocessing as dp
import utils
import main as Mateen_main
import pandas as pd
import numpy as np

class Args:
    dataset_name = "NSLKDD"
    window_size = 50000
    performance_thres = 0.99
    max_ensemble_length = 3
    selection_budget = 0.01
    mini_batch_size = 1000
    retention_rate = 0.3
    lambda_0 = 0.1
    shift_threshold = 0.05
    n_features = 15

args = Args()

print("=== Running Mateen with Enhanced NSL-KDD (Fixed Version) ===")

try:
    # Load enhanced preprocessed data
    print("1. Loading enhanced preprocessed NSL-KDD data...")
    x_train, x_test, y_train, y_test = dp.prepare_data("NSLKDD")
    print(f"✓ Data loaded: Train {x_train.shape}, Test {x_test.shape}")

    # Partition data into windows
    print("2. Partitioning data into windows...")
    x_slice, y_slice = dp.partition_array(x_data=x_test, y_data=y_test, slice_size=args.window_size)
    print(f"✓ Data partitioned into {len(x_slice)} windows")

    # Run Mateen adaptive ensemble
    print("3. Running Mateen adaptive ensemble...")
    print("   (This will train a new autoencoder model from scratch)")
    predictions, probs_list = Mateen_main.adaptive_ensemble(x_train, y_train, x_slice, y_slice, args)
    print("✓ Ensemble training completed!")

    # Evaluate results
    print("4. Evaluating results...")
    result = utils.getResult(y_test, predictions)
    auc_rocs = utils.auc_roc_in_chunks(y_test, probs_list, chunk_size=args.window_size)

    # Display final results
    print(f'\n🎯 FINAL RESULTS WITH ENHANCED PREPROCESSING:')
    print(f'   Average AUC-ROC: {np.mean(auc_rocs):.4f}')
    print(f'   Standard Deviation: {np.std(auc_rocs):.4f}')
    print(f'   Total Predictions: {len(predictions)}')
    print(f'   Test Samples: {len(y_test)}')

    # Save results
    print("5. Saving results...")
    import os
    os.makedirs('Results', exist_ok=True)

    df = pd.DataFrame({
        'Probabilities': probs_list,
        'Predictions': predictions,
        'True_Labels': y_test[:len(predictions)]  # Match lengths
    })

    result_file = f'Results/NSLKDD-enhanced-{args.n_features}feat-{args.selection_budget}.csv'
    df.to_csv(result_file, index=False)

    print(f'💾 Results saved to: {result_file}')

    # Summary of improvements
    print(f'\n🚀 ENHANCED PREPROCESSING SUMMARY:')
    print(f'   ✅ Consensus feature selection: {args.n_features} features from 54 engineered')
    print(f'   ✅ AutoEncoder optimized: Normal samples only for training')
    print(f'   ✅ Robust outlier handling and scaling')
    print(f'   ✅ Advanced feature engineering (ratios, logs, security scores)')
    print(f'   ✅ Mateen ensemble successfully trained and evaluated')

    print(f'\n🎉 SUCCESS! Enhanced NSL-KDD preprocessing integrated with Mateen!')

except Exception as e:
    print(f"\n❌ Error during execution: {e}")
    import traceback
    traceback.print_exc()

    # Show what worked
    print(f"\n📊 What worked so far:")
    print(f"   ✅ Enhanced preprocessing (15 features selected)")
    print(f"   ✅ Data loading and partitioning")
    print(f"   ❌ Mateen ensemble training (error occurred)")

/content/Mateen
=== Running Mateen with Enhanced NSL-KDD (Fixed Version) ===
1. Loading enhanced preprocessed NSL-KDD data...
=== Universal NSL-KDD Preprocessing ===
1. Loading and cleaning data...
   Original: Train (125973, 39), Test (22544, 39)
2. Engineering features...
   After engineering: (125973, 54)
3. Handling outliers...
4. Encoding categorical features...
5. Scaling features...
6. Selecting top 15 features...
Selected 15 features using consensus:
   1. root_shell
   2. duration
   3. num_root
   4. num_file_creations
   5. num_shells
   6. num_access_files
   7. is_guest_login
   8. count
   9. srv_count
  10. serror_rate
  11. srv_serror_rate
  12. rerror_rate
  13. srv_rerror_rate
  14. same_srv_rate
  15. diff_srv_rate
   Final: Train (125973, 15), Test (22544, 15)
   Class balance: Normal 67343, Attack 58630
Enhanced NSLKDD for Mateen: Train (67343, 15), Test (22544, 15)
✓ Data loaded: Train (67343, 15), Test (22544, 15)
2. Partitioning data into windows...
Data partiti

100%|██████████| 100/100 [04:39<00:00,  2.79s/it]


Updating Models Process Started!
Step 1/1
✓ Ensemble training completed!
4. Evaluating results...
Predicted Labels Counter({np.int64(0): 15245, np.int64(1): 7299})
True Labels Counter({np.int64(1): 12833, np.int64(0): 9711})
Positive label: 0
General Accuracy: 72.6224
Recall: 96.7151
Precision: 61.6071
F1 Score: 75.2685
True Negative Rate: 54.3910
True Positive Rate: 96.72%
Macro Recall: 75.5530
Macro Precision: 78.6183
Macro F1 Score: 72.3054
Balanced Accuracy: 75.5530

🎯 FINAL RESULTS WITH ENHANCED PREPROCESSING:
   Average AUC-ROC: 0.7344
   Standard Deviation: 0.0000
   Total Predictions: 22544
   Test Samples: 22544
5. Saving results...
💾 Results saved to: Results/NSLKDD-enhanced-15feat-0.01.csv

🚀 ENHANCED PREPROCESSING SUMMARY:
   ✅ Consensus feature selection: 15 features from 54 engineered
   ✅ AutoEncoder optimized: Normal samples only for training
   ✅ Robust outlier handling and scaling
   ✅ Advanced feature engineering (ratios, logs, security scores)
   ✅ Mateen ensemble s