# GPS Spoofing Detection - Quick Demo

This notebook demonstrates the complete pipeline for GPS spoofing detection using synthetic data.

## Pipeline Steps:
1. Generate synthetic GPS signals
2. Extract features from correlation profiles
3. Train Random Forest classifier
4. Evaluate performance
5. Visualize results

In [None]:
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Add src to path
sys.path.insert(0, '..')

from src.utils.synthetic_data import generate_synthetic_dataset, create_synthetic_features_dataframe
from src.features.feature_pipeline import build_feature_vector
from src.models.training import train_model
from src.models.evaluation import evaluate_model, generate_evaluation_report
from src.utils.plots import (
    plot_confusion_matrix,
    plot_roc_curves,
    plot_feature_distributions,
    plot_correlation_profile
)

print("Imports successful!")

## Step 1: Generate Synthetic Dataset

Generate authentic and spoofed GPS signals with realistic characteristics.

In [None]:
# Generate dataset
print("Generating synthetic GPS signals...")
signals, labels = generate_synthetic_dataset(
    n_authentic=100,
    n_spoofed=100,
    duration_s=0.5,
    fs=5e6,
    prn=1,
    seed=42
)

print(f"Generated {len(signals)} signals")
print(f"Authentic: {np.sum(labels == 0)}")
print(f"Spoofed: {np.sum(labels == 1)}")

## Step 2: Extract Features

Extract correlation and temporal features from signals.

In [None]:
# Extract features
print("Extracting features...")
features_df = build_feature_vector(
    signals,
    fs=5e6,
    prn=1,
    include_correlation=True,
    include_temporal=True,
    include_cn0_variation=True
)

# Add labels
features_df['label'] = labels

print(f"Features shape: {features_df.shape}")
print(f"\nFirst few features:")
features_df.head()

## Step 3: Visualize Feature Distributions

Compare feature distributions between authentic and spoofed signals.

In [None]:
# Select a few key features to visualize
key_features = [
    'corr_peak_height',
    'corr_fwhm',
    'corr_peak_ratio',
    'corr_asymmetry',
    'temp_cn0_estimate',
    'cn0var_cn0_std'
]

fig = plot_feature_distributions(
    features_df,
    label_column='label',
    feature_columns=key_features,
    class_names={0: 'Authentic', 1: 'Spoofed'},
    ncols=3
)
plt.tight_layout()
plt.show()

## Step 4: Train Classification Model

Train a Random Forest classifier with balanced class weights.

In [None]:
# Prepare data
X = features_df.drop(['segment_id', 'label'], axis=1, errors='ignore').values
y = features_df['label'].values

print(f"Training data shape: {X.shape}")
print(f"Number of features: {X.shape[1]}")

# Train model
print("\nTraining Random Forest model...")
model, info = train_model(
    X, y,
    model_name='random_forest',
    test_size=0.3,
    random_state=42
)

print(f"Training complete!")
print(f"Training samples: {info['n_train_samples']}")
print(f"Test samples: {info['n_test_samples']}")

## Step 5: Evaluate Model Performance

In [None]:
# Evaluate model
metrics = evaluate_model(
    model,
    info['X_test'],
    info['y_test'],
    info['X_train'],
    info['y_train'],
    class_names=['Authentic', 'Spoofed']
)

# Generate report
report = generate_evaluation_report(
    model,
    info['X_test'],
    info['y_test'],
    info['X_train'],
    info['y_train'],
    class_names=['Authentic', 'Spoofed']
)

print(report)

## Step 6: Visualize Results

In [None]:
# Confusion Matrix
fig_cm = plot_confusion_matrix(
    metrics['confusion_matrix'],
    class_names=['Authentic', 'Spoofed'],
    title='Confusion Matrix - Random Forest'
)
plt.show()

In [None]:
# ROC Curve
if 'roc_curve' in metrics:
    roc_data = {
        'Random Forest': {
            'fpr': metrics['roc_curve']['fpr'],
            'tpr': metrics['roc_curve']['tpr'],
            'auc': metrics.get('roc_auc', 0)
        }
    }
    fig_roc = plot_roc_curves(roc_data, title='ROC Curve')
    plt.show()

## Step 7: Feature Importance

Analyze which features are most important for classification.

In [None]:
from src.models.evaluation import compute_feature_importance
from src.utils.plots import plot_feature_importance

# Get feature names
feature_names = [col for col in features_df.columns if col not in ['segment_id', 'label']]

# Compute importance
importance_data = compute_feature_importance(
    model,
    feature_names=feature_names,
    top_n=15
)

# Plot
importance_dict = dict(importance_data['top_features'])
fig = plot_feature_importance(
    importance_dict,
    top_n=15,
    title='Top 15 Most Important Features'
)
plt.tight_layout()
plt.show()

print("\nTop 5 Features:")
for name, score in importance_data['top_features'][:5]:
    print(f"  {name}: {score:.4f}")

## Summary

This notebook demonstrated the complete GPS spoofing detection pipeline:

1. **Data Generation**: Created synthetic GPS signals with authentic and spoofed characteristics
2. **Feature Extraction**: Computed correlation-based and temporal features
3. **Model Training**: Trained Random Forest classifier with balanced classes
4. **Evaluation**: Achieved high accuracy on test set
5. **Analysis**: Identified most important features for detection

### Key Takeaways:
- Correlation features (FWHM, peak ratio) are highly discriminative
- C/N0 variation is a strong indicator of spoofing
- Random Forest with class balancing performs excellently

### Next Steps:
- Test with real GPS data (FGI-SpoofRepo)
- Experiment with multi-PRN analysis
- Explore deep learning approaches