# PCAP Analysis Test Notebook

This notebook tests the PCAP/CSV analysis pipeline.

**What it does:**
1. Loads the trained models
2. Extracts features from PCAP files (or uses existing CSV)
3. Runs threat detection
4. Shows results

**Prerequisites:**
- Trained models in `models/` directory
- NFStream installed: `pip install nfstream`
- For Windows: Npcap must be installed (https://npcap.com/)


In [None]:
# Setup path and imports
import sys
from pathlib import Path

# Add src to path
sys.path.insert(0, str(Path.cwd().parent / 'src'))

# Check imports
try:
    from predictor import NetworkThreatPredictor
    from feature_extractor import PCAPFeatureExtractor, CICIDS2017_FEATURES
    from analyzer import NetworkThreatAnalyzer
    print("✓ All modules imported successfully!")
except ImportError as e:
    print(f"✗ Import error: {e}")
    print("Make sure you're running from the notebooks/ directory")

# Check NFStream availability
try:
    import nfstream
    print(f"✓ NFStream version: {nfstream.__version__}")
except ImportError:
    print("⚠ NFStream not installed. Install with: pip install nfstream")
    print("  On Windows, you also need Npcap: https://npcap.com/")


## 1. Initialize the Analyzer


In [None]:
# Initialize the analyzer (loads models automatically)
analyzer = NetworkThreatAnalyzer()


## 2. Test with CSV File (Quick Test)

First, let's test the prediction pipeline using one of the existing CSV files from the CICIDS2017 dataset.


In [None]:
# Test with a small sample from the DDoS file
import pandas as pd
from pathlib import Path

DATASET_DIR = Path.cwd().parent / 'dataset'

# Load a small sample for quick testing
test_file = DATASET_DIR / 'Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv'
print(f"Loading test file: {test_file.name}")

df_test = pd.read_csv(test_file, nrows=1000)
df_test.columns = df_test.columns.str.strip()
print(f"Loaded {len(df_test)} rows")
print(f"\nActual labels in sample:")
print(df_test['Label'].value_counts())


In [None]:
# Run predictions using the predictor directly
predictions = analyzer.predictor.predict(df_test, model_type='multiclass')
print(f"Predictions made: {len(predictions)}")
print(f"\nPrediction distribution:")
unique, counts = pd.Series(predictions).value_counts().items(), pd.Series(predictions).value_counts().values
for label, count in pd.Series(predictions).value_counts().items():
    print(f"  {label}: {count}")


In [None]:
# Compare predictions with actual labels
from sklearn.metrics import accuracy_score, classification_report

# Get actual labels
actual_labels = df_test['Label'].values

# Calculate accuracy
accuracy = accuracy_score(actual_labels, predictions)
print(f"Accuracy on test sample: {accuracy:.4f} ({accuracy*100:.2f}%)")

print(f"\nDetailed Report:")
print(classification_report(actual_labels, predictions))


## 3. Test the Full Analysis Pipeline (CSV)


In [None]:
# Use the quick_scan function for fast analysis
results = analyzer.quick_scan(test_file, max_flows=5000)
print(f"\nQuick scan complete!")
print(f"Threat detected: {results['threat_detected']}")
print(f"Summary: {results['summary']}")


## 4. Test PCAP Feature Extraction

If you have a PCAP file, you can test the feature extraction here.

**Note:** This requires:
- NFStream installed (`pip install nfstream`)
- On Windows: Npcap installed (https://npcap.com/)


In [None]:
# Check if NFStream is available
extractor = PCAPFeatureExtractor()

if extractor.nfstream_available:
    print("✓ NFStream is available!")
    print(f"  Ready to process PCAP files")
    print(f"\nRequired features for model: {len(CICIDS2017_FEATURES)}")
else:
    print("⚠ NFStream not available")
    print("To install:")
    print("  1. pip install nfstream")
    print("  2. On Windows, install Npcap: https://npcap.com/")


In [None]:
# OPTIONAL: Test with a PCAP file
# Uncomment and modify the path to test with your PCAP file

# pcap_file = Path("path/to/your/file.pcap")
# 
# if pcap_file.exists() and extractor.nfstream_available:
#     print(f"Analyzing PCAP: {pcap_file}")
#     results = analyzer.analyze_pcap(pcap_file, model_type='multiclass', max_flows=10000)
#     print(f"\nResults: {results['summary']}")
# else:
#     print("PCAP file not found or NFStream not available")

print("Uncomment the code above to test with a PCAP file")


## 5. Example: Full CSV Analysis with Results Saved


In [None]:
# Full analysis with results saved to disk
# Using a small sample for demonstration

import pandas as pd

# Load a sample and save as a test file
sample_df = pd.read_csv(test_file, nrows=2000)
sample_file = Path.cwd().parent / 'data_processed' / 'test_sample.csv'
sample_file.parent.mkdir(exist_ok=True)
sample_df.to_csv(sample_file, index=False)
print(f"Created test sample: {sample_file}")

# Run full analysis
print("\nRunning full analysis...")
results = analyzer.analyze_csv(sample_file, model_type='multiclass', save_results=True)


In [None]:
# View the results dataframe
results_df = results['dataframe']
print(f"Results DataFrame shape: {results_df.shape}")
print(f"\nPrediction column added: {'Prediction' in results_df.columns}")
print(f"\nSample of predictions:")
print(results_df[['Destination Port', 'Flow Duration', 'Total Fwd Packets', 'Prediction']].head(10))


## Summary

✅ **Pipeline Components:**
1. `NetworkThreatPredictor` - Loads models and makes predictions
2. `PCAPFeatureExtractor` - Extracts CICIDS2017 features from PCAP files
3. `NetworkThreatAnalyzer` - End-to-end analysis pipeline

**Next Steps:**
1. Install NFStream and Npcap (for Windows) to enable PCAP analysis
2. Test with your real PCAP files
3. Integrate into your web application
