# 📊 Anomaly Profiling for Security Event Analysis

This notebook provides comprehensive analysis and profiling of anomalies detected in security logs. It helps security analysts understand anomaly patterns, validate detection algorithms, and tune anomaly scoring thresholds.

## Anomaly Types
- **Statistical Outliers**: Events that deviate significantly from normal patterns
- **Behavioral Deviations**: Unusual user/entity behavior compared to historical baselines
- **Rare Entity Combinations**: Uncommon combinations of entities in security events
- **Time Pattern Anomalies**: Events occurring at unusual times for specific entities
- **Privilege Escalation**: Patterns indicating potential privilege abuse
- **Lateral Movement**: Patterns suggesting unauthorized access expansion

## Analysis Features
- Real-time anomaly scoring and visualization
- Historical trend analysis
- Entity behavior profiling
- False positive analysis
- Threshold optimization
- Security incident correlation

In [None]:
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output
from datetime import datetime, timedelta
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from typing import List, Dict, Tuple, Optional
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")

# Import project modules
import sys
sys.path.append('../')
from src.detector.scoring import (
    AnomalyScoringEngine, StatisticalAnomalyDetector, BehavioralAnomalyDetector,
    RareEntityCombinationDetector, PrivilegeEscalationDetector, AnomalyType
)
from src.detector.schemas import CloudTrailEvent, ExtractedEntity, AnomalyScore
from src.detector.core import EnrichmentEngine

print("📊 Anomaly Profiling System loaded successfully!")
print("🔧 Available anomaly detectors ready for analysis")

## 🔧 Anomaly Detection Engine Setup

In [None]:
class AnomalyProfiler:
    """Comprehensive anomaly profiling and analysis tool."""
    
    def __init__(self):
        """Initialize the anomaly profiler."""
        # Initialize anomaly scoring engine
        self.config = {
            'anomaly_threshold': 5.0,
            'statistical_threshold': 2.5
        }
        self.scoring_engine = AnomalyScoringEngine(self.config)
        
        # Data storage
        self.events_data = []
        self.anomalies_data = []
        self.analysis_results = {}
        
        # Visualization settings
        self.color_map = {
            'STATISTICAL_OUTLIER': '#FF6B6B',
            'RARE_ENTITY_COMBO': '#4ECDC4',
            'BEHAVIORAL_DEVIATION': '#45B7D1',
            'TIME_PATTERN_ANOMALY': '#96CEB4',
            'PRIVILEGE_ESCALATION': '#FFEAA7',
            'LATERAL_MOVEMENT': '#DDA0DD'
        }
    
    def load_synthetic_data(self, num_events: int = 1000) -> None:
        """Load synthetic security events for analysis."""
        print(f"🔄 Generating {num_events} synthetic security events...")
        
        # Generate synthetic events with varying patterns
        base_time = datetime.now() - timedelta(days=30)
        
        users = ['alice.smith', 'bob.jones', 'carol.white', 'david.brown', 'eve.wilson']
        roles = ['AdminRole', 'PowerUserRole', 'ReadOnlyRole', 'S3FullAccessRole']
        actions = ['AssumeRole', 'CreateUser', 'AttachUserPolicy', 'ListUsers', 'DescribeInstances', 
                  'CreateRole', 'DeleteUser', 'CreateAccessKey', 'GetCredentialsForIdentity']
        ips = ['192.168.1.100', '10.0.1.50', '172.16.0.25', '203.0.113.1', '198.51.100.1']
        regions = ['us-east-1', 'us-west-2', 'eu-west-1', 'ap-southeast-1']
        
        events = []
        for i in range(num_events):
            # Create time patterns (some users more active during business hours)
            if np.random.random() < 0.7:  # 70% business hours
                hour_offset = np.random.randint(8, 18)  # 8 AM to 6 PM
            else:  # 30% off hours (potential anomalies)
                hour_offset = np.random.choice([0, 1, 2, 22, 23])  # Late night/early morning
            
            event_time = base_time + timedelta(
                days=np.random.randint(0, 30),
                hours=hour_offset,
                minutes=np.random.randint(0, 60)
            )
            
            # Introduce some anomalous patterns
            if np.random.random() < 0.1:  # 10% privilege escalation patterns
                user = np.random.choice(users[:2])  # Limit to first 2 users
                action = np.random.choice(['CreateRole', 'AttachUserPolicy', 'AssumeRole'])
            elif np.random.random() < 0.05:  # 5% rare combinations
                user = 'system.automated'  # Unusual user
                action = np.random.choice(actions)
            else:  # Normal patterns
                user = np.random.choice(users)
                action = np.random.choice(actions)
            
            # Create event
            from src.detector.schemas import UserIdentity
            
            event = CloudTrailEvent(
                eventTime=event_time,
                eventName=action,
                userIdentity=UserIdentity(
                    type="IAMUser",
                    principalId=f"AIDACKCEVSQ6C2{i:06d}",
                    arn=f"arn:aws:iam::123456789012:user/{user}",
                    accountId="123456789012",
                    userName=user
                ),
                awsRegion=np.random.choice(regions),
                sourceIPAddress=np.random.choice(ips),
                userAgent="aws-cli/2.0.0",
                requestParameters={},
                responseElements={}
            )
            
            # Create corresponding entities
            entities = [
                ExtractedEntity(
                    entity_id=user,
                    entity_type="USER",
                    context={"source": "userIdentity"},
                    confidence=0.9
                ),
                ExtractedEntity(
                    entity_id=event.sourceIPAddress,
                    entity_type="IP",
                    context={"source": "sourceIPAddress"},
                    confidence=0.95
                )
            ]
            
            # Add role entity if relevant
            if 'Role' in action:
                entities.append(ExtractedEntity(
                    entity_id=np.random.choice(roles),
                    entity_type="ROLE",
                    context={"source": "requestParameters"},
                    confidence=0.85
                ))
            
            events.append((event, entities))
        
        self.events_data = events
        print(f"✅ Generated {len(events)} synthetic events")
    
    def analyze_anomalies(self) -> None:
        """Analyze events for anomalies and collect results."""
        if not self.events_data:
            print("❌ No events loaded. Please load data first.")
            return
        
        print("🔍 Analyzing events for anomalies...")
        
        anomalies = []
        normal_events = []
        
        for i, (event, entities) in enumerate(self.events_data):
            # Score the event
            anomaly_scores = self.scoring_engine.score_event(event, entities)
            
            # Calculate aggregate score
            aggregate_score = self.scoring_engine.calculate_aggregate_score(anomaly_scores)
            
            event_data = {
                'index': i,
                'timestamp': event.eventTime,
                'user': event.userIdentity.userName,
                'action': event.eventName,
                'source_ip': event.sourceIPAddress,
                'region': event.awsRegion,
                'aggregate_score': aggregate_score,
                'anomaly_scores': anomaly_scores,
                'entities': entities,
                'event': event
            }
            
            if aggregate_score >= self.config['anomaly_threshold']:
                anomalies.append(event_data)
            else:
                normal_events.append(event_data)
        
        self.anomalies_data = anomalies
        self.normal_events = normal_events
        
        print(f"📊 Analysis complete:")
        print(f"  • Total events: {len(self.events_data)}")
        print(f"  • Anomalies detected: {len(anomalies)} ({len(anomalies)/len(self.events_data)*100:.1f}%)")
        print(f"  • Normal events: {len(normal_events)} ({len(normal_events)/len(self.events_data)*100:.1f}%)")
    
    def create_anomaly_summary_df(self) -> pd.DataFrame:
        """Create a summary DataFrame of anomalies."""
        if not self.anomalies_data:
            return pd.DataFrame()
        
        summary_data = []
        for anomaly in self.anomalies_data:
            # Extract anomaly types
            anomaly_types = []
            factors = []
            max_score = 0
            
            for score in anomaly['anomaly_scores']:
                if score.is_anomaly:
                    # This is a placeholder - the actual AnomalyResult would have type info
                    factors.extend(score.factors)
                    if score.score > max_score:
                        max_score = score.score
            
            summary_data.append({
                'timestamp': anomaly['timestamp'],
                'user': anomaly['user'],
                'action': anomaly['action'],
                'source_ip': anomaly['source_ip'],
                'region': anomaly['region'],
                'aggregate_score': anomaly['aggregate_score'],
                'max_individual_score': max_score,
                'num_anomaly_factors': len(factors),
                'factors': '; '.join(factors[:3]) if factors else 'Unknown',
                'hour': anomaly['timestamp'].hour,
                'day_of_week': anomaly['timestamp'].strftime('%A')
            })
        
        return pd.DataFrame(summary_data)
    
    def plot_anomaly_timeline(self) -> None:
        """Plot anomaly detection timeline."""
        if not self.anomalies_data:
            print("❌ No anomalies to plot")
            return
        
        df = self.create_anomaly_summary_df()
        
        fig = make_subplots(
            rows=2, cols=1,
            subplot_titles=['Anomaly Score Timeline', 'Anomaly Count by Hour'],
            specs=[[{"secondary_y": False}], [{"secondary_y": False}]]
        )
        
        # Timeline plot
        fig.add_trace(
            go.Scatter(
                x=df['timestamp'],
                y=df['aggregate_score'],
                mode='markers',
                marker=dict(
                    size=8,
                    color=df['aggregate_score'],
                    colorscale='Reds',
                    showscale=True,
                    colorbar=dict(title="Anomaly Score")
                ),
                text=df['factors'],
                hovertemplate='<b>%{text}</b><br>' +
                             'Time: %{x}<br>' +
                             'Score: %{y:.2f}<br>' +
                             '<extra></extra>',
                name='Anomalies'
            ),
            row=1, col=1
        )
        
        # Hourly distribution
        hourly_counts = df.groupby('hour').size().reset_index(name='count')
        fig.add_trace(
            go.Bar(
                x=hourly_counts['hour'],
                y=hourly_counts['count'],
                marker_color='lightcoral',
                name='Hourly Anomalies'
            ),
            row=2, col=1
        )
        
        fig.update_layout(
            height=800,
            title_text="Anomaly Detection Analysis",
            showlegend=False
        )
        
        fig.update_xaxes(title_text="Time", row=1, col=1)
        fig.update_yaxes(title_text="Anomaly Score", row=1, col=1)
        fig.update_xaxes(title_text="Hour of Day", row=2, col=1)
        fig.update_yaxes(title_text="Number of Anomalies", row=2, col=1)
        
        fig.show()
    
    def plot_user_behavior_analysis(self) -> None:
        """Analyze and plot user behavior patterns."""
        if not self.anomalies_data:
            print("❌ No anomalies to analyze")
            return
        
        df = self.create_anomaly_summary_df()
        
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=[
                'Anomalies by User', 'Anomalies by Action',
                'Anomalies by Source IP', 'Score Distribution'
            ]
        )
        
        # User analysis
        user_counts = df.groupby('user').agg({
            'aggregate_score': ['count', 'mean']
        }).round(2)
        user_counts.columns = ['count', 'avg_score']
        user_counts = user_counts.reset_index().sort_values('count', ascending=False)
        
        fig.add_trace(
            go.Bar(x=user_counts['user'], y=user_counts['count'], name='User Anomalies'),
            row=1, col=1
        )
        
        # Action analysis
        action_counts = df['action'].value_counts().head(10)
        fig.add_trace(
            go.Bar(x=action_counts.index, y=action_counts.values, name='Action Anomalies'),
            row=1, col=2
        )
        
        # IP analysis
        ip_counts = df['source_ip'].value_counts().head(10)
        fig.add_trace(
            go.Bar(x=ip_counts.index, y=ip_counts.values, name='IP Anomalies'),
            row=2, col=1
        )
        
        # Score distribution
        fig.add_trace(
            go.Histogram(x=df['aggregate_score'], nbinsx=20, name='Score Distribution'),
            row=2, col=2
        )
        
        fig.update_layout(height=800, title_text="User Behavior Analysis", showlegend=False)
        fig.show()
        
        # Print top anomalous users
        print("\n🔍 Top Anomalous Users:")
        for _, row in user_counts.head().iterrows():
            print(f"  • {row['user']}: {row['count']} anomalies (avg score: {row['avg_score']:.2f})")

# Initialize the profiler
profiler = AnomalyProfiler()
print("📊 Anomaly Profiler initialized!")

## 📈 Data Loading and Analysis

In [None]:
# Load synthetic data
profiler.load_synthetic_data(num_events=2000)

# Analyze for anomalies
profiler.analyze_anomalies()

## 📊 Anomaly Visualization and Analysis

In [None]:
# Display anomaly timeline
profiler.plot_anomaly_timeline()

In [None]:
# User behavior analysis
profiler.plot_user_behavior_analysis()

## 📋 Detailed Anomaly Investigation

In [None]:
# Create detailed anomaly summary
anomaly_df = profiler.create_anomaly_summary_df()

if not anomaly_df.empty:
    print("🔍 Top 10 Highest Scoring Anomalies:")
    top_anomalies = anomaly_df.nlargest(10, 'aggregate_score')
    
    display(top_anomalies[[
        'timestamp', 'user', 'action', 'source_ip', 
        'aggregate_score', 'factors'
    ]].style.format({
        'aggregate_score': '{:.2f}',
        'timestamp': lambda x: x.strftime('%Y-%m-%d %H:%M:%S')
    }).background_gradient(subset=['aggregate_score'], cmap='Reds'))
    
    # Summary statistics
    print(f"\n📊 Anomaly Statistics:")
    print(f"  • Mean anomaly score: {anomaly_df['aggregate_score'].mean():.2f}")
    print(f"  • Median anomaly score: {anomaly_df['aggregate_score'].median():.2f}")
    print(f"  • Highest anomaly score: {anomaly_df['aggregate_score'].max():.2f}")
    print(f"  • Most anomalous user: {anomaly_df['user'].value_counts().index[0]}")
    print(f"  • Most anomalous action: {anomaly_df['action'].value_counts().index[0]}")
else:
    print("❌ No anomalies detected in the current dataset")

## ⚙️ Threshold Optimization

In [None]:
def analyze_threshold_impact():
    """Analyze the impact of different anomaly thresholds."""
    if not profiler.events_data:
        print("❌ No events data available")
        return
    
    thresholds = np.arange(1.0, 10.0, 0.5)
    results = []
    
    print("🔄 Analyzing threshold impact...")
    
    for threshold in thresholds:
        anomaly_count = 0
        total_score = 0
        
        for event, entities in profiler.events_data:
            anomaly_scores = profiler.scoring_engine.score_event(event, entities)
            aggregate_score = profiler.scoring_engine.calculate_aggregate_score(anomaly_scores)
            total_score += aggregate_score
            
            if aggregate_score >= threshold:
                anomaly_count += 1
        
        anomaly_rate = anomaly_count / len(profiler.events_data)
        avg_score = total_score / len(profiler.events_data)
        
        results.append({
            'threshold': threshold,
            'anomaly_count': anomaly_count,
            'anomaly_rate': anomaly_rate,
            'avg_score': avg_score
        })
    
    results_df = pd.DataFrame(results)
    
    # Plot threshold analysis
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=['Anomaly Rate vs Threshold', 'Anomaly Count vs Threshold']
    )
    
    fig.add_trace(
        go.Scatter(
            x=results_df['threshold'],
            y=results_df['anomaly_rate'] * 100,
            mode='lines+markers',
            name='Anomaly Rate (%)',
            line=dict(color='red')
        ),
        row=1, col=1
    )
    
    fig.add_trace(
        go.Scatter(
            x=results_df['threshold'],
            y=results_df['anomaly_count'],
            mode='lines+markers',
            name='Anomaly Count',
            line=dict(color='blue')
        ),
        row=1, col=2
    )
    
    # Add current threshold line
    current_threshold = profiler.config['anomaly_threshold']
    fig.add_vline(x=current_threshold, line_dash="dash", line_color="green", 
                  annotation_text=f"Current: {current_threshold}", row=1, col=1)
    fig.add_vline(x=current_threshold, line_dash="dash", line_color="green", 
                  annotation_text=f"Current: {current_threshold}", row=1, col=2)
    
    fig.update_layout(height=400, title_text="Threshold Optimization Analysis")
    fig.update_xaxes(title_text="Threshold")
    fig.update_yaxes(title_text="Anomaly Rate (%)", row=1, col=1)
    fig.update_yaxes(title_text="Number of Anomalies", row=1, col=2)
    
    fig.show()
    
    # Recommendations
    print("\n💡 Threshold Recommendations:")
    
    # Find threshold that gives ~1-5% anomaly rate
    target_rate = 0.02  # 2%
    closest_idx = np.abs(results_df['anomaly_rate'] - target_rate).idxmin()
    recommended_threshold = results_df.loc[closest_idx, 'threshold']
    recommended_rate = results_df.loc[closest_idx, 'anomaly_rate']
    
    print(f"  • For ~2% anomaly rate: threshold = {recommended_threshold:.1f} ({recommended_rate*100:.1f}% actual)")
    
    # Current threshold performance
    current_idx = np.abs(results_df['threshold'] - current_threshold).idxmin()
    current_rate = results_df.loc[current_idx, 'anomaly_rate']
    current_count = results_df.loc[current_idx, 'anomaly_count']
    
    print(f"  • Current threshold {current_threshold}: {current_rate*100:.1f}% rate ({current_count} anomalies)")
    
    return results_df

# Run threshold analysis
threshold_results = analyze_threshold_impact()

## 🎯 Interactive Anomaly Investigation

In [None]:
class InteractiveAnomalyInvestigator:
    """Interactive widget for investigating specific anomalies."""
    
    def __init__(self, profiler: AnomalyProfiler):
        self.profiler = profiler
        self.current_anomaly_idx = 0
        
        if not self.profiler.anomalies_data:
            print("❌ No anomalies available for investigation")
            return
        
        # Create widgets
        self.anomaly_selector = widgets.IntSlider(
            value=0,
            min=0,
            max=len(self.profiler.anomalies_data) - 1,
            description='Anomaly #:',
            style={'description_width': 'initial'}
        )
        
        self.details_output = widgets.Output()
        
        # Connect events
        self.anomaly_selector.observe(self.on_anomaly_change, names='value')
        
        # Layout
        self.widget = widgets.VBox([
            widgets.HTML("<h3>🔍 Interactive Anomaly Investigation</h3>"),
            self.anomaly_selector,
            self.details_output
        ])
        
        # Show initial anomaly
        self.show_anomaly_details(0)
    
    def on_anomaly_change(self, change):
        """Handle anomaly selection change."""
        self.show_anomaly_details(change['new'])
    
    def show_anomaly_details(self, idx: int):
        """Show detailed information about a specific anomaly."""
        with self.details_output:
            clear_output()
            
            if idx >= len(self.profiler.anomalies_data):
                print("❌ Invalid anomaly index")
                return
            
            anomaly = self.profiler.anomalies_data[idx]
            event = anomaly['event']
            
            print(f"📋 Anomaly #{idx + 1} Details")
            print(f"{'='*50}")
            print(f"⏰ Timestamp: {event.eventTime.strftime('%Y-%m-%d %H:%M:%S')}")
            print(f"👤 User: {event.userIdentity.userName}")
            print(f"🎯 Action: {event.eventName}")
            print(f"🌐 Source IP: {event.sourceIPAddress}")
            print(f"📍 Region: {event.awsRegion}")
            print(f"📊 Aggregate Score: {anomaly['aggregate_score']:.2f}")
            
            print(f"\n🏷️ Extracted Entities:")
            for entity in anomaly['entities']:
                print(f"  • {entity.entity_type}: {entity.entity_id} (confidence: {entity.confidence:.2f})")
            
            print(f"\n🚨 Anomaly Factors:")
            for i, score in enumerate(anomaly['anomaly_scores']):
                if score.is_anomaly:
                    print(f"  {i+1}. Score: {score.score:.2f}")
                    for factor in score.factors:
                        print(f"     • {factor}")
            
            # Contextual information
            print(f"\n📈 Contextual Analysis:")
            
            # Time context
            hour = event.eventTime.hour
            if hour < 6 or hour > 20:
                print(f"  ⚠️ Unusual time: {hour:02d}:xx (outside business hours)")
            else:
                print(f"  ✅ Normal time: {hour:02d}:xx (business hours)")
            
            # User context
            user_anomalies = [a for a in self.profiler.anomalies_data if a['user'] == anomaly['user']]
            print(f"  👤 User has {len(user_anomalies)} total anomalies")
            
            # Action context
            action_anomalies = [a for a in self.profiler.anomalies_data if a['action'] == anomaly['action']]
            print(f"  🎯 Action '{event.eventName}' appears in {len(action_anomalies)} anomalies")
            
            # Investigation recommendations
            print(f"\n💡 Investigation Recommendations:")
            if anomaly['aggregate_score'] > 8.0:
                print(f"  🔴 HIGH PRIORITY: Immediate investigation recommended")
                print(f"  📞 Consider contacting user: {event.userIdentity.userName}")
                print(f"  🔒 Review account permissions and recent changes")
            elif anomaly['aggregate_score'] > 6.0:
                print(f"  🟡 MEDIUM PRIORITY: Monitor for additional suspicious activity")
                print(f"  📊 Review user's activity over the past 24 hours")
            else:
                print(f"  🟢 LOW PRIORITY: Flag for routine review")
                print(f"  📝 Document pattern for future reference")
    
    def display(self):
        """Display the investigation widget."""
        return self.widget

# Create and display the interactive investigator
if profiler.anomalies_data:
    investigator = InteractiveAnomalyInvestigator(profiler)
    investigator.display()
else:
    print("❌ No anomalies to investigate. Try running the analysis first.")

## 📊 Anomaly Detection Performance Metrics

In [None]:
def calculate_detection_metrics():
    """Calculate performance metrics for anomaly detection."""
    if not profiler.events_data:
        print("❌ No events data available")
        return
    
    total_events = len(profiler.events_data)
    detected_anomalies = len(profiler.anomalies_data)
    detection_rate = detected_anomalies / total_events
    
    # Calculate score distribution
    all_scores = []
    for event, entities in profiler.events_data:
        anomaly_scores = profiler.scoring_engine.score_event(event, entities)
        aggregate_score = profiler.scoring_engine.calculate_aggregate_score(anomaly_scores)
        all_scores.append(aggregate_score)
    
    all_scores = np.array(all_scores)
    
    metrics = {
        'total_events': total_events,
        'detected_anomalies': detected_anomalies,
        'detection_rate': detection_rate,
        'mean_score': np.mean(all_scores),
        'std_score': np.std(all_scores),
        'median_score': np.median(all_scores),
        'max_score': np.max(all_scores),
        'min_score': np.min(all_scores),
        'scores_above_threshold': np.sum(all_scores >= profiler.config['anomaly_threshold']),
        'score_95th_percentile': np.percentile(all_scores, 95),
        'score_99th_percentile': np.percentile(all_scores, 99)
    }
    
    print("📈 Anomaly Detection Performance Metrics")
    print("=" * 50)
    print(f"📊 Dataset Overview:")
    print(f"  • Total events analyzed: {metrics['total_events']:,}")
    print(f"  • Anomalies detected: {metrics['detected_anomalies']:,}")
    print(f"  • Detection rate: {metrics['detection_rate']:.2%}")
    
    print(f"\n📊 Score Distribution:")
    print(f"  • Mean score: {metrics['mean_score']:.2f}")
    print(f"  • Median score: {metrics['median_score']:.2f}")
    print(f"  • Standard deviation: {metrics['std_score']:.2f}")
    print(f"  • Score range: {metrics['min_score']:.2f} - {metrics['max_score']:.2f}")
    
    print(f"\n📈 Percentile Analysis:")
    print(f"  • 95th percentile: {metrics['score_95th_percentile']:.2f}")
    print(f"  • 99th percentile: {metrics['score_99th_percentile']:.2f}")
    
    print(f"\n⚙️ Threshold Analysis:")
    print(f"  • Current threshold: {profiler.config['anomaly_threshold']:.1f}")
    print(f"  • Events above threshold: {metrics['scores_above_threshold']:,}")
    
    # Quality assessment
    print(f"\n💯 Quality Assessment:")
    if metrics['detection_rate'] > 0.10:
        print(f"  ⚠️ High detection rate ({metrics['detection_rate']:.1%}) - consider raising threshold")
    elif metrics['detection_rate'] < 0.01:
        print(f"  ⚠️ Low detection rate ({metrics['detection_rate']:.1%}) - consider lowering threshold")
    else:
        print(f"  ✅ Good detection rate ({metrics['detection_rate']:.1%}) - threshold well calibrated")
    
    if metrics['std_score'] < 1.0:
        print(f"  ⚠️ Low score variance - detection may lack sensitivity")
    else:
        print(f"  ✅ Good score variance - detection shows good discrimination")
    
    return metrics

# Calculate and display metrics
performance_metrics = calculate_detection_metrics()

## 💾 Export Analysis Results

In [None]:
def export_analysis_results():
    """Export analysis results for reporting."""
    if not profiler.anomalies_data:
        print("❌ No anomalies to export")
        return
    
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    
    # Export anomaly summary
    anomaly_df = profiler.create_anomaly_summary_df()
    anomaly_filename = f'anomaly_analysis_{timestamp}.csv'
    anomaly_df.to_csv(anomaly_filename, index=False)
    print(f"📊 Exported anomaly analysis to {anomaly_filename}")
    
    # Export detailed results
    detailed_results = []
    for i, anomaly in enumerate(profiler.anomalies_data):
        event = anomaly['event']
        
        # Extract all factors
        all_factors = []
        individual_scores = []
        
        for score in anomaly['anomaly_scores']:
            individual_scores.append(score.score)
            all_factors.extend(score.factors)
        
        detailed_results.append({
            'anomaly_id': i + 1,
            'timestamp': event.eventTime.isoformat(),
            'user_arn': event.userIdentity.arn,
            'user_name': event.userIdentity.userName,
            'event_name': event.eventName,
            'source_ip': event.sourceIPAddress,
            'aws_region': event.awsRegion,
            'user_agent': event.userAgent,
            'aggregate_score': anomaly['aggregate_score'],
            'max_individual_score': max(individual_scores) if individual_scores else 0,
            'num_scoring_factors': len(all_factors),
            'scoring_factors': '; '.join(all_factors),
            'entities_detected': '; '.join([f"{e.entity_type}:{e.entity_id}" for e in anomaly['entities']]),
            'hour_of_day': event.eventTime.hour,
            'day_of_week': event.eventTime.weekday(),
            'is_weekend': event.eventTime.weekday() >= 5,
            'is_business_hours': 8 <= event.eventTime.hour <= 18
        })
    
    detailed_df = pd.DataFrame(detailed_results)
    detailed_filename = f'detailed_anomalies_{timestamp}.csv'
    detailed_df.to_csv(detailed_filename, index=False)
    print(f"📋 Exported detailed anomalies to {detailed_filename}")
    
    # Export configuration and metrics
    config_and_metrics = {
        'analysis_timestamp': datetime.now().isoformat(),
        'configuration': profiler.config,
        'scoring_engine_summary': profiler.scoring_engine.get_scoring_summary(),
        'performance_metrics': performance_metrics if 'performance_metrics' in globals() else {},
        'dataset_summary': {
            'total_events': len(profiler.events_data),
            'total_anomalies': len(profiler.anomalies_data),
            'anomaly_rate': len(profiler.anomalies_data) / len(profiler.events_data) if profiler.events_data else 0
        }
    }
    
    config_filename = f'analysis_config_{timestamp}.json'
    with open(config_filename, 'w') as f:
        json.dump(config_and_metrics, f, indent=2, default=str)
    print(f"⚙️ Exported configuration and metrics to {config_filename}")
    
    print(f"\n✅ Analysis export complete! Files saved:")
    print(f"  • {anomaly_filename} - Summary analysis")
    print(f"  • {detailed_filename} - Detailed anomaly records")
    print(f"  • {config_filename} - Configuration and metrics")

# Export button
export_btn = widgets.Button(description="💾 Export Results", button_style='success')
export_btn.on_click(lambda x: export_analysis_results())
display(export_btn)

## 🎯 Summary and Next Steps

### 📊 Analysis Summary

This notebook provides comprehensive anomaly profiling capabilities for security event analysis:

1. **Real-time Anomaly Detection**: Score events using multiple detection algorithms
2. **Behavioral Analysis**: Profile user and entity behavior patterns
3. **Interactive Investigation**: Drill down into specific anomalies
4. **Threshold Optimization**: Find optimal detection thresholds
5. **Performance Metrics**: Evaluate detection system performance

### 🔍 Key Insights Available

- **User Behavior Patterns**: Identify users with unusual activity
- **Time-based Anomalies**: Detect off-hours or unusual timing patterns
- **Entity Combinations**: Find rare or suspicious entity relationships
- **Privilege Escalation**: Identify potential privilege abuse patterns
- **Statistical Outliers**: Detect events that deviate from normal baselines

### 🛠️ Customization Options

- **Adjust Detection Thresholds**: Fine-tune sensitivity vs false positive rates
- **Configure Anomaly Types**: Enable/disable specific detection algorithms
- **Custom Scoring Weights**: Adjust importance of different anomaly types
- **Time Window Analysis**: Focus on specific time periods
- **Entity-specific Analysis**: Analyze specific users, IPs, or resources

### 📈 Next Steps for Security Teams

1. **Integrate with SIEM**: Connect findings to existing security workflows
2. **Create Alerting Rules**: Set up automated alerts for high-scoring anomalies
3. **Develop Playbooks**: Create investigation procedures for different anomaly types
4. **Train Models**: Use labeled data to improve detection accuracy
5. **Regular Reviews**: Schedule periodic analysis to catch evolving threats

### 💡 Pro Tips

- **Start Conservative**: Begin with higher thresholds and adjust based on results
- **Focus on Context**: Always consider business context when investigating anomalies
- **Track False Positives**: Keep records to improve detection algorithms
- **Collaborate**: Share findings with security teams and business stakeholders
- **Automate Where Possible**: Use results to build automated response workflows

Happy anomaly hunting! 🔍🛡️