# üéì Audio Deepfake Detection using FMSL - ASVspoof2019 LA Dataset

## Overview
This notebook provides a comprehensive implementation and evaluation of Frequency-Modulated Spectral Loss (FMSL) for audio deepfake detection using the **ASVspoof2019 LA dataset**.

### Key Features:
- **8 Baseline Models** (Maze1-Maze8) 
- **8 FMSL-Enhanced Models**
- **ASVspoof2019 LA Dataset** (Logical Access track)
- **Professional Evaluation Framework**
- **Publication-Ready Visualizations**

### Dataset Information:
- **ASVspoof2019 LA**: Logical Access track
- **Training**: 25,380 bonafide + 25,380 spoofed utterances
- **Evaluation**: 7,355 bonafide + 63,882 spoofed utterances
- **Sample Rate**: 16 kHz
- **Format**: FLAC audio files

---


## 1. Environment Setup

### 1.1 Google Colab Setup & Drive Mount
rEA

In [None]:
# Mount Google Drive and setup paths
from google.colab import drive
drive.mount('/content/drive')

import os
import sys
from pathlib import Path

# ASVspoof2019 LA Dataset paths
PROJECT_ROOT = "/content/drive/MyDrive/ASVspoof2019/LA/2021/LA/Baseline-RawNet2"
DATA_ROOT = "/content/sample_data/data"
ASVSPOOF_LA_ROOT = "/content/drive/MyDrive/ASVspoof2019/Extract/LA"
PROTOCOLS_PATH = f"{ASVSPOOF_LA_ROOT}/ASVspoof2019_LA_cm_protocols"

# Training and evaluation data paths
TRAIN_DATA_PATH = f"{ASVSPOOF_LA_ROOT}/ASVspoof2019_LA_train"
EVAL_DATA_PATH = f"{ASVSPOOF_LA_ROOT}/ASVspoof2019_LA_eval"
DEV_DATA_PATH = f"{ASVSPOOF_LA_ROOT}/ASVspoof2019_LA_dev"

# Add project to Python path
sys.path.append(PROJECT_ROOT)
os.chdir(PROJECT_ROOT)

print("‚úÖ ASVspoof2019 LA Environment Setup Complete!")
print(f"üìÅ Project Root: {PROJECT_ROOT}")
print(f"üìä Data Root: {DATA_ROOT}")
print(f"üéØ ASVspoof2019 LA Root: {ASVSPOOF_LA_ROOT}")
print(f"üìã Protocols: {PROTOCOLS_PATH}")


### 1.2 Package Installation for ASVspoof20

In [None]:
# Install packages optimized for ASVspoof2019
!pip install --upgrade pip setuptools wheel

# Core ML packages
!pip install torch==2.3.1+cu118 torchvision==0.18.1+cu118 torchaudio==2.3.1+cu118 --index-url https://download.pytorch.org/whl/cu118
!pip install transformers==4.41.2
!pip install librosa==0.9.2 soundfile pandas pyyaml scikit-learn
!pip install tensorboardX==2.6 numba==0.58.1
!pip install matplotlib seaborn plotly

print("‚úÖ ASVspoof2019 packages installed successfully!")


### 1.3 Import Libraries
kind of config 

In [None]:
# Import libraries for ASVspoof2019 LA
import os
import sys
import time
import json
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from typing import Dict, List, Tuple, Optional

# Audio processing
import librosa
import soundfile as sf
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchaudio

# Transformers for Wav2Vec2
from transformers import Wav2Vec2Model, Wav2Vec2Config

# Evaluation metrics
from sklearn.metrics import roc_curve, auc, confusion_matrix, classification_report
from sklearn.metrics import accuracy_score, precision_recall_curve

# Suppress warnings
warnings.filterwarnings('ignore')

print("‚úÖ All libraries imported successfully!")
print(f"üîß PyTorch: {torch.__version__}")
print(f"üîß CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"üîß GPU: {torch.cuda.get_device_name(0)}")


## 2. ASVspoof2019 LA Dataset Preparation

### 2.1 Dataset Verification


In [None]:
# Verify ASVspoof2019 LA dataset structure
def verify_asvspoof2019_la():
    """Verify ASVspoof2019 LA dataset availability and structure"""
    
    print("üîç Verifying ASVspoof2019 LA Dataset...")
    print("=" * 50)
    
    # Check main directories
    directories = {
        "ASVspoof2019 LA Root": ASVSPOOF_LA_ROOT,
        "Training Data": TRAIN_DATA_PATH,
        "Evaluation Data": EVAL_DATA_PATH,
        "Development Data": DEV_DATA_PATH,
        "Protocols": PROTOCOLS_PATH
    }
    
    for name, path in directories.items():
        if os.path.exists(path):
            print(f"‚úÖ {name}: {path}")
            # Count files if it's a data directory
            if "Data" in name:
                flac_files = len([f for f in os.listdir(path) if f.endswith('.flac')])
                print(f"   üìä FLAC files: {flac_files}")
        else:
            print(f"‚ùå {name}: {path} (Not found)")
    
    # Check protocol files
    protocol_files = [
        "ASVspoof2019.LA.cm.train.trn.txt",
        "ASVspoof2019.LA.cm.dev.trl.txt", 
        "ASVspoof2019.LA.cm.eval.trl.txt"
    ]
    
    print(f"\nüìã Protocol Files:")
    for protocol in protocol_files:
        protocol_path = os.path.join(PROTOCOLS_PATH, protocol)
        if os.path.exists(protocol_path):
            print(f"‚úÖ {protocol}")
        else:
            print(f"‚ùå {protocol} (Not found)")
    
    return all(os.path.exists(path) for path in directories.values())

# Verify dataset
dataset_ready = verify_asvspoof2019_la()


dy hard

s use cond

h of thos model

# üéì Audio Deepfake Detection using FMSL - Complete Thesis Notebook

## Overview
This notebook provides a comprehensive implementation and evaluation of Frequency-Modulated Spectral Loss (FMSL) for audio deepfake detection. It includes:

- **8 Baseline Models** (Maze1-Maze8)
- **8 FMSL-Enhanced Models** 
- **Comprehensive Evaluation Framework**
- **Professional Analysis and Visualization**

## Table of Contents
1. [Environment Setup](#1-environment-setup)
2. [Data Preparation](#2-data-preparation)
3. [Model Training](#3-model-training)
4. [Model Evaluation](#4-model-evaluation)
5. [Results Analysis](#5-results-analysis)
6. [Thesis Visualization](#6-thesis-visualization)

---


## 1. Environment Setup

### 1.1 Google Colab Setup


In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Set up paths
import os
import sys
from pathlib import Path

# Project paths
PROJECT_ROOT = "/content/drive/MyDrive/ASVspoof2019/LA/2021/LA/Baseline-RawNet2"
DATA_ROOT = "/content/sample_data/data"
PROTOCOLS_PATH = "/content/drive/MyDrive/ASVspoof2019/Extract/LA/ASVspoof2019_LA_cm_protocols"

# Add project to Python path
sys.path.append(PROJECT_ROOT)

# Change to project directory
os.chdir(PROJECT_ROOT)

print("‚úÖ Environment setup complete!")
print(f"üìÅ Project root: {PROJECT_ROOT}")
print(f"üìä Data root: {DATA_ROOT}")
print(f"üìã Protocols: {PROTOCOLS_PATH}")


### 1.2 Package Installation


In [None]:
# Install required packages
!pip install --upgrade pip setuptools wheel

# Install core ML packages
!pip install torch==2.3.1+cu118 torchvision==0.18.1+cu118 torchaudio==2.3.1+cu118 --index-url https://download.pytorch.org/whl/cu118
!pip install transformers==4.41.2
!pip install librosa==0.9.2 soundfile pandas pyyaml scikit-learn
!pip install tensorboardX==2.6
!pip install numba==0.58.1

print("‚úÖ All packages installed successfully!")


### 1.3 Import Required Libraries


In [None]:
# Import core libraries
import os
import sys
import time
import json
import warnings
from pathlib import Path
from typing import Dict, List, Tuple, Optional

# Data processing
import numpy as np
import pandas as pd
import librosa
import soundfile as sf

# Deep learning
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchaudio

# Transformers
from transformers import Wav2Vec2Model, Wav2Vec2Config

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import roc_curve, auc, confusion_matrix, classification_report

# Suppress warnings
warnings.filterwarnings('ignore')

print("‚úÖ All libraries imported successfully!")


## 2. Data Preparation

### 2.1 Data Setup and Verification


In [None]:
# Data preparation and verification
def setup_data_environment():
    """Setup data environment and verify data availability"""
    
    # Create data directory
    os.makedirs(DATA_ROOT, exist_ok=True)
    
    # Check if data exists
    data_sources = {
        "Training Data": f"{DATA_ROOT}/ASVspoof2019_LA_train",
        "Evaluation Data": f"{DATA_ROOT}/ASVspoof2019_LA_eval", 
        "Protocols": PROTOCOLS_PATH
    }
    
    print("üîç Checking data availability...")
    for name, path in data_sources.items():
        if os.path.exists(path):
            print(f"‚úÖ {name}: {path}")
        else:
            print(f"‚ùå {name}: {path} (Not found)")
    
    return data_sources

# Setup data environment
data_status = setup_data_environment()


## 3. Model Training

### 3.1 Training Configuration


In [None]:
# Training configuration
TRAINING_CONFIG = {
    'architecture': {
        'filts': [128, [128, 128], [128, 256]],
        'nb_fc_node': 1024,
        'nb_classes': 2,
        'sample_rate': 16000,
        'first_conv': 251,
        'dropout_rate': 0.3
    },
    'wav2vec2': {
        'wav2vec2_model_name': 'facebook/wav2vec2-base-960h',
        'wav2vec2_output_dim': 768,
        'wav2vec2_freeze': True
    },
    'fmsl': {
        'fmsl_type': 'prototype',
        'fmsl_n_prototypes': 3,
        'fmsl_s': 32.0,
        'fmsl_m': 0.45,
        'fmsl_enable_lsa': False
    },
    'training': {
        'batch_size': 12,
        'lr': 0.0001,
        'weight_decay': 0.0001,
        'grad_clip_norm': 1.0,
        'num_epochs': 5,
        'seed': 1234
    }
}

print("‚úÖ Training configuration loaded!")
print(f"üìä Architecture: {TRAINING_CONFIG['architecture']}")
print(f"üéØ Training params: {TRAINING_CONFIG['training']}")


### 3.2 Model Training Commands


In [None]:
# Training commands for all models
def get_training_commands():
    """Generate training commands for all models"""
    
    # Baseline models
    baseline_models = ['maze1', 'maze2', 'maze3', 'maze4', 'maze5', 'maze6', 'maze7', 'maze8']
    
    print("üöÄ BASELINE MODEL TRAINING COMMANDS")
    print("=" * 50)
    
    for model in baseline_models:
        cmd = f"python {model}.py --track=LA --loss=cce --lr={TRAINING_CONFIG['training']['lr']} --batch_size={TRAINING_CONFIG['training']['batch_size']} --num_epochs={TRAINING_CONFIG['training']['num_epochs']} --database_path={DATA_ROOT} --protocols_path={PROTOCOLS_PATH}"
        print(f"\n{model}:")
        print(f"  {cmd}")
    
    print("\n" + "=" * 50)
    print("üöÄ FMSL-ENHANCED MODEL TRAINING COMMANDS")
    print("=" * 50)
    
    for model in baseline_models:
        cmd = f"python {model}_fmsl_standardized.py --track=LA --lr={TRAINING_CONFIG['training']['lr']} --batch_size={TRAINING_CONFIG['training']['batch_size']} --num_epochs={TRAINING_CONFIG['training']['num_epochs']} --database_path={DATA_ROOT} --protocols_path={PROTOCOLS_PATH}"
        print(f"\n{model}_fmsl:")
        print(f"  {cmd}")
    
    print("\n‚úÖ All training commands generated!")
    print("üí° Uncomment and run specific commands as needed.")

# Generate training commands
get_training_commands()


## 4. Model Evaluation

### 4.1 Evaluation Commands


In [None]:
# Evaluation commands for all models
def get_evaluation_commands():
    """Generate evaluation commands for all models"""
    
    baseline_models = ['maze1', 'maze2', 'maze3', 'maze4', 'maze5', 'maze6', 'maze7', 'maze8']
    
    print("üîç BASELINE MODEL EVALUATION COMMANDS")
    print("=" * 50)
    
    for model in baseline_models:
        cmd = f"python {model}_eval.py --model_type {model} --model_path /path/to/{model}_model.pth --batch_size 128"
        print(f"\n{model}:")
        print(f"  {cmd}")
    
    print("\n" + "=" * 50)
    print("üîç FMSL-ENHANCED MODEL EVALUATION COMMANDS")
    print("=" * 50)
    
    for model in baseline_models:
        cmd = f"python {model}_eval.py --model_type {model}_fmsl --model_path /path/to/{model}_fmsl_model.pth --batch_size 128"
        print(f"\n{model}_fmsl:")
        print(f"  {cmd}")
    
    print("\n" + "=" * 50)
    print("üìä COMPREHENSIVE EVALUATION")
    print("=" * 50)
    
    comp_cmd = f"python comprehensive_evaluation.py --data_dir {DATA_ROOT} --protocol_file {PROTOCOLS_PATH}/ASVspoof2019.LA.cm.eval.trl.txt --output_dir evaluation_results --batch_size 128"
    print(f"\nComprehensive Evaluation:")
    print(f"  {comp_cmd}")
    
    print("\n‚úÖ All evaluation commands generated!")
    print("üí° Update model paths with actual trained model locations.")

# Generate evaluation commands
get_evaluation_commands()


## 5. Results Analysis

### 5.1 Performance Visualization


In [None]:
# Performance visualization function
def create_performance_plots():
    """Create performance comparison plots"""
    print("üìä Creating performance plots...")
    
    # Set up the plot style
    plt.style.use('seaborn-v0_8-whitegrid')
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    fig.suptitle('Audio Deepfake Detection Performance Comparison', fontsize=16, fontweight='bold')
    
    # Sample data (replace with actual results)
    models = ['Maze1', 'Maze2', 'Maze3', 'Maze4', 'Maze5', 'Maze6', 'Maze7', 'Maze8']
    baseline_eer = [0.15, 0.12, 0.10, 0.08, 0.07, 0.06, 0.05, 0.04]  # Placeholder values
    fmsl_eer = [0.12, 0.09, 0.07, 0.05, 0.04, 0.03, 0.02, 0.01]      # Placeholder values
    improvement = [(b-f)/b*100 for b, f in zip(baseline_eer, fmsl_eer)]
    
    # Plot 1: EER Comparison
    x = np.arange(len(models))
    width = 0.35
    
    axes[0, 0].bar(x - width/2, baseline_eer, width, label='Baseline', alpha=0.8, color='skyblue')
    axes[0, 0].bar(x + width/2, fmsl_eer, width, label='FMSL-Enhanced', alpha=0.8, color='lightcoral')
    axes[0, 0].set_title('EER Comparison')
    axes[0, 0].set_ylabel('Equal Error Rate (EER)')
    axes[0, 0].set_xticks(x)
    axes[0, 0].set_xticklabels(models, rotation=45)
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)
    
    # Plot 2: Improvement
    axes[0, 1].bar(models, improvement, alpha=0.8, color='green')
    axes[0, 1].set_title('FMSL Improvement')
    axes[0, 1].set_ylabel('Improvement (%)')
    axes[0, 1].tick_params(axis='x', rotation=45)
    axes[0, 1].grid(True, alpha=0.3)
    
    # Plot 3: Performance Distribution
    axes[1, 0].hist(baseline_eer, alpha=0.7, label='Baseline', bins=5, color='skyblue', edgecolor='black')
    axes[1, 0].hist(fmsl_eer, alpha=0.7, label='FMSL-Enhanced', bins=5, color='lightcoral', edgecolor='black')
    axes[1, 0].set_title('Performance Distribution')
    axes[1, 0].set_xlabel('EER')
    axes[1, 0].set_ylabel('Frequency')
    axes[1, 0].legend()
    axes[1, 0].grid(True, alpha=0.3)
    
    # Plot 4: Model Complexity vs Performance
    complexity = [1, 2, 3, 4, 5, 6, 7, 8]
    axes[1, 1].scatter(complexity, baseline_eer, alpha=0.8, label='Baseline', s=100, color='skyblue', edgecolor='black')
    axes[1, 1].scatter(complexity, fmsl_eer, alpha=0.8, label='FMSL-Enhanced', s=100, color='lightcoral', edgecolor='black')
    axes[1, 1].set_title('Model Complexity vs Performance')
    axes[1, 1].set_xlabel('Model Complexity')
    axes[1, 1].set_ylabel('EER')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("‚úÖ Performance plots created successfully!")
    print("üí° Replace placeholder data with actual evaluation results.")

# Create performance plots
create_performance_plots()
