<a href="https://colab.research.google.com/github/Ashif26/Momenta_Audio_Deepfake/blob/main/Momenta_Audio_Deepfake.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Part 1: Research & Selection

### 1️⃣ Model Name (e.g., RawNet2)
- **Innovation**: Uses raw waveform input instead of spectrograms.
- **Performance**: Achieves 90%+ accuracy on ASVspoof dataset.
- **Why It's Promising**: Works well for real-time applications.
- **Challenges**: Needs more training for new attack types.

### 2️⃣ Model Name (e.g., Wav2Vec2)
- **Innovation**: Self-supervised learning with large-scale speech data.
- **Performance**: Detects subtle deepfake audio manipulations.
- **Why It's Promising**: Can generalize across datasets.
- **Challenges**: Requires significant compute power.

### 3️⃣ Model Name (e.g., ResNet-based CNN)
- **Innovation**: Deep convolutional layers extract audio forgeries.
- **Performance**: Strong results on multiple datasets.
- **Why It's Promising**: Works well with short speech samples.
- **Challenges**: Computational cost is high for real-time use.


In [None]:

!wget https://zenodo.org/record/14498691/files/ASVspoof2019_LA_train.zip
!unzip ASVspoof2019_LA_train.zip


In [None]:

import librosa
import numpy as np
import librosa.display
import matplotlib.pyplot as plt

# Load an audio file
audio_path = "path_to_audio_file.wav"
y, sr = librosa.load(audio_path, sr=16000)

# Compute Mel spectrogram
mel_spec = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)
mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)

# Display spectrogram
plt.figure(figsize=(10, 4))
librosa.display.specshow(mel_spec_db, sr=sr, x_axis="time", y_axis="mel")
plt.colorbar(format="%+2.0f dB")
plt.title("Mel Spectrogram")
plt.show()


In [None]:
# Install necessary libraries
!pip install torch torchaudio

import torch
import torchaudio
from torchaudio.models import wav2vec2_base

# Load pretrained model
model = wav2vec2_base()

# Pass an audio sample through the model
waveform, sample_rate = torchaudio.load("path_to_audio_file.wav")
output = model(waveform)

print(output.shape)  # Output features


In [None]:
from sklearn.metrics import accuracy_score, classification_report

y_true = [0, 1, 0, 1]  # Actual labels (0: real, 1: fake)
y_pred = [0, 1, 1, 1]  # Model predictions

print("Accuracy:", accuracy_score(y_true, y_pred))
print("Classification Report:\n", classification_report(y_true, y_pred))


In [None]:
torch.save(model.state_dict(), "deepfake_model.pth")
