# The FaRi Project - Project Code by Faith Villarreal and Ricky Zapata
## Voice-To-Voice Voice Replication using a CycleGAN (Cycle Generative Adversarial Network)

A Cycle Generative Adversarial Network (CycleGAN) is a deep learning model that extends the Generative Adversarial Network (GAN) framework by using two GANs. A GAN is made up of a Generator and a Discriminator: the Generator creates data, and the Discriminator attempts to differentiate between real and generated data. The two models improve through their adversarial interaction—each one trying to "outsmart" the other.


In a CycleGAN, the goal is to transform data from one domain to another (e.g., images of cats to dogs) without needing paired examples. The model consists of two GANs: one that maps from domain X to Y (cats to dogs) and one that maps back from Y to X (dogs to cats). A cycle consistency loss ensures that after converting an image from X to Y and then back to X, the final result resembles the original image, which helps prevent the Generator from producing arbitrary results. This process allows for the generation of more realistic and diverse outputs.


In this project, we aim to utilize a Generative Adversarial network to allow train models on data of voices of prominent figures in culture and society in order to allow the user to input a recording of their voice and replicate a prominent figure of their choosing's voice to mimic them. Our project code is below.

## Project Code

### Importing Libraries

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import librosa
import numpy as np
import os
from torch.utils.data import DataLoader, Dataset

### Data Pre-Processing


In [None]:
def load_audio(file_path, sr=22050):
    audio, _ = librosa.load(file_path, sr=sr)
    return audio

def extract_mel_spectrogram(audio, sr=22050, n_mels=128, hop_length=512, win_length=1024):
    mel_spec = librosa.feature.melspectrogram(
        y=audio, sr=sr, n_mels=n_mels, hop_length=hop_length, win_length=win_length)
    mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)
    return mel_spec_db

def inverse_mel_spectrogram(mel_spec, sr=22050, n_fft=1024, hop_length=512, win_length=1024):
    mel_spec = librosa.db_to_power(mel_spec)
    return librosa.feature.inverse.mel_to_audio(mel_spec, sr=sr, hop_length=hop_length, win_length=win_length)

### CycleGAN (Cycle Generative Adversarial Network) Architecture

In [None]:
class ResBlock(nn.Module):
    def __init__(self, channels):
        super(ResBlock, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv1d(channels, channels, kernel_size=3, padding=1),
            nn.InstanceNorm1d(channels),
            nn.ReLU(True),
            nn.Conv1d(channels, channels, kernel_size=3, padding=1),
            nn.InstanceNorm1d(channels)
        )
    
    def forward(self, x):
        return x + self.conv(x)

class Generator(nn.Module):
    def __init__(self, input_dim=128, num_residual_blocks=6):
        super(Generator, self).__init__()
        
        model = [
            nn.Conv1d(input_dim, 64, kernel_size=7, padding=3),
            nn.InstanceNorm1d(64),
            nn.ReLU(True)
        ]
        
        in_channels = 64
        out_channels = 128
        for _ in range(2):
            model += [
                nn.Conv1d(in_channels, out_channels, kernel_size=4, stride=2, padding=1),
                nn.InstanceNorm1d(out_channels),
                nn.ReLU(True)
            ]
            in_channels = out_channels
            out_channels *= 2
        
        for _ in range(num_residual_blocks):
            model += [ResBlock(in_channels)]
        
        out_channels = in_channels // 2
        for _ in range(2):
            model += [
                nn.ConvTranspose1d(in_channels, out_channels, kernel_size=4, stride=2, padding=1),
                nn.InstanceNorm1d(out_channels),
                nn.ReLU(True)
            ]
            in_channels = out_channels
            out_channels //= 2
        
        model += [nn.Conv1d(in_channels, input_dim, kernel_size=7, padding=3), nn.Tanh()]
        
        self.model = nn.Sequential(*model)
    
    def forward(self, x):
        return self.model(x)

class Discriminator(nn.Module):
    def __init__(self, input_dim=128):
        super(Discriminator, self).__init__()
        model = [
            nn.Conv1d(input_dim, 64, kernel_size=4, stride=2, padding=1),
            nn.LeakyReLU(0.2, inplace=True)
        ]
        
        model += [
            nn.Conv1d(64, 128, kernel_size=4, stride=2, padding=1),
            nn.InstanceNorm1d(128),
            nn.LeakyReLU(0.2, inplace=True)
        ]
        
        model += [
            nn.Conv1d(128, 256, kernel_size=4, stride=2, padding=1),
            nn.InstanceNorm1d(256),
            nn.LeakyReLU(0.2, inplace=True)
        ]
        
        model += [
            nn.Conv1d(256, 512, kernel_size=4, stride=1, padding=1),
            nn.InstanceNorm1d(512),
            nn.LeakyReLU(0.2, inplace=True)
        ]
        
        model += [nn.Conv1d(512, 1, kernel_size=4, stride=1, padding=1)]
        
        self.model = nn.Sequential(*model)
    
    def forward(self, x):
        return self.model(x)

### Training our CycleGAN

In [None]:
def train_cycleGAN(data_loader, num_epochs=100):
    # Initialize models
    generator_A2B = Generator()
    generator_B2A = Generator()
    discriminator_A = Discriminator()
    discriminator_B = Discriminator()

    # Loss functions
    adversarial_loss = nn.MSELoss()
    cycle_loss = nn.L1Loss()

    # Optimizers
    optimizer_G = optim.Adam(list(generator_A2B.parameters()) + list(generator_B2A.parameters()), lr=0.0002, betas=(0.5, 0.999))
    optimizer_D_A = optim.Adam(discriminator_A.parameters(), lr=0.0002, betas=(0.5, 0.999))
    optimizer_D_B = optim.Adam(discriminator_B.parameters(), lr=0.0002, betas=(0.5, 0.999))

    for epoch in range(num_epochs):
        for i, (real_A, real_B) in enumerate(data_loader):
            
            # Train Generators A2B and B2A
            optimizer_G.zero_grad()
            identity_A = generator_B2A(real_A)
            identity_B = generator_A2B(real_B)
            loss_identity_A = cycle_loss(identity_A, real_A)
            loss_identity_B = cycle_loss(identity_B, real_B)
            
            fake_B = generator_A2B(real_A)
            fake_A = generator_B2A(real_B)
            pred_fake_B = discriminator_B(fake_B)
            pred_fake_A = discriminator_A(fake_A)
            loss_GAN_A2B = adversarial_loss(pred_fake_B, torch.ones_like(pred_fake_B))
            loss_GAN_B2A = adversarial_loss(pred_fake_A, torch.ones_like(pred_fake_A))
            
            recovered_A = generator_B2A(fake_B)
            recovered_B = generator_A2B(fake_A)
            loss_cycle_A = cycle_loss(recovered_A, real_A)
            loss_cycle_B = cycle_loss(recovered_B, real_B)
            
            loss_G = loss_identity_A + loss_identity_B + loss_GAN_A2B + loss_GAN_B2A + loss_cycle_A + loss_cycle_B
            loss_G.backward()
            optimizer_G.step()
            
            # Train Discriminator A
            optimizer_D_A.zero_grad()
            pred_real_A = discriminator_A(real_A)
            pred_fake_A = discriminator_A(fake_A.detach())
            loss_D_A_real = adversarial_loss(pred_real_A, torch.ones_like(pred_real_A))
            loss_D_A_fake = adversarial_loss(pred_fake_A, torch.zeros_like(pred_fake_A))
            loss_D_A = (loss_D_A_real + loss_D_A_fake) * 0.5
            loss_D_A.backward()
            optimizer_D_A.step()
            
            # Train Discriminator B
            optimizer_D_B.zero_grad()
            pred_real_B = discriminator_B(real_B)
            pred_fake_B = discriminator_B(fake_B.detach())
            loss_D_B_real = adversarial_loss(pred_real_B, torch.ones_like(pred_real_B))
            loss_D_B_fake = adversarial_loss(pred_fake_B, torch.zeros_like(pred_fake_B))
            loss_D_B = (loss_D_B_real + loss_D_B_fake) * 0.5
            loss_D_B.backward()
            optimizer_D_B.step()
            
        print(f'Epoch [{epoch}/{num_epochs}], Loss G: {loss_G.item()}, Loss D_A: {loss_D_A.item()}, Loss D_B: {loss_D_B.item()}')

### Griffin-Lim Vocoder

In [None]:
def griffin_lim(mel_spec, n_fft=1024, hop_length=512, win_length=1024, iterations=60):
    mel_spec = librosa.db_to_power(mel_spec)
    return librosa.griffinlim(mel_spec, n_iter=iterations, hop_length=hop_length, win_length=win_length)

# Example of converting generated Mel-spectrogram to waveform
def convert_to_audio(fake_mel_spec):
    output_audio = griffin_lim(fake_mel_spec.detach().cpu().numpy())
    librosa.output.write_wav('converted_voice.wav', output_audio, sr=22050)

### Data Loader

In [None]:
class VoiceDataset(Dataset):
    def __init__(self, audio_paths_A, audio_paths_B, sr=22050):
        self.audio_paths_A = audio_paths_A  # List of file paths for user's voice
        self.audio_paths_B = audio_paths_B  # List of file paths for celebrity's voice
        self.sr = sr

    def __len__(self):
        return min(len(self.audio_paths_A), len(self.audio_paths_B))

    def __getitem__(self, idx):
        audio_A = load_audio(self.audio_paths_A[idx], sr=self.sr)
        audio_B = load_audio(self.audio_paths_B[idx], sr=self.sr)

        mel_A = extract_mel_spectrogram(audio_A)
        mel_B = extract_mel_spectrogram(audio_B)

        mel_A = torch.tensor(mel_A, dtype=torch.float32).unsqueeze(0)  # Add channel dimension for 1D convolution
        mel_B = torch.tensor(mel_B, dtype=torch.float32).unsqueeze(0)

        return mel_A, mel_B

# Main

In [None]:
def main():
    # Paths to datasets
    user_voice_dir = 'data/user_voice/'
    celebrity_voice_dir = 'data/celebrity_voice/'
    
    audio_paths_A = [os.path.join(user_voice_dir, f) for f in os.listdir(user_voice_dir)]
    audio_paths_B = [os.path.join(celebrity_voice_dir, f) for f in os.listdir(celebrity_voice_dir)]
    
    # Create dataset and data loader
    dataset = VoiceDataset(audio_paths_A, audio_paths_B)
    data_loader = DataLoader(dataset, batch_size=1, shuffle=True)

    # Train the CycleGAN model
    print("Training CycleGAN model...")
    train_cycleGAN(data_loader, num_epochs=100)

    # Example: converting a single sample after training
    print("Converting example voice...")
    audio = load_audio('data/user_voice/example.wav')
    mel_spectrogram = extract_mel_spectrogram(audio)
    mel_spectrogram_tensor = torch.tensor(mel_spectrogram, dtype=torch.float32).unsqueeze(0).unsqueeze(0)

    # Load trained model and perform conversion (assuming the model is trained)
    generator_A2B = Generator()
    generator_A2B.eval()  # Set the generator to evaluation mode
    fake_mel_spectrogram = generator_A2B(mel_spectrogram_tensor)

    # Convert the Mel-spectrogram back to audio
    convert_to_audio(fake_mel_spectrogram)

    print("Converted voice saved to 'converted_voice.wav'.")

if __name__ == '__main__':
    main()

# Credits and Thanks

## In search of resources to learn more about deep learning, adversarial networks, and generative adversarial networks for this project, the following videos/resources were very helpful so I want to show appreciation and catalog them below:

[Deep Learning Crash Course for Beginners]( https://www.youtube.com/watch?v=VyWAvY2CF9c&t=3387s&ab_channel=freeCodeCamp.org) by freeCodeCamp.org

[Generative Adversarial Networks (GANs) - Computerphile](https://www.youtube.com/watch?v=Sw9r8CL98N0&ab_channel=Computerphile) by Computerphile

[Zebras, Horses & CycleGAN - Computerphile](https://www.youtube.com/watch?v=T-lBMrjZ3_0&ab_channel=Computerphile) by Computerphile