# Demo of our project

First, download our sample dataset from here:

Then unzip the folder and move it into this folder with all the other files.

Second, you can load our models and run them on the demo dataset to get a sense of their performance. You have some options when loading and running the models:

**NOTE**: To actually hear the audiofiles, you have to run this notebook yourself locally.

In [2]:
from load_models import get_model_predictions_and_data
from wav_generator import save_to_wav
import torch
import os
from IPython.display import Audio, HTML

mock = False # True to make a random datapoint to test the code
save_memory = True # True to save memory (should be true if you have less than 16GB of RAM available)
datapoints = 1 # Number of datapoints to run. We will just demo a single on here.
deterministic = True # True to make the code deterministic. Change if you want to hear different results.
data_path = "/Users/{__CHANGE_TO_YOUR_PATH__}/DL-CausalSpeechProject/sample_ears_wham" # Change to full dataset, if you have it

results = get_model_predictions_and_data(
    mock = mock, 
    save_memory = save_memory, 
    datapoints = datapoints, 
    deterministic = deterministic, 
    data_path = data_path
)

## The results here are normalized, so that we (as humans) can make sense of them. Lets convert them to WAV and listen to them:

def save_and_display_audio(audio_data, filename, tmp_dir, title = None):
    save_to_wav(audio_data, output_filename=f"{tmp_dir}/{filename}")
    if title:
        display(HTML(f"<h3>{title}</h3>"))
    display(Audio(filename=f"{tmp_dir}/{filename}"))
    os.remove(f"{tmp_dir}/{filename}")

for (predictions, inputs, outputs) in results:
    for i, (prediction, model_load_string) in enumerate(predictions):
        model_name = model_load_string.split('/')[-1].replace('.pth', '')
        save_and_display_audio(prediction[0:1, :].cpu().detach().numpy(), 
                             f"prediction_{i}d_{model_load_string.split('/')[-1]}.wav", 
                             "tmp",
                             f"Model Output: {model_name}")
    inputs = inputs / (torch.max(torch.abs(inputs)) + 1e-9)
    outputs = outputs / (torch.max(torch.abs(outputs)) + 1e-9)
    save_and_display_audio(inputs[0:1, :].cpu().detach().numpy(), 
                          f"prediction_input.wav", 
                          "tmp",
                          "Input Audio (Mixed)")
    save_and_display_audio(outputs[0:1, :].cpu().detach().numpy(), 
                          f"prediction_output.wav", 
                          "tmp", 
                          "Ground Truth (Clean)")

Loading models...: 100%|██████████| 5/5 [00:00<00:00, 10.21it/s]


Loading dataset...
Dataset loaded


Getting model predictions for 1 datapoints: 100%|██████████| 1/1 [00:02<00:00,  2.53s/it]

WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/prediction_0d_student_only_labels_dropout.pth.wav





WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/prediction_1d_student_only_labels.pth.wav


WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/prediction_2d_student_only_teacher.pth.wav


WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/prediction_3d_student_partly_teacher.pth.wav


WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/prediction_4d_student_only_teacher_e2e.pth.wav


WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/prediction_input.wav


WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/prediction_output.wav


### Overfit run:

See the readme for how to run the training files properly. Here will demo how to run the overfit run.

Ideally all changes should be made in the config file. Here we will just demo how to run the overfit run.

Keep an eye out for: **Logged train_loss/teacher with value tensor([[1.9837]])**.

This will show the evolution on loss over time for the teacher.

In [3]:
from load_config import load_config
from train import train

config_path = "/Users/{__CHANGE_TO_YOUR_PATH__}/DL-CausalSpeechProject/configs/overfit_run.yaml"

config = load_config(config_path)

# We will make some changes to speed up the training for this demo:
# The training will be faster, if you run it on a GPU or often also on CPU through the terminal, so we limit the epochs to 10.
config.training_params.epochs = 10
config.debug.save_memory = True

train(config)

Using device: cpu


Loading models...: 100%|██████████| 2/2 [00:00<00:00, 24.04it/s]


[neptune] [info   ] Neptune initialized. Open in the app: offline/8544427a-2986-49f7-8516-8b89a26c5f7b


Loading models...: 100%|██████████| 2/2 [00:00<00:00, 25.87it/s]
W1222 13:48:26.359880 68099 torch/_logging/_internal.py:1081] [5/0] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored


Logged train_loss/teacher with value tensor([[45.1312]], grad_fn=<CompiledFunctionBackward>) at step None




Logged eval_loss/teacher with value tensor([[7.7582]]) at step None
Logged eval_loss/student with value tensor([[48.0731]]) at step None
Logged time with value 72.42414689064026 at step None
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/teacher_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/student_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/labels.wav
Logged learning_rate with value 0.001 at step None


 10%|█         | 1/10 [01:24<12:42, 84.78s/it]

Logged train_loss/teacher with value tensor([[8.6946]], grad_fn=<CompiledFunctionBackward>) at step None
Logged eval_loss/teacher with value tensor([[1.9837]]) at step None
Logged eval_loss/student with value tensor([[48.0731]]) at step None
Logged time with value 22.411501169204712 at step None
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/teacher_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/student_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/labels.wav
Logged learning_rate with value 0.001 at step None


 20%|██        | 2/10 [01:56<07:09, 53.66s/it]

Logged train_loss/teacher with value tensor([[3.6358]], grad_fn=<CompiledFunctionBackward>) at step None
Logged eval_loss/teacher with value tensor([[-2.2187]]) at step None
Logged eval_loss/student with value tensor([[48.0731]]) at step None
Logged time with value 19.355607986450195 at step None
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/teacher_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/student_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/labels.wav
Logged learning_rate with value 0.001 at step None


 30%|███       | 3/10 [02:24<04:52, 41.80s/it]

Logged train_loss/teacher with value tensor([[0.4666]], grad_fn=<CompiledFunctionBackward>) at step None
Logged eval_loss/teacher with value tensor([[-0.9619]]) at step None
Logged eval_loss/student with value tensor([[48.0731]]) at step None
Logged time with value 17.674758911132812 at step None
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/teacher_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/student_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/labels.wav
Logged learning_rate with value 0.001 at step None


 40%|████      | 4/10 [02:51<03:35, 35.86s/it]

Logged train_loss/teacher with value tensor([[1.1379]], grad_fn=<CompiledFunctionBackward>) at step None
Logged eval_loss/teacher with value tensor([[-1.9448]]) at step None
Logged eval_loss/student with value tensor([[48.0731]]) at step None
Logged time with value 20.213778018951416 at step None
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/teacher_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/student_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/labels.wav
Logged learning_rate with value 0.001 at step None


 50%|█████     | 5/10 [03:20<02:48, 33.66s/it]

Logged train_loss/teacher with value tensor([[0.3740]], grad_fn=<CompiledFunctionBackward>) at step None
Logged eval_loss/teacher with value tensor([[-3.4712]]) at step None
Logged eval_loss/student with value tensor([[48.0731]]) at step None
Logged time with value 16.423042058944702 at step None
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/teacher_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/student_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/labels.wav
Logged learning_rate with value 0.001 at step None


 60%|██████    | 6/10 [03:46<02:03, 30.99s/it]

Logged train_loss/teacher with value tensor([[-0.6649]], grad_fn=<CompiledFunctionBackward>) at step None
Logged eval_loss/teacher with value tensor([[-3.6363]]) at step None
Logged eval_loss/student with value tensor([[48.0731]]) at step None
Logged time with value 18.210422039031982 at step None
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/teacher_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/student_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/labels.wav
Logged learning_rate with value 0.001 at step None


 70%|███████   | 7/10 [04:12<01:28, 29.46s/it]

Logged train_loss/teacher with value tensor([[-0.8135]], grad_fn=<CompiledFunctionBackward>) at step None
Logged eval_loss/teacher with value tensor([[-3.2157]]) at step None
Logged eval_loss/student with value tensor([[48.0731]]) at step None
Logged time with value 19.249557971954346 at step None
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/teacher_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/student_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/labels.wav
Logged learning_rate with value 0.001 at step None


 80%|████████  | 8/10 [04:41<00:58, 29.16s/it]

Logged train_loss/teacher with value tensor([[-0.5689]], grad_fn=<CompiledFunctionBackward>) at step None
Logged eval_loss/teacher with value tensor([[-3.8268]]) at step None
Logged eval_loss/student with value tensor([[48.0731]]) at step None
Logged time with value 26.75652813911438 at step None
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/teacher_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/student_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/labels.wav
Logged learning_rate with value 0.001 at step None


 90%|█████████ | 9/10 [05:19<00:32, 32.04s/it]

Logged train_loss/teacher with value tensor([[-1.0085]], grad_fn=<CompiledFunctionBackward>) at step None
Logged eval_loss/teacher with value tensor([[-4.9132]]) at step None
Logged eval_loss/student with value tensor([[48.0731]]) at step None
Logged time with value 30.717599868774414 at step None
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/teacher_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/student_out.wav
WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/labels.wav
Logged learning_rate with value 0.001 at step None


100%|██████████| 10/10 [05:59<00:00, 36.00s/it]


The training is now done. Throughout the training, the latest models are saved. Lets listen to the final output:

In [6]:
from load_models import load_models

models = load_models( # We load the teacher model, as this is the one we trained
    models_load_strings = [
        "/Users/{__CHANGE_TO_YOUR_PATH__}/DL-CausalSpeechProject/tmp/teacher.pth"
    ]
)

results = get_model_predictions_and_data(
    models=models,
    mock = False, 
    save_memory = True, 
    datapoints = 1, 
    deterministic = True, 
    data_path = data_path
)

for (predictions, inputs, outputs) in results:
    for i, (prediction, model_load_string) in enumerate(predictions):
        model_name = model_load_string.split('/')[-1].replace('.pth', '')
        save_and_display_audio(prediction[0:1, :].cpu().detach().numpy(), f"prediction_{i}d_{model_load_string.split('/')[-1]}.wav", "tmp", f"Model Output: {model_name}")
    save_and_display_audio(inputs[0:1, :].cpu().detach().numpy(), f"prediction_input.wav", "tmp", "Input Audio (Mixed)")
    save_and_display_audio(outputs[0:1, :].cpu().detach().numpy(), f"prediction_output.wav", "tmp", "Ground Truth (Clean)") 
    


Loading models...: 100%|██████████| 1/1 [00:00<00:00,  3.90it/s]


Loading dataset...
Dataset loaded


Getting model predictions for 1 datapoints: 100%|██████████| 1/1 [00:00<00:00,  1.90it/s]

WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/prediction_0d_teacher.pth.wav





WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/prediction_input.wav


WAV file saved to: /Users/lucasvilsen/Documents/Documents/DL-CausalSpeechProject/tmp/prediction_output.wav


The output is not super nice, but it is a okay start, when we only trained for 10 epochs.
To get reliable results, you should train for at least 200 epochs.

That's it! Everything else is just some other ways to train the models.