<a href="https://colab.research.google.com/github/Bateyjosue/AI-Saturday/blob/main/hw1_starter_code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Style transfer using a Linear Transformation

Here we have three pieces of audio. The first two are `Synth.wav` (audio A) and `Piano.wav` (audio B), which are recordings of a chromatic scale in a single octave played by a synthesizer and a piano respectively. The third piece of audio is the intro melody of “Blinding Lights” (audio C) by The Weeknd, played with the same synth tone used to generate `Synth.wav`.

All audio files are in the `hw1/audio` folder. 

From these files, you can obtain the spectrogram $M_{A}$ , $M_{B}$ and $M_{C}$ . Your objective is to find the spectrogram of the piano version of the song “Blinding Lights” ($M_{D}$).

In this problem, we assume that style can be transferred using a linear transformation. Formally, we need
to find the matrix **T** such that:
$$
TM_{A} ≈ M_{B}
$$

1. Write code to determine matrix **T** and report the value of $∥TM_{A} − M_{B} ∥^{2}_{F}$ .
**Submit the matrix T as $problem3t.csv$ and your code**
2. Our model assumes that **T** can transfer style from synthesizer music to piano music. Applying **T** on $M_{C}$ should give us a estimation of “Blinding Lights” played by Piano, getting an estimation of $M_{D}$. Using this matrix and phase matrix of C, synthesize an audio signal.
**Submit your code, your estimation of the matrix $M_{D}$ as $problem3md.csv$ and the sythensized audio named as $problem3.wav$**

---

In [2]:
# mounts your google drive to help you access files stored on your drive directly like you would on you local machine
from google.colab import drive

drive.mount("/content/gdrive")
%cd /content/gdrive/MyDrive/AI Saturnday/hw1

Mounted at /content/gdrive
/content/gdrive/MyDrive/AI Saturnday/hw1


## Solution

You need to compute the spectrogram of the music file. First, read and load the audio file at its correct sample rate (44100 in our case).

If you are using Python, you can use [Librosa](https://librosa.org/doc/latest/index.html) to load the wav file as follows (we also recommend using the numpy package for matrix operations below if you use python):

```
import librosa
audio, sr = librosa.load(filename, sr = None) # sr = None means the audio will be loaded with its original sr - sample rate
assert sr = 44100 # we want to make sure the computed sr from the music file is what we expect - 44100 Hz
```

Next, we can compute the complex Short-Time Fourier Transform (STFT) of the signal and its magnitude spectrogram. Use `2048` sample windows, which correspond to 64 ms analysis windows; overlap/hop length of `256` samples to 64 frames by second of signal. Different toolboxes should provide similar spectrograms. If
you are using the Python Librosa library, you can use the following command:

```
spectrogram = librosa.stft(audio, n fft=2048, hop length=256, center=False, win length=2048)
M = abs(spectrogram)
phase = spectrogram/(M + 2.2204e-16)
```

In this case, **M** represents the music file and should be a matrix, where the rows correspond to the frequencies and the columns to time. A visualization (see the documentation online for librosa.display.specshow) of this matrix (spectrogram) should look like in Figure 1.

In [3]:
import numpy as np
import librosa

In [35]:
# Audio A Spectogram
audioA, sr = librosa.load('./audio/Synth.wav', sr=44100)

spectoA = librosa.stft(audioA, n_fft=2048, hop_length=256, center=False, win_length=2048)
MA =abs(spectoA)
phase_A = spectoA / (MA + 2.2204e-16)

# Audio B Spectogram
audioB, sr = librosa.load('./audio/Piano.wav', sr=44100)

spectoB = librosa.stft(audioB, n_fft=2048, hop_length=256, center=False, win_length=2048)
MB =abs(spectoB)
phase_B = spectoB / (MB + 2.2204e-16)

audioC, sr = librosa.load('./audio/BlindingLights.wav')

spectoC = librosa.stft(audioC, n_fft=2048, hop_length=265, center=False, win_length=2048)
MC = abs(spectoC)
phase_C = spectoC / (MC + 2.2204e-16)

In [8]:
print(spectoC.shape)
print(MC.shape)
print(phase_C.shape)

(1025, 815)
(1025, 815)
(1025, 815)


### Computing $M_{A}$, $M_{B}$, $M_{C}$

In [None]:
# your code here

# TODO: Compute MA and phase_A
MA = np.zeros((2,2)) # change this
phase_A = np.zeros((2,2)) # change this

# TODO: Compute MB and phase_B
MB = np.zeros((2,2)) # change this
phase_B = np.zeros((2,2)) # change this

# TODO: Compute MC and phase_C
MC = np.zeros((2,2)) # change this
phase_C = np.zeros((2,2)) # change this

### Learning T

We were given that:

$$
TM_{A} ≈ M_{B}
$$

From basic math, we know that:

$$
T = \frac{M_{B}}{M_{A}} = M_{B} * \frac{1}{M_{A}}
$$

$\frac{1}{M_{A}}$ is known as the inverse of A, which can also be written as $M_{A}^{-1}.$ So our final formula will be:
$$
T = M_{B} * pinv(M_{A})
$$

where $pinv$ meanse the [pseudo-inverse](https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html) of a matrix. We use the pseudo-inverse ($pinv$) because **$M_{A}$ is not a square matrix**

**Step 1**: Compute T

Hint: use [matrix multiplication](https://numpy.org/doc/stable/reference/generated/numpy.matmul.html) not element-wise multiplcation for your multipliction operation to avoid broadcast error. Try and unsrestand why.

In [9]:
!ls


audio  hw1writeup.pdf


In [36]:
# your code here

# TODO: compute T
T = np.matmul(MB,(np.linalg.pinv(MA)))
np.savetxt('problem3t.csv', T)

# TODO: save your matrix T in a "problem3t.csv" file

### Computing error

We also would like to know how well the **T** we got represents a good style transfer from synthesizer music to piano music. We can do ths by computing the error. In this case, we would use the [frobenius norm](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html).

**Step 2:** Compute error $∥TM_{A} − M_{B} ∥^{2}_{F}$

Hint: use [matrix multiplication](https://numpy.org/doc/stable/reference/generated/numpy.matmul.html) not element-wise multiplcation for your multipliction operation to avoid broadcast error. Try and understand why.

In [37]:
# your code here

# TODO: compute the error (frobenius norm)
error = np.linalg.norm(T, ord='fro')

error

294221.03

### Finding $M_{D}$

Our model assumes that **T** can transfer style from synthesizer music to piano music. Applying **T** on $M_{C}$ should give us an estimation of “Blinding Lights” played by Piano, getting an estimation of $M_{D}$.

$$
M_{D} = T * M_{C}
$$

**Step 3:** Compute $M_{D}$

Hint: use [matrix multiplication](https://numpy.org/doc/stable/reference/generated/numpy.matmul.html) not element-wise multiplcation for your multipliction operation to avoid broadcast error. Try and understand why.

In [38]:
# your code here

# TODO: compute MD
MD = np.zeros((2,2)) # change this
MD = np.matmul(T, MC)
np.savetxt('problem3md.csv', MD)

print(MD.shape)
# TODO: save your matrix MD in a "problem3md.csv" file

(1025, 815)


### Recover $M_{D}$ signal

To recover the signal from the constructed spectrogram $M_{D}$ we need to use the `phase` matrix we computed earlier from the original signal. Combine both and compute the Inverse-STFT to obtain a vector and then write them into a wav file. To compute the STFT and then write the wav file you can use the following python command:

```
# Latest Librosa doesn't have an audio write function. Use PySoundFile instead.
import soundfile as sf
signal = librosa.istft(spectrogram * phase, hop length=256, center=False, win length=2048)
sf.write("problem3.wav", spectrogram, 44100) # here we use the original sr which is 44100 Hz
```

**Step 4:** Using the matrix, $M_{D}$ and **phase matrix of $C$**, synthesize an audio signal.

Hint: Here use **element-wise multiplication**. Try and understand why.

In [39]:
# your code here
import soundfile as sf

# TODO: recover the signal from MD
MD_signal = librosa.istft(MD * phase_C, hop_length=256, center=False, win_length=1024)

sf.write('problem3.wav', MD_signal, 44100)

# TODO: save your matrix MD in a "problem3.wav" file

### Bonus: Check your reconstructed signal music:



In [40]:
import IPython.display as ipd
ipd.Audio('problem3.wav') # load a local WAV file