# How to become a Spacebar Artist w/ Python

## Getting Started

A lot of people love music, but not a lot of people know how to start working with it. I come from a purely engineering background and have close to 0 understanding about music theory in general. Here, my goal is to cook up some documents that should serve as both an introduction to Python, but also an introduction to signals and machine learning applications - we will avoid math in this notebook and simply focus on applications.

To get started with this, you will need a few things. Ideally you are using either VSCode or Linux, so interacting with the terminal will be straightforward.
First, we need to ensure that you can work with JupyterNotebook from an IDE. To do this, we need to create a virtual environment. If you don't already, go into your terminal:
```
python3 -m venv PW
source PW/bin/activate
pip install librosa spleeter ipykernel matplotlib scipy numpy sounddevice
python -m ipykernel install --user --name=PW
sudo apt install ffmpeg
sudo apt-get install portaudio19-dev
```

Once those are done installing, we can start cooking. To test if there are any dependencies missing, just import them:

In [1]:
import librosa
import spleeter
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
from scipy.signal import cheby2, sosfilt, lfilter, resample
from spleeter.audio.adapter import AudioAdapter
from spleeter.separator import Separator
import gc

2024-04-18 01:00:59.048104: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


Finally, we should also import os to conveniently work with data in folders. For your convenience, we will format a lot of this for you, but it might be insightful to learn how to use the os library eventually.

In [2]:
import os

## Getting Data

When producing, you will ultimately need data to manipulate in order to make your songs/remixes. Gathering this data should be relatively straight forward if you are just mixing songs - you sample the song as you like and then you do magic, pretty straight forward.

On the other hand, getting live samples, i.e. your voice or some instrument can be difficult. To help overcome this, engineers developed a lot of straightforward recording techniques, especially for instruments that can easily be sampled digitally (keyboards, guitars, etc). But some instruments require the old microphone to properly capture input, therefore this section will have a focus on capturing microphone input and we will ignore other signals (if you like this, maybe we can do a part 2 where we discuss how to capture microphone input in software from certain instruments).

### What do we need?

We need to sample the data that the microphone is capturing, depending on what you have available, it is possible that your laptop already has a built-in (crappy) microphone, which is perfect! It is also possible that you connected an external microphone via USB to your computer, which will also work out for us. The difficult part about using a microphone is the need to actually sample it.

In [None]:
%pip install sounddevice
import sounddevice as sd # type: ignore
import numpy as np
from scipy.io.wavfile import write

# Set the duration of the recording in seconds
duration = 5.0  # seconds

# Set the sample rate
sample_rate = 44100

# Record audio
recording = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=2)

# Wait for the recording to finish
sd.wait()

# Save the recording to a file
write('sample.wav', sample_rate, recording)

In [None]:
import sounddevice as sd
import numpy as np

# Set the duration of the recording in seconds
duration = 5.0  # seconds

# Set the sample rate
sample_rate = 44100

# Set the gain
gain = 0.5

# Define a callback function to process the audio
def callback(indata, outdata, frames, time, status):
    outdata[:] = indata * gain

# Create an input-output stream
with sd.Stream(callback=callback, samplerate=sample_rate, channels=2):
    print("Press Enter to quit")
    input()  # sleep for the duration of the recording

If you'd like to work with the data in real-time like doing auto-tune for a performance (which is an overcomplication for most people), then you could do something like the following example. To break it down, we create an *audio stream object* which will stream the microphone data into the *callback* function for real-time processing. The data is appended to the indata stream in blocks of size *blocksize*, meaning a certain amount of *frames* will be written indata at every sample iteration. Finally, this data is recorded to the output list *recording*. 

In [None]:
def create_filter(lowcut, highcut, fs, order=12):
    nyq = 0.5 * fs
    low = lowcut / nyq
    high = highcut / nyq
    b, a = butter(order, [low, high], btype='band')
    return b, a

b,a = create_filter(20, 200, 44100) #bass filter

fake_audio = np.ones(1000)

y = lfilter(b, a, fake_audio)
print(len(b))
print(len(a))
print(len(y))
print(y)

These are a few checks to make sure that your microphone can actually handle the processing you will be doing.

In [5]:
#check for audio devices
sd.query_devices()

   0 Razer Kraken 7.1 Chroma: USB Audio (hw:0,0), ALSA (2 in, 0 out)
   1 NexiGo N60 FHD Webcam: USB Audio (hw:1,0), ALSA (1 in, 0 out)
   2 HDA Intel PCH: ALC1220 Analog (hw:2,0), ALSA (2 in, 6 out)
   3 HDA Intel PCH: ALC1220 Digital (hw:2,1), ALSA (0 in, 2 out)
   4 HDA Intel PCH: ALC1220 Alt Analog (hw:2,2), ALSA (2 in, 0 out)
   5 HDA Intel PCH: HDMI 0 (hw:2,3), ALSA (0 in, 2 out)
   6 HDA Intel PCH: HDMI 1 (hw:2,7), ALSA (0 in, 8 out)
   7 HDA Intel PCH: HDMI 2 (hw:2,8), ALSA (0 in, 8 out)
   8 HDA Intel PCH: HDMI 3 (hw:2,9), ALSA (0 in, 8 out)
   9 HDA NVidia: HDMI 0 (hw:3,3), ALSA (0 in, 8 out)
  10 HDA NVidia: HDMI 1 (hw:3,7), ALSA (0 in, 8 out)
  11 HDA NVidia: HDMI 2 (hw:3,8), ALSA (0 in, 8 out)
  12 HDA NVidia: HDMI 3 (hw:3,9), ALSA (0 in, 8 out)
  13 sysdefault, ALSA (128 in, 0 out)
  14 spdif, ALSA (2 in, 0 out)
  15 samplerate, ALSA (128 in, 0 out)
  16 speexrate, ALSA (128 in, 0 out)
  17 pulse, ALSA (32 in, 32 out)
  18 upmix, ALSA (8 in, 0 out)
  19 vdownmix, ALSA (6 

In [7]:
#check if your input device can sample at 44100 Hz
#check if your input device can use 1 or 2 channels
sd.query_devices(0, 'input')

{'name': 'Razer Kraken 7.1 Chroma: USB Audio (hw:0,0)',
 'index': 0,
 'hostapi': 0,
 'max_input_channels': 2,
 'max_output_channels': 0,
 'default_low_input_latency': 0.008684807256235827,
 'default_low_output_latency': -1.0,
 'default_high_input_latency': 0.034829931972789115,
 'default_high_output_latency': -1.0,
 'default_samplerate': 44100.0}

In [11]:
import sounddevice as sd
import numpy as np
from scipy.signal import butter
import soundfile as sf

# Set the sample rate and the gain
sample_rate = 44100
gain = 0.5  # Gain factor

# Create a list to store the audio data
recording = []

# Define a callback function to process the audio
def callback(indata, outdata, frames, time, status):
    processed = lfilter(b, a, indata)
    # Write the processed data to the output
    outdata = processed
    # Append the processed data to the recording
    recording.append(processed.copy())

# Create an input-output stream
with sd.Stream(callback=callback, samplerate=sample_rate, channels=2, blocksize=2048):
    print("Press Enter to quit")
    input()

# Convert the list to a numpy array and save it to a file
recording_array = np.concatenate(recording)  # concatenate the arrays
# Write the recording array to the output file with the specified sample rate
sf.write('recording.wav', recording_array, sample_rate)

Press Enter to quit


Exception ignored from cffi callback <function _StreamBase.__init__.<locals>.callback_ptr at 0x7904b160b5b0>:
Traceback (most recent call last):
  File "/home/jibby2k1/MusicWorkshops/ProgrammingProduction/PW/lib/python3.10/site-packages/sounddevice.py", line 886, in callback_ptr
    return _wrap_callback(
  File "/home/jibby2k1/MusicWorkshops/ProgrammingProduction/PW/lib/python3.10/site-packages/sounddevice.py", line 2687, in _wrap_callback
    callback(*args)
  File "/tmp/ipykernel_538524/1648625499.py", line 15, in callback
NameError: name 'b' is not defined


ValueError: need at least one array to concatenate

## Working With The Data

A brief explanation of the data you will receive. I manually labeled the files I am working with in a convenient way that 1) Holds Order Information and 2) Contains Relevant MetaInformation (mostly for readability and interpretability). Therefore the data might look something like: ```1_songName_author.mp3```

In [None]:
folder_path = 'Songs'  # Replace with the actual folder path

file_list = []
for file_name in os.listdir(folder_path):
    file_list.append(file_name)

file_list.sort()
print(file_list)

You can see above that  a few things are done. We managed to figure out all the different song files and sort them by order to mix.

The next step is to get the components of the songs that we will be using. For DJ Mixing, the most useful parts of the songs to interact with are **Vocals** and **Accompaniment**. We will use spleeter to do all of the hard machine learning work to split the song into these 2 stems.

In [None]:
audio_adapter = AudioAdapter.default()

# Ensure the directories exist
os.makedirs('Songs/vocals', exist_ok=True)
os.makedirs('Songs/accompaniment', exist_ok=True)

for file_name in file_list[9:]:
    if not file_name.endswith('.mp3'):
        continue
    file_path = os.path.join(folder_path, file_name)
    waveform, fs = audio_adapter.load(file_path)
    print(waveform.shape, fs)
    
    separator = Separator('spleeter:2stems')
    prediction = separator.separate(waveform)

    del waveform
    gc.collect()

    print(prediction)
    print(prediction['vocals'].shape)
    print(prediction['accompaniment'].shape)
    
    audio_adapter.save('Songs/vocals/' + file_name, prediction['vocals'], fs)
    audio_adapter.save('Songs/accompaniment/' + file_name, prediction['accompaniment'], fs)
    
    plt.figure(figsize=(12, 8))
    plt.subplot(2, 1, 1)
    plt.title('Vocals')
    plt.plot(prediction['vocals'])
    plt.subplot(2, 1, 2)
    plt.title('Accompaniment')
    plt.plot(prediction['accompaniment'])
    plt.show()

    # Delete the variables and free up the memory
    del prediction
    gc.collect()

Cool! So now we have all the songs split into the different parts that most DJs address throughout their songs. The next DJ concept to tackle is "Signal Processing"

## Signal Processing

To mix, DJ's typically play with a few knobs and buttons on their DJ Mixer board which will allow them to either control frequency content, or apply interesting song-effects. In general, an understanding of these 2 is the bare minimum to properly DJ a set (live or pre-made), so let's dive into these.

### Effects

Adding effects or "Fx" to an audio signal is a creative and interesting way to alter your audio so that it can sound better. We will discuss a few techniques that are used to help spice up a song or transition into a new one.

#### Repeat/Loop

The most simple effect that someone can do is to loop a part of a song - there are many musical implications about this but we do not need to dive into them. Consider a portion of a song from t = k $\to$ k+T. 

In [None]:
fakeSong = [i for i in range(1000)]
k = 250
T = 50
reps = 3 # Number of repetitions
newFakeSong = fakeSong[:k] + fakeSong[k:k+T] * reps + fakeSong[k+T:]
plt.figure();
plt.plot(newFakeSong);
plt.plot(fakeSong);

#### Frequency Control
When mixing, the **MOST** essential effectin DJing is controlling frequency. To succesfully mix songs together, the songs must have the same BPM (or be integer scalar multiples of the other). Songs are produced for the public and many times their frequencies do not match, to fix this is actually really simple, we just need to find the BPM in a song and match it to the other song's BPM.

In [None]:
def apply_speed_change(song, speed_change_factor):
    # Apply the speed change
    y_changed = resample(song, int(len(song)*speed_change_factor))
    # Save the modified audio to a new file
    return y_changed
def bpm_match(song1, song2 , sr=44100):
    # Get the BPM of song1
    song1_bpm = get_bpm(song1)
    # Get the BPM of song2
    song2_bpm = get_bpm(song2)
    # Calculate the speed change factor
    speed_change_factor = song2_bpm / song1_bpm
    # Apply the speed change to song1
    return apply_speed_change(song1, speed_change_factor, sr=sr)

In [None]:
#Sometimes the BPM you are using sucks and you want to change it to a better one. Let's take "Leave Me Like This" by Skrillex.
#The BPM of the song is 125. Let's change it to 135.
audio_adapter = AudioAdapter.default()
song = 'Songs/Baddadan.mp3'
song, _ = audio_adapter.load(song)
new_song = apply_speed_change(song, (87.5/75))
audio_adapter.save('Songs/Baddadan_150.mp3', new_song, 44100)


In [None]:
#Find how much memory song occupies
song = 'Songs/Tears_70.mp3'
song, _ = audio_adapter.load(song)
new_song = apply_speed_change(song, (70/87))
audio_adapter.save('Songs/Tears_174.mp3', new_song, 44100)

#### Equalization/Filtering
One of the most essential effects in DJing is controlling frequency content. There are many times when a DJ may want to control how loud a part of a song is with respect to the rest of the rest of the song. I.E. if you want to emphasize a song's vocals, you may consider amplifying the treble and filtering the bass.

In [None]:
#T1 and T2 are the start and end times of the song segment to be equalized
def apply_equalization(song, T1, T2, bass_gain=1, other_gain=1, treble_gain=1, sr=22050):
    def cheby2_bandpass(lowcut, highcut, fs, ripple, order=5):
        nyq = 0.5 * fs
        low = lowcut / nyq
        high = highcut / nyq
        sos = cheby2(order, ripple, [low, high], btype='band', output='sos')
        return sos
    def cheby2_bandpass_filter(data, lowcut, highcut, fs, ripple, order=5):
        sos = cheby2_bandpass(lowcut, highcut, fs, ripple, order=order)
        y = sosfilt(sos, data)
        return y
    N1, N2 = librosa.time_to_samples([T1, T2], sr=fs)
    lowbass, highbass, lowtreble, hightreble = 20, 300, 2000, 20000
    bass = cheby2_bandpass_filter(song[N1:N2], lowbass, highbass, fs, ripple=20, order=6)
    other = cheby2_bandpass_filter(song[N1:N2], highbass, lowtreble, fs, ripple=20, order=6)
    treble = cheby2_bandpass_filter(song[N1:N2], lowtreble, hightreble, fs, ripple=20, order=6)
    equalized_song = song.copy()
    equalized_song[N1:N2] = bass_gain*bass + other_gain*other + treble_gain*treble
    return equalized_song

#### Reverb

A great effect to use on any song, it simulates the effect of the reflections of sounds in a physical space. It can make audio sound like you are inside of a barrel or a concert hall, depending on the desired outcome.

In [None]:
def apply_reverb(song, reverb_time, decay):
    # Generate reverb impulse response
    t = np.arange(0, reverb_time, 1/(44100))
    reverb = np.exp(-decay * t)
    # Apply reverb to the signal
    reverb_signal = song + np.convolve(song, reverb, mode='same')
    reverb_signal = reverb_signal / np.max(np.abs(reverb_signal))
    return reverb_signal

#### Delay/Echoes

Another effect which is interesting to listen to, just as the name implies, it echoes the past song samples which can give an interesting effect similar to that of filtering. It can cluster sounds, and ultimately add some "chaos".

In [None]:
def apply_echoes(signal, delay, decay):
    # Calculate the number of samples to delay
    delay_samples = int(delay * 44100)
    # Create an empty array to store the echoed signal
    echoed_signal = np.zeros_like(signal)
    # Apply the echo effect
    for i in range(delay_samples, len(signal)):
        echoed_signal[i] = signal[i] + decay * signal[i - delay_samples]
    return echoed_signal

#### Transient

In [None]:
def apply_transient(signal, attack_time_percent, fs=44100):
    # Calculate the number of samples for attack and decay
    attack_samples = int(attack_time_percent * len(signal))
    decay_samples = len(signal) - attack_samples
    # Create the transient envelope
    envelope = np.zeros_like(signal)
    envelope[:attack_samples] = np.linspace(0, 1, attack_samples)
    envelope[attack_samples:] = np.linspace(1, 0, decay_samples)
    # Apply the transient effect to the signal
    transient_signal = lfilter(envelope, [1], signal)
    return transient_signal

#### Mixing

Mixing itself can be pretty difficult, it is mostly done by ear but there are some rules that all EDM tend to follow. I'm not sure if there's an explicit name for it but I'd like to call it the $2^n$ beat rule, what this says is that all events in a song will happen on iterations of $2^n$, i.e. every 4-16 beats something new will happen in a song. To learn this I highly suggest looking at beat markers on a song in mixing software like Serato DJ Lite or Rekordbox (or whatever you use). What I'd like to implement is a function that mixes 2 (or more) audio signals in a weighed manner.

In [None]:
def audo_mix(songs, weights, max_amplitude=1):
    # Normalize the weights
    weights = weights / np.sum(weights)
    # Mix the songs
    mixed_song = np.zeros_like(songs[0])
    for i in range(len(songs)):
        mixed_song += weights[i] * songs[i]
    # Normalize the amplitude
    mixed_song = max_amplitude * mixed_song / np.max(np.abs(mixed_song))
    return mixed_song

### Try it Yourself

With all these tools you've learned how to:

1) Decompose Songs into Stems (for sampling)

2) Apply Effects (Fx)

and

3) Mix Samples