### MIR-Libraries
- https://essentia.upf.edu/
- https://librosa.org/
- https://madmom.readthedocs.io/en/latest/
- https://www.audeering.com/research/opensmile/
- ...
- https://qmro.qmul.ac.uk/xmlui/bitstream/handle/123456789/13075/Moffatt%20AN%20EVALUATION%20OF%20AUDIO%20FEATURE%202015%20Published.pdf?sequence=2

### Installation
- Essentia Mac/Linux: pip install oder homebrew formula
- Essentia docker image to run a jupyter notebook: https://hub.docker.com/r/mtgupf/essentia/
- Essentia Docker Image mit Librosa: https://github.com/Maerdm/MIR-toolbox-docker
- docker pull maedd/mir-toolbox_librosa
- Run the Image: docker run -d --name MIR_Tutorial -p 8888:8888 -e JUPYTER_TOKEN="mir" --mount type=bind,source=$(pwd),target=/notebooks maedd/mir-toolbox_librosa
- anaconda/jupyter: https://www.youtube.com/watch?v=tFVjzORFmdI

In [None]:
! pip install essentia-tensorflow
! pip install librosa
! pip install matplotlib

In [None]:
import essentia.standard as es
import essentia
import librosa
import numpy as np
import IPython
from pylab import plot, show, figure, imshow
%matplotlib inline
import matplotlib.pyplot as plt
import soundfile as sf
import math

In [None]:
print(dir(es))
help(es.MFCC)

### Audio Laden:
- AudioLoader (stereo) 
- MonoLoader (mono)
- EasyLoader (mono, normalisiert)
- EqloudLoader (mono, normalisiert, EqualLoudness Filter)

In [7]:
sr = 44100

file = '../AudioFiles/DontStop.mp3'

audio = es.MonoLoader(filename=file, sampleRate=sr)()
y, sr = librosa.load(file)

In [None]:
IPython.display.Audio('../AudioFiles/DontStop.mp3')

In [None]:
plot(audio[:1000])

### Lowlevel Features zeitbereich

In [None]:
z_essentia = es.ZeroCrossingRate()(audio)
z_librosa = librosa.feature.zero_crossing_rate(audio)
print(z_essentia)
print(np.mean(z_librosa))

### Spektrum berechnen

<img src="../Sonstiges/Bilder/FFT.png" align="left" width="450" />

In [None]:
w = es.Windowing(type='hann')
spectrum = es.Spectrum()
lin2db = es.UnaryOperator(type='lin2db')
 
frame = audio[44100*2 : 44100*2+1024]

spec = lin2db(spectrum(w(frame)))

plot(spec)
plt.title("The spectrum of a frame:")
show()

imshow(np.array([spec]).T, aspect = 'auto', origin='lower')
plt.title("Spectrum")
show()

In [None]:
w = es.Windowing(type='hann')
spectrum = es.Spectrum()
logNorm = es.UnaryOperator(type='lin2db') # logarithmic dB scale (log10(x) * 20)
pool = essentia.Pool()

frameSize=1024
hopSize=512

for frame in es.FrameGenerator(audio[:44100], frameSize=frameSize, hopSize=hopSize):
    spec = logNorm(spectrum(w(frame)))
    pool.add('spec', spec)

imshow(pool['spec'].T, aspect = 'auto', origin='lower', interpolation='none')
plt.title("Spectrum")
plt.show()

# frequency resolution = frameSize/2

### Log Frequency Spektrum und Pitch Chroma

In [None]:
chromatic = '../Audiofiles/chromaTones.mp3'
chromatic_audio = es.MonoLoader(filename=chromatic, sampleRate=sr)()
IPython.display.Audio(chromatic)

In [None]:

w = es.Windowing(type='hann')
spectrum = es.Spectrum()
logNorm = es.UnaryOperator(type='lin2db', scale=1) # logarithmic dB scale (log10(x) * 20)
pool = essentia.Pool()
spectrum_logfreq = es.LogSpectrum(binsPerSemitone=3, frameSize=frameSize)

frameSize=1024
hopSize=512

for frame in es.FrameGenerator(chromatic_audio, frameSize=frameSize, hopSize=hopSize):
    spec = spectrum(w(frame))
    frame_spec, _, _  = spectrum_logfreq(spec) # logarithmic frequency axis
    pool.add('spec', logNorm(spec))
    pool.add('log_spec', logNorm(frame_spec))
    

imshow(pool['spec'].T[1:,:], aspect = 'auto', origin='lower', interpolation='none')
plt.title("Spectrum")
show()

imshow(pool['log_spec'].T, aspect = 'auto', origin='lower', interpolation='none')
plt.title("Spectrum")
plt.show()

#### Chromagram
- Aufteilung in 12 Töne (C, C#, D, D#, ...), Oktaven werden zusammengefasst
######
- --> robuster

In [None]:
chroma = librosa.feature.chroma_stft(y=chromatic_audio)
img = librosa.display.specshow(chroma, y_axis='chroma', x_axis='time')
# --> Essentia: HPCP Algorithm (https://essentia.upf.edu/tutorial_tonal_hpcpkeyscale.html) --> spectral Peaks als Input
# --> Key(), ChordsDescriptors(), ChordsDetection()

### Mel-Spektrogram und Mel-Frequency Cepstral Coefficients (MFCCs)

##### Mel-Skala:
<img src="../Sonstiges/Bilder/Mel_Scale.png" align="left" width="450" />

In [None]:
w = es.Windowing(type='hann')
spectrum = es.Spectrum()

mel = es.MelBands(numberBands=96)
mfcc = es.MFCC(numberBands=96, numberCoefficients=13)
logNorm = es.UnaryOperator(type='lin2db')

frameSize=2048
hopSize=512

pool.clear()

for frame in es.FrameGenerator(audio, frameSize=frameSize, hopSize=hopSize, startFromZero=False):
    spec = spectrum(w(frame))
    melspec = logNorm(mel(spec))
    bands, coeffs = mfcc(spec)

    pool.add('mel', melspec)
    pool.add('mfccs', coeffs)

imshow(pool['mel'].T, aspect = 'auto', origin='lower', interpolation='none')
plt.title("Mel")
show()

imshow(pool['mfccs'].T[1:,:], aspect = 'auto', origin='lower', interpolation='none')
plt.title("MFCCs")
show()

In [None]:
# spectrogram
spec_lib = librosa.power_to_db(np.abs(librosa.stft(audio, n_fft=frameSize, hop_length=hopSize, window='hann')))

# mel spectrogram
mel_lib = librosa.feature.melspectrogram(y=audio, n_fft=frameSize, hop_length=hopSize, sr=sr, 
                                         center=False,n_mels=96)
mel_lib = librosa.power_to_db(mel_lib)

# mfccs
mfccs = librosa.feature.mfcc(y=audio, n_fft=frameSize, hop_length=hopSize, sr=sr, n_mfcc=13)

imshow(spec_lib, aspect = 'auto', origin='lower', interpolation='none')
show()

imshow(mel_lib, aspect = 'auto', origin='lower', interpolation='none')
show()

imshow(mfccs, aspect = 'auto', origin='lower', interpolation='none')
show()

### Lowlevel Features (Spectral)
- Spectral Centroid: Frequenz, die die spektrale Energie in zwei gleich große Teile teilt 
######
- Spectral Rolloff: Grenzfrequenz, unterhalb derer sich die maßgebliche spektrale Energie befindet (oft 85%)
######
- Spectral Flux: Stärke der spektralen Änderungen (Summe der Differenzen aufeinanderfolgender Betragsspektren)


In [None]:
w = es.Windowing(type='hann')
spectrum = es.Spectrum()
centroid = es.Centroid(range=22050)
flux = es.Flux()
rolloff = es.RollOff()
rms = es.RMS()

pool = essentia.Pool()

frameSize = 1024
hopSize = 512

pool.clear()

for frame in es.FrameGenerator(audio, frameSize=frameSize, hopSize=hopSize, startFromZero=False):  
    pool.add('centroid', centroid(spectrum(w(frame))))
    pool.add('flux', flux(spectrum(frame)))
    pool.add('rolloff', rolloff(spectrum(w(frame))))
    pool.add('rms', rms(w(frame)))
        
#cent_lib = librosa.feature.spectral_centroid(y=audio, n_fft=frameSize, hop_length=hopSize, sr=sr)

fig, ax = plt.subplots(4, 1, sharex=True, sharey=False, figsize=(15, 16))
ax[0].set_title("spectral centroid")
ax[0].plot(pool['centroid'].T)
ax[1].set_title("flux")
ax[1].plot(pool['flux'])
ax[2].set_title("rolloff")
ax[2].plot(pool['rolloff'])
ax[3].set_title("rms")
ax[3].plot(pool['rms'])

plt.show()

### Speichern

In [44]:
# Als JSON-Datei speichern
statistics = es.PoolAggregator(defaultStats = [ 'mean', 'stdev' ])(pool)
# --> https://essentia.upf.edu/reference/std_PoolAggregator.html

es.YamlOutput(filename = '../AudioFiles/' + 'features.json', format='json')(statistics)
es.YamlOutput(filename = '../AudioFiles/' + 'features_frames.json', format='json')(pool)

# pool.clear()

### Onset Detection

<img src="../Sonstiges/Bilder/onset.png" align="left" width="450" />


1. Novelty Function berechnen:
    - https://essentia.upf.edu/reference/std_OnsetDetection.html
###
2. Peak Picking/Detect Onsets:
    - https://essentia.upf.edu/reference/std_Onsets.html
    - --> als Input auch mehrere novelty functions möglich

In [None]:
help(es.Onsets)

In [None]:
# onset Detection kann mehrere novelty functions berechnen
o_hfc = es.OnsetDetection(method='hfc') # spectral based novelty (HFC = High Frequency Content of a spectrum)   
o_rms = es.OnsetDetection(method='rms') # energy based novelty
onsets = es.Onsets() # Computes onset positions/preprocesses Novelty function, Argumente: alpha = 0.001, delay=2

w = es.Windowing(type='hann')
fft = es.FFT()
c2p = es.CartesianToPolar()
pool = essentia.Pool()

# Compute HFC and RMS novelty functions
for frame in es.FrameGenerator(audio, frameSize=1024, hopSize=512):
    magnitude, phase = c2p(fft(w(frame)))
    pool.add('hfc', o_hfc(magnitude, phase))
    pool.add('rms', o_rms(magnitude, phase))

# Peak Pickng/selecting onsets
onsets_hfc = onsets(essentia.array([pool['hfc']]), [1])
onsets_rms = onsets(essentia.array([pool['rms']]), [1])
 
# Plotting
n_frames = len(pool['hfc'])
frames_position_samples = np.array(range(n_frames)) * 512
fig, ax = plt.subplots(3, 1, sharex=True, sharey=False, figsize=(15, 8))
ax[0].set_title("hfc novelty function")
ax[0].plot(frames_position_samples, pool['hfc'].T)
ax[1].set_title("rms novelty function")
ax[1].plot(frames_position_samples, pool['rms'].T)
ax[2].set_title("onsets and waveform")
ax[2].plot(audio)
for onset in onsets_hfc:
    ax[2].axvline(x=onset*44100, color='magenta')

plt.show()


### Essentia Extractors

- Extractor: alle low/mid/high level 
- MusicExtractor: https://acousticbrainz.org/
- FreesoundExtractor: https://freesound.org/
- LowLevelSpectralEqloudExtractor: Spectral Features für die Equal Loudness nötig ist
- LowLevelSpectralExtractor: Spectral Features, für die keine Equal Loudness nötig ist

In [None]:
# Music Extractor
pool = essentia.Pool()
extractor = es.Extractor()(audio)

#aggrpool = es.PoolAggregator(defaultStats = ['mean', 'stdev', 'min', 'max'])(extractor) # 'stdev', 'min', 'max', 'median'
es.YamlOutput(filename = '../AudioFiles/' + 'features.yaml', format='json')(extractor)

# extractor['lowLevel.spectral_centroid']

In [None]:
print(pool.descriptorNames())
pool.clear()
help(es.Extractor)

In [None]:
features, features_frames = es.MusicExtractor(lowlevelStats=["mean"], rhythmStats=["mean"], tonalStats=["mean"])(file)

### ML Models

--> https://essentia.upf.edu/models.html#

- Audio Event Recognition
- Music Style Classification
- Music Auto Tagging
- Transfer learning Classifiers
- Feature Extractors
- Pitch Detection
- Source Seperation
- Tempo Estimation

In [None]:
from essentia.standard import TensorflowPredictMusiCNN
import json

with open('../Models/msd-musicnn-1.json', 'r') as json_file:
    metadata = json.load(json_file)

metadata

In [None]:
file = '../AudioFiles/DontStop.mp3'
model = '../Models/msd-musicnn-1.pb'

audio = es.MonoLoader(filename=file, sampleRate=16000)()
model = TensorflowPredictMusiCNN(graphFilename=model)
activations = model(audio)

In [None]:
len(activations[0])

In [None]:
ig, ax = plt.subplots(1, 1, figsize=(10, 10))
ax.matshow(activations.T, aspect='auto')

ax.set_yticks(range(len(metadata['classes'])))
ax.set_yticklabels(metadata['classes'])
ax.set_xlabel('patch number')
ax.xaxis.set_ticks_position('bottom')
plt.title('Activations')

plt.show()

In [None]:
modelPath = '../Models/gender-musicnn-msd-2.pb'

audio = es.MonoLoader(filename=file, sampleRate=16000)()
model = TensorflowPredictMusiCNN(graphFilename=modelPath)
activations_gender = model(audio)

plot(activations_gender)
for label, probability in zip(['female', 'male'], activations_gender.mean(axis=0)):
    print(f'{label}: {100 * probability:.1f}%')

In [None]:
from essentia.standard import MonoLoader, TensorflowPredictVGGish

activations_dance = TensorflowPredictVGGish(graphFilename='../Models/danceability-vggish.pb')(audio)

plot(activations_dance)
for label, probability in zip(['danceable', 'not_danceable'], activations_dance.mean(axis=0)):
    print(f'{label}: {100 * probability:.1f}%')

#### Spleeter 
- https://spleeter.online/
- https://essentia.upf.edu/models.html#music-style-classification
- --> bis zu 5 Instrumentengruppen separieren


### Streaming Mode

In [None]:
import essentia.streaming as ess

file = '../AudioFiles/DontStop.mp3'

# instantiate
loader = ess.MonoLoader(filename = file)
frameCutter = ess.FrameCutter(frameSize = 1024, hopSize = 512)
w = ess.Windowing(type = 'hann')
spec = ess.Spectrum()
mfcc = ess.MFCC()
centroid = ess.Centroid(range=22050)
rolloff = ess.RollOff()

# connect algorithms
loader.audio >> frameCutter.signal
frameCutter.frame >> w.frame >> spec.frame
spec.spectrum >> mfcc.spectrum
spec.spectrum >> centroid.array
spec.spectrum >> rolloff.spectrum

# connect to Pool
mfcc.bands >> None
mfcc.mfcc >> (pool, 'lowlevel.mfcc')
centroid.centroid >> (pool, 'lowlevel.centroid')
rolloff.rollOff >> (pool, 'lowlevel.rolloff')

# run network
essentia.run(loader)
print(pool['lowlevel.rolloff'])

### Essentia real-time pitch tracker

In [None]:
import soundcard as sc
from collections import Counter

hopSize = 128
frameSize = 2048
sampleRate = 44100
buffer_size = frameSize * 4

# instantiate algorithms
mics = sc.all_microphones()
buffer = np.zeros(buffer_size, dtype='float32')
vimp = ess.VectorInput(buffer)
pitch = ess.PredominantPitchMelodia(guessUnvoiced=False,frameSize=frameSize,hopSize=hopSize, sampleRate=sampleRate)
pitchMel = ess.MultiPitchMelodia(frameSize=frameSize,hopSize=hopSize, sampleRate=sampleRate)
filter = ess.PitchFilter(useAbsolutePitchConfidence=False)
pool = essentia.Pool()

# connect algorithms
vimp.data   >> pitch.signal
pitch.pitch >> filter.pitch
pitch.pitchConfidence >> filter.pitchConfidence
pitch.pitch    >> (pool, 'pitch')
pitch.pitchConfidence  >> (pool, 'confidence')
filter.pitchFiltered >> (pool, 'filterPitch')

def process(data):
    buffer[:] = data.flatten()
    essentia.reset(vimp)
    essentia.run(vimp)

    confidence = np.mean(list(pool['confidence']))
    if confidence > 0.0001:
        pitch = np.array(list(pool['filterPitch']))
        pitch = pitch[pitch != float(0)]
        b = Counter(np.around(np.array(pitch), 1))
        print(b)
        pool.clear()

# capture microphone input
with sc.all_microphones()[1].recorder(samplerate=sampleRate) as mic:
    while True:
        process(mic.record(numframes=buffer_size).mean(axis=1))

### Data Augmentation

In [None]:
from essentia.standard import *
x = np.arange(len(audio))
y = np.sin(2 * np.pi * 8 *  x / sr)

audio_sine = audio * y
sf.write('../AudioFiles/sine.wav', audio_sine, samplerate=sr)
IPython.display.Audio('../AudioFiles/sine.wav')

In [None]:
random = np.random.rand(len(audio),1).T.astype('f')

audio_rand = audio + random[0] * 0.1
sf.write('../AudioFiles/1_Random.wav', audio_rand, samplerate=sr)
IPython.display.Audio('../AudioFiles/1_Random.wav')

In [None]:
addNoise = NoiseAdder(level= -25)(audio)
sf.write('../AudioFiles/1_Random.wav', addNoise, samplerate=sr)
IPython.display.Audio('../AudioFiles/1_Random.wav')

In [None]:
hpf = HighPass(cutoffFrequency=600)(audio)
sf.write('../AudioFiles/highpass.wav', hpf, samplerate=sr)
IPython.display.Audio('../AudioFiles/highpass.wav')

# LowPass, Bandpass, ...

In [None]:
audio_fast = librosa.effects.time_stretch(audio, rate=0.5)
sf.write('../AudioFiles/Resample.wav', audio_fast, samplerate=sr)
IPython.display.Audio('../AudioFiles/Resample.wav')

In [None]:
audio_pitch = librosa.effects.pitch_shift(audio, sr=sr, n_steps = 4)
sf.write('../AudioFiles/pitch.wav', audio_pitch, samplerate=sr)
IPython.display.Audio('../AudioFiles/pitch.wav')

##### Pytorch Data Augmentation

- https://pypi.org/project/torchaudio-augmentations/

### Working with features - Beispiel

In [28]:
import pandas as pd
from sklearn.manifold import TSNE
from scipy.stats import zscore
import seaborn as sns

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

d_path = '/Users/maed/Documents/Projects/MIR/MIR_Tutorials/EssentiaTutorials/MIR_Tutorial/Sonstiges/HSD_Rec_2500_Dimensions_Genre.csv'
df = pd.read_csv(d_path)
df = df.drop(index = [131, 294, 295, 296, 297, 298, 299])

In [None]:
df.head(2)

In [30]:
# T-SNE
non_numeric = ['Unnamed: 0', 'Track']

# drop non numeric and zscore
df_numeric = df.drop(non_numeric, axis=1)
df_z = zscore(df_numeric)
m = TSNE(learning_rate = 50, n_components = 3)
tsne_features = m.fit_transform(df_z)

In [None]:
# 2D T-SNE
df['x'] = tsne_features[:,0]
df['y'] = tsne_features[:,1]
plt = sns.scatterplot(x= 'x', y = 'y', hue=df['voice'], data=df)
plt.plot()

### Weitere Tools, Libraries, ... 

MIR Related:
- https://ismir.net/resources/datasets/
- https://www.audiolabs-erlangen.de/resources/MIR/FMP/C0/C0.html
- https://ismir.net/resources/software-tools/
- https://github.com/jordipons/musicnn-training
- https://music-classification.github.io/tutorial/landing-page.html
- https://developer.spotify.com/documentation/web-api
- https://www.youtube.com/@ValerioVelardoTheSoundofAI
- https://mtg.github.io/essentia-labs/news/tensorflow/2023/02/08/fsdsinet-models/
- https://essentia.upf.edu/api/docs/

Audio Synthese/Programming Tools:
- https://juce.com/
- https://puredata.info/
- https://supercollider.github.io/
- https://cycling74.com/
- https://csound.com/

Hardware Stuff:
- https://blokas.io/pisound/
- https://bela.io/

Also nice:
- https://openframeworks.cc/ --> hat auch Essentia AddOn
- https://derivative.ca/

