# Essentia Python tutorial

This is a hands-on tutorial for complete newcomers to Essentia. Essentia combines the power of computation speed of the main C++ code with the Python environment, making fast prototyping and scientific research very easy.

To follow this tutorial (and [various Python examples we provide](https://essentia.upf.edu/python_examples.html)) interactively, we provide [Jupyter](http://jupyter.org/) Python notebooks. They are located in the `src/examples/python` folder [in the source code](https://github.com/MTG/essentia/tree/master/src/examples/python). The notebook for this tutorial is `essentia_python_tutorial.ipynb` ([link](https://github.com/MTG/essentia/blob/master/src/examples/python/essentia_python_tutorial.ipynb)). If you are not familiar with Python notebooks, read how to use them [here](http://jupyter.readthedocs.io).


You should have the [NumPy](http://numpy.scipy.org/) package installed for computations with vectors and matrices in Python and [Matplotlib](http://matplotlib.sourceforge.net/) for plotting. Other recommended packages for scientific computing not used in this tutorial but often used with Essentia are [SciPy](http://www.scipy.org/), [scikit-learn](http://scikit-learn.org/), [pandas](https://pandas.pydata.org/), and [seaborn](https://stanford.edu/~mwaskom/software/seaborn/) for visualization.


The big strength of Essentia is its extensive collection of optimized and tested algorithms for audio processing and analysis, all conveniently available within the same library. You can parametrize and re-use these algorithms for specific use-cases and your custom analysis pipelines. For more details on the algorithms, see the [algorithms overview](https://essentia.upf.edu/algorithms_overview.html) and the complete [algorithms reference](https://essentia.upf.edu/algorithms_reference.html).

## Using Essentia in standard mode

There are two modes of using Essentia, *standard* and *streaming*, and in this section, we will focus on the standard mode. See next section for the streaming mode.

We will have a look at some basic functionality:

 - how to load an audio
 - how to perform some numerical operations such as FFT
 - how to plot results
 - how to output results to a file
 
### Exploring the Python module

Let’s investigate a bit the Essentia package.

In [None]:
import os
!pip install essentia
try:
  import essentia
except Exception as _e:
  os.system('git clone https://github.com/DJStompZone/essentia --recurse-submodules')
  os.chdir('essentia')
  os.system("pip install setup.py")
import essentia

if "essentia_python_tutorial.ipynb" not in os.listdir():
  if 'essentia' in os.listdir():
    os.chdir('essentia')
  if os.path.exists("src/examples/python/essentia_python_tutorial.ipynb"):
    os.chdir("src/examples/python")
  else:
    raise Exception("ValueError: Unable to find tutorial path (src/examples/python/essentia_python_tutorial.ipynb)")

print(dir(essentia))
# there are two operating modes in essentia which (mostly) have the same algorithms
# they are accessible via two submodules:
import essentia.standard
import essentia.streaming

# let's have a look at what is in there
print(dir(essentia.standard))

# you can also do it by using autocompletion in Jupyter/IPython, typing "essentia.standard." and pressing Tab
    

This list contains all Essentia algorithms available in standard mode. You can have an inline help for the algorithms you are interested in using `help` command (you can also see it by typing `MFCC` in Jupyter/IPython). You can also use our online [algorithm reference](http://essentia.upf.edu/documentation/algorithms_reference.html).

In [None]:
help(essentia.standard.MFCC)

### Instantiating our first algorithm, loading some audio

Before you can use algorithms in Essentia, you first need to instantiate (create) them. When doing so, you can give them parameters which they may need to work properly, such as the filename of the audio file in the case of an audio loader.

Once you have instantiated an algorithm, nothing has happened yet, but your algorithm is ready to be used and works like a function, that is, *you have to call it to make stuff happen* (technically, it is a [function object]( <http://en.wikipedia.org/wiki/Function_object>)).

Essentia has a selection of audio loaders:

- [AudioLoader](http://essentia.upf.edu/documentation/reference/std_AudioLoader.html): the most generic one, returns the audio samples, sampling rate and number of channels, and some other related information
- [MonoLoader](http://essentia.upf.edu/documentation/reference/std_MonoLoader.html): returns audio, down-mixed and resampled to a given sampling rate
- [EasyLoader](http://essentia.upf.edu/documentation/reference/std_EasyLoader.html): a MonoLoader which can optionally trim start/end slices and rescale according to a ReplayGain value
- [EqloudLoader](http://essentia.upf.edu/documentation/reference/std_EqloudLoader.html): an EasyLoader that applies an equal-loudness filtering to the audio


In [None]:
# we start by instantiating the audio loader:
loader = essentia.standard.MonoLoader(filename='../../../test/audio/recorded/dubstep.wav')

# and then we actually perform the loading:
audio = loader()

In [None]:
# This is what the audio we want to process sounds like
import IPython
IPython.display.Audio('../../../test/audio/recorded/dubstep.wav')

By default, the MonoLoader will output audio with 44100Hz sample rate downmixed to mono. To make sure that this actually worked, let's plot a 1-second slice of audio, from t = 1 sec to t = 2 sec:

In [None]:
# pylab contains the plot() function, as well as figure, etc... (same names as Matlab)
from pylab import plot, show, figure, imshow
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (15, 6) # set plot sizes to something larger than default

plot(audio[1*44100:2*44100])
plt.title("This is how the 2nd second of this audio looks:")
show()

### Computing spectrum, mel bands energies, and MFCCs
So let's say that we want to compute spectral energy in mel bands and the associated [MFCCs](http://en.wikipedia.org/wiki/Mel-frequency_cepstral_coefficient) for the frames in our audio.

We will need the following algorithms: [Windowing](https://essentia.upf.edu/reference/std_Windowing.html), [Spectrum](https://essentia.upf.edu/reference/std_Spectrum.html), [MFCC](https://essentia.upf.edu/reference/std_MFCC.html). For windowing, we'll specify to use [Hann](https://en.wikipedia.org/wiki/Hann_function) window.

In [None]:
from essentia.standard import *
w = Windowing(type = 'hann')
spectrum = Spectrum()  # FFT() would return the complex FFT, here we just want the magnitude spectrum
mfcc = MFCC()

Once algorithms have been instantiated, they work like normal functions. Note that the MFCC algorithm returns two values: the band energies and the coefficients. Let's compute and plot the spectrum, mel band energies, and MFCCs for a frame of audio:

In [None]:
frame = audio[6*44100 : 6*44100 + 1024]
spec = spectrum(w(frame))
mfcc_bands, mfcc_coeffs = mfcc(spec)

plot(spec)
plt.title("The spectrum of a frame:")
show()

plot(mfcc_bands)
plt.title("Mel band spectral energies of a frame:")
show()

plot(mfcc_coeffs)
plt.title("First 13 MFCCs of a frame:")
show()

In the case of mel band energies, sometimes you may want to apply log normalization, which can be done using [UnaryOperator](http://essentia.upf.edu/documentation/reference/std_UnaryOperator.html). Using this algorithm we can do different types of normalization on vectors.

In [None]:
logNorm = UnaryOperator(type='log')
plot(logNorm(mfcc_bands))
plt.title("Log-normalized mel band spectral energies of a frame:")
show()

### Computations on frames
Now let's compute the mel band energies and MFCCs in all frames.

A typical Matlab-like way we would do it is by slicing the frames manually (the first frame starts at the moment 0, that is, with the first audio sample):

In [None]:
mfccs = []
melbands = []
frameSize = 1024
hopSize = 512

for fstart in range(0, len(audio)-frameSize, hopSize):
    frame = audio[fstart:fstart+frameSize]
    mfcc_bands, mfcc_coeffs = mfcc(spectrum(w(frame)))
    mfccs.append(mfcc_coeffs)
    melbands.append(mfcc_bands)

This is OK, but there is a much nicer way of computing frames in Essentia by using *FrameGenerator*, the [FrameCutter](http://essentia.upf.edu/documentation/reference/std_FrameCutter.html) algorithm wrapped into a Python generator:

In [None]:
mfccs = []
melbands = []
melbands_log = []

for frame in FrameGenerator(audio, frameSize=1024, hopSize=512, startFromZero=True):
    mfcc_bands, mfcc_coeffs = mfcc(spectrum(w(frame)))
    mfccs.append(mfcc_coeffs)
    melbands.append(mfcc_bands)
    melbands_log.append(logNorm(mfcc_bands))

# transpose to have it in a better shape
# we need to convert the list to an essentia.array first (== numpy.array of floats)
mfccs = essentia.array(mfccs).T
melbands = essentia.array(melbands).T
melbands_log = essentia.array(melbands_log).T

# and plot
imshow(melbands[:,:], aspect = 'auto', origin='lower', interpolation='none')
plt.title("Mel band spectral energies in frames")
show()

imshow(melbands_log[:,:], aspect = 'auto', origin='lower', interpolation='none')
plt.title("Log-normalized mel band spectral energies in frames")
show()

imshow(mfccs[1:,:], aspect='auto', origin='lower', interpolation='none')
plt.title("MFCCs in frames")
show()

You can configure frame and hop size of the frame generator, and whether to start the first frame or to center it at zero position in time. For the complete list of available parameters see the documentation for the [FrameCutter](http://essentia.upf.edu/documentation/reference/std_FrameCutter.html).

Note, that when plotting MFCCs, we ignored the first coefficient to disregard the power of the signal and only plot its spectral shape.


### Storing results to Pool
A **Pool** is a container similar to a C++ map or Python dict which can contain any type of values (easy in Python, not as much in C++...). Values are stored in there using a name which represents the full path to these values with dot (`.`) characters used as separators. You can think of it as a directory tree, or as namespace(s) plus a local name.

Examples of valid names are: ``"bpm"``, ``"lowlevel.mfcc"``, ``"highlevel.genre.rock.probability"``, etc...

Let's redo the previous computations using a pool. The pool has the nice advantage that the data you get out of it is already in an ``essentia.array`` format (which is equal to numpy.array of floats), so you can call transpose (``.T``) directly on it.

In [None]:
pool = essentia.Pool()

for frame in FrameGenerator(audio, frameSize = 1024, hopSize = 512, startFromZero=True):
    mfcc_bands, mfcc_coeffs = mfcc(spectrum(w(frame)))
    pool.add('lowlevel.mfcc', mfcc_coeffs)
    pool.add('lowlevel.mfcc_bands', mfcc_bands)
    pool.add('lowlevel.mfcc_bands_log', logNorm(mfcc_bands))

imshow(pool['lowlevel.mfcc_bands'].T, aspect = 'auto', origin='lower', interpolation='none')
plt.title("Mel band spectral energies in frames")
show()

imshow(pool['lowlevel.mfcc_bands_log'].T, aspect = 'auto', origin='lower', interpolation='none')
plt.title("Log-normalized mel band spectral energies in frames")
show()

imshow(pool['lowlevel.mfcc'].T[1:,:], aspect='auto', origin='lower', interpolation='none')
plt.title("MFCCs in frames")
show()

### Aggregation and file output
As we are using Python, we could use its facilities for writing data to a file, but for the sake of this tutorial let's do it using the [YamlOutput](http://essentia.upf.edu/documentation/reference/std_YamlOutput.html) algorithm, which writes a pool in a file using the [YAML](http://yaml.org/) or [JSON](http://en.wikipedia.org/wiki/JSON) format.

In [None]:
output = YamlOutput(filename = 'mfcc.sig.json', format="json") # use "format = 'json'" for JSON output
output(pool)

# or as a one-liner:
YamlOutput(filename = 'mfcc.sig', format="yaml")(pool)

This can take a while as we actually write the MFCCs for all the frames, which
can be quite heavy depending on the duration of your audio file.

Now let's assume we do not want all the frames but only the mean and standard deviation of those frames. We can do this using the [PoolAggregator](http://essentia.upf.edu/documentation/reference/std_PoolAggregator.html) algorithm on our pool with frame value to get a new pool with the aggregated descriptors (check the documentation for this algorithm to get an idea of other statistics it can compute):

In [None]:
# compute mean and variance of the frames
aggrPool = PoolAggregator(defaultStats = [ 'mean', 'stdev' ])(pool)

print('Original pool descriptor names:')
print(pool.descriptorNames())
print('')
print('Aggregated pool descriptor names:')
print(aggrPool.descriptorNames())
print('')

# and ouput those results in a file
YamlOutput(filename = 'mfccaggr.sig')(aggrPool)

This is how the file with aggregated descriptors looks like:

In [None]:
!cat mfccaggr.sig

### Summary and more examples
There is not much more to know for using Essentia in standard mode in Python, the basics are:

* instantiate and configure algorithms
* use them to compute some results
* and that's pretty much it!

You can find various Python examples in the ```src/examples/python``` folder [in the source code](https://github.com/MTG/essentia/tree/master/src/examples/python), for example:

* computing spectral centroid ([example_spectral_spectralcentroid.py](https://github.com/MTG/essentia/blob/master/src/examples/python/example_spectral_spectralcentroid.py))
* onset detection ([example_rhythm_onsetdetection.py](https://github.com/MTG/essentia/blob/master/src/examples/python/example_rhythm_onsetdetection.py))
* predominant melody detection ([example_pitch_predominantmelody.py](https://github.com/MTG/essentia/blob/master/src/examples/python/example_pitch_predominantmelody.py) and [example_pitch_predominantmelody_by_steps.py](https://github.com/MTG/essentia/blob/master/src/examples/python/example_pitch_predominantmelody_by_steps.py))



## Using Essentia in streaming mode

In this section, we will consider how to use Essentia in streaming mode.

The main difference between standard and streaming is that the standard mode is imperative while the streaming mode is declarative. That means that in standard mode, you tell exactly the computer what to do, whereas in the streaming mode, you "declare"  what is needed to be done, and you let the computer do it itself. One big advantage of the streaming mode is that the memory consumption is greatly reduced, as you don't need to load the entire audio in memory. Also, streaming mode allows reducing the amount of code which may be very significant for larger projects. Let's have a look at it.



As usual, first import the essentia module:

In [None]:
import essentia
from essentia.streaming import *

Instantiate our algorithms:

In [None]:
loader = MonoLoader(filename = '../../../test/audio/recorded/dubstep.wav')
frameCutter = FrameCutter(frameSize = 1024, hopSize = 512)
w = Windowing(type = 'hann')
spec = Spectrum()
mfcc = MFCC()

In streaming, instead of calling algorithms like functions, we need to connect their inputs and outputs. This is done using the `>>` operator.

For example, the graph we want to connect looks like this:

```
----------      ------------      -----------      --------------      --------------
MonoLoader      FrameCutter       Windowing        Spectrum            MFCC
     audio ---> signal frame ---> frame frame ---> frame spectrum ---> spectrum bands ---> ???
                                                                                mfcc  ---> ???
----------      ------------      -----------      --------------      --------------
```


In [None]:
loader.audio >> frameCutter.signal
frameCutter.frame >> w.frame >> spec.frame
spec.spectrum >> mfcc.spectrum

When building a network, all inputs need to be connected, no matter what, otherwise the network cannot be started and we get an error message:

In [None]:
try:
  essentia.run(loader)
except Exception as expected:
  print(expected)

In our case, the outputs of the MFCC algorithm were not connected anywhere. Let's store *mfcc* values in the pool and ignore *bands* values.

```
----------      ------------      -----------      --------------      --------------
MonoLoader      FrameCutter       Windowing        Spectrum            MFCC
     audio ---> signal frame ---> frame frame ---> frame spectrum ---> spectrum bands ---> NOWHERE
                                                                                mfcc  ---> Pool: lowlevel.mfcc
----------      ------------      -----------      --------------      --------------
```

In [None]:
pool = essentia.Pool()

mfcc.bands >> None
mfcc.mfcc >> (pool, 'lowlevel.mfcc')

essentia.run(loader)

print('Pool contains %d frames of MFCCs' % len(pool['lowlevel.mfcc']))

### Let's try writing directly to a text file instead of a pool and yaml files

We first need to disconnect the old connection to the pool to avoid putting the same data in there again.

In [None]:
mfcc.mfcc.disconnect((pool, 'lowlevel.mfcc'))

We create a [FileOutput](https://essentia.upf.edu/reference/streaming_FileOutput.html) and connect it. It is a special connection that has no input because it can actually take any type of input (the other algorithms will complain if you try to connect an output to an input of a different type).

In [None]:
fileout = FileOutput(filename = 'mfccframes.txt')
mfcc.mfcc >> fileout

Reset the network otherwise the loader in particular will not do anything useful, and rerun the network

In [None]:
essentia.reset(loader)
essentia.run(loader)

This is the resulting file (the first 10 lines correspond to the first 10 frames):

In [None]:
!head mfccframes.txt -n 10

### Examples

* extracting key by steps ([example_key_by_steps_streaming.py](https://github.com/MTG/essentia/blob/master/src/examples/python/example_tonal_key_by_steps_streaming.py))