Skip to content

alesanmed/lola

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LOLA header logo

LOLA

forthebadge made-with-python

forthebadge forthebadge

License Built with Python Deploy Badge

Play with the project here


Welcome to my final project made to the context of CORE School.

LOLA is a tool for classifying audios to different flamenco styles (palo).

Table of contents

Project and motivation 💃

Flamenco is an art composed of guitar, singing, and dancing. Its origins come from the XVIII century.

Flamenco is divided into styles, called palos. Each of them has different guitar styles, rhythms, as well as lyrics.

Let me illustrate this splitting with an image.

Flamenco styles division shaped like a tree

Even this image does not contain all flamenco styles. If you take a look at this one:

Flamenco styles division shaped as a mind map

There are some styles present only to one of them. All of this is just to illustrate how many different styles are to flamenco.

Even for experienced people, classifying specific audios is hard. That's why I decided to make an expert system for trying to solve this task.

I'm from Andalusia, where flamenco comes from, so contributing in any way to that culture it's a major motivation for me.

Dataset used 📚

The dataset is a self-built one. I manually selected, split and tagged audio files from three different palos:

  • Bulerías
  • Alegrías
  • Sevillanas

The total dataset is composed by 577 alegrias audios, 589 bulerias and 518 sevillanas.

I've built the largest flamenco dataset on the internet. You can download it from here

The dataset structure is:

.
├── alegrias
│   ├── 0.mp3
│   ├── 1.mp3
│   ├── ...
│   └── n.mp3
├── bulerias
│   ├── 0.mp3
│   ├── 1.mp3
│   ├── ...
│   └── n.mp3
└── sevillanas
    ├── 0.mp3
    ├── 1.mp3
    ├── ...
    └── n.mp3

There is another flamenco dataset available, called TONAS. But there are only 72 acapella audio samples.

Models used 🪐

I've used two different approaches to this project. The first is a pure deep learning one, while the latter is a hybrid between a deep learning and a classical machine learning model.

Deep learning 🌌

Deep learning model

This approach takes an audio file as input. That audio is split into portions of 10-15 seconds each, with overlapping.

For each sample, a tempogram, spectrogram, and normalized tempogram are extracted. Each of those images is fed into a convolutional neural network. That network recognizes the images and outputs a prediction for each of them.

Finally, the label with more occurrences among all predictions is taken as the final one.

Machine learning hybrid 👽

Deep learning model

The second approach is a hybrid process involving a neural network called VGGish, which takes an audio, splits it into samples of 0.9 seconds, and, for each sample, generates a 128-lenght vector of numerical features.

Those vectors are fed into a random forest, which generates a prediction.

Local deployment 🔨🔧

Docker 🐳

To deploy this project using Docker, you only have to clone the repository and bring it up with docker-compose.

git clone https://github.com/alesanmed/lola.git

cd lola

docker-compose up

From source ⛲

Open in Visual Studio Code

If you want to take this path, I recommend:

  1. Clone the repository
  2. Open it to your favorite IDE
  3. Follow the steps to each part's README

License

Apache 2.0