<a href="https://colab.research.google.com/gist/dadoonet/60e000d41ba0719a68d14d49bbe5345e/elastic_music_search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Humming search
Code from blog [Searching by Music: Leveraging Vector Search for Music Information Retrieval](https://www.elastic.co/fr/blog/searching-by-music-leveraging-vector-search-audio-information-retrieval)

**Author:** Alex Salgado

**Modified by:** David Pilato

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dadoonet/music-search/blob/main/elastic_music_search.ipynb)

## Settings

In [64]:
github_url = "https://github.com/dadoonet/music-search.git"
index_name = "my-audio-index"
audio_path = "/content/music-search/dataset"

## Setup

### Install (git clone and dependencies)

In [65]:
!git clone $github_url

fatal: destination path 'music-search' already exists and is not an empty directory.


In [66]:
!pip install elasticsearch



In [67]:
!pip install -qU panns-inference librosa

In [68]:
# Import necessary modules for audio display from IPython
from IPython.display import Audio, display

### Load the ML model

In [69]:
# Install the ML model locally
from panns_inference import AudioTagging

# load the default model into the gpu.
model = AudioTagging(checkpoint_path=None, device='cuda') # change device to cpu if a gpu is not available

Checkpoint path: /root/panns_data/Cnn14_mAP=0.431.pth
Using CPU.


### Test connection with Elasticsearch

In [70]:
# Connect to elasticsearch
from elasticsearch import Elasticsearch
import getpass

es_cloud_id = getpass.getpass('Enter Elastic Cloud ID:  ')
es_user = getpass.getpass('Enter cluster username:  ')
es_pass = getpass.getpass('Enter cluster password:  ')
es = Elasticsearch(cloud_id=es_cloud_id,
                   basic_auth=(es_user, es_pass)
                   )
es.info() # should return cluster info

Enter Elastic Cloud ID:  ··········
Enter cluster username:  ··········
Enter cluster password:  ··········


ObjectApiResponse({'name': 'instance-0000000000', 'cluster_name': 'fef139f8c01d46ed8832ed084ba4d27f', 'cluster_uuid': 'lvWGFWRlTvmZ4aTPuAnf4A', 'version': {'number': '8.9.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '8aa461beb06aa0417a231c345a1b8c38fb498a0d', 'build_date': '2023-07-19T14:43:58.555259655Z', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'})

### Create Elasticsearch index

In [71]:
from elasticsearch import Elasticsearch

# Specify index configuration
index_config = {
  "mappings": {
    "_source": {
          "excludes": ["audio-embedding"]
      },
    "properties": {
      "audio-embedding": {
        "type": "dense_vector",
        "dims": 2048,
        "index": True,
        "similarity": "cosine"
      },
      "path": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "timestamp": {
        "type": "date"
      },
      "title": {
        "type": "text"
      },
      "genre": {
        "type": "text"
      }
    }
  }
}

# Delete existing index
if es.indices.exists(index=index_name):
    index_deletion = es.indices.delete(index=index_name)
    print("index deleted: ", index_deletion)

# Create index
index_creation = es.indices.create(index=index_name, body=index_config)
print("index created: ", index_creation)

index deleted:  {'acknowledged': True}


  index_creation = es.indices.create(index=index_name, body=index_config)


index created:  {'acknowledged': True, 'shards_acknowledged': True, 'index': 'my-audio-index'}


## Code

### List audio files from music dir

In [72]:
import os

def list_audio_files(directory):
    # The list to store the names of .wav files
    audio_files = []

    # Check if the path exists
    if os.path.exists(directory):
        # Walk the directory
        for root, dirs, files in os.walk(directory):
            for file in files:
                # Check if the file is a .wav file
                if file.endswith('.wav'):
                    # Extract the filename from the path
                    filename = os.path.splitext(file)[0]
                    print(filename)

                    # Add the file to the list
                    audio_files.append(file)
    else:
        print(f"The directory '{directory}' does not exist.")

    # Return the list of .wav files
    return audio_files

### Generate embeddings from a Wav file and normalize

In [73]:
import numpy as np
import librosa

# Function to normalize a vector. Normalizing a vector means adjusting the values measured in different scales to a common scale.
def normalize(v):
   # np.linalg.norm computes the vector's norm (magnitude). The norm is the total length of all vectors in a space.
   norm = np.linalg.norm(v)
   if norm == 0:
        return v

   # Return the normalized vector.
   return v / norm

# Function to get an embedding of an audio file. An embedding is a reduced-dimensionality representation of the file.
def get_embedding (audio_file):

  # Load the audio file using librosa's load function, which returns an audio time series and its corresponding sample rate.
  a, _ = librosa.load(audio_file, sr=44100)

  # Reshape the audio time series to have an extra dimension, which is required by the model's inference function.
  query_audio = a[None, :]

  # Perform inference on the reshaped audio using the model. This returns an embedding of the audio.
  _, emb = model.inference(query_audio)

  # Normalize the embedding. This scales the embedding to have a length (magnitude) of 1, while maintaining its direction.
  normalized_v = normalize(emb[0])

  # Return the normalized embedding required for dot_product elastic similarity dense vector
  return normalized_v

### Index into Elasticsearch

In [74]:
from datetime import datetime

# Storing Songs in Elasticsearch with Vector Embeddings:
def store_in_elasticsearch(song, embedding, path, index_name, genre, vec_field):
  body = {
      'audio-embedding' : embedding,
      'title': song,
      'timestamp': datetime.now(),
      'path' : path,
      'genre' : genre

  }

  es.index(index=index_name, document=body)
  print ("stored...",song, embedding, path, genre, index_name)

### Search for a similar song

In [92]:
# Define a function to query audio vector in Elasticsearch
def query_audio_vector(es, emb, field_key, index_name):
    # Initialize the query structure
    # It's a bool filter query that checks if the field exists
    query = {
        "bool": {
            "filter": [{
                "exists": {
                    "field": field_key
                }
            }]
        }
    }

    # KNN search parameters
    # field is the name of the field to perform the search on
    # k is the number of nearest neighbors to find
    # num_candidates is the number of candidates to consider (more means slower but potentially more accurate results)
    # query_vector is the vector to find nearest neighbors for
    # boost is the multiplier for scores (higher means this match is considered more important)
    knn = {
        "field": field_key,
        "k": 5,
        "num_candidates": 100,
        "query_vector": emb,
        "boost": 100
    }

    # The fields to retrieve from the matching documents
    fields = ["title", "path", "genre", "body_content", "url"]

    # The name of the index to search
    index = index_name

    # Perform the search
    # index is the name of the index to search
    # query is the query to use to find matching documents
    # knn is the parameters for KNN search
    # fields is the fields to retrieve from the matching documents
    # size is the maximum number of matches to return
    # source is whether to include the source document in the results
    resp = es.search(index=index,
                     query=query,
                     knn=knn,
                     fields=fields,
                     size=5,
                     source=False)

    # Return the search results
    return resp


## Run

### List audio files

In [76]:
audio_files = list_audio_files(audio_path)

mozart_symphony25_jazz-with-saxophone
mozart_symphony25_tribal-drums-and-flute
mozart_symphony25_piano-solo
bella_ciao_piano-solo
bella_ciao_a-cappella-chorus
bella_ciao_tribal-drums-and-flute
mozart_symphony25_string-quartet
mozart_symphony25_guitar-solo
mozart_symphony25_prompt
mozart_symphony25_electronic-synth-lead
a-cappella-chorus
bella_ciao_string-quartet
bella_ciao_guitar-solo
mozart_symphony25_opera-singer
bella_ciao_humming
bella_ciao_opera-singer
bella_ciao_electronic-synth-lead
bella_ciao_jazz-with-saxophone


### Read each file and index it

In [77]:

# Initialize a list genre for test
genre_lst = ['jazz', 'opera', 'piano','prompt', 'humming', 'string', 'capella', 'eletronic', 'guitar']

for filename in audio_files:
  audio_file = audio_path + "/" + filename

  emb = get_embedding(audio_file)

  song = filename.lower()

  # Compare if genre list exists inside the song
  genre = next((g for g in genre_lst if g in song), "generic")

  store_in_elasticsearch(song, emb, audio_file, index_name, genre, 2 )


stored... mozart_symphony25_jazz-with-saxophone.wav [0.        0.        0.        ... 0.0032542 0.        0.       ] /content/music-search/dataset/mozart_symphony25_jazz-with-saxophone.wav jazz my-audio-index
stored... mozart_symphony25_tribal-drums-and-flute.wav [0.         0.         0.         ... 0.03785052 0.03278063 0.        ] /content/music-search/dataset/mozart_symphony25_tribal-drums-and-flute.wav generic my-audio-index
stored... mozart_symphony25_piano-solo.wav [0.         0.00863423 0.         ... 0.00270792 0.02372581 0.        ] /content/music-search/dataset/mozart_symphony25_piano-solo.wav piano my-audio-index
stored... bella_ciao_piano-solo.wav [0.         0.         0.         ... 0.01568016 0.04890013 0.        ] /content/music-search/dataset/bella_ciao_piano-solo.wav piano my-audio-index
stored... bella_ciao_a-cappella-chorus.wav [0.         0.03191019 0.         ... 0.03001107 0.00014891 0.        ] /content/music-search/dataset/bella_ciao_a-cappella-chorus.wav gen

### Find a similar song (Bella Ciao)

In [95]:
# Provide the URL of the audio file
audio_file = "/content/music-search/dataset/bella_ciao_humming.wav"

# Display the audio file in the notebook
Audio(audio_file)

In [96]:
# Generate the embedding vector from the provided audio file
# 'get_embedding' is a function that presumably converts the audio file into a numerical vector
emb = get_embedding(audio_file)

In [None]:
# Display the begining of the embeddings generated from the audio file
emb.tolist()

In [97]:
# Query the Elasticsearch instance 'es' with the embedding vector 'emb', field key 'audio-embedding',
# and index name 'my-audio-index'
# 'query_audio_vector' is a function that performs a search in Elasticsearch using a vector embedding.
# 'tolist()' method is used to convert numpy array to python list if 'emb' is a numpy array.
resp = query_audio_vector (es, emb.tolist(), "audio-embedding", index_name)

In [None]:
resp['hits']

In [98]:
NUM_MUSIC = 5  # example value

for i in range(NUM_MUSIC):
    path = resp['hits']['hits'][i]['fields']['path'][0]
    genre = resp['hits']['hits'][i]['fields']['genre'][0]
    score = resp['hits']['hits'][i]['_score']
    score_padding = ('{: <10}'.format(score))
    genre_padding = ('{: <10}'.format(genre))
    print(f'{score_padding}', f'{genre_padding}', path)


100.0      humming    /content/music-search/dataset/bella_ciao_humming.wav
86.1148    opera      /content/music-search/dataset/bella_ciao_opera-singer.wav
85.94842   generic    /content/music-search/dataset/bella_ciao_a-cappella-chorus.wav
85.65134   generic    /content/music-search/dataset/a-cappella-chorus.wav
84.48779   opera      /content/music-search/dataset/mozart_symphony25_opera-singer.wav


In [100]:
# Listen to the 2nd one (ignore the first as it's the same song)
print(resp['hits']['hits'][1]['fields']['path'][0])
Audio(resp['hits']['hits'][1]['fields']['path'][0])

/content/music-search/dataset/bella_ciao_opera-singer.wav


In [101]:
print(resp['hits']['hits'][2]['fields']['path'][0])
Audio(resp['hits']['hits'][2]['fields']['path'][0])

/content/music-search/dataset/bella_ciao_a-cappella-chorus.wav


In [102]:
print(resp['hits']['hits'][3]['fields']['path'][0])
Audio(resp['hits']['hits'][3]['fields']['path'][0])

/content/music-search/dataset/a-cappella-chorus.wav


In [103]:
print(resp['hits']['hits'][4]['fields']['path'][0])
Audio(resp['hits']['hits'][4]['fields']['path'][0])

/content/music-search/dataset/mozart_symphony25_opera-singer.wav


### Find a similar song (Mozart)

In [104]:
# Provide the URL of the audio file
audio_file = "/content/music-search/dataset/mozart_symphony25_string-quartet.wav"

# Display the audio file in the notebook
Audio(audio_file)

In [105]:
# Generate the embedding vector from the provided audio file
# 'get_embedding' is a function that presumably converts the audio file into a numerical vector
emb = get_embedding(audio_file)

In [106]:
# Query the Elasticsearch instance 'es' with the embedding vector 'emb', field key 'audio-embedding',
# and index name 'my-audio-index'
# 'query_audio_vector' is a function that performs a search in Elasticsearch using a vector embedding.
# 'tolist()' method is used to convert numpy array to python list if 'emb' is a numpy array.
resp = query_audio_vector (es, emb.tolist(), "audio-embedding", index_name)

In [107]:
for i in range(NUM_MUSIC):
    path = resp['hits']['hits'][i]['fields']['path'][0]
    genre = resp['hits']['hits'][i]['fields']['genre'][0]
    score = resp['hits']['hits'][i]['_score']
    score_padding = ('{: <10}'.format(score))
    genre_padding = ('{: <10}'.format(genre))
    print(f'{score_padding}', f'{genre_padding}', path)


100.0      string     /content/music-search/dataset/mozart_symphony25_string-quartet.wav
93.18552   string     /content/music-search/dataset/bella_ciao_string-quartet.wav
84.62818   jazz       /content/music-search/dataset/mozart_symphony25_jazz-with-saxophone.wav
82.56258   piano      /content/music-search/dataset/mozart_symphony25_piano-solo.wav
81.66269   piano      /content/music-search/dataset/bella_ciao_piano-solo.wav


In [108]:
# Listen to the 2nd one (ignore the first as it's the same song)
print(resp['hits']['hits'][1]['fields']['path'][0])
Audio(resp['hits']['hits'][1]['fields']['path'][0])

/content/music-search/dataset/bella_ciao_string-quartet.wav


In [109]:
print(resp['hits']['hits'][2]['fields']['path'][0])
Audio(resp['hits']['hits'][2]['fields']['path'][0])

/content/music-search/dataset/mozart_symphony25_jazz-with-saxophone.wav


In [110]:
print(resp['hits']['hits'][3]['fields']['path'][0])
Audio(resp['hits']['hits'][3]['fields']['path'][0])

/content/music-search/dataset/mozart_symphony25_piano-solo.wav


In [111]:
print(resp['hits']['hits'][4]['fields']['path'][0])
Audio(resp['hits']['hits'][4]['fields']['path'][0])

/content/music-search/dataset/bella_ciao_piano-solo.wav
