# Configuration

## Import DALI code

(to be run only once to install it on your computer)

In [1]:
pip install dali-dataset

Note: you may need to restart the kernel to use updated packages.


## Import DALI_data

In [2]:
import DALI as dali_code

In [4]:
dali_data_path = r'D:\0_Maud\Etudes\4_ECL\G2\S8\Cours Bologne\Knowledge Engineering\Project\ke_final_project-july24\dali\DALI_v1.0'
dali_data = dali_code.get_the_DALI_dataset(dali_data_path, skip=[], keep=[])

# Example of usage: song with the id 00c4f2efb0b143728113fad1a78d6ce3

In [5]:
entry = dali_data['00c4f2efb0b143728113fad1a78d6ce3']

## Metadata of the song

In [6]:
entry.info

{'dataset_version': 1.0,
 'ground-truth': False,
 'artist': 'Tool',
 'title': 'Sober',
 'scores': {'NCC': 0.8695741846161744, 'manual': 0.0},
 'audio': {'url': 'Rh98c5BZqiQ', 'path': 'None', 'working': True},
 'id': '00c4f2efb0b143728113fad1a78d6ce3',
 'metadata': {'album': '72826', 'release_date': '1991', 'language': 'english'}}

## Audio-alignment data

### General one (note by note, then paragraph by paragraph)

In [7]:
entry.annotations

{'type': 'horizontal',
 'annot': {'notes': [{'text': "there's",
    'freq': [440.0, 440.0],
    'time': [53.41, 53.60973501820917],
    'index': 0},
   {'text': 'a',
    'freq': [440.0, 440.0],
    'time': [53.60973501820917, 54.00920505462752],
    'index': 1},
   {'text': 'sha',
    'freq': [440.0, 440.0],
    'time': [54.00920505462752, 54.2089400728367],
    'index': 2},
   {'text': 'dow',
    'freq': [391.9954359817493, 391.9954359817493],
    'time': [54.2089400728367, 54.608410109255054],
    'index': 2},
   {'text': 'just',
    'freq': [391.9954359817493, 391.9954359817493],
    'time': [54.608410109255054, 54.80814512746423],
    'index': 3},
   {'text': 'be',
    'freq': [349.2282314330039, 349.2282314330039],
    'time': [54.80814512746423, 55.007880145673404],
    'index': 4},
   {'text': 'hind',
    'freq': [349.2282314330039, 349.2282314330039],
    'time': [55.007880145673404, 55.407350182091754],
    'index': 4},
   {'text': 'me',
    'freq': [293.6647679174075, 293.664

### Note by note (third note)

In [8]:
my_annot = entry.annotations['annot']['notes']
my_annot[2]

{'text': 'sha',
 'freq': [440.0, 440.0],
 'time': [54.00920505462752, 54.2089400728367],
 'index': 2}

### Word by word (third word)

In [9]:
my_annot = entry.annotations['annot']['words']
my_annot[2]

{'text': 'shadow',
 'freq': [391.9954359817493, 440.0],
 'time': [54.00920505462752, 54.608410109255054],
 'index': 0}

### Line by line (first line)

In [10]:
my_annot = entry.annotations['annot']['lines']
my_annot[0]

{'text': "there's a shadow just behind me",
 'freq': [293.6647679174075, 440.0],
 'time': [53.41, 55.806820218510104],
 'index': 0}

### Paragraph by paragraph (first paragraph)

In [11]:
my_annot = entry.annotations['annot']['paragraphs']
my_annot[0]

{'text': "there's a shadow just behind me shrouding every step i take making every promise empty pointing every finger at me",
 'freq': [293.6647679174075, 466.1637615180898],
 'time': [53.41, 65.39410109255056]}

## Extraction to JSON

In [12]:
path_save = r'D:\0_Maud\Etudes\4_ECL\G2\S8\Cours Bologne\Knowledge Engineering\Project\ke_final_project-july24\dali\dali_json'
name = 'my_annot_name_00c4f2efb0b143728113fad1a78d6ce3'

In [13]:
# export
entry.write_json(path_save, name)

/!\ Everything below this line is not up-to-date anymore, I will need to remove it once the newest version works well. The conversion is now taken care outside of dali's folder.

## Step 1: Scrape the Lyrics and Paragraph Structure from Genius

In [1]:
pip install lyricsgenius

Note: you may need to restart the kernel to use updated packages.


In [1]:
import utils
import pprint

In [2]:
song = utils.genius.search_song("Sober", "Tool")

print(song.lyrics)

Searching for "Sober" by Tool...
Done.
110 ContributorsTranslationsTürkçeSober Lyrics[Verse 1]
There's a shadow just behind me
Shrouding every step I take
Making every promise empty
Pointing every finger at me
Waiting like a stalking butler
Who upon the finger rests
Murder now the pattern, must we
Just because the son has come
[Pre-Chorus]
Jesus, won't you fucking whistle
Something but the past and done?
Jesus, won't you fucking whistle
Something but the past and done?

[Chorus]
Why can't we not be sober?
I just want to start this over
Why can't we drink forever?
I just want to start this over

[Verse 2]
I am just a worthless liar
I am just an imbecile
I will only complicate you
Trust in me and fall as well
I will find a center in you
I will chew it up and leave
I will work to elevate you
Just enough to bring you down
You might also like[Pre-Chorus]
Mother Mary, won't you whisper
Something but the past and done?
Mother Mary, won't you whisper
Something but the past and done?

[Chorus]


In [3]:
directory = "./json_lyrics_genius/"
utils.save_song_lyrics_as_json("Tool", "Sober", directory)

Searching for "Sober" by Tool...
Done.
Lyrics saved to ./json_lyrics_genius/Tool_Sober.json


## Step 2: Parse the Genius lyrics into paragraphs and export it in JSON

In [4]:
utils.parse_genius_lyrics("./json_lyrics_genius/Tool_Sober.json", "./json_parsed_lyrics_genius/")

Paragraph Name: Verse 1
Content: There's a shadow just behind me Shrouding every step I take Making every promise empty Pointing every finger at me Waiting like a stalking butler Who upon the finger rests Murder now the pattern, must we Just because the son has come
Singer(s): Tool
------
Paragraph Name: Pre-Chorus
Content: Jesus, won't you fucking whistle Something but the past and done? Jesus, won't you fucking whistle Something but the past and done?
Singer(s): Tool
------
Paragraph Name: Chorus
Content: Why can't we not be sober? I just want to start this over Why can't we drink forever? I just want to start this over
Singer(s): Tool
------
Paragraph Name: Verse 2
Content: I am just a worthless liar I am just an imbecile I will only complicate you Trust in me and fall as well I will find a center in you I will chew it up and leave I will work to elevate you Just enough to bring you down
Singer(s): Tool
------
Paragraph Name: Pre-Chorus 2
Content: Mother Mary, won't you whisper Some

## Step 3: Merge the information of the three JSON files (DALI, Genius lyrics and Genius parsed lyrics) to create the final output JSON file

Note: this function creates the three JSON files and then merge them, you don't need to run everything from scratch

In [2]:
import utils
import pprint

In [5]:
utils.convert_dali_genius_json('00c4f2efb0b143728113fad1a78d6ce3')
# Not working yet

KeyboardInterrupt: 