# Scraping des paroles de chansons avec LyricsGenius : Une autre solution pour collecter et prétraiter les paroles de manière automatisée

L'analyse des paroles de chansons peut fournir des informations précieuses sur les thèmes, les styles et les messages transmis par les artistes. Cependant, collecter et prétraiter manuellement un grand nombre de paroles de chansons peut être fastidieux et chronophage. C'est là qu'intervient notre solution. En utilisant le package LyricsGenius et un code simple, nous vous offrons une méthode automatisée pour collecter et prétraiter les paroles de chansons en utilisant l'API Genius.

### Notre besoin :

La collecte et le prétraitement des paroles de chansons sont essentiels pour de nombreuses applications telles que l'analyse des tendances musicales, la création de playlists personnalisées et l'étude des thèmes abordés par les artistes. Cependant, cela peut être un défi de collecter des paroles de manière efficace et fiable, en particulier lorsqu'il s'agit d'un grand nombre de chansons provenant de différentes sources. Il est donc nécessaire d'avoir une solution automatisée qui peut collecter et prétraiter les paroles de manière rapide et précise.

In [None]:
# Package for lyrics scraping
!pip install git+https://github.com/johnwmillr/LyricsGenius.git

Collecting git+https://github.com/johnwmillr/LyricsGenius.git
  Cloning https://github.com/johnwmillr/LyricsGenius.git to /tmp/pip-req-build-m2h7yabv
  Running command git clone --filter=blob:none --quiet https://github.com/johnwmillr/LyricsGenius.git /tmp/pip-req-build-m2h7yabv
  Resolved https://github.com/johnwmillr/LyricsGenius.git to commit bec02665b807941ca95e045be910e861789fc4a7
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: lyricsgenius
  Building wheel for lyricsgenius (setup.py) ... [?25ldone
[?25h  Created wheel for lyricsgenius: filename=lyricsgenius-3.0.1-py3-none-any.whl size=44689 sha256=34a78ed76062d63fae20b355ffe7fc7c059c3e602ac2f7200807a621822ad0e6
  Stored in directory: /tmp/pip-ephem-wheel-cache-vuothdc7/wheels/22/d8/54/6e236bc9517965a346a5b6cd223931b13f1eb6ba8676f1078c
Successfully built lyricsgenius
Installing collected packages: lyricsgenius
Successfully installed lyricsgenius-3.0.1
You should consider upgrading via the '

In [None]:
import os 
import requests
import lyricsgenius
import pandas as pd
import numpy as np
import json
from bs4 import BeautifulSoup as bs

# On importe le data set créé dans le notebook précédent
df = pd.read_csv('TESTMERGE.csv', index_col=[0])

df

Unnamed: 0,Unnamed: 0.1,Nom,Artiste,Genre,Annee,Paroles,country_mb
0,0,As It Was,Harry Styles,pop,2022,[Intro] Come on Harry we wanna say goodn...,United Kingdom
1,1,Heat Waves,Glass Animals,pop,2022,[Intro] (Last night all I think about is y...,United Kingdom
2,2,STAY (with Justin Bieber),The Kid LAROI,hip hop,2022,[Chorus: The Kid LAROI] I do the same thing ...,
3,3,Me Porto Bonito,Bad Bunny,reggae,2022,"[Letra de ""Me Porto Bonito""] [Intro: Bad ...",United States
4,4,Tití Me Preguntó,Bad Bunny,reggae,2022,"[Letra de ""Tití Me Preguntó""] [Intro: Bad...",United States
...,...,...,...,...,...,...,...
2445,2445,Maria Maria (feat. The Product G&B),Santana,rock,2000,[Intro: Wyclef Jean] Ladies and gents turn...,United States
2446,2446,Incomplete,Sisqo,r&b,2000,[Intro] Ooh-ooh ooh-ooh ooh Oh yeah ye...,
2447,2447,Country Grammar (Hot Shit),Nelly,pop,2000,Unknown,United States
2448,2448,"Jumpin', Jumpin'",Destiny's Child,pop,2000,Unknown,


In [None]:
# Connection à l'API Genius avec une clé d'accès client
client_access_token='UqUkTFLy9laYU_FmkLexf2_VGBs1XF0q5WfUN_cYVHWydENkb5Zc8pAoSiPb03Zi'
LyricsGenius = lyricsgenius.Genius(client_access_token)

# The package got some timeout issue so these two lines are needed. If you don't then there will be error when you scrape
# temps d'attente maximal à 30 secondes
LyricsGenius.timeout = 30  #timeout
LyricsGenius.sleep = 5


In [None]:
# Créer un nouveau df avec seulement les titres des chansons
lyrics_df = df.Nom
lyrics_df = pd.DataFrame(lyrics_df)
lyrics_df.head()

Unnamed: 0,Nom
0,As It Was
1,Heat Waves
2,STAY (with Justin Bieber)
3,Me Porto Bonito
4,Tití Me Preguntó


In [None]:
# Get the first song
sample_song = lyrics_df.Nom[0]
print(f"Name of the first song in the database is: {sample_song}")

# Search for the song's lyric
searched_song = LyricsGenius.search_song(sample_song)
print(f"The lyric is:\n{searched_song.lyrics}")

Name of the first song in the database is: As It Was
Searching for "As It Was"...
Done.
The lyric is:
205 ContributorsTranslationsTürkçeEspañolPortuguês日本語ItalianoΕλληνικάDeutschFrançaisEnglishEnglishNederlandsShqipPolski한국어As It Was Lyrics[Intro]
Come on, Harry, we wanna say goodnight to you

[Verse 1]
Holdin' me back
Gravity's holdin' me back
I want you to hold out the palm of your hand
Why don't we leave it at that?
Nothin' to say
When everything gets in the way
Seems you cannot be replaced
And I'm the one who will stay, oh-oh-oh

[Chorus]
In this world, it's just us
You know it's not the same as it was
In this world, it's just us
You know it's not the same as it was
As it was, as it was
You know it's not the same

[Verse 2]
Answer the phone
"Harry, you're no good alone
Why are you sitting at home on the floor?
What kind of pills are you on?"
Ringin' the bell
And nobody's coming to help
Your daddy lives by himself
He just wants to know that you're well, oh-oh-oh
See Harry Styles Liv

In [0]:
# Create an array to store each song's lyric
lyrics_arr = []

# Traverse through the database, get the song's lyrics from title, and do some preprocessing
for i in range(len(lyrics_df)):
    # get title
    song_title = lyrics_df.Nom.iloc[i]
    
    # search for song in genius.com
    searched_song = LyricsGenius.search_song(song_title)
    
    # if we can't find a song's lyrics then skip and append empty string
    if searched_song is None:
        lyrics_arr.append("")
        continue
        
    # get the lyric
    lyric = searched_song.lyrics
    
    # replace the lyrics newline with ". "
    lyric = lyric.replace("\n", ". ")
    
    # remove initial non-lyrics character:
    # Source: https://thispointer.com/remove-string-before-a-specific-character-in-python/
    # lyric = lyric[lyric.index('.') + 1 :]
    # append the processed lyric to the array
    lyrics_arr.append(lyric)

Done.
Searching for "Dilemma"...
Done.
Searching for "Without Me"...
Done.
Searching for "Complicated"...
Done.
Searching for "A Thousand Miles"...
Done.
Searching for "Wherever You Will Go"...
Done.
Searching for "Underneath Your Clothes"...
Done.
Searching for "Underneath It All"...
Done.
Searching for "In the End"...
Done.
Searching for "Hero (feat. Josey Scott)"...
Done.
Searching for "Can't Stop"...
Done.
Searching for "Die Another Day"...
Done.
Searching for "Nessaja"...
Done.
Searching for "Tainted Love"...
Done.
Searching for "Till I Collapse"...
Done.
Searching for "A Little Less Conversation - JXL Radio Edit Remix"...
Done.
Searching for "Sk8er Boi"...
Done.
Searching for "The Logical Song"...
Done.
Searching for "By the Way"...
Done.
Searching for "The Ketchup Song (Aserejé) - Spanglish Version"...
Done.
Searching for "Anyone of Us (Stupid Mistake)"...
Done.
Searching for "What I Go To School For"...
Done.
Searching for "Kiss Kiss"...
Done.
Searching for "Move Bitch"...
Done

In [None]:
   # Check length
#len(lyrics_arr)

Unnamed: 0,country_mb,Total Genres
0,Argentina,hip hop
1,Australia,popindieindiepoppoppoppopdancedancehip hophip ...
2,Austria,popdance
3,Bahamas,pop
4,Barbados,pop
5,Belgium,dancepoppoppopdancepoppophousedancedance
6,Brazil,pophousepop
7,British Virgin Islands,pop
8,Canada,popr&bpoppopr&bhip hoprappoppopr&br&bhip hoppo...
9,China,dance


In [None]:
lyrics_df

Unnamed: 0,Nom
0,As It Was
1,Heat Waves
2,STAY (with Justin Bieber)
3,Me Porto Bonito
4,Tití Me Preguntó
...,...
2445,Maria Maria (feat. The Product G&B)
2446,Incomplete
2447,Country Grammar (Hot Shit)
2448,"Jumpin', Jumpin'"


In [None]:
lyrics_arr

['205 ContributorsTranslationsTürkçeEspañolPortuguês日本語ItalianoΕλληνικάDeutschFrançaisEnglishEnglishNederlandsShqipPolski한국어As It Was Lyrics[Intro]. Come on, Harry, we wanna say goodnight to you. . [Verse 1]. Holdin\' me back. Gravity\'s holdin\' me back. I want you to hold out the palm of your hand. Why don\'t we leave it at that?. Nothin\' to say. When everything gets in the way. Seems you cannot be replaced. And I\'m the one who will stay, oh-oh-oh. . [Chorus]. In this world, it\'s just us. You know it\'s not the same as it was. In this world, it\'s just us. You know it\'s not the same as it was. As it was, as it was. You know it\'s not the same. . [Verse 2]. Answer the phone. "Harry, you\'re no good alone. Why are you sitting at home on the floor?. What kind of pills are you on?". Ringin\' the bell. And nobody\'s coming to help. Your daddy lives by himself. He just wants to know that you\'re well, oh-oh-oh. See Harry Styles LiveGet tickets as low as $60You might also like[Chorus]. 

In [None]:
# Add the lyrics_arr to the dataframe
df['Lyrics'] = lyrics_arr

In [None]:
df

Unnamed: 0,Unnamed: 0.1,Nom,Artiste,Genre,Annee,Paroles,country_mb,Lyrics
0,0,As It Was,Harry Styles,pop,2022,[Intro] Come on Harry we wanna say goodn...,United Kingdom,205 ContributorsTranslationsTürkçeEspañolPortu...
1,1,Heat Waves,Glass Animals,pop,2022,[Intro] (Last night all I think about is y...,United Kingdom,135 ContributorsTranslationsItalianoDeutschFra...
2,2,STAY (with Justin Bieber),The Kid LAROI,hip hop,2022,[Chorus: The Kid LAROI] I do the same thing ...,,1 ContributorSTAY (with Justin Bieber) LyricsI...
3,3,Me Porto Bonito,Bad Bunny,reggae,2022,"[Letra de ""Me Porto Bonito""] [Intro: Bad ...",United States,29 ContributorsTranslationsEnglishDeutschFranç...
4,4,Tití Me Preguntó,Bad Bunny,reggae,2022,"[Letra de ""Tití Me Preguntó""] [Intro: Bad...",United States,39 ContributorsTranslationsEnglishDeutschItali...
...,...,...,...,...,...,...,...,...
2445,2445,Maria Maria (feat. The Product G&B),Santana,rock,2000,[Intro: Wyclef Jean] Ladies and gents turn...,United States,"2 ContributorsMaria, Maria LyricsLadies and ge..."
2446,2446,Incomplete,Sisqo,r&b,2000,[Intro] Ooh-ooh ooh-ooh ooh Oh yeah ye...,,11 ContributorsIncomplete Lyrics[Intro]. Ooh-o...
2447,2447,Country Grammar (Hot Shit),Nelly,pop,2000,Unknown,United States,87 ContributorsCountry Grammar (Hot Shit) Lyri...
2448,2448,"Jumpin', Jumpin'",Destiny's Child,pop,2000,Unknown,,"43 ContributorsJumpin’, Jumpin’ Lyrics[Chorus:..."


In [None]:
# Save the dataframe as a CSV file
df.to_csv('my_data.csv', index=False)


Cette solution automatisée permet ainsi de collecter et de prétraiter les paroles de chansons en utilisant l'API Genius de manière efficace et reproductible. On peut ensuite utiliser ces données prétraitées pour effectuer différentes analyses, telles que l'analyse des thèmes et des sentiments et la visualisation des tendances dans les paroles de chansons dans notre cas.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=73a2d827-cb40-472a-ae56-dc40ef4b18bd' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>