# Lyrics Spotify Api

Using the Spotify Api and the color-lyrics Spotify Api we can retrieve lyrics from tracks based on the artist genre.
This can be particularly useful in the case of text classification or topic modeling.

The first part is done using the official Spotify Api, which will help us retrieve artists based on a certain genre and then retrieve their top 10 tracks.

The second part is done thanks to an "unofficial" Api and will allow us to retrieve for each track its lyrics.

First of all, you will have to obtain a *client-id* and a *client-secret* by creating an app at this link: https://developer.spotify.com/dashboard.

Then copy and assign the *client-id* to the *spotifyCid* variable and the *client-secret* to the *spotifySecret* variable.

Then you will have to go to the following link on your browser: https://open.spotify.com/get_access_token?reason=transport&productType=web_player.

And manually copy the *accessToken* and assign it to the *lyricsAccessToken* variable.

There is already a pre-made list of genres, but you can add or remove any to it so it matches your needs.

In [None]:
spotifyCid =''
spotifySecret =''
lyricsAccessToken = ''

# List of genres retrieved through the requests
genres = ['rap', 'death-metal', 'country', 'pop', 'rock', 'punk', 'r&b', 'blues', 'folk', 'jazz', 'reggae', 'indie', 'disco', 'edm']

The following function retrieves an access token from the Spotify API using the Client Credentials Flow.

In [None]:
import requests

def credentials_retriever() -> str :
  # The url of the Api request
  url = 'https://accounts.spotify.com/api/token'

  # A header to descripe the format of the data we are sending
  headers = {
      'Content-Type':'application/x-www-form-urlencoded'
  }

  # The data required to be sent to retrieve our credentials
  data = {
      'grant_type':'client_credentials',
      'client_id':spotifyCid,
      'client_secret':spotifySecret
  }

  # We here perform the post request and retrieve the response of the request
  response = requests.post(url, headers=headers, data=data)
  access_token = ''

  # We ensure here that the request was successful and store the acc_token
  if response.status_code == 200:
    response_data = response.json()
    access_token = response_data['access_token']

  return access_token

The next function retrieves the current top 10 tracks of an artist by questioning the Spotify Api. The tracks informations are then used to perform a request to the color-lyrics Spotify Api and get the lyrics of each song.

In [None]:
import pandas as pd

def get_artist_songs(id : str, access_token : str, genre : str) -> pd.DataFrame :
  # The url to retrieve the artist top tracks
  url = 'https://api.spotify.com/v1/artists/' + id + '/top-tracks'

  # Through the header we send our access_token to the Api
  headers = {
    'Authorization': 'Bearer ' + access_token
  }

  # We perform the get request to retrieve all the songs
  search = requests.get(url, headers=headers)

  # We use this header to make believe the color-lyrics api that the request
  # has been made from a browser and thus allowing us access to it
  lyricsHeaders = {
      'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
      'App-platform': 'WebPlayer',
      'Authorization': 'Bearer ' + lyricsAccessToken
  }

  tracks = []
  artists = []
  lyrics = []
  df = pd.DataFrame(search.json())

  # We loop through all the tracks to request their lyrics
  for x in df['tracks']:
    search = requests.get('https://spclient.wg.spotify.com/color-lyrics/v2/track/' + x['id'], headers=lyricsHeaders)
    if not search.text:
      continue
    tracks.append(x['name'])
    artists.append(x['artists'][0]['name'])
    lyrics.append(search.text)

  # Dataframe containing track data
  tracksDf = pd.DataFrame({
    'lyrics':lyrics,
    'track':tracks,
    'artist':artists,
    'genre':genres.index(genre)
  })

  return tracksDf

This short function allow us to perform a search and get a certain number of artists based on the genre of music they make.

In [None]:
def get_artists_from_genre(genre : str, access_token : str, limit : int=15,
                           offset : int=5, types : list[str] = ['artist'], market: str = 'GB') -> pd.DataFrame :
  # The url to perform a search on the Spotify Api
  url = 'https://api.spotify.com/v1/search'

  # Through the header we send our access_token to the Api
  headers = {
    'Authorization':'Bearer ' + access_token
  }

  # The search parameters for the Api request
  # A full description of each parameters can be found here : https://developer.spotify.com/documentation/web-api/reference/search
  params = {
    'q':'genre:' + genre,
    'type':types,
    'limit':limit,
    'offest':offset,
    'market':market,
  }

  # We perform the get request on the api and retrieve the response from the servers
  search = requests.get(url, headers=headers, params=params)

  ids = []
  df = pd.DataFrame(search.json())
  tracksDf = df['artists']
  # for each track we retrieve the track if that we have a later use
  for x in tracksDf['items']:
    ids.append(x['id'])

  finalTracksDf = pd.DataFrame()
  for id in ids:
    tracksDf = get_artist_songs(id, access_token, genre)
    finalTracksDf = pd.concat([finalTracksDf, tracksDf])

  return finalTracksDf

Now that we have defined all the functions, we can perform each step one after the other.

First we are going to retrieve the user access token, used for several of the Api calls. Once it is secured, for each genre, we will get a list of artists and obtain lyrics for the top 10 tracks of each artist.

We then zip and export the result for later use.

There is a lot of parameters you can play with, change the number of artists you want to look for or change the region in which the search is performed for example.

On top of that you can also change some of the Api calls to fit well into the kind of informations you are looking for, following the Spotify web-api documentation: https://developer.spotify.com/documentation/web-api.

In [None]:
# We retrieve the access_token
access_token = credentials_retriever()

# If no access_token just exit the program
if access_token is None:
  print('Issue with access token retrieval')
  exit()

finalLyricsDf = pd.DataFrame()
# We are going to loop through each genre and retrieve lyrics
for genre in genres:
  lyricsDf = get_artists_from_genre(
    genre,
    access_token=access_token,
    limit=50,
    offset=0
  )
  finalLyricsDf = pd.concat([finalLyricsDf, lyricsDf])

# We will create a csv file containing all the lyrics and then compress it to facilitate download and transfer
compression_opts = dict(method='zip', archive_name='Spotify_Lyrics.csv')
finalLyricsDf.to_csv('Spotify_Lyrics.zip', index=False, compression=compression_opts)