#### danceability
Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. 

####  energy
Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. 	
#### tempo
The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. 	

#### valence
A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tswift import Song
import re
from langdetect import detect
import pycountry

In [3]:
db_tracks=pd.read_csv('data.csv')

In [4]:
db_tracks.head(10)

Unnamed: 0,acousticness,artists,danceability,duration_ms,energy,explicit,id,instrumentalness,key,liveness,loudness,mode,name,popularity,release_date,speechiness,tempo,valence,year
0,0.991,['Mamie Smith'],0.598,168333,0.224,0,0cS0A1fUEUd1EW3FcF8AEI,0.000522,5,0.379,-12.628,0,Keep A Song In Your Soul,12,1920,0.0936,149.976,0.634,1920
1,0.643,"[""Screamin' Jay Hawkins""]",0.852,150200,0.517,0,0hbkKFIJm7Z05H8Zl9w30f,0.0264,5,0.0809,-7.261,0,I Put A Spell On You,7,1920-01-05,0.0534,86.889,0.95,1920
2,0.993,['Mamie Smith'],0.647,163827,0.186,0,11m7laMUgmOKqI3oYzuhne,1.8e-05,0,0.519,-12.098,1,Golfing Papa,4,1920,0.174,97.6,0.689,1920
3,0.000173,['Oscar Velazquez'],0.73,422087,0.798,0,19Lc5SfJJ5O1oaxY0fpwfh,0.801,2,0.128,-7.311,1,True House Music - Xavier Santos & Carlos Gomi...,17,1920-01-01,0.0425,127.997,0.0422,1920
4,0.295,['Mixe'],0.704,165224,0.707,1,2hJjbsLCytGsnAHfdsLejp,0.000246,10,0.402,-6.036,0,Xuniverxe,2,1920-10-01,0.0768,122.076,0.299,1920
5,0.996,['Mamie Smith & Her Jazz Hounds'],0.424,198627,0.245,0,3HnrHGLE9u2MjHtdobfWl9,0.799,5,0.235,-11.47,1,Crazy Blues - 78rpm Version,9,1920,0.0397,103.87,0.477,1920
6,0.992,['Mamie Smith'],0.782,195200,0.0573,0,5DlCyqLyX2AOVDTjjkDZ8x,2e-06,5,0.176,-12.453,1,Don't You Advertise Your Man,5,1920,0.0592,85.652,0.487,1920
7,0.996,['Mamie Smith & Her Jazz Hounds'],0.474,186173,0.239,0,02FzJbHtqElixxCmrpSCUa,0.186,9,0.195,-9.712,1,Arkansas Blues,0,1920,0.0289,78.784,0.366,1920
8,0.996,['Francisco Canaro'],0.469,146840,0.238,0,02i59gYdjlhBmbbWhf8YuK,0.96,8,0.149,-18.717,1,La Chacarera - Remasterizado,0,1920-07-08,0.0741,130.06,0.621,1920
9,0.00682,['Meetya'],0.571,476304,0.753,0,06NUxS2XL3efRh0bloxkHm,0.873,8,0.092,-6.943,1,Broken Puppet - Original Mix,0,1920-01-01,0.0446,126.993,0.119,1920


# let's take what we need

In [5]:
db=pd.DataFrame(columns=['testo','artist'])
db[['name','artists','valence']]=db_tracks[['name','artists','valence']].copy()


for ii in range(db.shape[0]):
    artist=re.findall("\[(.+)\]",db['artists'][ii])
    db["artist"][ii]=artist[0]

db.drop(["artists"], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [6]:
db.head(10)

Unnamed: 0,testo,artist,name,valence
0,,'Mamie Smith',Keep A Song In Your Soul,0.634
1,,"""Screamin' Jay Hawkins""",I Put A Spell On You,0.95
2,,'Mamie Smith',Golfing Papa,0.689
3,,'Oscar Velazquez',True House Music - Xavier Santos & Carlos Gomi...,0.0422
4,,'Mixe',Xuniverxe,0.299
5,,'Mamie Smith & Her Jazz Hounds',Crazy Blues - 78rpm Version,0.477
6,,'Mamie Smith',Don't You Advertise Your Man,0.487
7,,'Mamie Smith & Her Jazz Hounds',Arkansas Blues,0.366
8,,'Francisco Canaro',La Chacarera - Remasterizado,0.621
9,,'Meetya',Broken Puppet - Original Mix,0.119


# Let's get the Lyrics

In [None]:
def getLyrics(track,artist):
    try:
        s = Song(track,artist)
        return s.lyrics
    except:
        return "0"

#invochiamo la funzione
for ii in range(db.shape[0]):
    db['testo'][ii]= getLyrics(db['name'][ii], db["artist"][ii])

#eliminiamo quelli non trovati
for i in range(db.shape[0]):
    if (db['testo'][i]=='0') :
        db.drop([i], axis=0, inplace=True)

#eliminiamo i duplicati
db=db.drop_duplicates(subset=["testo"], inplace=True)

db.reset_index(inplace=True)
db.drop(["index"], inplace=True, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.


# To delete Not English ones

In [None]:
j=0
for i in range(db.shape[0]):
    print(j)
    j+=1
    if (db['testo'][i] == '') | (db['testo'][i]=="<>"):
        db.drop([i], axis=0, inplace=True)
        print("*************")
    else:
        lang=detect(db["testo"][i])
        lang=pycountry.languages.get(alpha_2=lang).name.lower()
        print(lang)
        if (lang != "english"):
            db.drop([i], axis=0, inplace=True)
            
db.reset_index(inplace=True)
db.drop(["index"], inplace=True, axis=1)

In [None]:
db.to_csv("CleanEnglishLyrics.csv")