# Pop and Country Songs Analysis:

### Background:

Music information retrieval. Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications.

There are several literatures about song genres and the different characteristics which define songs and genre.
For this project we will specifically look at the song features:  Arousal, Valence, and Depth from the write up by [McGill University](https://phys.org/news/2016-05-scientists-categorize-music.html), and [this](https://phys.org/news/2016-05-scientists-categorize-music.html). They found that characteristics from a signle genre could be grouped in these same three categories.

__The following song features or characteristics were defined as as:__<br>
<br>Arousal - describes intensity and energy in music
<br>Valence - describes the spectrum of emotions in music
<br>Depth - describes intellect and sophistication in music

<br>For this project, we will be extracting Spotify Features and Lyrics Text from Genius, the following data will be used:

- Energy
- Valence
- Flesch Kincaid Grade:  this will be computed from the song lyrics text.  We will use this grade as our 'Intellect' feature

For this project, we will be analyzing songs which have been tagged with the word 'pop' or 'country' from the Million Songs Database https://labrosa.ee.columbia.edu/millionsong/.

and in the end specifically explore if there are differences between Taylor Swift's pre-pop - country songs and her pop songs.


### Problem Definition:

- How does country and pop music differ?  
- Are there differences between the two music genres? 
- can we find some patterns that would differentiate these two music genres? (features, and themes - lyrics)







### Limitations
- No specific criteria was found for getting the Arousal and Intellect song characterstics, for this project we will be using Energy and Loudness? for getting the Arousal feature, and for Intellect, we will be using the Flesch Kincaid Reading Grade.

- since we are relying on the tags from the Million Songs Database to filter out pop and country songs, we can potentially have a 'noisy' data, since we could have rock songs, or songs with other genres which were added with incorrect tags.  But we should be able to see all these when we try to create clusters for our songs.  



 



### Acquiring Data Set:

Data sets were acquired from the different Sources:

(1) To get the list of Artists, Songs, and genre tags records were extracted from the Million Song Database

https://labrosa.ee.columbia.edu/millionsong/ 

(2) To get audio features of the songs, Spotify API was used, given
the artist name, and song title, the spotify Track ID can be retrieved, which can then be used to query the audio features for the song.
<br><br>Spotify audio features are:
- acousticness
- danceability
- duration_ms
- energy
- instrumentalness
- key
- liveness
- loudness
- mode
- speechiness
- tempo
- time signature
- valence

(3) To get the lyrics of the songs, the Genius API will be used, to search the song based on a given Artist Name, and song title.


the following notebooks will show the processing for each of these data acquisition steps:
getArtistSong.ipynb:  get Artist/Songs from Million Song Database
getFeatures.ipynb:    get Audio features via Spotify API
getLyrics.ipynb:      get Song Lyrics via Genius API

and were based on the following Open Source codes:


### Cleaning The Data

1) Combine the song data collected from Spotify with the Spotify Features, and the Lyrics collected from Genius.
   Only keep records which both Spotify Features, and Lyrics Text from Genius.
   
   
2) The Spotify Features extracted were in JSON format.  Reformatted records to create a column for every Spotify Feature attribute.

3) Added a column 'genre' which was derived based on the tags found from the Song record.  A song was classified as 'pop' if 'pop' occurred in the tag the most, 'country' if 'country' occurred in the tag the most, otherwise, it will be tagged as 'other'.  This column will only be used for verifying output, but cannot provide a good score of accuracy since songs can be mis-tagged.  i.e. Beyonce's song records are tagged with 'country' instead of 'pop'.

4) Lyrics Text were cleaned to remove tags, i.e. [Verses], [Chorus].





In [2]:
import pandas as pd
import operator

In [1]:
def getGenre(df):
    """ This function reads and counts through the tags of a song record, and sets the genre with the type - pop, country
        or other that has the most counts, and returns a dataframe with new features added:
        genre: country, pop or other
        country_count: number of tags with 'country'
        pop_count: number of tags with 'pop'
        other_count: number of tags other than country or pop """
    
    for i, row in enumerate(df.itertuples(), start=0):
     
        rlist = []
        rtagcount = {'country': 0, 'pop': 0, 'other':0}
        
        if len(row.tags) > 0:
            for tag in row.tags:
            
                str1 = tag[0].strip().lower()    
            
                #count the number of occurrences of the word 'country' or 'pop' in tags
                if (str1.find('pop')) != -1:  #string contains pop
                    rtagcount['pop'] += 1
                elif (str1.find('country')) != -1:  #string contains country
                    rtagcount['country'] += 1
                else:
                    rtagcount['other'] += 1
    
    
                rlist.append(str1)  #set tag list with cleaned-up tags

        #set new column features with 'genre', 'genre_list', 'genre_count'
    
        df.set_value(i,'genre', max(rtagcount, key=lambda key: rtagcount[key]))
        df.set_value(i,'country_count', rtagcount['country'])
        df.set_value(i,'pop_count', rtagcount['pop'])
        df.set_value(i,'other_count', rtagcount['other'])
    
    return df

#### Gather and combine song Features and Lyrics Files extracted.

In [16]:
#Read the file with Spotify Features
pop_Features_A = pd.read_pickle('../data/msongs/out/popFeatures_1106_0to20000.p')
pop_Features_B = pd.read_pickle('../data/msongs/out/popFeatures_1107_20to40000.p')
#Read the file with Genius Lyrics
pop_Lyrics_A = pd.read_pickle('../data/msongs/out/DFPopLyrics_110618_0to20000.p')
pop_Lyrics_B = pd.read_pickle('../data/msongs/out/DFPopLyrics_110718_20to40000.p')

In [17]:
popF_A = pop_Features_A[0:20000].copy()
popF_B = pop_Features_B[20000:40000].copy()

popL_A = pop_Lyrics_A[0:20000].copy()
popL_B = pop_Lyrics_B[20000:40000].copy()

In [22]:
# empty dataframe to contain the subset
popF_DF = pd.DataFrame(data=None, columns=popF_A.columns)
popF_DF = popF_DF.append(popF_A, ignore_index=True)
popF_DF = popF_DF.append(popF_B, ignore_index=True)

In [28]:
# empty dataframe to contain the subset
popL_DF = pd.DataFrame(data=None, columns=popL_A.columns)
popL_DF = popL_DF.append(popL_A, ignore_index=True)
popL_DF = popL_DF.append(popL_B, ignore_index=True)

In [32]:
null_Lyrics_idx = popL_DF.index[popL_DF['lyrics_text'].isnull()]
popL_DF.drop(null_Lyrics_idx, inplace=True)

null_Features_idx = popF_DF.index[popF_DF['songFeatures'].isnull()]
popF_DF.drop(null_Features_idx, inplace=True)

#### Combine records where song is in both the Lyrics File and the Features File

In [154]:
pop_DF = popL_DF.merge(popF_DF, on=['artist_id','track_id', 'song_id'], how='inner').reset_index()

In [155]:
pop_DF.columns

Index(['index', 'index_x', 'artist_id', 'tags_x', 'track_id', 'title_x',
       'song_id', 'release_x', 'artist_mbid_x', 'artist_name_x', 'duration_x',
       'artist_familiarity_x', 'artist_hotttnesss_x', 'year_x',
       'track_7digitalid_x', 'shs_perf_x', 'shs_work_x', 'lyrics_text',
       'index_y', 'tags_y', 'title_y', 'release_y', 'artist_mbid_y',
       'artist_name_y', 'duration_y', 'artist_familiarity_y',
       'artist_hotttnesss_y', 'year_y', 'track_7digitalid_y', 'shs_perf_y',
       'shs_work_y', 'spotifyURI', 'songFeatures'],
      dtype='object')

In [156]:
#### add features
#### clean column id names to remove _x

#### Format Spotify Features and add columns to the Pop DataFrame

In [157]:
for i, row in enumerate(pop_DF.itertuples(),start=0):
   # print(row.artist_name_x, row.title_y, row.songFeatures)
    features = row.songFeatures[0]
    
    #save features as individual column names
    try:
        for key, value in features.items():
            pop_DF.set_value(i, key, value)
    except:
        print('error processing features for row {}'.format(i))
        #print(row.artist_name_x, row.title_y)
        #print(key, values)

error processing features for row 10557


In [158]:
none_Features_idx = pop_DF.index[pop_DF['valence'].isnull()]
pop_DF.drop(none_Features_idx, inplace=True)

In [159]:
pop_DF = pop_DF[['index', 'artist_id', 'tags_x', 'track_id', 'title_x',
       'song_id', 'release_x', 'artist_mbid_x', 'artist_name_x', 'duration_x',
       'artist_familiarity_x', 'artist_hotttnesss_x', 'year_x',
       'track_7digitalid_x', 'shs_perf_x', 'shs_work_x', 'lyrics_text','spotifyURI', 'songFeatures', 'danceability', 'energy',
       'key', 'loudness', 'mode', 'speechiness', 'acousticness',
       'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri',
       'track_href', 'analysis_url', 'duration_ms', 'time_signature']].copy()

In [160]:
# Rename columns to remove _x
for i in range(0,len(pop_DF.columns.values)):
    
    if (pop_DF.columns.values[i][-2] == '_') & (pop_DF.columns.values[i][-1] == 'x'):
        collen = len(pop_DF.columns.values[i])
        colstr = pop_DF.columns.values[i][0:(collen-2)]
        pop_DF.columns.values[i] = colstr

In [161]:
pop_DF.columns

Index(['index', 'artist_id', 'tags', 'track_id', 'title', 'song_id', 'release',
       'artist_mbid', 'artist_name', 'duration', 'artist_familiarity',
       'artist_hotttnesss', 'year', 'track_7digitalid', 'shs_perf', 'shs_work',
       'lyrics_text', 'spotifyURI', 'songFeatures', 'danceability', 'energy',
       'key', 'loudness', 'mode', 'speechiness', 'acousticness',
       'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri',
       'track_href', 'analysis_url', 'duration_ms', 'time_signature'],
      dtype='object')

In [162]:
pop_DF = getGenre(pop_DF)

In [163]:
pop_DF.columns

Index(['index', 'artist_id', 'tags', 'track_id', 'title', 'song_id', 'release',
       'artist_mbid', 'artist_name', 'duration', 'artist_familiarity',
       'artist_hotttnesss', 'year', 'track_7digitalid', 'shs_perf', 'shs_work',
       'lyrics_text', 'spotifyURI', 'songFeatures', 'danceability', 'energy',
       'key', 'loudness', 'mode', 'speechiness', 'acousticness',
       'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri',
       'track_href', 'analysis_url', 'duration_ms', 'time_signature', 'genre',
       'country_count', 'pop_count', 'other_count'],
      dtype='object')

In [164]:
pop_DF.head()

Unnamed: 0,index,artist_id,tags,track_id,title,song_id,release,artist_mbid,artist_name,duration,...,id,uri,track_href,analysis_url,duration_ms,time_signature,genre,country_count,pop_count,other_count
0,0.0,AR009211187B989185,"[(pop rock,), (pop,), (synthpop,), (reggae pop,)]",TRDBNUI128F933DE6E,I'm So Sorry,SOZCBYK12AB0180B4D,The Best Of Original British Lovers Rock Volum...,9dfe78a6-6d91-454e-9b95-9d7722cbc476,Carroll Thompson,260.91057,...,19mWMfe0fX4lm3V4fXkyA6,spotify:track:19mWMfe0fX4lm3V4fXkyA6,https://api.spotify.com/v1/tracks/19mWMfe0fX4l...,https://api.spotify.com/v1/audio-analysis/19mW...,426520.0,4.0,pop,0.0,4.0,0.0
1,1.0,AR00FOZ1187FB5C9F3,"[(synthpop,), (pop,), (pukkelpop,)]",TRWTSUW12903CD2DEA,Stay The Same,SOBGFYV12AB018DEAD,Stay The Same,92337972-f0c5-4ebd-be8c-f6b23d596ae1,autoKratz,404.97587,...,1wZY4LPJsABndQeoqzsyOX,spotify:track:1wZY4LPJsABndQeoqzsyOX,https://api.spotify.com/v1/tracks/1wZY4LPJsABn...,https://api.spotify.com/v1/audio-analysis/1wZY...,283333.0,4.0,pop,0.0,3.0,0.0
2,2.0,AR00FOZ1187FB5C9F3,"[(synthpop,), (pop,), (pukkelpop,)]",TRNZCDV128F92F00A1,Stay The Same,SOYUZXU12A58A78201,Animal,92337972-f0c5-4ebd-be8c-f6b23d596ae1,autoKratz,311.06567,...,1gZ4TP1pQwRD5WhYKDG0jw,spotify:track:1gZ4TP1pQwRD5WhYKDG0jw,https://api.spotify.com/v1/tracks/1gZ4TP1pQwRD...,https://api.spotify.com/v1/audio-analysis/1gZ4...,311200.0,4.0,pop,0.0,3.0,0.0
3,3.0,AR00FOZ1187FB5C9F3,"[(synthpop,), (pop,), (pukkelpop,)]",TRPGUDC12903CD2DEC,Stay The Same,SOBZAGY12AB018E2A5,Stay The Same,92337972-f0c5-4ebd-be8c-f6b23d596ae1,autoKratz,277.73342,...,26wPSNT05P58sCin1svhtd,spotify:track:26wPSNT05P58sCin1svhtd,https://api.spotify.com/v1/tracks/26wPSNT05P58...,https://api.spotify.com/v1/audio-analysis/26wP...,291000.0,4.0,pop,0.0,3.0,0.0
4,4.0,AR00FOZ1187FB5C9F3,"[(synthpop,), (pop,), (pukkelpop,)]",TRTMPTG128F92F00A0,Always More,SOUQQXB12A8C140687,Animal,92337972-f0c5-4ebd-be8c-f6b23d596ae1,autoKratz,255.26812,...,1gZ4TP1pQwRD5WhYKDG0jw,spotify:track:1gZ4TP1pQwRD5WhYKDG0jw,https://api.spotify.com/v1/tracks/1gZ4TP1pQwRD...,https://api.spotify.com/v1/audio-analysis/1gZ4...,311200.0,4.0,pop,0.0,3.0,0.0


In [166]:
### Save pre-processed Pop File

pop_DF.to_pickle('../data/msongs/out/Pop_Preprocessed_1109')

#### Process other batches of files

In [1]:
### Process Country features and Lyrics Files

In [167]:
#Read the file with Spotify Features
country_Features = pd.read_pickle('countryFeatures.p')

In [168]:
#Read the file with Genius Lyrics
country_Lyrics = pd.read_pickle('countryLyrics_1105.p')

In [169]:
null_Lyrics_idx = country_Lyrics.index[country_Lyrics['lyrics_text'].isnull()]
country_Lyrics.drop(null_Lyrics_idx, inplace=True)

In [170]:
len(country_Lyrics)

27579

In [171]:
null_Features_idx = country_Features.index[country_Features['songFeatures'].isnull()]
country_Features.drop(null_Features_idx, inplace=True)

In [172]:
len(country_Features)

23312

In [173]:
country_DF = country_Lyrics.merge(country_Features, on=['artist_id','track_id', 'song_id'], how='inner')

In [174]:
len(country_DF)

13006

In [175]:
country_DF.columns

Index(['index_x', 'artist_id', 'tags_x', 'track_id', 'title_x', 'song_id',
       'release_x', 'artist_mbid_x', 'artist_name_x', 'duration_x',
       'artist_familiarity_x', 'artist_hotttnesss_x', 'year_x',
       'track_7digitalid_x', 'shs_perf_x', 'shs_work_x', 'lyrics_text',
       'index_y', 'tags_y', 'title_y', 'release_y', 'artist_mbid_y',
       'artist_name_y', 'duration_y', 'artist_familiarity_y',
       'artist_hotttnesss_y', 'year_y', 'track_7digitalid_y', 'shs_perf_y',
       'shs_work_y', 'spotifyURI', 'songFeatures'],
      dtype='object')

- format features to create new columns - done
- create new features, i.e. intelligence, emotions
- run clustering against features
- look for patterns
- run PCA
- set labels
- get NLP features for lyric processing
- clean text data, filter for english songs only

In [176]:
#Reset index of DataFrame prior to processing
country_DF.reset_index(inplace=True)

In [177]:
country_DF.columns

Index(['index', 'index_x', 'artist_id', 'tags_x', 'track_id', 'title_x',
       'song_id', 'release_x', 'artist_mbid_x', 'artist_name_x', 'duration_x',
       'artist_familiarity_x', 'artist_hotttnesss_x', 'year_x',
       'track_7digitalid_x', 'shs_perf_x', 'shs_work_x', 'lyrics_text',
       'index_y', 'tags_y', 'title_y', 'release_y', 'artist_mbid_y',
       'artist_name_y', 'duration_y', 'artist_familiarity_y',
       'artist_hotttnesss_y', 'year_y', 'track_7digitalid_y', 'shs_perf_y',
       'shs_work_y', 'spotifyURI', 'songFeatures'],
      dtype='object')

#### Format songFeatures retrieved from Spotify to create new feature columns

In [178]:
for i, row in enumerate(country_DF.itertuples(),start=0):
   # print(row.artist_name_x, row.title_y, row.songFeatures)
    features = row.songFeatures[0]
    
    #save features as individual column names
    try:
        for key, value in features.items():
            country_DF.set_value(i, key, value)
    except:
        print('error processing features for row {}'.format(i))
        #print(row.artist_name_x, row.title_y)
        #print(key, values)
    

error processing features for row 772
error processing features for row 773
error processing features for row 774
error processing features for row 775
error processing features for row 776
error processing features for row 777
error processing features for row 778
error processing features for row 779
error processing features for row 780
error processing features for row 781
error processing features for row 782
error processing features for row 783
error processing features for row 784
error processing features for row 785
error processing features for row 786
error processing features for row 787
error processing features for row 788
error processing features for row 789
error processing features for row 790
error processing features for row 791
error processing features for row 792
error processing features for row 794
error processing features for row 795
error processing features for row 796
error processing features for row 797
error processing features for row 798
error proces

In [179]:
none_Features_idx = country_DF.index[country_DF['valence'].isnull()]
country_DF.drop(none_Features_idx, inplace=True)

In [180]:
dfCountry = country_DF[['index', 'artist_id', 'tags_x', 'track_id', 'title_x',
       'song_id', 'release_x', 'artist_mbid_x', 'artist_name_x', 'duration_x',
       'artist_familiarity_x', 'artist_hotttnesss_x', 'year_x',
       'track_7digitalid_x', 'shs_perf_x', 'shs_work_x', 'lyrics_text','spotifyURI', 'songFeatures', 'danceability', 'energy',
       'key', 'loudness', 'mode', 'speechiness', 'acousticness',
       'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri',
       'track_href', 'analysis_url', 'duration_ms', 'time_signature']].copy()

In [181]:
# Rename columns to remove _x
for i in range(0,len(dfCountry.columns.values)):
    
    if (dfCountry.columns.values[i][-2] == '_') & (dfCountry.columns.values[i][-1] == 'x'):
        collen = len(dfCountry.columns.values[i])
        colstr = dfCountry.columns.values[i][0:(collen-2)]
        dfCountry.columns.values[i] = colstr

In [182]:
dfCountry.columns

Index(['index', 'artist_id', 'tags', 'track_id', 'title', 'song_id', 'release',
       'artist_mbid', 'artist_name', 'duration', 'artist_familiarity',
       'artist_hotttnesss', 'year', 'track_7digitalid', 'shs_perf', 'shs_work',
       'lyrics_text', 'spotifyURI', 'songFeatures', 'danceability', 'energy',
       'key', 'loudness', 'mode', 'speechiness', 'acousticness',
       'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri',
       'track_href', 'analysis_url', 'duration_ms', 'time_signature'],
      dtype='object')

In [183]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(dfCountry[dfCountry.year > 2005][['artist_id','artist_name']].drop_duplicates().sort_values(by='artist_name'))

                artist_id                          artist_name
124    AR08IOK1187FB5B78E                           A.A. Bondy
9125   ARCN61O1187B9ABBE8                        Aaron Neville
8619   ARBZHTB1187FB3A725                         Aaron Watson
70     AR02T3I1187FB4D0E5                            Aberfeldy
5357   AR7DO6G1187FB41056                           Adam Faith
3988   AR60LUY1187B98C8CD                           Afterhours
10032  ARDUFS01187FB4CF5E                          Against Me!
1467   AR2BS1V1187B9AA66D                         Akron/Family
5676   AR80BY41187B98DECF                          Albert King
3234   AR4TJDF1187FB59D3D                           Albert Pla
8419   ARBOBNN1187B98E706                           Aled Jones
12712  ARGMKGE1187FB44E41                         Alesha Dixon
550    AR0TZ1F1187FB4AD1E                      Alexandre Pires
1022   AR1PYXY1187FB42576                        Alexi Murdoch
10230  ARE2XN81187B98BB83                            Al

In [187]:
dfCountry = dfCountry[pd.notnull(dfCountry.tags)].reset_index(drop=True)


#### Populate Genre, Country_Count, Pop_Count, Other_Count columns

In [2]:
#call getGenre to populate features and counts

dfCountry = getGenre(dfCountry)

NameError: name 'dfCountry' is not defined

In [189]:
dfCountry.head()

Unnamed: 0,level_0,index,artist_id,tags,track_id,title,song_id,release,artist_mbid,artist_name,...,id,uri,track_href,analysis_url,duration_ms,time_signature,genre,country_count,pop_count,other_count
0,0,0,AR00B1I1187FB433EB,"[(country rock,)]",TRHYMAO128E0792260,Crashing Down,SOAHIFF12A6701E01B,Sub Rosa,4a5777b3-f55b-437c-8b23-d9ee7791c7fc,Eagle-Eye Cherry,...,30SxJBAOgFVIT7ZsWNhA0T,spotify:track:30SxJBAOgFVIT7ZsWNhA0T,https://api.spotify.com/v1/tracks/30SxJBAOgFVI...,https://api.spotify.com/v1/audio-analysis/30Sx...,322067.0,4.0,country,1.0,0.0,0.0
1,1,1,AR00B1I1187FB433EB,"[(country rock,)]",TRQCBTB128E079225F,Feels So Right,SONAEPC12A6701E01A,Sub Rosa,4a5777b3-f55b-437c-8b23-d9ee7791c7fc,Eagle-Eye Cherry,...,3rEXei8vuirQ6V3pzzDafE,spotify:track:3rEXei8vuirQ6V3pzzDafE,https://api.spotify.com/v1/tracks/3rEXei8vuirQ...,https://api.spotify.com/v1/audio-analysis/3rEX...,261040.0,4.0,country,1.0,0.0,0.0
2,2,2,AR00J9R1187B98D920,"[(country gospel,), (country,)]",TRBUNNH128F932E25F,Space Pt. 5,SOXMPDS12AB0181393,Time Is Running Out,41da462e-e441-4bc0-952b-5a66387df5be,Farm Fresh,...,2dw9TO8A7YSwfP8gDjeZAP,spotify:track:2dw9TO8A7YSwfP8gDjeZAP,https://api.spotify.com/v1/tracks/2dw9TO8A7YSw...,https://api.spotify.com/v1/audio-analysis/2dw9...,285067.0,4.0,country,2.0,0.0,0.0
3,3,3,AR00J9R1187B98D920,"[(country gospel,), (country,)]",TRBLRQM128F932E231,Time Is Running Out,SODLFKY12AB0187BD6,Time Is Running Out,41da462e-e441-4bc0-952b-5a66387df5be,Farm Fresh,...,4JAYSrK78qTC8u3pE0qlGC,spotify:track:4JAYSrK78qTC8u3pE0qlGC,https://api.spotify.com/v1/tracks/4JAYSrK78qTC...,https://api.spotify.com/v1/audio-analysis/4JAY...,222773.0,4.0,country,2.0,0.0,0.0
4,4,4,AR00J9R1187B98D920,"[(country gospel,), (country,)]",TRFAMCY128F932E22A,Fresh As That,SOFSYJL12AB0187BBC,Time Is Running Out,41da462e-e441-4bc0-952b-5a66387df5be,Farm Fresh,...,3CIrY2AXTQ7JLd6XMSG2v1,spotify:track:3CIrY2AXTQ7JLd6XMSG2v1,https://api.spotify.com/v1/tracks/3CIrY2AXTQ7J...,https://api.spotify.com/v1/audio-analysis/3CIr...,231067.0,4.0,country,2.0,0.0,0.0


In [34]:
dfCountry[dfCountry.artist_id == 'AR65K7A1187FB4DAA4']

Unnamed: 0,level_0,index,artist_id,tags,track_id,title,song_id,release,artist_mbid,artist_name,...,uri,track_href,analysis_url,duration_ms,time_signature,genre,genre_list,country_count,pop_count,other_count
4049,4083,4083,AR65K7A1187FB4DAA4,"[(country,)]",TRMHLAL128F92D5D3A,Halo,SONQBQA12A8C141D2C,Halo,183105b5-3e68-4748-9086-2c1c11bf7a3d,Beyoncé,...,spotify:track:4JehYebiI9JE8sR8MisGVb,https://api.spotify.com/v1/tracks/4JehYebiI9JE...,https://api.spotify.com/v1/audio-analysis/4Jeh...,261640.0,4.0,country,[country],1.0,0.0,0.0
4050,4084,4084,AR65K7A1187FB4DAA4,"[(country,)]",TRWRRFC128F931FD82,Green Light,SOSYXDE12A8AE45E45,B'Day Deluxe Edition,183105b5-3e68-4748-9086-2c1c11bf7a3d,Beyoncé,...,spotify:track:3JdDBUiZ1UJv7J7RQqxEFF,https://api.spotify.com/v1/tracks/3JdDBUiZ1UJv...,https://api.spotify.com/v1/audio-analysis/3JdD...,210240.0,4.0,country,[country],1.0,0.0,0.0
4051,4085,4085,AR65K7A1187FB4DAA4,"[(country,)]",TRWJCLJ128F42B84A1,Encore For The Fans,SOSEABB12AF72A0F70,B'Day,183105b5-3e68-4748-9086-2c1c11bf7a3d,Beyoncé,...,spotify:track:2M16xKUgNTgi8W3Cpv18M2,https://api.spotify.com/v1/tracks/2M16xKUgNTgi...,https://api.spotify.com/v1/audio-analysis/2M16...,39880.0,4.0,country,[country],1.0,0.0,0.0
4052,4086,4086,AR65K7A1187FB4DAA4,"[(country,)]",TRGQFUZ128F92ED646,That's Why You're Beautiful,SORBUNE12AB017CDD2,I AM...SASHA FIERCE,183105b5-3e68-4748-9086-2c1c11bf7a3d,Beyoncé,...,spotify:track:22yMWV37oR5GTuP6XMummO,https://api.spotify.com/v1/tracks/22yMWV37oR5G...,https://api.spotify.com/v1/audio-analysis/22yM...,221467.0,4.0,country,[country],1.0,0.0,0.0
4053,4087,4087,AR65K7A1187FB4DAA4,"[(country,)]",TRGNSTB128F92D5D39,Halo,SOGXLJB12A8C141D21,Halo,183105b5-3e68-4748-9086-2c1c11bf7a3d,Beyoncé,...,spotify:track:4JehYebiI9JE8sR8MisGVb,https://api.spotify.com/v1/tracks/4JehYebiI9JE...,https://api.spotify.com/v1/audio-analysis/4Jeh...,261640.0,4.0,country,[country],1.0,0.0,0.0
4054,4088,4088,AR65K7A1187FB4DAA4,"[(country,)]",TRGLRYX128F9340D39,Poison,SOFXOAX12AB0181A75,I AM...SASHA FIERCE THE BONUS TRACKS,183105b5-3e68-4748-9086-2c1c11bf7a3d,Beyoncé,...,spotify:track:0yGlYMCTNE1aKRd5YAXEdg,https://api.spotify.com/v1/tracks/0yGlYMCTNE1a...,https://api.spotify.com/v1/audio-analysis/0yGl...,244893.0,4.0,country,[country],1.0,0.0,0.0
4055,4089,4089,AR65K7A1187FB4DAA4,"[(country,)]",TRGELPA128F9327BCF,Naughty Girl,SOPXVXL12A58A7EDF3,Naughty Girl,183105b5-3e68-4748-9086-2c1c11bf7a3d,Beyoncé,...,spotify:track:0YGQ3hZcRLC5YX7o0hdmHg,https://api.spotify.com/v1/tracks/0YGQ3hZcRLC5...,https://api.spotify.com/v1/audio-analysis/0YGQ...,208573.0,4.0,country,[country],1.0,0.0,0.0
4056,4090,4090,AR65K7A1187FB4DAA4,"[(country,)]",TRGXSEO128F92E5880,Irreplaceable,SOZQKLX12A8C1436CF,Irreplaceable (remixes),183105b5-3e68-4748-9086-2c1c11bf7a3d,Beyoncé,...,spotify:track:6RX5iL93VZ5fKmyvNXvF1r,https://api.spotify.com/v1/tracks/6RX5iL93VZ5f...,https://api.spotify.com/v1/audio-analysis/6RX5...,227853.0,4.0,country,[country],1.0,0.0,0.0
4057,4091,4091,AR65K7A1187FB4DAA4,"[(country,)]",TRGKIAX128F92FFC89,Once In A Lifetime,SOHABYI12AF72A9C93,Music From The Motion Picture Cadillac Records,183105b5-3e68-4748-9086-2c1c11bf7a3d,Beyoncé,...,spotify:track:4HPfLaqHCxqMmwSg7VXiRt,https://api.spotify.com/v1/tracks/4HPfLaqHCxqM...,https://api.spotify.com/v1/audio-analysis/4HPf...,239533.0,3.0,country,[country],1.0,0.0,0.0
4058,4092,4092,AR65K7A1187FB4DAA4,"[(country,)]",TRHQYDD128F92E5882,Irreplaceable,SOIQRCE12A8C1436D6,Irreplaceable (remixes),183105b5-3e68-4748-9086-2c1c11bf7a3d,Beyoncé,...,spotify:track:6RX5iL93VZ5fKmyvNXvF1r,https://api.spotify.com/v1/tracks/6RX5iL93VZ5f...,https://api.spotify.com/v1/audio-analysis/6RX5...,227853.0,4.0,country,[country],1.0,0.0,0.0


In [35]:
### Save pre-processed Country File
dfCountry.to_pickle('../data/msongs/out/Country_Preprocessed_1109.p')
