# Analysis of Taylor Swift songs!

Getting Spotify data:
 - https://developer.spotify.com/documentation/web-api
 - [Extracting Song Data From the Spotify API Using Python](https://towardsdatascience.com/extracting-song-data-from-the-spotify-api-using-python-b1e79388d50)

Package:
 - [Spotipy](https://spotipy.readthedocs.io/en/2.22.1/)

Getting lyric data:
 - [lyricsgenius 3.0.1](https://pypi.org/project/lyricsgenius/?source=post_page-----a5563ef7f7b1--------------------------------)
 - [How To Access and Use the Spotify and LyricGenius API](https://raizelb.medium.com/how-to-access-and-use-the-spotify-and-lyricgenius-api-a5563ef7f7b1)

In [1]:
from config import SPOTIPY_CLIENT_ID, SPOTIPY_CLIENT_SECRET, GENIUS_ACCESS_TOKEN

## 1. Get data for all albums from Spotify

In [2]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import os

In [3]:
auth_manager = SpotifyClientCredentials(client_id=SPOTIPY_CLIENT_ID, client_secret=SPOTIPY_CLIENT_SECRET)
sp = spotipy.Spotify(auth_manager=auth_manager)

In [4]:
artist_id = "06HL4z0CvFAxyc27GXpf02" #Taylor Swift
tay_all_albums = sp.artist_albums(artist_id, album_type='album',limit=50) 

In [5]:
for item in tay_all_albums['items']:
    print("{} : {} --- {}".format(item['id'], item['name'], item['release_date']))

1o59UpKw81iHR0HPiSkJR0 : 1989 (Taylor's Version) [Deluxe] --- 2023-10-27
64LU4c1nfjz1t4VnGhagcg : 1989 (Taylor's Version) --- 2023-10-26
5AEDGbliTTfjOB8TSm1sxt : Speak Now (Taylor's Version) --- 2023-07-07
1fnJ7k0bllNfL1kVdNVW1A : Midnights (The Til Dawn Edition) --- 2023-05-26
3lS1y25WAhcqJDATJK70Mq : Midnights (3am Edition) --- 2022-10-22
151w1FgRZfnKZA9FEcg9Z3 : Midnights --- 2022-10-21
6kZ42qRrzov54LcAk4onW9 : Red (Taylor's Version) --- 2021-11-12
4hDok0OAJd57SGIT8xuWJH : Fearless (Taylor's Version) --- 2021-04-09
6AORtDjduMM3bupSWzbTSG : evermore (deluxe version) --- 2021-01-07
2Xoteh7uEpea4TohMxjtaq : evermore --- 2020-12-11
0PZ7lAru5FDFHuirTkWe9Z : folklore: the long pond studio sessions (from the Disney+ special) [deluxe edition] --- 2020-11-25
1pzvBxYgT6OVwJLtHkrdQK : folklore (deluxe version) --- 2020-08-18
2fenSS68JI1h4Fo296JfGr : folklore --- 2020-07-24
1NAmidJlEaVgA3MpcPFYGq : Lover --- 2019-08-23
6DEjYFkNZh67HP7R9PSZvv : reputation --- 2017-11-10
1MPAXuTVL2Ej5x0JHiSPq8 : 

In [6]:
import pandas as pd

df = pd.DataFrame(tay_all_albums['items'], columns=['id', 'name', 'release_date', 'total_tracks'])

In [7]:
# check the data
df

Unnamed: 0,id,name,release_date,total_tracks
0,1o59UpKw81iHR0HPiSkJR0,1989 (Taylor's Version) [Deluxe],2023-10-27,22
1,64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),2023-10-26,21
2,5AEDGbliTTfjOB8TSm1sxt,Speak Now (Taylor's Version),2023-07-07,22
3,1fnJ7k0bllNfL1kVdNVW1A,Midnights (The Til Dawn Edition),2023-05-26,23
4,3lS1y25WAhcqJDATJK70Mq,Midnights (3am Edition),2022-10-22,20
5,151w1FgRZfnKZA9FEcg9Z3,Midnights,2022-10-21,13
6,6kZ42qRrzov54LcAk4onW9,Red (Taylor's Version),2021-11-12,30
7,4hDok0OAJd57SGIT8xuWJH,Fearless (Taylor's Version),2021-04-09,26
8,6AORtDjduMM3bupSWzbTSG,evermore (deluxe version),2021-01-07,17
9,2Xoteh7uEpea4TohMxjtaq,evermore,2020-12-11,15


In [8]:
# handpicked appropriate albums
eras = [1, 2, 4, 6, 7, 8, 11, 13, 14, 25]
eras_albums = df.loc[eras]

In [9]:
eras_albums

Unnamed: 0,id,name,release_date,total_tracks
1,64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),2023-10-26,21
2,5AEDGbliTTfjOB8TSm1sxt,Speak Now (Taylor's Version),2023-07-07,22
4,3lS1y25WAhcqJDATJK70Mq,Midnights (3am Edition),2022-10-22,20
6,6kZ42qRrzov54LcAk4onW9,Red (Taylor's Version),2021-11-12,30
7,4hDok0OAJd57SGIT8xuWJH,Fearless (Taylor's Version),2021-04-09,26
8,6AORtDjduMM3bupSWzbTSG,evermore (deluxe version),2021-01-07,17
11,1pzvBxYgT6OVwJLtHkrdQK,folklore (deluxe version),2020-08-18,17
13,1NAmidJlEaVgA3MpcPFYGq,Lover,2019-08-23,18
14,6DEjYFkNZh67HP7R9PSZvv,reputation,2017-11-10,15
25,5eyZZoQEFQWRHkV2xgAeBw,Taylor Swift,2006-10-24,15


### Save data into csv

In [10]:
df_all = pd.DataFrame(tay_all_albums['items'])
df_all.to_csv("taylor_all_albums.csv", index=False)

## Get data for all album tracks

In [11]:
all_album_tracks = []
for alb_id, alb_name in zip(list(eras_albums['id']), list(eras_albums['name'])):
    album_tracks = sp.album_tracks(alb_id)
    for track in album_tracks['items']:
        all_album_tracks.append([track['id'], track['name'], alb_id, alb_name, track['track_number']])

In [12]:
all_album_tracks_df = pd.DataFrame(all_album_tracks, columns=["track_id", "track_name", "album_id", "album_name", "track_number"])

In [13]:
all_album_tracks_df.head(10)

Unnamed: 0,track_id,track_name,album_id,album_name,track_number
0,1hR8BSuEqPCCZfv93zzzz9,Welcome To New York (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),1
1,45wMBGri1PORPjM9PwFfrS,Blank Space (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),2
2,1hjRhYpWyqDpPahmSlUTlc,Style (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),3
3,045ZeOHPIzhxxsm8bq5kyE,Out Of The Woods (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),4
4,6GXgd1BPD9bUpqw5AntGV5,All You Had To Do Was Stay (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),5
5,3pv7Q5v2dpdefwdWIvE7yH,Shake It Off (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),6
6,43y1WpBdnEy5TR9aZoSQL9,I Wish You Would (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),7
7,64FzgoLZ3oXu2SriZblHic,Bad Blood (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),8
8,1K39ty6o1sHwwlZwO6a7wK,Wildest Dreams (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),9
9,75W3SngKzTuoQ94uLf3y82,How You Get The Girl (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,1989 (Taylor's Version),10


### save to csv

In [14]:
all_album_tracks_df.to_csv("taylor_all_album_tracks.csv")

## Get lyrics for all the songs from Genius

In [15]:
import lyricsgenius

genius = lyricsgenius.Genius(GENIUS_ACCESS_TOKEN, timeout=20)
genius.remove_section_headers = True

In [16]:
artist_id = 1177 #Taylor Swift

# It can get up to 50 results. To get all albums exceeding 50, we need to repeat the call using paginated offset.
def download_album_info(page):
    total = 0
    arr = []
    g_albums = genius.artist_albums(artist_id, per_page=50, page=page)
    #print(page)
    for i, album_dict in enumerate(g_albums['albums']):
        print("{}, {}, {}".format(i, album_dict['name'], album_dict['id']))
        album_info = [album_dict['name'], album_dict['id']]
        total = i
        arr.append(album_info)
    
    return arr


page = 1
g_all_albums_info =[]
while True:
    albums_in_page = download_album_info(page)
    g_all_albums_info = g_all_albums_info + albums_in_page
    if len(albums_in_page) < 49:
        break
    page += 1

0, The Tortured Poets Department, 1140058
1, The Tortured Poets Department + Bonus Track “The Albatross” , 1148040
2, The Tortured Poets Department (Physical Version), 1140061
3, The Tortured Poets Department + Bonus Track “The Bolter”, 1144980
4, 1989 (Taylor’s Version) [Webstore Deluxe], 1105019
5, 1989 (Taylor’s Version) [Deluxe], 1099677
6, 1989 (Taylor’s Version), 754738
7, 1989 (Taylor’s Version) [Tangerine Edition], 1082316
8, The Cruelest Summer, 1096172
9, Speak Now (Taylor’s Version) [Digital Deluxe], 1058580
10, Speak Now (Taylor’s Version), 758025
11, Midnights (The Late Night Edition), 1040217
12, Midnights (The Til Dawn Edition), 1040211
13, folklore: the long pond studio sessions (Record Store Day Exclusive), 1027134
14, The More Fearless (Taylor’s Version) Chapter, 1013718
15, The More Lover Chapter, 1013715
16, The More Red (Taylor’s Version) Chapter, 1013719
17, Lavender Haze (Remixes), 1008313
18, Lover (Live From Paris) Heart Shaped Vinyl, 1032700
19, Anti-Hero (Rem

In [17]:
g_all_albums_df = pd.DataFrame(g_all_albums_info, columns=["Name", "id"])

In [18]:
g_all_albums_df.to_csv("all_albums_genius.csv")

**NOTE:** There are some variations for a certain album release (an *era*). I want to pick one album for each *era* like I didt for Spotify data. 
I tried to look them up by using `eras_albums` list, but it didn't find *Taylor's Version*.

In [19]:
g_all_albums_df['Name'].head(5)

0                        The Tortured Poets Department
1    The Tortured Poets Department + Bonus Track “T...
2     The Tortured Poets Department (Physical Version)
3    The Tortured Poets Department + Bonus Track “T...
4            1989 (Taylor’s Version) [Webstore Deluxe]
Name: Name, dtype: object

In [20]:
eras_albums['name'].head(5)

1         1989 (Taylor's Version)
2    Speak Now (Taylor's Version)
4         Midnights (3am Edition)
6          Red (Taylor's Version)
7     Fearless (Taylor's Version)
Name: name, dtype: object

In [21]:
# The album list from spotify and genius doesn't match because of the difference between "(Taylor's Version)" and "(Taylor’s Version).
# I need to correct it and turn it to a list
eras_albums_corrected = [n.replace("'", "’") for n in list(eras_albums['name'])]

In [22]:
g_eras_albums_df = g_all_albums_df[g_all_albums_df['Name'].isin(eras_albums_corrected)]

**NOTE: Pick only appropriate albums from the csv by hand. (There are some variations for a certain album release)**

In [23]:
# check the data
g_eras_albums_df

Unnamed: 0,Name,id
6,1989 (Taylor’s Version),754738
10,Speak Now (Taylor’s Version),758025
23,Midnights (3am Edition),962334
32,Red (Taylor’s Version),758022
38,Fearless (Taylor’s Version),734107
42,evermore (deluxe version),710147
54,folklore (deluxe version),659926
60,Lover,520929
65,reputation,350247
92,Taylor Swift,12682


## Get all lyrics from all albums

In [24]:
for g_album_id, g_album_title in zip(list(g_eras_albums_df['id']), list(g_eras_albums_df['Name'])):
    #print(g_album_id)
    g_album_noheader = genius.search_album(g_album_title, "Taylor Swift", album_id=g_album_id)
    lyrics_filename = "lyrics_" + g_album_title
    g_album_noheader.save_lyrics(lyrics_filename, overwrite=True, extension='json')    

Searching for "1989 (Taylor’s Version)" by Taylor Swift...
Wrote lyrics_1989 Taylors Version.json.
Searching for "Speak Now (Taylor’s Version)" by Taylor Swift...
Wrote lyrics_Speak Now Taylors Version.json.
Searching for "Midnights (3am Edition)" by Taylor Swift...
Wrote lyrics_Midnights 3am Edition.json.
Searching for "Red (Taylor’s Version)" by Taylor Swift...
Wrote lyrics_Red Taylors Version.json.
Searching for "Fearless (Taylor’s Version)" by Taylor Swift...
Wrote lyrics_Fearless Taylors Version.json.
Searching for "evermore (deluxe version)" by Taylor Swift...
Wrote lyrics_evermore deluxe version.json.
Searching for "folklore (deluxe version)" by Taylor Swift...
Wrote lyrics_folklore deluxe version.json.
Searching for "Lover" by Taylor Swift...
Wrote lyrics_Lover.json.
Searching for "reputation" by Taylor Swift...
Wrote lyrics_reputation.json.
Searching for "Taylor Swift" by Taylor Swift...
Wrote lyrics_Taylor Swift.json.
