# Analyzing Lana Del Rey's discography

### Being a huge fan of Grammy-nominated alternative singer Lana Del Rey since middle school, I wanted to analyze her chart statistics and how they align with her album reviews, looking for any positive relationships between reviews and chart standings. For the chart data, I used her Billboard Hot 100 chart standings (https://www.billboard.com/artist/lana-del-rey/chart-history/hsi/), and for her album reviews, I used a Kaggle dataset named "Contemporary album ratings and reviews" (https://www.kaggle.com/datasets/kauvinlucas/30000-albums-aggregated-review-ratings). I scraped the billboard website and merged the information with the existing dataset of Lana Del Rey's album reviews.

In [1]:
import pandas as pd
from altair import Chart, X, Y, Color, Scale
import altair as alt
import requests

In [2]:
df = pd.read_csv('/Users/abraryaser/Downloads/archive/album_ratings.csv')
df.head()

Unnamed: 0,Artist,Title,Release Month,Release Day,Release Year,Format,Label,Genre,Metacritic Critic Score,Metacritic Reviews,Metacritic User Score,Metacritic User Reviews,AOTY Critic Score,AOTY Critic Reviews,AOTY User Score,AOTY User Reviews
0,Neko Case,Middle Cyclone,March,3,2009,LP,ANTI-,Alt-Country,79.0,31.0,8.7,31.0,79,25,78,55
1,Jason Isbell & The 400 Unit,Jason Isbell & The 400 Unit,February,17,2009,LP,Thirty Tigers,Country Rock,70.0,14.0,8.4,7.0,73,11,73,8
2,Animal Collective,Merriweather Post Pavilion,January,20,2009,LP,Domino,Psychedelic Pop,89.0,36.0,8.5,619.0,92,30,87,1335
3,Bruce Springsteen,Working on a Dream,January,27,2009,LP,Columbia Records,Rock,72.0,29.0,7.9,101.0,70,23,66,38
4,Andrew Bird,Noble Beast,January,20,2009,LP,Fat Possum,Singer-Songwriter,79.0,29.0,8.7,47.0,74,24,78,44


In [3]:
# producing a dataset of all lana albums and their critiques using boolean masking

df = df[df['Artist'] == 'Lana Del Rey']
df

Unnamed: 0,Artist,Title,Release Month,Release Day,Release Year,Format,Label,Genre,Metacritic Critic Score,Metacritic Reviews,Metacritic User Score,Metacritic User Reviews,AOTY Critic Score,AOTY Critic Reviews,AOTY User Score,AOTY User Reviews
2256,Lana Del Rey,Born To Die,January,31,2012,LP,Interscope,Chamber Pop,62.0,37.0,8.4,1802.0,62,36,74,1337
3357,Lana Del Rey,Paradise,November,13,2012,EP,Interscope,Chamber Pop,64.0,9.0,8.4,1059.0,60,7,74,516
8655,Lana Del Rey,Ultraviolence,June,17,2014,LP,Polydor / Interscope,Art Pop,74.0,35.0,8.5,1742.0,72,33,77,1271
14490,Lana Del Rey,Honeymoon,September,18,2015,LP,"Interscope, Polydor",Art Pop,78.0,31.0,8.5,1882.0,75,33,74,1003
19931,Lana Del Rey,Lust for Life,July,21,2017,LP,Interscope,Dream Pop,77.0,26.0,8.0,3314.0,75,36,72,1209
20884,Lana Del Rey,Lana Del Rey EP,January,10,2012,EP,"Stranger, Interscope",Art Pop,,,,,90,1,78,33
24031,Lana Del Rey,Norman Fucking Rockwell!,August,30,2019,LP,"Polydor, Interscope",Art Pop,87.0,28.0,9.2,8492.0,86,33,82,2470


### Now, I am going to scrape the billboard article using requests and BeautifulSoup. The objective of this is to find the most popular songs out of her albums, the debut date of the song, the peak date, where in the chart the song peaked, and for how long the song peaked for.

In [4]:
# scraping billboard to get crucial information about lana's chart data

# getting webpage to be scraped

res = requests.get('https://www.billboard.com/artist/lana-del-rey/chart-history/hsi/')
res.status_code

200

In [5]:
chart_data = res.text

In [6]:
from bs4 import BeautifulSoup

In [7]:
chart_soup = BeautifulSoup(chart_data)

## upon exploration, I realized most of my desired information was under the span tag wth class name c-label and h3 tags.

In [8]:
# getting the debut date and peak date of lana songs that have appeared on the billboard hot 100

debut_list = []

for i in range(0,28,2):
    debut_list.append(chart_soup.select('span.c-label a.c-label__link')[i].get_text().split('\t')[1])

#chart_soup.select('span.c-label a.c-label__link')[3].get_text().split('\t')[1]

In [9]:
peak_list = []

for i in range(1,28,2):
    peak_list.append(chart_soup.select('span.c-label a.c-label__link')[i].get_text().split('\t')[1])

In [10]:
debut_list

['07.27.13',
 '05.11.13',
 '11.05.22',
 '09.28.19',
 '06.03.23',
 '08.29.15',
 '09.19.15',
 '03.11.17',
 '05.03.14',
 '09.14.19',
 '01.28.12',
 '06.14.14',
 '06.21.14',
 '05.13.17']

In [11]:
peak_list

['09.21.13',
 '06.01.13',
 '11.05.22',
 '09.28.19',
 '06.03.23',
 '08.29.15',
 '09.19.15',
 '03.11.17',
 '05.03.14',
 '09.14.19',
 '01.28.12',
 '06.14.14',
 '06.21.14',
 '05.13.17']

In [12]:
# getting names of the songs appearing on hot 100

song_list = []

for i in range(3,17):
    song_list.append(chart_soup.select('h3')[i].get_text().split('\t')[9])

In [13]:
song_list

['Summertime Sadness',
 'Young And Beautiful',
 'Snow On The Beach',
 "Don't Call Me Angel (Charlie's Angels)",
 'Say Yes To Heaven',
 'High By The Beach',
 'Prisoner',
 'Love',
 'West Coast',
 "Doin' Time",
 'Video Games',
 'Shades Of Cool',
 'Ultraviolence',
 'Lust For Life']

In [14]:
peak_pos_list = []

for i in range(14):
    peak_pos_list.append(chart_soup.select('span.c-label.artist-chart-row-peak-pos')[i].get_text().split('\n')[2].split('\t')[1])

In [15]:
peak_pos_list

['6',
 '22',
 '4',
 '13',
 '54',
 '51',
 '47',
 '44',
 '17',
 '59',
 '91',
 '79',
 '70',
 '64']

In [16]:
weeks_list = []

for i in range(14):
    weeks_list.append(chart_soup.select('span.c-label.artist-chart-row-week-on-chart')[i].get_text().split('\n')[2].split('\t')[1])

In [17]:
weeks_list

['23', '21', '8', '7', '3', '3', '3', '2', '2', '2', '1', '1', '1', '1']

### I made a new dataset using the scraped data to merge with my original dataset.

In [18]:
d = {'Popular Songs': pd.Series(song_list), 'SONG DEBUT DATE': pd.Series(debut_list), 'SONG PEAK POS.': pd.Series(peak_pos_list), 'SONG PEAK DATE': pd.Series(peak_list), 'SONG WKS ON CHART': pd.Series(weeks_list)}
chart_df = pd.DataFrame(d)
chart_df

Unnamed: 0,Popular Songs,SONG DEBUT DATE,SONG PEAK POS.,SONG PEAK DATE,SONG WKS ON CHART
0,Summertime Sadness,07.27.13,6,09.21.13,23
1,Young And Beautiful,05.11.13,22,06.01.13,21
2,Snow On The Beach,11.05.22,4,11.05.22,8
3,Don't Call Me Angel (Charlie's Angels),09.28.19,13,09.28.19,7
4,Say Yes To Heaven,06.03.23,54,06.03.23,3
5,High By The Beach,08.29.15,51,08.29.15,3
6,Prisoner,09.19.15,47,09.19.15,3
7,Love,03.11.17,44,03.11.17,2
8,West Coast,05.03.14,17,05.03.14,2
9,Doin' Time,09.14.19,59,09.14.19,2


In [19]:
# dropping collaborations
chart_df = chart_df.drop([1,2,3,5,13])
chart_df

Unnamed: 0,Popular Songs,SONG DEBUT DATE,SONG PEAK POS.,SONG PEAK DATE,SONG WKS ON CHART
0,Summertime Sadness,07.27.13,6,09.21.13,23
4,Say Yes To Heaven,06.03.23,54,06.03.23,3
6,Prisoner,09.19.15,47,09.19.15,3
7,Love,03.11.17,44,03.11.17,2
8,West Coast,05.03.14,17,05.03.14,2
9,Doin' Time,09.14.19,59,09.14.19,2
10,Video Games,01.28.12,91,01.28.12,1
11,Shades Of Cool,06.14.14,79,06.14.14,1
12,Ultraviolence,06.21.14,70,06.21.14,1


## I filtered out some songs to make sure only the most popular song is mapped onto the album. I also filtered out albums that did not manage to chart.

In [20]:
# adding albums to merge
album_list = ['Born To Die', 'Honeymoon', 'Lust for Life', 'Ultraviolence', 'Norman Fucking Rockwell!']
# dropping duplicates
chart_df = chart_df.drop([9, 10, 11, 12])
chart_df.set_axis(range(len(chart_df)), inplace=True)
chart_df['Album'] = album_list

In [21]:
chart_df

Unnamed: 0,Popular Songs,SONG DEBUT DATE,SONG PEAK POS.,SONG PEAK DATE,SONG WKS ON CHART,Album
0,Summertime Sadness,07.27.13,6,09.21.13,23,Born To Die
1,Say Yes To Heaven,06.03.23,54,06.03.23,3,Honeymoon
2,Prisoner,09.19.15,47,09.19.15,3,Lust for Life
3,Love,03.11.17,44,03.11.17,2,Ultraviolence
4,West Coast,05.03.14,17,05.03.14,2,Norman Fucking Rockwell!


In [22]:
# dropping repackaged albums
df.drop([20884, 3357], inplace = True)

### Merging the datasets

In [23]:
lana_df = df.merge(chart_df, how = 'outer', left_on='Title', right_on='Album')
lana_df.drop(columns = ['Album'], inplace = True)

In [24]:
# dropping columns not needed for graphical analysis
lana_df.drop(columns = ['Format', 'Label', 'Metacritic Reviews', 'Metacritic User Reviews', 'Metacritic User Reviews', 'AOTY Critic Reviews', 'AOTY User Reviews'], inplace = True)

In [25]:
lana_df

Unnamed: 0,Artist,Title,Release Month,Release Day,Release Year,Genre,Metacritic Critic Score,Metacritic User Score,AOTY Critic Score,AOTY User Score,Popular Songs,SONG DEBUT DATE,SONG PEAK POS.,SONG PEAK DATE,SONG WKS ON CHART
0,Lana Del Rey,Born To Die,January,31,2012,Chamber Pop,62.0,8.4,62,74,Summertime Sadness,07.27.13,6,09.21.13,23
1,Lana Del Rey,Ultraviolence,June,17,2014,Art Pop,74.0,8.5,72,77,Love,03.11.17,44,03.11.17,2
2,Lana Del Rey,Honeymoon,September,18,2015,Art Pop,78.0,8.5,75,74,Say Yes To Heaven,06.03.23,54,06.03.23,3
3,Lana Del Rey,Lust for Life,July,21,2017,Dream Pop,77.0,8.0,75,72,Prisoner,09.19.15,47,09.19.15,3
4,Lana Del Rey,Norman Fucking Rockwell!,August,30,2019,Art Pop,87.0,9.2,86,82,West Coast,05.03.14,17,05.03.14,2


### I coded a new helper function to figure out if the respective albums are more favored by the public or the critics, which I am going to further explore using Altair.

In [26]:
critic_score = lana_df['AOTY Critic Score']

def calculate_favorite(critic_score, user_score):
    if critic_score > user_score:
        return "Critic Favorite"
    else:
        return "User Favorite"
    

In [27]:
lana_df["Favorite"] = lana_df.apply(lambda x : calculate_favorite(x["AOTY Critic Score"], x["AOTY User Score"]), axis=1)

### Using aggregate statistics

In [28]:
lana_df.groupby('Favorite').mean().drop(columns= 'Release Year')

Unnamed: 0_level_0,Metacritic Critic Score,Metacritic User Score,AOTY Critic Score,AOTY User Score
Favorite,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Critic Favorite,80.666667,8.566667,78.666667,76.0
User Favorite,68.0,8.45,67.0,75.5


In [29]:
# for altair bug purposes since the integers weren't being compared properly
weeks_list = [23, 2, 3, 2, 2]
lana_df['SONG WKS ON CHART'] = weeks_list
lana_df

Unnamed: 0,Artist,Title,Release Month,Release Day,Release Year,Genre,Metacritic Critic Score,Metacritic User Score,AOTY Critic Score,AOTY User Score,Popular Songs,SONG DEBUT DATE,SONG PEAK POS.,SONG PEAK DATE,SONG WKS ON CHART,Favorite
0,Lana Del Rey,Born To Die,January,31,2012,Chamber Pop,62.0,8.4,62,74,Summertime Sadness,07.27.13,6,09.21.13,23,User Favorite
1,Lana Del Rey,Ultraviolence,June,17,2014,Art Pop,74.0,8.5,72,77,Love,03.11.17,44,03.11.17,2,User Favorite
2,Lana Del Rey,Honeymoon,September,18,2015,Art Pop,78.0,8.5,75,74,Say Yes To Heaven,06.03.23,54,06.03.23,3,Critic Favorite
3,Lana Del Rey,Lust for Life,July,21,2017,Dream Pop,77.0,8.0,75,72,Prisoner,09.19.15,47,09.19.15,2,Critic Favorite
4,Lana Del Rey,Norman Fucking Rockwell!,August,30,2019,Art Pop,87.0,9.2,86,82,West Coast,05.03.14,17,05.03.14,2,Critic Favorite


## This chart shows which album & genre is most favored by the critics. One limitation is the fact that there are only 3 genres to choose from, and currently we only have data on 5 of her albums as this dataset is relatively older. 

In [30]:
alt.Chart(df).mark_bar().encode(
    x = "Title:O",
    y = "Metacritic Critic Score:Q",
    color= "Genre:O"
).properties(
    width=300,
    height=200
)

## This chart attempts to show the correlation between user reviews and the duration the most popular song on the album charted for. Since Summertime Sadness was a viral hit, it is an outlier. Otherwise, there does not seem to be a strong relationship between the variables.

In [31]:
alt.Chart(lana_df).mark_point(filled=True).encode(
    alt.X('Metacritic User Score'),
    alt.Y('SONG WKS ON CHART'),
    alt.Size('Metacritic Critic Score'),
    alt.Color('Genre'),
    alt.OpacityValue(0.7),
    tooltip = [alt.Tooltip('Title'),
               alt.Tooltip('Popular Songs'),
               alt.Tooltip('SONG DEBUT DATE'),
               alt.Tooltip('SONG PEAK DATE')
              ]
)

## Lastly, this graph shows which album is the most critically acclaimed, both by critics and users. It is interesting to notice that Born to Die is her least critically acclaimed album despite being the most commercially successful as seen above.

In [32]:
Chart(lana_df).mark_point().encode(x='AOTY User Score', y='AOTY Critic Score', tooltip='Title').interactive()

## Lana has proceeded to release 3 more albums since the release of this dataset, so I am very interested to see how the reviews and charts of these will fit into my analysis. One cruical thing I want to also explore is the timeline of these releases and whether the arrival of TikTok has exarcerberated any of her ratings or chart standings. Thank you so much!