Wrangling the Billboard Top 100 Consider the data in billboard.csv containing every song to appear on the weekly Billboard Top 100 chart since 1958, up through the middle of 2021. Each row of this data corresponds to a single song in a single week. For our purposes, the relevant columns here are:

performer: who performed the song song: the title of the song year: year (1958 to 2021) week: chart week of that year (1, 2, etc) week_position: what position that song occupied that week on the Billboard top 100 chart. Use your skills in data wrangling and plotting to answer the following three questions.

Part A: Make a table of the top 10 most popular songs since 1958, as measured by the total number of weeks that a song spent on the Billboard Top 100. Note that these data end in week 22 of 2021, so the most popular songs of 2021 will not have up-to-the-minute data; please send our apologies to The Weeknd.

Your table should have 10 rows and 3 columns: performer, song, and count, where count represents the number of weeks that song appeared in the Billboard Top 100. Make sure the entries are sorted in descending order of the count variable, so that the more popular songs appear at the top of the table. Give your table a short caption describing what is shown in the table.

(Note: you'll want to use both performer and song in any group_by operations, to account for the fact that multiple unique songs can share the same title.)

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from tabulate import tabulate

In [2]:
billboard = pd.read_csv("billboard.csv")

#creates a dataframe grouped by performers and song combinations, and provides a count of the weeks it was on the 
top_songs = billboard.groupby(['performer',"song"])["week"].count().reset_index()
#sorts from most weeks to least and pulls the Top 10
top_songs = top_songs.sort_values(by = "week", ascending = False).head(10)
top_songs = top_songs.rename(columns = {"week": "count"})
print(top_songs.to_string(index=False))

FileNotFoundError: [Errno 2] No such file or directory: 'billboard.csv'

Part B: Is the "musical diversity" of the Billboard Top 100 changing over time? Let's find out. We'll measure the musical diversity of given year as the number of unique songs that appeared in the Billboard Top 100 that year. Make a line graph that plots this measure of musical diversity over the years. The x axis should show the year, while the y axis should show the number of unique songs appearing at any position on the Billboard Top 100 chart in any week that year. For this part, please filter the data set so that it excludes the years 1958 and 2021, since we do not have complete data on either of those years. Give the figure an informative caption in which you explain what is shown in the figure and comment on any interesting trends you see.

There are number of ways to accomplish the data wrangling here. For example, you could use two distinct sets of data-wrangling steps. The first set of steps would get you a table that counts the number of times that a given song appears on the Top 100 in a given year. The second set of steps operate on the result of the first set of steps; it would count the number of unique songs that appeared on the Top 100 in each year, irrespective of how many times it had appeared.

In [None]:
#Filtering out years 1958 and 2021
filtered_year_data = billboard[(billboard["year"] != 1958) & (billboard["year"] != 2021)]
#Groups the data by year and gives count of number of unique songs
musical_diversity = filtered_year_data.groupby("year")["song"].nunique().reset_index()

#Creates Line graph
plt.figure(figsize=(10,6))
plt.plot(musical_diversity["year"], musical_diversity["song"], marker = "o")
plt.xlabel("Year")
plt.ylabel("Number of Unique Songs")
plt.title("Musical Diversity on Billboard Top 100 from 1959 to 2020")
plt.grid(True)
plt.show()

Part C: Let's define a "ten-week hit" as a single song that appeared on the Billboard Top 100 for at least ten weeks. There are 19 artists in U.S. musical history since 1958 who have had at least 30 songs that were "ten-week hits." Make a bar plot for these 19 artists, showing how many ten-week hits each one had in their musical career. Give the plot an informative caption in which you explain what is shown.

Notes:

You might find this easier to accomplish in two distinct sets of data wrangling steps. Make sure that the individuals names of the artists are readable in your plot, and that they're not all jumbled together. If you find that your plot isn't readable with vertical bars, you can add a coord_flip() layer to your plot to make the bars (and labels) run horizontally instead. #By default a bar plot will order the artists in alphabetical order. This is acceptable to turn in. But if you'd like to order them according to some other variable, you can use the fct_reorder function, described in this blog post. This is optional.

In [None]:
#Counts the number of weeks a song appears on the Billboard for each artist
artist_weeks_on_chart = billboard.groupby(["performer", "song"])["week"].nunique()
#Counts number of songs that were ten week hits
artist10weeks = artist_weeks_on_chart[artist_weeks_on_chart >= 10]
#Count the number of ten week hits for each artists
artist10weeks = artist10weeks.groupby("performer").count()
#Filters to artists with at least 30 ten week hits
artist30hits = artist10weeks[artist10weeks >= 30]
#artist30hits.count()

#Creating Bar Plot
plt.figure(figsize = (12, 8))
#Making the bar plot horizontal to ensure that the arists names are readable
artist30hits.sort_values().plot(kind = "barh")
plt.xlabel("Number of Ten-Week Hits")
plt.ylabel("Artist")
plt.title("Number of Ten-Week Hits for Top Artists")
plt.tight_layout()
plt.show()