In [None]:
import pandas as pd

In [48]:
columns_to_read = ['track_name', 'artist(s)_name', 'streams', 'bpm', 'key', 'mode']
df = pd.read_csv('spotify-2023.csv', usecols= columns_to_read, encoding= 'latin-1')
df_desc = df.sort_values(by='streams', ascending= False)
print(df_desc)

                                  track_name      artist(s)_name  \
574      Love Grows (Where My Rosemary Goes)   Edison Lighthouse   
33                                 Anti-Hero        Taylor Swift   
625                                   Arcade     Duncan Laurence   
253                            Glimpse of Us                Joji   
455                           Seek & Destroy                 SZA   
..                                       ...                 ...   
366                                  Revenge        XXXTENTACION   
744                                 Right On            Lil Baby   
515                             Best Friends          The Weeknd   
500                               ýýýabcdefu               Gayle   
301  Arcï¿½ï¿½ngel: Bzrp Music Sessions, Vol  Arcangel, Bizarrap   

                                               streams  bpm key   mode  
574  BPM110KeyAModeMajorDanceability53Valence75Ener...  110   A  Major  
33                                   

Cleaning 1: Columns and Sorting. 

The above code is cleaned by reading only the columns needed. The 'track_name' and 'artist(s)_name' columns identify the song and artist, the 'streams' column identifies the popularity of the song, and the 'bpm', 'key', and 'mode' columns identify the song's musical attributes.

Then, we use .sort_values to put the the "streams" column in descending so we can see the most popular songs at the top of the dataframe.

In [None]:
common_bpm = df['bpm'].mode()[0]
common_key = df['key'].mode()[0]
common_mode = df['mode'].mode()[0]

print(common_bpm)
print(common_key)
print(common_mode)

Interpretation/Analysis 1: Most Common Attributes.

Rather than tracking the songs based on popularity alone, we will also take a look at each song's musical attributes.
After having the initial dataset cleaned, we want to find the most common musical attributes for the entire data set. 
We use .mode() to find the most common bpm, key, and mode used for each song. They are bpm=120, key=C#, mode=Major.

In [49]:
count_bpm = df['bpm'].value_counts()[common_bpm]
total_bpm = df['bpm'].count()
percentage_bpm = (count_bpm / total_bpm) * 100
percentage_bpm = round(percentage_bpm, 2)

count_key = df['key'].value_counts()[common_key]
total_key = df['key'].count()
percentage_key = (count_key / total_key) * 100
percentage_key = round(percentage_key, 2)

count_mode = df['mode'].value_counts()[common_mode]
total_mode = df['mode'].count()
percentage_mode = (count_mode / total_mode) * 100
percentage_mode = round(percentage_mode, 2)


print(common_bpm, 'is the most common bpm of all the songs in the dataset. It occurs', count_bpm, 'times.', 'This is', percentage_bpm, 'percent of the entire dataset.')
print(common_key, 'is the most common key of all the songs in the dataset. It occurs', count_key, 'times.', 'This is', percentage_key, 'percent of the entire dataset.')
print(common_mode, 'is the most common mode of all the songs in the dataset. It occurs', count_mode, 'times.', 'This is', percentage_mode, 'percent of the entire dataset.')

120 is the most common bpm of all the songs in the dataset. It occurs 39 times. This is 4.09 percent of the entire dataset.
C# is the most common key of all the songs in the dataset. It occurs 120 times. This is 13.99 percent of the entire dataset.
Major is the most common mode of all the songs in the dataset. It occurs 550 times. This is 57.71 percent of the entire dataset.


Interpretation/Analysis 2: Percentage of Attributes.

Now, we will calculate how often the the most common attributes occur in the entire dataset. This will take four lines of code:

1: .value_counts() will count how many times an attribute occures in the column.

2: .count() will count the total number of entries for the column.

3: We divide the attribute count and total count, then multiply by 100 to get the percentage.

4: the round() function will round the percentage to the nearest 2 decimal points.