In [None]:
import pandas as pd

In [None]:
columns_to_read = ['track_name', 'artist(s)_name', 'streams', 'bpm', 'key', 'mode']
df = pd.read_csv('spotify-2023.csv', usecols= columns_to_read, encoding= 'latin-1')

# Convert 'streams' column to numeric, forcing errors to NaN
df['streams'] = pd.to_numeric(df['streams'], errors='coerce')

# Drop rows with NaN if necessary
df = df.dropna()

# Convert 'streams' column to integers
df['streams'] = df['streams'].astype(int)

# Sort the DataFrame by 'streams' column in descending order
df_desc = df.sort_values(by='streams', ascending=False)

print(df_desc)

**Cleaning 1: Columns- Converting and Sorting.**

1. The above code is cleaned by reading only the columns needed. The 'track_name' and 'artist(s)_name' columns identify the song and artist, the 'streams' column identifies the popularity of the song, and the 'bpm', 'key', and 'mode' columns identify the song's musical attributes.

2. Convert the 'streams' column to numeric using to_numeric() forcing errors to NaN 

3. Drop rows with NaN using .dropna().

4. Convert 'streams' column to integers using .astype(int).

5. Sort the DataFrame by 'streams' column in descending order using .sort() to have the most popular songs at the top of the data frame.

In [None]:
common_bpm = df['bpm'].mode()[0]
common_key = df['key'].mode()[0]
common_mode = df['mode'].mode()[0]

print(common_bpm)
print(common_key)
print(common_mode)

**Interpretation/Analysis 1: Most Common Attributes.**

Rather than tracking the songs based on popularity alone, we will also take a look at each song's musical attributes.

After having the initial dataset cleaned, we want to find the most common musical attributes for the entire data set. 

We use .mode() to find the most common bpm, key, and mode used for each song. They are bpm=120, key=C#, mode=Major.

In [None]:
count_bpm = df['bpm'].value_counts()[common_bpm]
total_bpm = df['bpm'].count()
percentage_bpm = (count_bpm / total_bpm) * 100
percentage_bpm = round(percentage_bpm, 2)

count_key = df['key'].value_counts()[common_key]
total_key = df['key'].count()
percentage_key = (count_key / total_key) * 100
percentage_key = round(percentage_key, 2)

count_mode = df['mode'].value_counts()[common_mode]
total_mode = df['mode'].count()
percentage_mode = (count_mode / total_mode) * 100
percentage_mode = round(percentage_mode, 2)


print(common_bpm, 'is the most common bpm of all the songs in the dataset. It occurs', count_bpm, 'times.', 'This is', percentage_bpm, 'percent of the entire dataset.')
print(common_key, 'is the most common key of all the songs in the dataset. It occurs', count_key, 'times.', 'This is', percentage_key, 'percent of the entire dataset.')
print(common_mode, 'is the most common mode of all the songs in the dataset. It occurs', count_mode, 'times.', 'This is', percentage_mode, 'percent of the entire dataset.')

**Interpretation/Analysis 2: Counts & Percentages of Attributes.**

Now, we will calculate how often the the most common attributes occur in the entire dataset. This will take 4 steps.

1. Count how many times an attribute occures in the column using .value_counts().

2. Count the total number of entries for the column using .count().

3. Divide the attribute count and total count, then multiply by 100 to get the percentage.

4. Round the percentage to the nearest 2 decimal points using the round() function.