# Song Characteristics and Predicting Popularity
>Our project is to determine the  Song Characteristics and Predicting Popularity. We used our data set from Kaggle (https://www.kaggle.com/datasets/yasserh/song-popularity-dataset/data).

>**How do various song characteristics influence the popularity of a song?**
>-----------------------------------------------------------------------------

# Import necessary libraries
To begin our analysis, we need to import the necessary libraries, load our dataset, and format the data appropriately.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

*Load the dataset*

In [None]:
songs_df = pd.read_csv("song_data.csv")

*Display the first few rows of the dataframe*

In [None]:
songs_df.head()

Summary: After loading the dataset, we discovered that the only columns with null values are release_date, homepage, and tagline. We'll filter out the null release dates, but the other two columns shouldn't be relevant to our project.

# Clean Data
Drop unnecessary columns

In [None]:
df_drop = df.drop(['song_duration_ms', 'audio_mode', 'time_signature', 'audio_valence'], axis='columns')
df_drop

Remove song name duplicates

In [None]:
df_clean = df_drop.drop_duplicates(subset='song_name')
df_clean

# Data Exploration
**Top 10 Most Popular Songs**

In [None]:
# Display the top 10 most popular songs
df_sort = df_clean.sort_values('song_popularity', ascending=False)
df_sort

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500


# Danceability Analysis
*Display the top 10 most popular songs and check for similarities between them*

In [None]:
songs_10 = df_sort.sort_values('song_popularity', ascending=False)[:10]
songs_10

![1.png](attachment:1.png)

*Sort top 10 most popular songs by danceability*

In [None]:
dance_10 = songs_10[['song_name', 'song_popularity', 'danceability']].sort_values('danceability', ascending=False)[:10]
dance_10

![4.png](attachment:4.png)

*Conclusion: Most songs out of the top 10 most popular songs have high danceability scores, meaning danceability has an effect on how popular a song is.*

# Graph Danceability
Sort top 10 most popular songs by danceability

In [None]:
dance_10_graph = dance_10[['song_name', 'danceability']][:10]

In [None]:
dance_10_graph.plot(kind='bar', x='song_name', y='danceability')
plt.xlabel('Song Name')
plt.ylabel('Danceability')
plt.title('Top 10 Songs by Danceability')
plt.xticks(rotation=45, ha='right')

In [None]:
plt.show()

![Danceability .png](<attachment:Danceability .png>)

*Conclusion: It is clear that the top 10 most popular songs have high danceability scores*

# !/usr/bin/env python
# coding: utf-8

In [None]:
# In[11]:
# Load the CSV file to see its contents
# csv_path = '/Users/ddpatel/Downloads/song_data.csv'
# song_data = pd.read_csv(csv_path)

In [None]:
song_data = pd.read_csv('song_data.csv')

*Display the first few rows of the dataframe and the column names*

In [None]:
song_data.head(), song_data.columns

**Calculate correlation of song popularity with selected features**

In [None]:
correlation_data = song_data[['song_popularity', 'danceability', 'energy', 'acousticness','instrumentalness', 'loudness', 'speechiness', 'tempo', 'audio_valence']].corr()

**Focusing on the correlation with song popularity**

In [None]:
correlation_with_popularity = correlation_data['song_popularity'].sort_values(ascending=False)
correlation_with_popularity

**Set up the matplotlib figure**

In [None]:
plt.figure(figsize=(18, 5))

# Plot 1: Song Popularity vs Danceability

In [None]:
plt.subplot(1, 3, 1)
sns.scatterplot(x='danceability', y='song_popularity', data=song_data)
plt.title('Song Popularity vs Danceability'

![2.png](attachment:2.png)

Conclusion: There seems to be a  positive trend indicating that songs with higher danceability scores  tend to have higher popularity. ​

# Plot 2: Song Popularity vs Loudness

In [None]:
plt.subplot(1, 3, 2)
sns.scatterplot(x='loudness', y='song_popularity', data=song_data)
plt.title('Song Popularity vs Loudness')

![3.png](attachment:3.png)

Conclusion: There seems to be a positive correlation indicating that songs with loudness. Louder songs are more popular and it captures more attention to listeners. 

# Plot 3: Song Popularity vs Instrumentalness

In [None]:
plt.subplot(1, 3, 3)
sns.scatterplot(x='instrumentalness', y='song_popularity', data=song_data)
plt.title('Song Popularity vs Instrumentalness')

In [None]:
plt.tight_layout()
plt.show()

![4.png](attachment:4.png)

Conclusion: Negative trend is seen when instrumentalness increases , song popularity tend to decrease.

# Conclusion
The analysis reveals that danceability appears to have a significant effect on song popularity. Additionally, correlations with other features such as loudness and instrumentalness provide further insights into factors influencing song popularity.