# Introduction

#### Objective & Scope
* Share Insights and Findings
* Collect, Clean and Analyze the 2023 Spotify Dataset
* Curate and Develop a Web Application Based on the Dataset
* Deploy the Application to Render

#### Initial Questions:
* Can we identify variables of interest that can be used to predict whether or not a song will make it the top 50, top 100, top 500? 
* Are their commmonalities between artists and their "music magic" (i.e. energy and liveness) despite the year the track was released that keeps the track relevant?
* According to the dataset, is there an apparent formula to making a good, catchy song?
* Which top 10, 5, 3 artists has the most songs produced?
* Do these top artists have a similiar music magic formula that's allowing them to be in the top charts?

In [None]:
#Libraries are imported here
import pandas as pd
import streamlit as st
import numpy as np
import plotly.express as px
import altair as alt

In [None]:
#The data loads here
df = pd.read_csv("spotify-2023.csv", encoding='latin1')

In [None]:
#Before locating NaN/Missing values
df.info()

In [None]:
#After locating NaN/Missing values
df.isnull().sum()

In [None]:
# After discovering the nan values, I decided to remove the missing value rows in order to 
# ensure across the board that all artists had a value in each of the columns and
# therefore, the data's shape was uniform
df.dropna(inplace=True)
df.info()

In [None]:
df.head()

In [None]:
print(f"There are {df['artist(s)_name'].nunique()} artists and or groups who reached Spotify's Most Streams Songs on their platform in 2023.")

In [None]:
num_of_artists = df['artist(s)_name'].value_counts()
num_of_artists

In [None]:
num_of_artists.head(10)

There are 8 solo artists and 2 (Kpop & Duo) groups that had between 7-29 popular songs charting on Spotify.

In [None]:
df.duplicated().sum()

In [None]:
print(f"Here are the {df['key'].nunique()} unique keys popular songs were played in: \n {df['key'].unique()} ")

In [None]:
print(f"Here are all the {df['released_year'].sort_values().nunique()} distinct years a song was released in yet was relevant all of 2023: \n {df['released_year'].sort_values().unique()} ")

In [None]:
fig = px.histogram(df, x='streams', y='speechiness_%', histfunc='avg')
fig.show()

In [None]:
fig = px.histogram(df, x='streams', y='valence_%', histfunc='avg')
fig.show()

In [None]:
fig = px.scatter_matrix(df, dimensions=["energy_%", "bpm", "valence_%"])
fig.show()

In [None]:
fig = px.scatter_matrix(df, dimensions=["danceability_%", "bpm", "valence_%"])
fig.show()

In [None]:
# I could add something that agg values together but the idea hasn't hit just yet --
# but I must thing in relation to the objectives from above, what gets sogns to top the charts?

df.describe()