<b>Basic Flow</b>
- Background of task and data
- Summary of technique used
- Simple data analysis (show of original data features)
- Preprocessing
- Feature Extraction (no feature extraction since the features are already given from the second data set)

<b>Background and Related Works</b>

In recent years, music revenue in the United States has seen substantial growth. In 2017, revenues from recorded music in the United States increased 16.5% at estimated retail value to $8.7 billion, continuing the growth from the previous year. Paid subscriptions from streaming services like Spotify and Apple Music were the biggest growth driver for the music industry in 2017. The revenues from streaming platforms made up 65% of total industry revenues. [1] Online streaming can now be seen as 
the new norm for accessing and distributing music. Therefore, having a fundamental understanding of what makes a song popular 
has major implications to musicians and record labels that thrive on stream count and song popularity. 

The ability to make accurate predictions of song popularity could be achieved through the use of machine learning techniques. 

Pham, Kyauk and Park used both acoustic features and metadata to create both classification and predictive models, to determine 
if whether or not the song is popular or in the case of the latter, predict the popularity score. Upon applying SVMs, neural 
networks and logistic regression for classification, SVM (Gaussian kernel) yielded the highest F1 score. As for regression, 
they fitted the models using a standard multiple linear regression, and applied feature selection methods to achieve the best 
coefficient estimates for regression.it was Logistic Lasso regression that yielded the smallest test error. The research 
concluded that the acoustic features aren’t nearly as predicative as the metadata features. A likely reason for this is that 
there is a lot of variation in acoustic features within a single song that make it difficult to extract metrics that represent 
an entire song.[2]

<b>Objective</b>

Determine the popularity of a song based on the given audio features with a value of 1 for popular and 0 for unpopular.

<b> Machine Learning Techniques</b>

A. Classification:<br>
1. Naive Bayes<br>
2. SVM<br>
3. Decision Tree<br>
4. Logistic Regression<br>
5. KNN

<b>Description of the Data</b>

This research uses data from two Kaggle datasets, namely Spotify’s Worldwide Daily Song Ranking [1] and Top Spotify Tracks of 2017 [2]. The former is a collection of Spotify’s most streamed songs in different regions across the world for each day of 2017. Each row contains a ranking position on a specific day and region. There are roughly 200 entries per day for each region, however be aware that some of Spotify's data was missing in very few occasions. Due to this, the researchers have only focused on the streams (stream count) column for each entry. The latter is a collection of the audio features of the songs found in the Top Spotify Tracks of 2017 playlist in Spotify. Aside from the song title, artist and song url, each song is given values for 13 audio features listed and defined by Spotify API below.
<style>
table, th, td {
    border: 1px solid black;
    border-collapse: collapse;
    padding: 15px;
    text-align: left;
}
</style>
<table>
  <tr>
    <th>Feature</th>
    <th>Description</th> 
  </tr>
  <tr>
    <td>Danceability</td>
    <td>Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.</td> 
  </tr>
  <tr>
    <td>Energy</td>
    <td>A measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.</td> 
  </tr>
  <tr>
    <td>Key</td>
    <td>The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on.</td> 
  </tr>
  <tr>
    <td>Loudness</td>
    <td>The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.</td> 
  </tr>
  <tr>
    <td>Mode</td>
    <td>Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.</td> 
  </tr>
  <tr>
    <td>Speechiness</td>
    <td>Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.</td> 
  </tr>
  <tr>
    <td>Acousticness</td>
    <td>A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.</td> 
  </tr>
  <tr>
    <td>Instrumentalness</td>
    <td>Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.</td> 
  </tr>
  <tr>
    <td>Liveliness</td>
    <td>Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.</td> 
  </tr>
  <tr>
    <td>Valence</td>
    <td>A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).</td> 
  </tr>
  <tr>
    <td>Tempo</td>
    <td>The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.</td> 
  </tr>
  <tr>
    <td>Duration</td>
    <td>The duration of the track in milliseconds.</td> 
  </tr>
  <tr>
    <td>Time Signature</td>
    <td>An estimated overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure).</td>
  </tr>
</table>

<b>Exploratory Data Analysis</b>

Create a distribution plot for the values for each audio feature.<br>
Create a feature correlation heat map.<br>
See the following examples:<br>
https://medium.com/mlreview/spotify-analyzing-and-predicting-songs-58827a0fa42b <br>
https://www.kaggle.com/cihanoklap/top-songs-on-spotify-what-makes-them-popular <br>
https://www.kaggle.com/nadintamer/what-makes-top-spotify-songs-popular

<b>Data Preprocessing</b>

Since the research checks the association between the song’s stream count and the song’s features, the researchers have to make sure that the song entry exists in both datasets. The researchers mapped the 100 songs from the Top Spotify Tracks dataset to the entries in the Spotify’s Worldwide Daily Song Ranking dataset and pulled entries with matching song titles. The 256718 entries will serve as the researchers’ working dataset. The dataset would contain duplicate entries, since songs could be on the top 200 in a span of several days in different regions. The researchers get the total stream count per song for the whole year.<br><br>
With stream count as the metric of popularity, songs with a stream count higher than mean would be considered as popular (1) and the rest would be labeled as unpopular (0). 

<b>Feature Selection</b>

Choose one of the following:<br>
1) Forward Stepwise Selection - Forward selection greedily chooses the best combination of features by starting with an empty subset of features, then incrementally adding a feature to the model that was selected through evaluation of the feature subset through cross-validation. This step is repeated until the generalization error is minimized and the best subset of features is reported.<br>
2) Backward Stepwise Selection - Backward stepwise selection works similarly to forward stepwise selection; however, instead of starting with an empty subset of features, it begins by evaluating the use of all features and incrementally removes features until the model is optimized.<br>
3) Regularization - Regularization is a shrinkage method that regularizes the coefficient estimates by shrinking the coefficients towards zero. Regularization often improves the fit because reducing coefficient estimates can significantly.<br>

For selecting the appropriate features to be used for the machine learning models, we use backward stepwise selection to see which feature actually contributes more to the accuracy of the model. We do this by first selecting a random machine learning model which ended up as decision trees. Here, we use only one feature out of the thirteen and the resulting accuracy of the test set to the train set is thus tabulated as seen in the table below.

<table>
  <tr>
    <th>Feature</th>
    <th>Accuracy</th> 
  </tr>
  <tr>
    <td>Danceability</td>
    <td>0.375</td> 
  </tr>
  <tr>
    <td>Energy</td>
    <td>0.5833</td> 
  </tr>
  <tr>
    <td>Key</td>
    <td>0.625</td> 
  </tr>
  <tr>
    <td>Loudness</td>
    <td>0.2917</td> 
  </tr>
  <tr>
    <td>Mode</td>
    <td>0.4583</td> 
  </tr>
  <tr>
    <td>Speechiness</td>
    <td>0.5417</td> 
  </tr>
  <tr>
    <td>Acousticness</td>
    <td>0.4583</td> 
  </tr>
  <tr>
    <td>Instrumentalness</td>
    <td>0.625</td> 
  </tr>
  <tr>
    <td>Liveliness</td>
    <td>0.375</td> 
  </tr>
  <tr>
    <td>Valence</td>
    <td>0.375</td> 
  </tr>
  <tr>
    <td>Tempo</td>
    <td>0.3333</td> 
  </tr>
  <tr>
    <td>Duration</td>
    <td>0.5833</td> 
  </tr>
</table>

We can see that there are a lot of features that produced accuracies which are very low. From here, we opted to remove the features that are below 0.5 since these features are unreliable as features for classification of data given a test set. As a result, we get a total of five remaining features to be used, namely duration, energy, instrumentalness, key, and speechiness.

<b>Metrics</b>

Precision, recall, and F1 score are used to capture how well our model does in the task of classification. Precision measures the portion of examples that were classified as popular that are truly popular while recall measures the portion of examples that are truly popular that our model classified as popular. F1 score acts as the weighted average between these two values. 

The AUC is a metric used to evaluate the performance of a binary classifier by taking the area under a curve created by plotting TPR vs. FPR at different probability thresholds. The AUC represents the probability that the classifier ranks a random positive example higher than a random negative one. 