#  <u> Spotify Track Analysis Tutorial</u>

 ### By: Anuoluwapo Faboro, John Gansallo & Zikora Anyaoku

##  <u> **Table of contents:**</u>
1. Introduction
    - About the data
    - Libraries Used
2. Data Collection
    - Document Set Up
    - Accessing the Data
    - Tidying the Data
3. Exploratory Data Analysis
4. Hypothesis
5. Conclusion

--------

#  <u> Introduction:</u>

In this project, we introduce streaming through services such as Spotify and Apple Music. Music has never been more personalized and accessible. However, with the advent of streaming services, we're going to explore what makes songs popular on Spotify based on several factors such as: 

- Danceability
- Duration
- Tempo

The dataset contains about 600,000 songs gathered from the Spotify Web API, with song, artist, and release date information as well as song qualities such as acousticness, danceability, volume, tempo, and so on. The time span is from 1922 through 2020.
When grabbing each track from the dataset, we can obtain track information such as track name, album, release date, length, and popularity. More importantly, Spotify’s API allows us to extract a number of audio features such as danceability, energy, instrumentalness, liveness, loudness, speechiness, acousticness, and tempo.

We live in the Big Data era. We can collect a large amount of data, allowing us to derive useful conclusions and make well-informed strategic decisions. However, as the volume of data grows, analyzing and exploring it becomes more difficult. When utilized effectively and responsibly, visualizations can be powerful tools in exploratory data research.Visualizations can also be used to convey a message or inform our audience about our results. Because there is no one-size-fits-all approach of visualization, different tasks involve diverse types of visualizations. In this study, we'll look at the Spotify dataset, which is available on Kaggle.

## <u> About the Data:

This data contains 600,00+ tracks that were released between 1922 till present. It also includes the track title, track id, the release date, the artist and the features each song contains.  (https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-160k-tracks) is where the data is obtained from. 

## <u> Libraries Used:


   - Pandas: Used to display and organize in dataframes
   - Seaborn: Used to create plot
   - Matplotlib: Used to format plots


In [54]:
!pip install opendatasets
import opendatasets as od
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import re
import string
from statsmodels.formula.api import ols



------------------------

# <u>Data Collection</u>

## <u> Accessing the Data and Set Up</u>

In order to download this dataset, you will need to have a Kaggle account and generate an API token, if one does not exist.The next step is to load data from the web. Then download it as a csv file and using pandas to manipulate it.

In [5]:
od.download('https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-160k-tracks') #Retrieves dataset from Kaggle with an API Token

Skipping, found downloaded files in "./spotify-dataset-19212020-160k-tracks" (use force=True to force download)


In [6]:
tfile = 'spotify-dataset-19212020-160k-tracks/tracks.csv' # Spotify Dataset for audio features of tracks
tracks = pd.read_csv(tfile)
print(tracks.shape)
tracks.head()

(586672, 20)


Unnamed: 0,id,name,popularity,duration_ms,explicit,artists,id_artists,release_date,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,35iwgR4jXetI318WEWsa1Q,Carve,6,126903,0,['Uli'],['45tIt06XoI0Iio4LBEVpls'],1922-02-22,0.645,0.445,0,-13.338,1,0.451,0.674,0.744,0.151,0.127,104.851,3
1,021ht4sdgPcrDgSk7JTbKY,Capítulo 2.16 - Banquero Anarquista,0,98200,0,['Fernando Pessoa'],['14jtPCOoNZwquk5wd9DxrY'],1922-06-01,0.695,0.263,0,-22.136,1,0.957,0.797,0.0,0.148,0.655,102.009,1
2,07A5yehtSnoedViJAZkNnc,Vivo para Quererte - Remasterizado,0,181640,0,['Ignacio Corsini'],['5LiOoJbxVSAMkBS2fUm3X2'],1922-03-21,0.434,0.177,1,-21.18,1,0.0512,0.994,0.0218,0.212,0.457,130.418,5
3,08FmqUhxtyLTn6pAh6bk45,El Prisionero - Remasterizado,0,176907,0,['Ignacio Corsini'],['5LiOoJbxVSAMkBS2fUm3X2'],1922-03-21,0.321,0.0946,7,-27.961,1,0.0504,0.995,0.918,0.104,0.397,169.98,3
4,08y9GfoqCWfOGsKdwojr5e,Lady of the Evening,0,163080,0,['Dick Haymes'],['3BiJGZsyX9sJchTqcSA7Su'],1922,0.402,0.158,3,-16.9,0,0.039,0.989,0.13,0.311,0.196,103.22,4


After loading the data, we discover that there 586,672 rows (in this case, there are 586,672 tracks) and 20 columns (those tracks' attributes) within the tracks dataset. Due to the large size of the dataset, we will not be using the entirety of the dataset and thus some of the data will be left out of our analysis.

## <U> Tidying the Data</U>

Datasets can often having missing data which can impact our analysis, so before we begin analyzing the data, we want to check if the tracks dataset has any missing data and if there is missing data we would have to consider the circumstances that it is missing under, in other to decide the steps we taking moving forward. 

In [7]:
tracks.isnull().sum() # Check if tracks in dataset is null or contains missing data

id                   0
name                71
popularity           0
duration_ms          0
explicit             0
artists              0
id_artists           0
release_date         0
danceability         0
energy               0
key                  0
loudness             0
mode                 0
speechiness          0
acousticness         0
instrumentalness     0
liveness             0
valence              0
tempo                0
time_signature       0
dtype: int64

From the output above we notice 72 names are missing, but the other column within the dataframe do not contain any other missing values. We can assume that the data is missing completely at random (MCAR) for several reasons which are provided below: 

   1. Only the name column has missing data has missing data
   2. The id column is the special identifier for the track so even if the name is missing, one can still identify the name of the track by accessing Spotify's API and using the trackID
   3. There’s no relationship between whether a data point is missing and any values in the data set, missing or observed.
   
Since we've determined that the data is missing completely at random and the missing data is not consequential to our analysis so 

In [8]:
tracks['release_date'].head(10)

0    1922-02-22
1    1922-06-01
2    1922-03-21
3    1922-03-21
4          1922
5          1922
6          1922
7          1922
8          1922
9    1922-03-29
Name: release_date, dtype: object

Based on initial dataset, there are some columns within the Tracks dataset that need to be cleaned, for example, the release data isn't consistent as some rows include the day and month of the track release and others just include the year of release. Thus to streamline the data, we will be only considering the year of release. 

In [9]:
tracks['release_date'] = tracks['release_date'].str[:4].astype(int)
tracks.sort_values(by='release_date', inplace = True)
tracks.rename(columns = {'release_date' : 'year'}, inplace = True)
tracks.reset_index(drop=True, inplace=True)
tracks.head(5)

Unnamed: 0,id,name,popularity,duration_ms,explicit,artists,id_artists,year,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,74CSJTE5QQp1e4bHzm3wti,Maldita sea la primera vez,19,233920,0,['Los Pincheira del Sur'],['1BnQrx8p0bHBpidjIGq26z'],1900,0.659,0.791,2,-4.895,1,0.0295,0.139,2e-06,0.161,0.956,141.999,4
1,3AwlEhAkDDwKuTaNlgmMNQ,Nola,0,233426,0,"['Vincent Lopez and his Orchestra', 'Vincent L...","['1NElogFmaZxxGVsKS6hvl2', '3wxzXhMAoYbpJDXtBx...",1922,0.567,0.663,2,-5.334,1,0.0318,0.992,0.878,0.268,0.853,103.394,4
2,32Y9PU9JqxYFqzFaIdCQOs,Midnight rose,0,195862,0,"['Abe Lyman’s Orchestra', 'Abe Lyman']","['6LxnbCQ3ZrKj1lvC1lylS5', '3cNzWID6yZ1HN8qj4g...",1922,0.483,0.06,1,-9.499,1,0.042,0.982,8.9e-05,0.0498,0.381,136.044,4
3,2zRV6Vk6ZQYDokmiv5QEoP,California blues,0,195470,0,"['Abe Lyman’s Orchestra', 'Abe Lyman']","['6LxnbCQ3ZrKj1lvC1lylS5', '3cNzWID6yZ1HN8qj4g...",1922,0.578,0.462,8,-7.217,1,0.0398,0.995,0.903,0.0767,0.513,89.876,4
4,2uqaxtC6Usy7QeKfoD1jhB,Good evenin',0,189649,0,"['Isham Jones & His Orchestra', 'Isham Jones']","['65A1WinXDUhVkZD98s8kKU', '4OWTlYl5kkhaZEsyjU...",1922,0.565,0.334,10,-6.802,1,0.0309,0.978,0.0329,0.256,0.55,97.167,4


In [10]:
tracks.tail(5)

Unnamed: 0,id,name,popularity,duration_ms,explicit,artists,id_artists,year,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
586667,4XZ0ow6wtWOxDy0WMMzcBG,Somos Iguales,1,236587,0,"['Jhay Cortez', 'Zion & Lennox']","['0EFisYRi20PTADoJrifHrz', '21451j1KhjAiaYKflx...",2021,0.777,0.714,11,-4.296,1,0.0532,0.16,0.0,0.115,0.59,90.987,4
586668,4RU0r5hnonG58XU4NqCBto,No Me Conoce - Remix,0,309120,0,"['Jhay Cortez', 'J Balvin', 'Bad Bunny']","['0EFisYRi20PTADoJrifHrz', '1vyhD5VmyZ7KMfW5gq...",2021,0.804,0.786,10,-3.837,0,0.0735,0.144,0.0,0.0928,0.575,91.992,4
586669,4HdpcPFATP6heiXec1GWH0,CÓMO SE SIENTE - Remix,0,227520,1,"['Jhay Cortez', 'Bad Bunny']","['0EFisYRi20PTADoJrifHrz', '4q3ewBCX7sLwd24euu...",2021,0.807,0.606,3,-8.871,0,0.0872,0.0946,0.0,0.119,0.304,92.988,4
586670,6cI7wJTCwrvcfIuapt9JCC,Imaginaste - Remix,1,246653,0,"['Jhay Cortez', 'Wisin & Yandel']","['0EFisYRi20PTADoJrifHrz', '1wZtkThiXbVNtj6hee...",2021,0.855,0.71,1,-5.321,1,0.0939,0.0426,0.0,0.337,0.591,89.977,4
586671,3kNWyHdLVW1x6pn9EnSQ1H,Didn't Know,71,168897,0,['Tom Zanetti'],['73Msd8rknjBghcGQiZ1mgh'],2021,0.896,0.459,1,-8.937,1,0.0515,0.0737,8.4e-05,0.0981,0.484,125.939,4


Because there is only one track that was released in 1900, we will drop that row because it's an outlier that can affect our analysis.

In [11]:
tracks = tracks.drop([tracks.index[0]])
tracks.head()

Unnamed: 0,id,name,popularity,duration_ms,explicit,artists,id_artists,year,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
1,3AwlEhAkDDwKuTaNlgmMNQ,Nola,0,233426,0,"['Vincent Lopez and his Orchestra', 'Vincent L...","['1NElogFmaZxxGVsKS6hvl2', '3wxzXhMAoYbpJDXtBx...",1922,0.567,0.663,2,-5.334,1,0.0318,0.992,0.878,0.268,0.853,103.394,4
2,32Y9PU9JqxYFqzFaIdCQOs,Midnight rose,0,195862,0,"['Abe Lyman’s Orchestra', 'Abe Lyman']","['6LxnbCQ3ZrKj1lvC1lylS5', '3cNzWID6yZ1HN8qj4g...",1922,0.483,0.06,1,-9.499,1,0.042,0.982,8.9e-05,0.0498,0.381,136.044,4
3,2zRV6Vk6ZQYDokmiv5QEoP,California blues,0,195470,0,"['Abe Lyman’s Orchestra', 'Abe Lyman']","['6LxnbCQ3ZrKj1lvC1lylS5', '3cNzWID6yZ1HN8qj4g...",1922,0.578,0.462,8,-7.217,1,0.0398,0.995,0.903,0.0767,0.513,89.876,4
4,2uqaxtC6Usy7QeKfoD1jhB,Good evenin',0,189649,0,"['Isham Jones & His Orchestra', 'Isham Jones']","['65A1WinXDUhVkZD98s8kKU', '4OWTlYl5kkhaZEsyjU...",1922,0.565,0.334,10,-6.802,1,0.0309,0.978,0.0329,0.256,0.55,97.167,4
5,2tKVyDsEVSrAibTVhFGDGH,I can't believe It's true,0,197407,0,"['Isham Jones & His Orchestra', 'Isham Jones']","['65A1WinXDUhVkZD98s8kKU', '4OWTlYl5kkhaZEsyjU...",1922,0.78,0.6,5,-4.948,0,0.194,0.988,0.0121,0.115,0.843,123.595,4


To make the duration of a track more readable, we will convert the column ```duration_ms``` from milliseconds to minutes.

In [12]:
tracks['duration_ms'] = tracks['duration_ms'].apply(lambda x: x/60000)
tracks.rename(columns = {'duration_ms' : 'duration_min'}, inplace = True)
tracks.head()

Unnamed: 0,id,name,popularity,duration_min,explicit,artists,id_artists,year,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
1,3AwlEhAkDDwKuTaNlgmMNQ,Nola,0,3.890433,0,"['Vincent Lopez and his Orchestra', 'Vincent L...","['1NElogFmaZxxGVsKS6hvl2', '3wxzXhMAoYbpJDXtBx...",1922,0.567,0.663,2,-5.334,1,0.0318,0.992,0.878,0.268,0.853,103.394,4
2,32Y9PU9JqxYFqzFaIdCQOs,Midnight rose,0,3.264367,0,"['Abe Lyman’s Orchestra', 'Abe Lyman']","['6LxnbCQ3ZrKj1lvC1lylS5', '3cNzWID6yZ1HN8qj4g...",1922,0.483,0.06,1,-9.499,1,0.042,0.982,8.9e-05,0.0498,0.381,136.044,4
3,2zRV6Vk6ZQYDokmiv5QEoP,California blues,0,3.257833,0,"['Abe Lyman’s Orchestra', 'Abe Lyman']","['6LxnbCQ3ZrKj1lvC1lylS5', '3cNzWID6yZ1HN8qj4g...",1922,0.578,0.462,8,-7.217,1,0.0398,0.995,0.903,0.0767,0.513,89.876,4
4,2uqaxtC6Usy7QeKfoD1jhB,Good evenin',0,3.160817,0,"['Isham Jones & His Orchestra', 'Isham Jones']","['65A1WinXDUhVkZD98s8kKU', '4OWTlYl5kkhaZEsyjU...",1922,0.565,0.334,10,-6.802,1,0.0309,0.978,0.0329,0.256,0.55,97.167,4
5,2tKVyDsEVSrAibTVhFGDGH,I can't believe It's true,0,3.290117,0,"['Isham Jones & His Orchestra', 'Isham Jones']","['65A1WinXDUhVkZD98s8kKU', '4OWTlYl5kkhaZEsyjU...",1922,0.78,0.6,5,-4.948,0,0.194,0.988,0.0121,0.115,0.843,123.595,4


To make it easier for us to parse through the artist data, we will convert the ```artists``` column to a list rather than a Pandas Series.

In [59]:
def convertStrArray(str):
    arr = re.split(",|'|]|\[", str)
    fixed = []
    for val in arr:
        if (val != '') and (val != ' '):
            fixed.append(val)
    return fixed

['Vincent Lopez and his Orchestra', 'Vincent Lopez']

In [61]:
tracks['artists'] = tracks['artists'].apply(lambda x: convertStrArray(x))
tracks['artists'].head()

1    [Vincent Lopez and his Orchestra, Vincent Lopez]
2                  [Abe Lyman’s Orchestra, Abe Lyman]
3                  [Abe Lyman’s Orchestra, Abe Lyman]
4          [Isham Jones & His Orchestra, Isham Jones]
5          [Isham Jones & His Orchestra, Isham Jones]
Name: artists, dtype: object

-----------------------------------------------------------

# Exploratory Data Analysis

Examining the relationship between the popularity of songs vs the identified aspects. The Exploratory Data Analysis step is a chance to examine the dataset to identify patterns within the dataset. Occasionally, while examining the dataset you may find that different features may not be related at all.

### Attribute Categories

- Danceability: Any general attribute relating to dancing
- Energy: 
- Instrumentalness: 
- Liveness: 
- Loudness:
- Speechiness: 
- Acousticness: 
- Tempo:

In [None]:
years = range(2010, 2020)
for i in years:
    # Creates a scatterplot plotting the relationship between danceability 
    # and popularity of each track released between 2010 and 2019 for each year
    by_year = tracks[tracks['year'] == i]
    fig = plt.figure(figsize=(15, 10))
    plt.scatter(x = by_year['danceability'], y = by_year['popularity'], color = '#85D5E9')
    plt.ylabel('Popularity', size = 16)
    plt.xlabel('Danceability', size = 16)
    plt.title('Popularity vs Danceability of Tracks From ' + str(i), size = 22)
    
    # Adds a linear regression to the scatterplot
    d = np.polyfit(by_year['danceability'], by_year['popularity'], 1)
    f = np.poly1d(d)
    x_linreg = np.linspace(by_year['danceability'].min(),
                           by_year['danceability'].max(),
                           200)
    y_linreg = f(x_linreg)
    plt.plot(x_linreg, y_linreg, color="#ff6b61", linewidth=5.0)
    
    by_year = pd.DataFrame()

We've looked at the relationship between Popularity and Danceability amongst tracks over the past 10 years from 2010 to 2019 however there is no relationship between Popularity vs. Danceability. From this set of graphs we can surmise that danceability has no affect on the popularity of a track on Spotify. Next we'll see if there's a relationship between popularity and the tempo of a track.

In [None]:
years = range(2010, 2020)
for i in years:
    by_year = tracks[tracks['year'] == i]
    fig = plt.figure(figsize=(15, 10))
    plt.scatter(x = by_year['tempo'], y = by_year['popularity'])
    plt.ylabel('Popularity', size = 16)
    plt.xlabel('Tempo', size = 16)
    plt.title('Popularity vs Tempo of Songs Released in ' + str(i), size = 22)
    
    # Adds a linear regression to the scatterplot
    d = np.polyfit(by_year['tempo'], by_year['popularity'], 1)
    f = np.poly1d(d)
    x_linreg = np.linspace(by_year['tempo'].min(),
                           by_year['tempo'].max(),
                           200)
    y_linreg = f(x_linreg)
    plt.plot(x_linreg, y_linreg, color="#ff6b61", linewidth=5.0)
    
    by_year = pd.DataFrame()    

After plotting the tempo and popularity of tracks released between 2010 and 2019, there also appears to be no relationship thus tempo can't be a factor in what makes a song popular on Spotify. Next we'll see if duration of a track will have an impact on a song's popularity. However based on the trend we're seeing so far, there may not be a relationship between the duration of a song and the population of a song.

In [None]:
years = range(2010, 2020)
for i in years:
    fig = plt.figure(figsize = (15, 10))
    by_year = tracks[tracks['year'] == i]
    plt.scatter(x = by_year['duration_min'], y = by_year['popularity'])
    plt.ylabel('Popularity', size = 16)
    plt.xlabel('Duration (in minutes)', size = 16)
    plt.title('Popularity vs Duration of Songs Released in ' + str(i), size = 22)
    
    # Adds a linear regression to the scatterplot
    d = np.polyfit(by_year['duration_min'], by_year['popularity'], 1)
    f = np.poly1d(d)
    x_linreg = np.linspace(by_year['duration_min'].min(),
                           by_year['duration_min'].max(),
                           200)
    y_linreg = f(x_linreg)
    plt.plot(x_linreg, y_linreg, color="#ff6b61", linewidth=5.0)
    
    by_year = pd.DataFrame()

In [None]:
limit = tracks[tracks.year < 2020]
limit.head()

In [None]:
mean_danceability = limit.groupby(['year'])['danceability'].mean()
mean_danceability = mean_danceability.to_frame()
mean_danceability.reset_index(inplace = True)
mean_danceability.index = mean_danceability.index + 1
mean_danceability.head()

In [None]:
fig = plt.figure(figsize=(15, 10))
plt.plot(mean_danceability['year'], mean_danceability['danceability'])
plt.xlabel('Year', fontsize = 16)
plt.ylabel('Average Danceability', fontsize = 16)
plt.title('Average Danceability Over the Years', fontsize = 22)

In [None]:
mean_loudness = limit.groupby(['year'])['loudness'].mean()
mean_loudness = mean_loudness.to_frame()
mean_loudness.reset_index(inplace = True)
mean_loudness.index = mean_loudness.index + 1
mean_loudness.head()

In [None]:
fig = plt.figure(figsize=(15, 10))
plt.plot(mean_loudness['year'], mean_loudness['loudness'])
plt.xlabel('Year', fontsize = 16)
plt.ylabel('Average Loudness', fontsize = 16)
plt.title('Average Loudness From 1922 to 2020', fontsize = 22)

In [None]:
mean_duration = tracks.groupby(['year'])['duration_min'].mean()
mean_duration = mean_duration.to_frame()
mean_duration.reset_index(inplace = True)
mean_duration.index = mean_duration.index + 1
mean_duration.head()

In [None]:
fig = plt.figure(figsize=(15, 10))
plt.plot(mean_duration['year'], mean_duration['duration_min'])
plt.xlabel('Years', fontsize = 16)
plt.ylabel('Average Duration (in minutes)', fontsize = 16)
plt.title('Average Duration of a Track Over the Years', fontsize = 22)

-----------

# Hypothesis Testing

We now want to put our theory to the test. However, let's have a clear picture of what hypothesis testing entails. Hypothesis testing is a statistical approach for assessing if a model you've constructed is a good match or not. Hypothesis testing consists of two types of hypotheses: a null hypothesis and an alternative hypothesis. The objective is that you want to put up your hypothesis in such a way that the null hypothesis is rejected. So now, what is the interpretation of rejecting a hypothesis? This is when the amount of relevance comes into play. In addition to the hypotheses, you must provide a significance level while planning your experiment. You reject the null hypothesis if the p value of your random variable is less than your significance level, also known as the rejection level. When deciding whether or not to reject your null hypothesis, make sure to consider the sort of test you're running: one-tailed or two-tailed?

We want to collect data from every song Beyonce has ever released for our study. The API can be used to produce a dataset in a few different ways. We could get a list of the artist's albums and then play each album track on a loop.
Alternatively, we could cycle through a playlist we find on Spotify that has every track Beyonce has to offer, which would possibly more efficient.

Look at Beyonce's tracklist on Spotify and see if there's no linearity between the song features, the null hypothesis. The alternative hypothesis is that there is linearity between the song features.

So look at artists(beyonce) based on the past tracks they've dropped and make a hypothesis using linear regression on whether the danceability will be greater than or less than 0.5


In [None]:
fastfwd = tracks[tracks.year > 2001]

In [86]:
bey_tracks = pd.DataFrame()
bey_tracks = fastfwd[fastfwd['artists'].apply(lambda x: 'Beyoncé' in x)]
bey_tracks.head()

Unnamed: 0,id,name,popularity,duration_min,explicit,artists,id_artists,year,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
397035,5ljCWsDlSyJ41kwqym2ORw,03' Bonnie & Clyde,66,3.426,1,"[JAY-Z, Beyoncé]","['3nFkdlSjzX9mRTtwJOzDYB', '6vWDO969PvNqNYHIOW...",2002,0.759,0.678,9,-5.148,0,0.314,0.23,0.0,0.15,0.327,89.64,4
397958,29LHe8kG3PraghUZOZYsw4,Baby Boy (feat. Beyoncé ),60,4.111783,0,"[Sean Paul, Beyoncé]","['3Isy6kedDrgPYoTS1dazA9', '6vWDO969PvNqNYHIOW...",2002,0.658,0.621,1,-5.725,1,0.231,0.0667,2e-06,0.142,0.817,91.267,4
397990,1uVfUdVv0h9MWia3tdZo5G,Baby Boy (feat. Beyoncé ),60,4.111783,0,"[Sean Paul, Beyoncé]","['3Isy6kedDrgPYoTS1dazA9', '6vWDO969PvNqNYHIOW...",2002,0.658,0.621,1,-5.725,1,0.231,0.0667,2e-06,0.142,0.817,91.267,4
398085,3pxJuMLjNPtiC0fX8EHFlF,"Me, Myself and I",59,5.01955,0,[Beyoncé],['6vWDO969PvNqNYHIOW5v0m'],2003,0.747,0.47,1,-9.08,1,0.0819,0.228,9.9e-05,0.159,0.555,83.615,4
398266,0WqIKmW4BTrj3eJFmnCKMv,Crazy In Love (feat. Jay-Z),25,3.932217,0,"[Beyoncé, JAY-Z]","['6vWDO969PvNqNYHIOW5v0m', '3nFkdlSjzX9mRTtwJO...",2003,0.664,0.758,2,-6.583,0,0.21,0.00238,0.0,0.0598,0.701,99.259,4


# Conclusion

# Resources