# API Report
### Connor Farrar - Oct. 14, 2023

In [624]:
import requests
import pandas as pd
import base64
import json
import urllib

# Hypothesis
### I believe that songs from playlists that contain the word "Happy" in the title will have a valence score of 0.5 or higher on average. 

# Endpoints to Use
### Search
I will use the search endpoint to look for playlists that include the keyword 'happy'. This will narrow the search down to playlists that have 'happy' in the title, but it will include Spotify generated playlists a along with user generated playlits.

### Playlists (Get User's Playlists)
I will use this endpoint to find information on one of the user playlists that I gathered with the search endpoint. I will specifically be looking for the items response in order to view the tracks on that playlist

### Tracks (Audio Features)
Finally, I will use the Get Tracks' Audio Features endpoint to get the audio_features response that contains the valences of the tracks. This will then allow me to test my hypothesis on those tracks.

# Data Reliabality

### Who collected this data?

 This data is collected by Spotify.

### Why was this data collected?

This data was collected because I accessed the API in order to search for it.

### In what ways may this data be reliable?

This data is reliable as it is directly collected from Spotify, yet this is also what makes it unreliable. So long as Spotify is not altering their data, then it is reliable.  

### In what ways may this data be unreliable?

The data may also be unreliable as Spotify could provide me with whatever information they wanted to, and as they are a company this information could be swayed by financial incentives. 

The data could be unreliable as I was unable to locate the playlist by searching on the Spotify app. Therefore the data that was provided to me through the API may not exactly match, but I have no way to confirm this. 

### Limitations

There where no major limitations that I faced while finding my data. Some steps may have been able to be condensed by simplfying the API or adding more responses to certain endpoints, but I was ultimately able to find all the information I needed.

### Create the session headers and access the API using the Spotify keys

In [625]:
my_header = {"User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"}

In [626]:
Client_ID = pd.read_csv('spotify_keys')['Client_ID'].iloc[0]

In [627]:
Client_Secret = pd.read_csv('spotify_keys')['Client_Secret'].iloc[0]

In [628]:
client_cred = base64.b64encode(str(Client_ID + ":" + Client_Secret).encode("ascii"))

In [629]:
# client_cred

In [630]:
headers = {"Authorization": "Basic {}".format(client_cred.decode("ascii"))}

In [631]:
payload = {'grant_type' : 'client_credentials'}
url = 'https://accounts.spotify.com/api/token'

In [632]:
session_key_response = requests.post(url = url, data = payload, headers = headers)

In [633]:
session_key_response.status_code

200

In [634]:
# session_key_response.json()['access_token']

In [635]:
session_header_key = session_key_response.json()

In [636]:
key = session_header_key['access_token']

In [637]:
session_headers = {"Authorization": "Bearer {}".format(key)}

In [638]:
# session_headers

### Access the Spotify API and use the search endpoint to find 50 playlists that have 'Happy' in the title

In [639]:
search_url = 'https://api.spotify.com/v1/search?q={}&type={}&limit={}'\
.format(urllib.parse.quote('Happy'), 'playlist', 50)

In [640]:
search_response = requests.get(url = search_url, headers = session_headers)
search_response.status_code

200

In [641]:
search_data = search_response.json()

### Create a dataframe of those playlists and use keys to single out the names of the playlists

In [642]:
search_df = pd.DataFrame(search_data['playlists']['items'])
pname_df = pd.DataFrame(search_df['name'])
pname_df

Unnamed: 0,name
0,Happy Beats
1,Happy Mix
2,Happy Hits!
3,HAPPY CAT AND BANANA CAT
4,Happy Days
5,Banana cat songs
6,Happy Pop Hits
7,HAPPY BIRTHDAY GRIMACE
8,Happy Drive
9,happy birthday Grimace!


### Create a data frame that displays the usernames of the palylist creators that are not 'Spotify'

In [643]:
new_search = [x['owner']['display_name'] for x in search_data['playlists']['items']]
search_df['name'] = new_search

In [644]:
# search_df['name'] = new_search
ownername_df = pd.DataFrame(search_df['name'])
ownername_df[ownername_df['name'] != 'Spotify']

Unnamed: 0,name
3,ASAFXD
5,Denise Rosaroso-Tapia
7,💌
9,lechonk 42
11,g!
14,Electro Posé
15,skye
17,Grandma Fox
20,Haqeem Afif
22,John Timothy Heist


### Merge the data frame that displays the playlist names and the user names so that they match up

In [645]:
merged_df = pd.merge(pname_df, ownername_df, suffixes=['_playlist', '_owner'], left_index=True, right_index=True)
merged_df[merged_df['name_owner'] != 'Spotify']

Unnamed: 0,name_playlist,name_owner
3,HAPPY CAT AND BANANA CAT,ASAFXD
5,Banana cat songs,Denise Rosaroso-Tapia
7,HAPPY BIRTHDAY GRIMACE,💌
9,happy birthday Grimace!,lechonk 42
11,Happy Birthday Songs for Kids,g!
14,Happy Vibes 2023 ☀️,Electro Posé
15,Happy songs everyone knows 😄 🙌,skye
17,Happy songs(CLEAN),Grandma Fox
20,Banana cat and HAPI,Haqeem Afif
22,Happy/Upbeat Christian Music,John Timothy Heist


### Select one user playlist and identify the user's id

I selected the 42nd palylist in the list titled simply 'happy' by the user Belina Bellwood

In [646]:
search_df['owner'][42]['id']

'spotify'

### Use that user id to acces the User Playlists endpoint

In [647]:
id_url = 'https://api.spotify.com/v1/users/{}/playlists'\
.format(urllib.parse.quote('31ogj7kcdegclqzp3ejqaznocw3i'))

In [648]:
id_response = requests.get(url = id_url, headers = session_headers)
id_response.status_code

200

### Find the 'happy' playlist among all the user's playlists

In [649]:
id_data = id_response.json()
id_data['items']

[{'collaborative': False,
  'description': 'lemme be your dj',
  'external_urls': {'spotify': 'https://open.spotify.com/playlist/6NLcjPubMX81BoMUQkCgiE'},
  'href': 'https://api.spotify.com/v1/playlists/6NLcjPubMX81BoMUQkCgiE',
  'id': '6NLcjPubMX81BoMUQkCgiE',
  'images': [{'height': None,
    'url': 'https://image-cdn-ak.spotifycdn.com/image/ab67706c0000bebbc7a14bdd8593043a331c208e',
    'width': None}],
  'name': ' a 2013 party',
  'owner': {'display_name': 'Belina Bellwood',
   'external_urls': {'spotify': 'https://open.spotify.com/user/31ogj7kcdegclqzp3ejqaznocw3i'},
   'href': 'https://api.spotify.com/v1/users/31ogj7kcdegclqzp3ejqaznocw3i',
   'id': '31ogj7kcdegclqzp3ejqaznocw3i',
   'type': 'user',
   'uri': 'spotify:user:31ogj7kcdegclqzp3ejqaznocw3i'},
  'primary_color': None,
  'public': True,
  'snapshot_id': 'NzEsODc0NTJmOTg4MDNhYjUyYjIwNGQ2M2ExNTUxMTZiOTczMDkwN2I5Yg==',
  'tracks': {'href': 'https://api.spotify.com/v1/playlists/6NLcjPubMX81BoMUQkCgiE/tracks',
   'total': 60

In [650]:
id_data.keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [651]:
id_data['items']

[{'collaborative': False,
  'description': 'lemme be your dj',
  'external_urls': {'spotify': 'https://open.spotify.com/playlist/6NLcjPubMX81BoMUQkCgiE'},
  'href': 'https://api.spotify.com/v1/playlists/6NLcjPubMX81BoMUQkCgiE',
  'id': '6NLcjPubMX81BoMUQkCgiE',
  'images': [{'height': None,
    'url': 'https://image-cdn-ak.spotifycdn.com/image/ab67706c0000bebbc7a14bdd8593043a331c208e',
    'width': None}],
  'name': ' a 2013 party',
  'owner': {'display_name': 'Belina Bellwood',
   'external_urls': {'spotify': 'https://open.spotify.com/user/31ogj7kcdegclqzp3ejqaznocw3i'},
   'href': 'https://api.spotify.com/v1/users/31ogj7kcdegclqzp3ejqaznocw3i',
   'id': '31ogj7kcdegclqzp3ejqaznocw3i',
   'type': 'user',
   'uri': 'spotify:user:31ogj7kcdegclqzp3ejqaznocw3i'},
  'primary_color': None,
  'public': True,
  'snapshot_id': 'NzEsODc0NTJmOTg4MDNhYjUyYjIwNGQ2M2ExNTUxMTZiOTczMDkwN2I5Yg==',
  'tracks': {'href': 'https://api.spotify.com/v1/playlists/6NLcjPubMX81BoMUQkCgiE/tracks',
   'total': 60

### Find the href for the tracks in that playlist and use that href to access the tracks data

In [652]:
id_data['items'][6]['tracks']

{'href': 'https://api.spotify.com/v1/playlists/5maTxNYkbtZkKbnmXarjA8/tracks',
 'total': 88}

In [653]:
tracks_url = 'https://api.spotify.com/v1/playlists/5maTxNYkbtZkKbnmXarjA8/tracks'

In [654]:
tracks_response = requests.get(url = tracks_url, headers = session_headers)
tracks_response.status_code

200

In [655]:
tracks_data = tracks_response.json()
tracks_data['items'][0]['track']['id']

'2nGFzvICaeEWjIrBrL2RAx'

### List all the track IDs rather than just one

In [656]:
new_track_id = [x['track']['id'] for x in tracks_data['items']]
tracks_df['id_list'] = new_track_id
tracks_df['id_list']

0     2nGFzvICaeEWjIrBrL2RAx
1     5MEYDJVJMaGAXfddTo0D6J
2     2DnJjbjNTV9Nd5NOa1KGba
3     4lLtanYk6tkMvooU0tWzG8
4     1ARkKt39O6WQqE0QEpZntu
               ...          
83    3Ejes8GtOct6T6UF24rzRY
84    1f92lJLwj8eMyZWEA11Gqe
85    6A9mKXlFRPMPem6ygQSt7z
86    0iwf1pdKI2os27cTX88dIt
87    1PckUlxKqWQs3RlWXVBLw3
Name: id_list, Length: 88, dtype: object

In [657]:
','.join(list(tracks_df['id_list']))

'2nGFzvICaeEWjIrBrL2RAx,5MEYDJVJMaGAXfddTo0D6J,2DnJjbjNTV9Nd5NOa1KGba,4lLtanYk6tkMvooU0tWzG8,1ARkKt39O6WQqE0QEpZntu,1EzrEOXmMH3G43AXT1y7pA,2S5FeDvQmmI9iLq8SdCsB2,1vTlUeZQx3G063arrFeyrT,4poYvYKOKtpAOoXSIYeU0A,3Ve0ag71EeiQalsl7Ha4Dw,1mCsF9Tw4AkIZOjvZbZZdT,6FE2iI43OZnszFLuLtvvmg,599hlIX8JXS7RNYvQ3EPCQ,5HQVUIKwCEXpe7JIHyY734,6oHDvarQSp0mf5AD1SyNH0,1FCQEg7wOK9IIBuxx63krr,3uCAZ7WcpYCeEqBXqfi8Uu,2bJvI42r8EF3wxjOuDav4r,0puf9yIluy9W0vpMEUoAnN,1wZcp22XAyFnSw82R65UUB,5jSlcXdUGLWOV2pSfYYiBs,3LI4MmibTkXH5cGpCGZgyw,5EbKmAd0qWa7nl7E2ys6Cp,4u7EnebtmKWzUH433cf5Qv,1XZa6MDzWqCTeNATvtxzZY,2L6hCTpUlR2p7Su3hiSB0s,7s3VxmIizcDgDTRCA4kDzd,5UFbfXuj4TjirzmcvTFBBy,4JHvpv9UsX8AOJk28mUbPC,1iihbPAWlTbGW3UTbz22Th,4IYNzoZ63Am2XYkxMFXsN7,0G7vexduCvboPyIGjJXQIC,56VwY79h8kyYfSEHyCecFd,1PS1QMdUqOal0ai3Gt7sDQ,3ZFTkvIE7kyPt6Nu3PEa7V,2Cd9iWfcOpGDHLz6tVA3G4,14iN3o8ptQ8cFVZTEmyQRV,7LqjznQwfrax7MjQXmxqdQ,4fXGWiVhlOLdhwRDP6pIFG,1orDTlpI02AkHtNN3RAtLV,79z9QkhHePTFsSeVw9uyj0,1BEMASUZQDqN5UGxMb1m6A,5VrcvDIi2IvTwZFtWbnty1,5TFVfE1zNb

### Create a new data set to look at the audio features of each of the tracks

In [658]:
valence_url = 'https://api.spotify.com/v1/audio-features?ids={}'\
.format(','.join(list(tracks_df['id_list'])))

In [659]:
valence_response = requests.get(url = valence_url, headers = session_headers)
valence_response.status_code

200

In [698]:
valence_data = valence_response.json()

### Create a data frame displaying all the audio features

In [661]:
valence_df = pd.DataFrame(valence_data['audio_features'])
valence_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.739,0.511,9,-7.844,1,0.0362,0.1670,0.000001,0.1330,0.542,96.038,audio_features,2nGFzvICaeEWjIrBrL2RAx,spotify:track:2nGFzvICaeEWjIrBrL2RAx,https://api.spotify.com/v1/tracks/2nGFzvICaeEW...,https://api.spotify.com/v1/audio-analysis/2nGF...,215360,4
1,0.704,0.780,1,-4.743,1,0.1420,0.3740,0.000000,0.3090,0.864,79.483,audio_features,5MEYDJVJMaGAXfddTo0D6J,spotify:track:5MEYDJVJMaGAXfddTo0D6J,https://api.spotify.com/v1/tracks/5MEYDJVJMaGA...,https://api.spotify.com/v1/audio-analysis/5MEY...,184227,4
2,0.659,0.678,0,-8.180,1,0.0313,0.1570,0.000007,0.0784,0.647,106.186,audio_features,2DnJjbjNTV9Nd5NOa1KGba,spotify:track:2DnJjbjNTV9Nd5NOa1KGba,https://api.spotify.com/v1/tracks/2DnJjbjNTV9N...,https://api.spotify.com/v1/audio-analysis/2DnJ...,258411,4
3,0.704,0.558,2,-7.273,0,0.0542,0.1480,0.000000,0.1070,0.245,110.444,audio_features,4lLtanYk6tkMvooU0tWzG8,spotify:track:4lLtanYk6tkMvooU0tWzG8,https://api.spotify.com/v1/tracks/4lLtanYk6tkM...,https://api.spotify.com/v1/audio-analysis/4lLt...,222091,4
4,0.842,0.679,1,-5.876,1,0.0853,0.7760,0.000000,0.0891,0.918,137.196,audio_features,1ARkKt39O6WQqE0QEpZntu,spotify:track:1ARkKt39O6WQqE0QEpZntu,https://api.spotify.com/v1/tracks/1ARkKt39O6WQ...,https://api.spotify.com/v1/audio-analysis/1ARk...,197347,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
83,0.625,0.589,0,-7.683,1,0.1540,0.0967,0.000000,0.1640,0.133,174.063,audio_features,3Ejes8GtOct6T6UF24rzRY,spotify:track:3Ejes8GtOct6T6UF24rzRY,https://api.spotify.com/v1/tracks/3Ejes8GtOct6...,https://api.spotify.com/v1/audio-analysis/3Eje...,166757,4
84,0.691,0.663,11,-6.034,0,0.0347,0.0170,0.000031,0.9390,0.591,119.915,audio_features,1f92lJLwj8eMyZWEA11Gqe,spotify:track:1f92lJLwj8eMyZWEA11Gqe,https://api.spotify.com/v1/tracks/1f92lJLwj8eM...,https://api.spotify.com/v1/audio-analysis/1f92...,158402,4
85,0.814,0.482,9,-10.493,1,0.0588,0.0111,0.000002,0.0476,0.615,148.404,audio_features,6A9mKXlFRPMPem6ygQSt7z,spotify:track:6A9mKXlFRPMPem6ygQSt7z,https://api.spotify.com/v1/tracks/6A9mKXlFRPMP...,https://api.spotify.com/v1/audio-analysis/6A9m...,180267,4
86,0.831,0.649,7,-6.851,1,0.0412,0.1430,0.000274,0.0986,0.960,109.014,audio_features,0iwf1pdKI2os27cTX88dIt,spotify:track:0iwf1pdKI2os27cTX88dIt,https://api.spotify.com/v1/tracks/0iwf1pdKI2os...,https://api.spotify.com/v1/audio-analysis/0iwf...,141468,4


### Create a list of all the valence values for the tracks

In [662]:
valence_list = [x['valence'] for x in valence_data['audio_features']]
valence_df['valences'] = valence_list
valence_df['valences']

0     0.542
1     0.864
2     0.647
3     0.245
4     0.918
      ...  
83    0.133
84    0.591
85    0.615
86    0.960
87    0.722
Name: valences, Length: 88, dtype: float64

### Create a data frame that displays all the valences of the tracks

In [663]:
valence_final_df = pd.DataFrame(valence_df['valences'])
valence_final_df

Unnamed: 0,valences
0,0.542
1,0.864
2,0.647
3,0.245
4,0.918
...,...
83,0.133
84,0.591
85,0.615
86,0.960


### List all of the track names in order

In [664]:
new_tracks = [x['track']['name'] for x in tracks_data['items']]
tracks_df['tname'] = new_tracks
tracks_df['tname']

0     Put Your Records On
1     Dear Future Husband
2          You're So Vain
3                 Grenade
4           Right As Rain
             ...         
83            Geldautomat
84         On My Way Home
85     Three Little Birds
86                   Dumb
87        About Damn Time
Name: tname, Length: 88, dtype: object

### Create a data frame that displays all the track names

In [665]:
tracks_name = pd.DataFrame(tracks_df['tname'])
tracks_name

Unnamed: 0,tname
0,Put Your Records On
1,Dear Future Husband
2,You're So Vain
3,Grenade
4,Right As Rain
...,...
83,Geldautomat
84,On My Way Home
85,Three Little Birds
86,Dumb


### Merge the valence data frame with the track name data frame to display the song names in order with their respective valences

In [668]:
combined_df = pd.merge(tracks_name, valence_final_df, left_index=True, right_index=True)
combined_df

Unnamed: 0,tname,valences
0,Put Your Records On,0.542
1,Dear Future Husband,0.864
2,You're So Vain,0.647
3,Grenade,0.245
4,Right As Rain,0.918
...,...,...
83,Geldautomat,0.133
84,On My Way Home,0.591
85,Three Little Birds,0.615
86,Dumb,0.960


### Display only the tracks that have a valence of 0.5 or greater

In [697]:
combined_df.query('valences >= 0.5')

Unnamed: 0,tname,valences
0,Put Your Records On,0.542
1,Dear Future Husband,0.864
2,You're So Vain,0.647
4,Right As Rain,0.918
5,I'm Yours,0.712
...,...,...
80,Nur ein Wort,0.947
84,On My Way Home,0.591
85,Three Little Birds,0.615
86,Dumb,0.960


# Analysis

## Conclusion
The data I collected points to my hypothesis hodling true. Seventy-six out of the eighty-eight tracks in the tested palylist had a valence greater than or equal to 0.5, therefore the average valence score I predicted in my hypothesis was correct in this case. However, I was only able to test one user playlist. If given the oppurtunity to evaluate more playlists, I may find that my hypothesis does not stand up to further scrutiny. Therefore I cannot say that my hypothesis is entirely correct and tested.

## Possible Next Steps
As mentioned in my conclusion, the best way to improve upon my research would be to study many more user playlists. Limitiations in time and my skill limited me to studying only one playlist, but the validity of my hypothesis could be greatly improved if a much larger sample of playlists were examined.