# API Report
## Ash S. Copeland (Holloway)
### 10/11/2024
My Hypothesis that is testable with the Spotify API is Tracks artists release in summer gain more traction and are more likely to become a “greatest hit” song than in winter.

The theoretical applications are that you would be able to better the chances of a song becoming a "greatest hit". The Statistical Applications are the ability to graph and track songs from their release date in order to ensure their success.

**Spotify Endpoints:** I used the Get Album Tracks and Get Several Tracks Endpoints.

**Why endpoints are suitable to test hypthesis:** These Endpoints allowed me to take the greatest Hits albumn from Queen, extract the songs and their Track ID's, and use those ID's to get the release dates of the songs to compare against one another. 

**Ways data is reliable and unreliable?**
In this report, I used Queen due to the legth of their active music career for a good pull of data. The long time between greatest hit releases and wide array of data makes this test reliable, yet the data is only for one artist and genre, which makes it questionable.

**Limitations or caveats to response that limit hypothesis?**
There certainly are limitations. Considering this pull was done from a single artist for the sake of simplificty for this project, the data's results are rather limited. In an actual test, it would be better to pull from multiple years of greatest hits accross multiple genres and analyze the trends between them.

**For this Report, we begin as we always do, with the imports. Some of these tools are not as necessary, but included them just in case.**

In [76]:
import urllib
import requests
import pandas as pd
import json
import base64

Now that we have imported all teh packages we will need, I next defined the functions that would be utilized in this API Call. Please see the comments within this cell for organization of each function.

In [77]:
#Definition of all Functions

#Function to formulate access
def get_session_token(SessionID, SessionKey):
    url = 'https://accounts.spotify.com/api/token'
    urldata = {'grant_type':'client_credentials'}
    encoded_key = base64.b64encode(str(SessionID + ":" + SessionKey).encode("ascii"))
    urlheader = {'Authorization': 'Basic {}'.format(encoded_key.decode("ascii"))}
    response = requests.post(url, data = urldata, headers = urlheader)
    print(response.status_code)
    return response.json()['access_token']
    
#Funtion to Make API Calls
def api_call(endpoint_url, api_header):
    response = requests.get(endpoint_url, headers = api_header)
    print(response.status_code)
    return response.json()

Once the functions are created, it is time to define our variables. In order to simplify this Report, I have condensed all variables into one cell. This is how I standardly organize my work. I will break it down in order here: 

**Access Keys:**
This utilizes an external file to pull the my Spotify *Client ID* and *Client Secret*. It then uses that key to run the session tokens function from above in order to get our access token for the API Call and get ready to pull some data!

**API Session Variables:**
Next we use the access token we recieved in order to create a session header, which is one part of our API Call. Success can be seen from the 200 status codes listed below the variables!

**IDs:**
This section happens in two parts. The Album part, and Track part. Split between our two endpoints, GAT and GST. Since we do not have the track IDs yet, as they come from the GAT response, we will focus on the Album section. This will utilize an artist album ID we recieve from the Greatest Hits Album on spotify.

**Enpoints:**
This is where we list and organize our enpoints. The Endpoints come from the Spotify Web API Documentation. They were then formated using the variables that store the respective track and album ids for reach endpoint, even though we did not have the track ids at this time. 

**Responses:**
Here we go! Our API Calls! we now use the API Call function created in order to pull our data from spotify.

**DataFrames:**
After recieving the raw data, it is placed in the respective raw dataframe to be reorganized later. However, this dataframe was displayed and all Spotify Track IDs were pulled through copy and paste and placed in the track ids in the id area above! then the second API Call was made above, pulled into a raw dataframe, and then normalized using the normalize process I have personal experience with from previous positions of employment and feelance work. 

In [131]:
#Definition of all Variables

#Access Keys
keys = pd.read_csv("Spot_WebAPI_ClientID.txt") #Reading from an External File for Security Purposes
access_token = get_session_token(keys['Client_ID'].iloc[0], keys['Client_Secret'].iloc[0]) #runs the session function below

#API Session Variables
session_header = {'Authorization': 'Bearer {}'.format(access_token)}

#IDs
artist_album_ID = '6a8nlV9V8kPUbTTCJNVSsh'
track_IDs = '6l8GvAyoUZwWDgF1e4822w,5Lsg8jlCoTyxRch9LvJo3E,7GqWnsKhMtEW0nzki5o0d8,0NZ90au4uU11IkvReTOGYJ,3lUx27TOwV2nAiKwnYYXxe,4OKf7CcYuw5H2HptkcKxcP,0DrDcqWpokMlhKYJSwoT4B,1jq8WXj8zBaNhcq3S4yadE,6aU0F03DR257LCPAXjtg42,1vfyi0Du06IjkakfSdXqGm,0P7YJ9fxIOM0Rq4pZ2qU42,1N8UEhbh2LXPvIymWwjmi6,4b0mX1GtrQLiUW9jpb6Xcx,6tYYT8zNxkadSCujCdR6Ur,1e9Tt3nKBwRbuaU79kN3dn,3bCjss1Y0kPPaSgd9cb89K,6ceLJHWkvMM3oc0Ftodrdm'

#Endpoints: Get Album Tracks (GAT), Get Several Tracks (GST)
End_GAT = 'https://api.spotify.com/v1/albums/{}/tracks' 
End_GST = 'https://api.spotify.com/v1/tracks?ids={}'
End_GAT_Final = End_GAT.format(artist_album_ID)
End_GST_Final = End_GST.format(track_IDs)

#Responses
GAT_response = api_call(End_GAT_Final, session_header)
GST_response = api_call(End_GST_Final, session_header)

#DataFrames
GAT_df_raw = pd.DataFrame(GAT_response['items'])
GST_df_raw = pd.DataFrame(GST_response['tracks'])
GST_df_raw_norm = pd.json_normalize(GST_response['tracks'])

200
200
200


With all that done its Tidy time! I simply dropped all the columns irrelevant to the search, or that were duplicates of each other so that I could compine them. I also added a column with teh Artist's name.

In [128]:
#Tidy GAT Response Dataframe
GAT_df_final = GAT_df_raw.drop(columns = ['artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_urls', 'href', 'track_number', 'type', 'uri', 'is_local'])
#Tidy GST Response Dataframe
GST_df_final = GST_df_raw_norm.drop(columns = ['track_number', 'type', 'uri', 'artists', 'album.album_type', 'album.artists', 'album.available_markets', 'album.href', 'album.uri', 'album.external_urls.spotify', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'href', 'id', 'is_local', 'name', 'popularity', 'album.id', 'album.images', 'album.name', 'album.release_date_precision', 'album.total_tracks', 'album.type', 'external_ids.isrc', 'external_urls.spotify'])
GST_df_final['artist'] = 'Queen'

Next up I merged the dataframes using concat in order to place them together on the horizontal axis! I also renamed the final columns in order to make them more readable!

In [136]:
#Merge Frames on ID
final_df = pd.concat([GAT_df_final, GST_df_final], axis=1)
final_df.rename(columns={'name': 'Song'}, inplace=True)
final_df.rename(columns={'album.release_date': 'Release Date'}, inplace=True)
final_df.rename(columns={'id': 'Spotify Track ID'}, inplace=True)
final_df.rename(columns={'artist': 'Artist'}, inplace=True)
final_df['Song'] = final_df['Song'].str.replace('Remastered 2011', '', regex=False)

And thats it! all that is left is to display the final dataframe!

In [137]:
final_df

Unnamed: 0,Spotify Track ID,Song,Release Date,Artist
0,6ljkRMigoNtu0x1mlTEsKc,Bohemian Rhapsody -,2006-01-01,Queen
1,6hfNDGNTJBR029RmV63IoO,Another One Bites The Dust -,2006-01-01,Queen
2,6Oj0XnWrDEl3KrwZuMQqVj,Killer Queen -,1974-11-08,Queen
3,52ZQTzXbbWjS4kjOcV3z5b,Fat Bottomed Girls - Single Version /,2006-01-01,Queen
4,6I55r9WyH1wV1whBRacLFa,Bicycle Race -,1978-11-10,Queen
5,6CVzXxIHDIDdyzlgfEWSZr,You're My Best Friend -,1975-11-21,Queen
6,064C5ivM2FUsY0ghkyt4YK,Don't Stop Me Now -,2020-08-10,Queen
7,5V890judRbpVT6X5AEYZc8,Save Me -,2019-07-26,Queen
8,0q8IUBbw0iedjCbzs7vT6U,Crazy Little Thing Called Love -,2006-01-01,Queen
9,4RJdwSqHapVcW5DaRtTkv0,Somebody To Love -,2004,Queen


After viewing the data above, with the exception of item 9 that only had a listed release year, we can see a clear pattern! Only 3 of the 16 songs were released in the summer time! And 11 were released during the winter months. This disproves my hypothesis! It appears Queen recieved more greatest hits in the winter than the summer! As stated above, this analysis is limited due to it only being a pull from one artist. I believe the alternate approach stated in the begin would still provide a better outcome. Next steps would be to work on a deeper analysis of this data from a greater and wider pull in order to confirm the new hypothesis of winter months pringing more greatest hits! and then we begin the process again!