# Goal

In [1]:
# The goal of this project is to demonstrate skills in extracting and storing data from APIs

In [2]:
# This project will be collecting data from the iTunes API.
# Documentation: https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/iTuneSearchAPI

In [3]:
# Importing necessary modules
import requests
import json

In [4]:
# Specify base route
base_url = "https://itunes.apple.com/search"

In [5]:
# Specify the query parameters we will be using
params = {
    "term" : "the+beatles", # the search text we will be querying iTunes with
    "country" : "US", # the country code of the iTunes store we want to query
    "media" : "music", # we only want to collect data on music
    "limit" : 200 # setting a limit to how many results we want to get
}

In [6]:
# Make a request
r = requests.get(base_url, params=params)
r.status_code

200

In [9]:
# Store the response
info = r.json()

## Inspecting the structure of the response

In [12]:
#print(json.dumps(info,indent=4))

In [11]:
info.keys() # The data we want is located in 'results'

dict_keys(['resultCount', 'results'])

In [13]:
# Storing the results containing relevant data in a new variable

results = info['results']

## Examining the structure of a single result

In [14]:
results[0]

{'wrapperType': 'track',
 'kind': 'song',
 'artistId': 136975,
 'collectionId': 1474815798,
 'trackId': 1474815898,
 'artistName': 'The Beatles',
 'collectionName': 'Abbey Road (2019 Mix)',
 'trackName': 'Here Comes the Sun',
 'collectionCensoredName': 'Abbey Road (2019 Mix)',
 'trackCensoredName': 'Here Comes the Sun (2019 Mix)',
 'artistViewUrl': 'https://music.apple.com/us/artist/the-beatles/136975?uo=4',
 'collectionViewUrl': 'https://music.apple.com/us/album/here-comes-the-sun-2019-mix/1474815798?i=1474815898&uo=4',
 'trackViewUrl': 'https://music.apple.com/us/album/here-comes-the-sun-2019-mix/1474815798?i=1474815898&uo=4',
 'previewUrl': 'https://audio-ssl.itunes.apple.com/itunes-assets/AudioPreview125/v4/78/18/cc/7818cce5-1c20-1270-a9bf-b24e4be8c152/mzaf_7407478495292373909.plus.aac.p.m4a',
 'artworkUrl30': 'https://is2-ssl.mzstatic.com/image/thumb/Music114/v4/2a/f9/04/2af904cd-f05a-d5a1-8014-c1387855b56c/source/30x30bb.jpg',
 'artworkUrl60': 'https://is2-ssl.mzstatic.com/image/

## Identifying data we want to collect

In [16]:
# All keys of a result
results[0].keys()

dict_keys(['wrapperType', 'kind', 'artistId', 'collectionId', 'trackId', 'artistName', 'collectionName', 'trackName', 'collectionCensoredName', 'trackCensoredName', 'artistViewUrl', 'collectionViewUrl', 'trackViewUrl', 'previewUrl', 'artworkUrl30', 'artworkUrl60', 'artworkUrl100', 'collectionPrice', 'trackPrice', 'releaseDate', 'collectionExplicitness', 'trackExplicitness', 'discCount', 'discNumber', 'trackCount', 'trackNumber', 'trackTimeMillis', 'country', 'currency', 'primaryGenreName', 'isStreamable'])

We will pick a few of these keys to create lists of data from. The fields we will focus on are...
* artistName,
* collectionName,
* trackName,
* trackId,
* trackViewUrl,
* releaseDate,
* primaryGenreName,
* trackPrice, and
* currency

# Collecting our Data

## Artist Names

In [17]:
artist_names = [result['artistName'] for result in results]

In [18]:
len(artist_names) # Correct number of names

200

In [19]:
# Previewing the collected names

artist_names[:10]

['The Beatles',
 'The Beatles',
 'The Beatles',
 'The Beatles',
 'The Beatles',
 'The Beatles',
 'The Beatles',
 'The Beatles',
 'The Beatles',
 'The Beatles']

In [20]:
set(artist_names) # Looks like there are only two distinct artist names but that is fine

{'The Beatles', 'The Beatles & Billy Preston'}

## Collection Names

In [21]:
collection_names = [result['collectionName'] for result in results]

In [22]:
len(collection_names) # Correct number

200

In [23]:
collection_names[:10] # Looks good

['Abbey Road (2019 Mix)',
 'Let It Be (2021 Mix)',
 'Abbey Road (2019 Mix)',
 'The Beatles 1967-1970 (The Blue Album)',
 '1 (2015 Version)',
 '1 (2015 Version)',
 'The Beatles (The White Album)',
 'Rubber Soul',
 '1 (2015 Version)',
 '1 (2015 Version)']

## Track Names

In [24]:
track_names = [result['trackName'] for result in results]

In [25]:
len(track_names) # Correct number

200

In [26]:
track_names[:10] # Looks good

['Here Comes the Sun',
 'Let It Be',
 'Come Together',
 'Hey Jude',
 'Hey Jude',
 'All You Need Is Love',
 'Blackbird',
 'In My Life',
 'Yesterday',
 'I Want to Hold Your Hand']

## Track IDs

In [27]:
track_ids = [result['trackId'] for result in results]

In [28]:
len(track_ids) # Correct number

200

In [29]:
track_ids[:10]

[1474815898,
 1582223667,
 1474815799,
 1441133277,
 1440834224,
 1440833920,
 1441133834,
 1441164829,
 1440833891,
 1440833542]

## Track URLs

In [30]:
track_urls = [result['trackViewUrl'] for result in results]

In [31]:
len(track_urls)

200

In [32]:
track_urls[:10]

['https://music.apple.com/us/album/here-comes-the-sun-2019-mix/1474815798?i=1474815898&uo=4',
 'https://music.apple.com/us/album/let-it-be-2021-mix/1582219768?i=1582223667&uo=4',
 'https://music.apple.com/us/album/come-together-2019-mix/1474815798?i=1474815799&uo=4',
 'https://music.apple.com/us/album/hey-jude/1441133100?i=1441133277&uo=4',
 'https://music.apple.com/us/album/hey-jude-2015-stereo-mix/1440833098?i=1440834224&uo=4',
 'https://music.apple.com/us/album/all-you-need-is-love-2015-stereo-mix/1440833098?i=1440833920&uo=4',
 'https://music.apple.com/us/album/blackbird/1441133180?i=1441133834&uo=4',
 'https://music.apple.com/us/album/in-my-life/1441164359?i=1441164829&uo=4',
 'https://music.apple.com/us/album/yesterday-2015-stereo-mix/1440833098?i=1440833891&uo=4',
 'https://music.apple.com/us/album/i-want-to-hold-your-hand-2015-stereo-mix/1440833098?i=1440833542&uo=4']

## Release Dates

In [33]:
release_dates = [result['releaseDate'] for result in results]

In [34]:
len(release_dates)

200

In [35]:
release_dates[:10] # We will convert these from strings to Dates at the end

['1969-09-26T12:00:00Z',
 '2021-08-28T07:00:00Z',
 '1969-09-26T12:00:00Z',
 '1968-08-26T12:00:00Z',
 '1968-08-26T12:00:00Z',
 '1967-07-07T12:00:00Z',
 '1968-11-22T12:00:00Z',
 '1965-12-03T12:00:00Z',
 '1965-09-13T12:00:00Z',
 '1963-11-29T12:00:00Z']

## Primary Genre Names

In [36]:
genre_names = [result['primaryGenreName'] for result in results]

In [37]:
len(genre_names)

200

In [38]:
genre_names[:10]

['Rock',
 'Rock',
 'Rock',
 'Rock',
 'Rock',
 'Rock',
 'Rock',
 'Rock',
 'Rock',
 'Rock']

In [39]:
set(genre_names) # There are only two distinct genre names in our current result set

{'Pop', 'Rock'}

## Track Prices

Although it might be useful to combine the track prices and currencies into one value e.g. 19.99 USD, that is usually done in the data cleaning process so we will leave that be for now.

In [45]:
track_prices = [result['trackPrice'] for result in results]

In [46]:
len(track_prices)

200

In [47]:
track_prices[:10]

# It looks like we have negative values in our track prices. It might be safe to assume that the API uses -1 to indicate
# missing data. We can clean that up later if we want to.

[1.29, 1.29, 1.29, 1.29, 1.29, -1.0, 1.29, 1.29, 1.29, 1.29]

### Track Currencies

In [48]:
track_currencies = [result['currency'] for result in results]

In [49]:
len(track_currencies)

200

In [50]:
track_currencies[:10]

['USD', 'USD', 'USD', 'USD', 'USD', 'USD', 'USD', 'USD', 'USD', 'USD']

In [51]:
set(track_currencies) # Looks like the only currency in our data set is USD (probably because we specified 'country' as 'US')

{'USD'}

# Creating our Data Frame

In [52]:
import pandas as pd

In [53]:
theBeatlesData = pd.DataFrame()

In [54]:
# Adding our lists of data to the data frame
theBeatlesData['Artist'] = artist_names
theBeatlesData['Collection'] = collection_names
theBeatlesData['Track Name'] = track_names
theBeatlesData['Track ID'] = track_ids
theBeatlesData['Track URL'] = track_urls
theBeatlesData['Release Date'] = release_dates
theBeatlesData['Genre'] = genre_names
theBeatlesData['Track Price'] = track_prices
theBeatlesData['Currency'] = track_currencies

In [55]:
# Taking a look at our data frame
theBeatlesData

Unnamed: 0,Artist,Collection,Track Name,Track ID,Track URL,Release Date,Genre,Track Price,Currency
0,The Beatles,Abbey Road (2019 Mix),Here Comes the Sun,1474815898,https://music.apple.com/us/album/here-comes-th...,1969-09-26T12:00:00Z,Rock,1.29,USD
1,The Beatles,Let It Be (2021 Mix),Let It Be,1582223667,https://music.apple.com/us/album/let-it-be-202...,2021-08-28T07:00:00Z,Rock,1.29,USD
2,The Beatles,Abbey Road (2019 Mix),Come Together,1474815799,https://music.apple.com/us/album/come-together...,1969-09-26T12:00:00Z,Rock,1.29,USD
3,The Beatles,The Beatles 1967-1970 (The Blue Album),Hey Jude,1441133277,https://music.apple.com/us/album/hey-jude/1441...,1968-08-26T12:00:00Z,Rock,1.29,USD
4,The Beatles,1 (2015 Version),Hey Jude,1440834224,https://music.apple.com/us/album/hey-jude-2015...,1968-08-26T12:00:00Z,Rock,1.29,USD
...,...,...,...,...,...,...,...,...,...
195,The Beatles,A Hard Day's Night,Can't Buy Me Love,1441164423,https://music.apple.com/us/album/cant-buy-me-l...,1964-03-16T12:00:00Z,Rock,1.29,USD
196,The Beatles,Love,Being For the Benefit of Mr. Kite! / I Want Yo...,1441133408,https://music.apple.com/us/album/being-for-the...,2006-11-17T12:00:00Z,Rock,1.29,USD
197,The Beatles,Let It Be (2021 Mix),Dig a Pony,1582221004,https://music.apple.com/us/album/dig-a-pony-20...,1970-05-08T12:00:00Z,Rock,1.29,USD
198,The Beatles,A Hard Day's Night,I Should Have Known Better,1441164418,https://music.apple.com/us/album/i-should-have...,1964-07-10T12:00:00Z,Rock,1.29,USD


Our data looks to be in good shape. There are a few columns that might need data cleaning before being used in a machine learning application such as...
* 'Release Date' : We might want to break this up into one column containing the yyyy-mm-dd part of the date and another column for the hh-mm-ss part (and convert those columns to DateTime dtypes). 
* 'Track Price': We saw before that at least one negative price exists (-1). Someone performing data cleaning on our dataset might want to replace those negative values with NaN (a special NumPy value used to denote missing information). They might also decide to combine the 'Track Price' and 'Currency' columns into one column as we mentioned before. 

We will refrain from doing that here as this project focuses primarily on data collection. We want to keep our collected data as raw as possible and let the cleaning/transformation decisions be made in relation to specific projects or applications.

# Saving and Exporting our Data

In [56]:
# Saving our collected data as a CSV file
theBeatlesData.to_csv('iTunes_API_Data.csv', index=False, encoding='utf-8')