# Web Scraping using Python

This project is about getting data with Python.
It is such an important task!
We are going to use the iTunes API for this web scraping, 
and we are going to send the data out to a csv file 
for further processing and work with it

#### iTunes API page
 https://performance-partners.apple.com/search-api

In [1]:
# first we import the main modules we are going to use:
import pandas as pd
import requests

In [2]:
# let's save in a variable the base url of the iTunes API we are going to use
base_url = "https://itunes.apple.com/search"

In [3]:
# this API has two required parameters, "term" and "country" then check the
# request is ok using .get method. The "limit" param is used for controlling 
#the output and the "media" param is used to select only the songs
url = base_url + "?term=the+beatles&country=us&limit=100&media=music"

requests.get(url)

<Response [200]>

In [4]:
# another more professional way of obtaining the url is by passing the params 
# to the .get method in a dictionary
r = requests.get(base_url, params = {"term": "the beatles", "country": "us", "media": "music", "limit": 100})

## Checking the response

In [5]:
# now check the response using .get method
response = requests.get(url)

In [6]:
# checking first way of doing it
response.ok

True

In [7]:
response.status_code

200

In [8]:
# checking second way of doing it, with "r" as "response"
r.ok

True

In [9]:
r.status_code

200

In [10]:
# checking url is obtained correctly in the first way
response.url

'https://itunes.apple.com/search?term=the+beatles&country=us&limit=100&media=music'

In [11]:
# checking url is obtained correctly in the second way
r.url

'https://itunes.apple.com/search?term=the+beatles&country=us&media=music&limit=100'

## Checking outputs and parameters

In [12]:
# import json module in order to work with json files
import json

In [13]:
# check output obtained through json file
info = r.json()
print(json.dumps(info, indent=4))

{
    "resultCount": 73,
    "results": [
        {
            "wrapperType": "track",
            "kind": "song",
            "artistId": 136975,
            "collectionId": 1440833098,
            "trackId": 1440834225,
            "artistName": "The Beatles & Billy Preston",
            "collectionName": "1",
            "trackName": "Get Back",
            "collectionCensoredName": "1",
            "trackCensoredName": "Get Back",
            "collectionArtistName": "The Beatles",
            "artistViewUrl": "https://music.apple.com/us/artist/the-beatles/136975?uo=4",
            "collectionViewUrl": "https://music.apple.com/us/album/get-back/1440833098?i=1440834225&uo=4",
            "trackViewUrl": "https://music.apple.com/us/album/get-back/1440833098?i=1440834225&uo=4",
            "previewUrl": "https://audio-ssl.itunes.apple.com/itunes-assets/AudioPreview126/v4/89/27/0d/89270d7a-514d-fb4d-471b-6f77d2b53325/mzaf_4421583623311051659.plus.aac.p.m4a",
            "artworkUrl30":

In [14]:
# obtained output is a dictionary of two items
info.keys()

dict_keys(['resultCount', 'results'])

In [15]:
# inspect the first item
r.json()["resultCount"]

73

In [16]:
# inspect the second item, as the first only contains a count
print(json.dumps(info['results'][0], indent=4))

{
    "wrapperType": "track",
    "kind": "song",
    "artistId": 136975,
    "collectionId": 1440833098,
    "trackId": 1440834225,
    "artistName": "The Beatles & Billy Preston",
    "collectionName": "1",
    "trackName": "Get Back",
    "collectionCensoredName": "1",
    "trackCensoredName": "Get Back",
    "collectionArtistName": "The Beatles",
    "artistViewUrl": "https://music.apple.com/us/artist/the-beatles/136975?uo=4",
    "collectionViewUrl": "https://music.apple.com/us/album/get-back/1440833098?i=1440834225&uo=4",
    "trackViewUrl": "https://music.apple.com/us/album/get-back/1440833098?i=1440834225&uo=4",
    "previewUrl": "https://audio-ssl.itunes.apple.com/itunes-assets/AudioPreview126/v4/89/27/0d/89270d7a-514d-fb4d-471b-6f77d2b53325/mzaf_4421583623311051659.plus.aac.p.m4a",
    "artworkUrl30": "https://is1-ssl.mzstatic.com/image/thumb/Music116/v4/f2/98/fb/f298fb48-1e0e-6ad4-4cff-fb824b77f02e/15UMGIM59587.rgb.jpg/30x30bb.jpg",
    "artworkUrl60": "https://is1-ssl.mzsta

## Structure and explore the data

In [None]:
# now let's set some diplay options in pandas in order 
# to be able to look correctly at the data
pd.set_option("display.max_rows",100)
pd.set_option("display.max_columns",100)

In [19]:
# we can create a data frame with the "results" object by accessing
# the information stored in the "info" variable
songs_df = pd.DataFrame(info["results"])
songs_df

Unnamed: 0,wrapperType,kind,artistId,collectionId,trackId,artistName,collectionName,trackName,collectionCensoredName,trackCensoredName,collectionArtistName,artistViewUrl,collectionViewUrl,trackViewUrl,previewUrl,artworkUrl30,artworkUrl60,artworkUrl100,collectionPrice,trackPrice,releaseDate,collectionExplicitness,trackExplicitness,discCount,discNumber,trackCount,trackNumber,trackTimeMillis,country,currency,primaryGenreName,isStreamable,collectionArtistId,collectionArtistViewUrl
0,track,song,136975,1440833098,1440834225,The Beatles & Billy Preston,1,Get Back,1,Get Back,The Beatles,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/get-back/1440...,https://music.apple.com/us/album/get-back/1440...,https://audio-ssl.itunes.apple.com/itunes-asse...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,12.99,1.29,1969-04-11T12:00:00Z,notExplicit,notExplicit,1,1,27,22,191773,USA,USD,Rock,True,,
1,track,song,136975,1474815798,1474815898,The Beatles,Abbey Road (2019 Mix),Here Comes the Sun,Abbey Road (2019 Mix),Here Comes the Sun (2019 Mix),,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/here-comes-th...,https://music.apple.com/us/album/here-comes-th...,https://audio-ssl.itunes.apple.com/itunes-asse...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,12.99,1.29,1969-09-26T12:00:00Z,notExplicit,notExplicit,1,1,17,7,185707,USA,USD,Rock,True,,
2,track,song,136975,1441164359,1441164829,The Beatles,Rubber Soul,In My Life,Rubber Soul,In My Life,,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/in-my-life/14...,https://music.apple.com/us/album/in-my-life/14...,https://audio-ssl.itunes.apple.com/itunes-asse...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,12.99,1.29,1965-12-03T12:00:00Z,notExplicit,notExplicit,1,1,14,11,146333,USA,USD,Rock,True,,
3,track,song,136975,1440833098,1440834249,The Beatles,1,Let It Be,1,Let It Be,,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/let-it-be/144...,https://music.apple.com/us/album/let-it-be/144...,https://audio-ssl.itunes.apple.com/itunes-asse...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,12.99,1.29,1970-03-06T12:00:00Z,notExplicit,notExplicit,1,1,27,26,230440,USA,USD,Rock,True,,
4,track,song,136975,1440833098,1440833891,The Beatles,1,Yesterday,1,Yesterday,,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/yesterday/144...,https://music.apple.com/us/album/yesterday/144...,https://audio-ssl.itunes.apple.com/itunes-asse...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,12.99,1.29,1965-09-13T12:00:00Z,notExplicit,notExplicit,1,1,27,11,125320,USA,USD,Rock,True,,
5,track,song,136975,1440833098,1440834224,The Beatles,1,Hey Jude,1,Hey Jude,,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/hey-jude/1440...,https://music.apple.com/us/album/hey-jude/1440...,https://audio-ssl.itunes.apple.com/itunes-asse...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,12.99,1.29,1968-08-26T12:00:00Z,notExplicit,notExplicit,1,1,27,21,425653,USA,USD,Rock,True,,
6,track,song,136975,1440833098,1440833905,The Beatles,1,Eleanor Rigby,1,Eleanor Rigby,,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/eleanor-rigby...,https://music.apple.com/us/album/eleanor-rigby...,https://audio-ssl.itunes.apple.com/itunes-asse...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,12.99,1.29,1966-08-05T12:00:00Z,notExplicit,notExplicit,1,1,27,16,125867,USA,USD,Rock,True,,
7,track,song,12224,585701590,585701990,Paul McCartney,12-12-12 The Concert for Sandy Relief,Helter Skelter,12-12-12 The Concert for Sandy Relief,Helter Skelter (Live),Various Artists,https://music.apple.com/us/artist/paul-mccartn...,https://music.apple.com/us/album/helter-skelte...,https://music.apple.com/us/album/helter-skelte...,https://audio-ssl.itunes.apple.com/itunes-asse...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,12.99,-1.0,2012-12-18T12:00:00Z,notExplicit,notExplicit,1,1,24,23,250293,USA,USD,Rock,False,4035426.0,
8,track,song,136975,1440833098,1440833542,The Beatles,1,I Want to Hold Your Hand,1,I Want to Hold Your Hand,,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/i-want-to-hol...,https://music.apple.com/us/album/i-want-to-hol...,https://audio-ssl.itunes.apple.com/itunes-asse...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,12.99,1.29,1963-11-29T12:00:00Z,notExplicit,notExplicit,1,1,27,4,145747,USA,USD,Rock,True,,
9,track,song,136975,1440833098,1440833560,The Beatles,1,Help!,1,Help!,,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/help/14408330...,https://music.apple.com/us/album/help/14408330...,https://audio-ssl.itunes.apple.com/itunes-asse...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,https://is1-ssl.mzstatic.com/image/thumb/Music...,12.99,1.29,1965-07-19T12:00:00Z,notExplicit,notExplicit,1,1,27,10,139240,USA,USD,Rock,True,,


In [20]:
# looking at the above dataset we can delete some columns that do not
# contain such sensitive information
songs_df = songs_df.drop(columns=[
    "artistId",
    "collectionId",  
    "collectionExplicitness", 
    "trackId", 
    "trackCensoredName",
    "collectionArtistId",
    "artworkUrl60",
    "artworkUrl100",
    "artworkUrl30",
    "trackViewUrl",
    "collectionArtistName",
    "collectionArtistViewUrl"])

display (songs_df)

Unnamed: 0,wrapperType,kind,artistName,collectionName,trackName,collectionCensoredName,artistViewUrl,collectionViewUrl,previewUrl,collectionPrice,trackPrice,releaseDate,trackExplicitness,discCount,discNumber,trackCount,trackNumber,trackTimeMillis,country,currency,primaryGenreName,isStreamable
0,track,song,The Beatles & Billy Preston,1,Get Back,1,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/get-back/1440...,https://audio-ssl.itunes.apple.com/itunes-asse...,12.99,1.29,1969-04-11T12:00:00Z,notExplicit,1,1,27,22,191773,USA,USD,Rock,True
1,track,song,The Beatles,Abbey Road (2019 Mix),Here Comes the Sun,Abbey Road (2019 Mix),https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/here-comes-th...,https://audio-ssl.itunes.apple.com/itunes-asse...,12.99,1.29,1969-09-26T12:00:00Z,notExplicit,1,1,17,7,185707,USA,USD,Rock,True
2,track,song,The Beatles,Rubber Soul,In My Life,Rubber Soul,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/in-my-life/14...,https://audio-ssl.itunes.apple.com/itunes-asse...,12.99,1.29,1965-12-03T12:00:00Z,notExplicit,1,1,14,11,146333,USA,USD,Rock,True
3,track,song,The Beatles,1,Let It Be,1,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/let-it-be/144...,https://audio-ssl.itunes.apple.com/itunes-asse...,12.99,1.29,1970-03-06T12:00:00Z,notExplicit,1,1,27,26,230440,USA,USD,Rock,True
4,track,song,The Beatles,1,Yesterday,1,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/yesterday/144...,https://audio-ssl.itunes.apple.com/itunes-asse...,12.99,1.29,1965-09-13T12:00:00Z,notExplicit,1,1,27,11,125320,USA,USD,Rock,True
5,track,song,The Beatles,1,Hey Jude,1,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/hey-jude/1440...,https://audio-ssl.itunes.apple.com/itunes-asse...,12.99,1.29,1968-08-26T12:00:00Z,notExplicit,1,1,27,21,425653,USA,USD,Rock,True
6,track,song,The Beatles,1,Eleanor Rigby,1,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/eleanor-rigby...,https://audio-ssl.itunes.apple.com/itunes-asse...,12.99,1.29,1966-08-05T12:00:00Z,notExplicit,1,1,27,16,125867,USA,USD,Rock,True
7,track,song,Paul McCartney,12-12-12 The Concert for Sandy Relief,Helter Skelter,12-12-12 The Concert for Sandy Relief,https://music.apple.com/us/artist/paul-mccartn...,https://music.apple.com/us/album/helter-skelte...,https://audio-ssl.itunes.apple.com/itunes-asse...,12.99,-1.0,2012-12-18T12:00:00Z,notExplicit,1,1,24,23,250293,USA,USD,Rock,False
8,track,song,The Beatles,1,I Want to Hold Your Hand,1,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/i-want-to-hol...,https://audio-ssl.itunes.apple.com/itunes-asse...,12.99,1.29,1963-11-29T12:00:00Z,notExplicit,1,1,27,4,145747,USA,USD,Rock,True
9,track,song,The Beatles,1,Help!,1,https://music.apple.com/us/artist/the-beatles/...,https://music.apple.com/us/album/help/14408330...,https://audio-ssl.itunes.apple.com/itunes-asse...,12.99,1.29,1965-07-19T12:00:00Z,notExplicit,1,1,27,10,139240,USA,USD,Rock,True


In [None]:
# Export the information to a csv file
songs_df.to_csv("songs_info.csv")