# Lab 2a: Data Collection (cont.)

### Introduction:
In this lab, you'll be using Python and the Spotify API to find the 10 most popular tracks by your favorite artist on Spotify and export your results to a CSV file on your computer.

The steps you will take to do this are as follows:
1. Getting an access token using your Spotify credentials.
2. Collecting the data from the Spotify API using Python's `requests` library.
3. Transforming the collected data into tabular form
4. Saving the data to a CSV file

Run the following cell to get started.

In [None]:
import requests
import pandas as pd

from music1030.collection.spotify import SpotifyCrawler

**Task 0**: Create a project at https://developer.spotify.com/my-applications/ to get your client ID and client secret and put them into the corresponding variables.

In [None]:
client_ID = <YOUR_SPOTIFY_CLIENT_ID>
client_secret = <YOUR_SPOTIFY_CLIENT_SECRET>
access_token = SpotifyCrawler(client_ID, client_secret).access_token
print(access_token)

We want to find your favorite artist's top 10 songs, but [Spotify API's artist endpoints](https://developer.spotify.com/web-api/console/artists/) requires us to know the artist ID of our artist. Therefore, we'll need to find each artist's Spotify IDs before we can do this query. Let's try this now via the [Spotify API's search endpoint](https://developer.spotify.com/web-api/console/get-search-item/).

**Task 1 **: Use the form Spotify provides to search for your favorite artist by artist name. Limit your search to five results. Finally, in the cell below, copy and paste the cURL Command that is outputted for your query.

You should get something that looks similar to this. If your request does not work, ask a TA for help.
   
<img src="https://i.imgur.com/Pgykpyg.png" />

The above cURL Command is a bit hard to read, so let's break it down! cURL is a command-line tool (i.e., you run it from the terminal) you can use to make HTTP requests. 

The command above is sending a GET request to the **https://api.spotify.com/v1/search** endpoint. 

Within the Spotify URL, the parameter **q** corresponds to the keyword we're searching (Star-Spangled Banner), ** type** corresponds to the type of query (track), and **limit** corresponds to the maximum number of results we want to receive (1). 

Items following `-H` are HTTP headers. Headers are operating parameters sent along with your HTTP request. The **Accept** header specifies the Content-Types we expect from the response. In this case, we want data to be returned in the form of JSON, a common syntax for storing nested content. The **Authorization** header specifies the authentication credentials via the access token.

To see cURL in action, copy and paste the cURL command you wrote in Task 1 to your terminal and see what happens. You should get a JSON string that looks like this:
<img src="https://i.imgur.com/JzDEa4u.png">

Now that you've learned a bit more about making HTTP requests through cURL, we're going to work on making HTTP requests through Python.

**Task 2 **: Using the Python **requests** library, transform the cURL command you created to search for your artist into an equivalent Python command and store the HTTP response in a variable named **response**. 

Please refer to [HTTP python library: requests](http://docs.python-requests.org/en/master/user/quickstart/) to learn how to make HTTP reqeusts and get response in json format.

In [None]:
# HERE IS AN EXAMPLE, MODIFY IT TO MATCH YOUR FAVORITE ARTIST
query = 'Radiohead' #your search string (e.g. Star-Spangled Banner)
item_type = 'artist' #the type of item you are searching for (e.g. track)
limit = '3' # maximum number of search results you want
search_endpoint_url = 'https://api.spotify.com/v1/search'
params = {
            'q': query,
            'type': item_type,
            'limit': limit 
         }
headers = {
            'Authorization': 'Bearer {}'.format(access_token) 
          }

response = requests.get(search_endpoint_url, params=params, headers=headers)
print(response.text)

You should be seeing the same five artists that were returned when you ran the cURL command on your terminal. Click on the __external_urls__ link to find the right artist from your search results. Your artist will most likely be the first artist in the list. 

JSON is easily converted to a Python dictionary, which allows us to pull out individual fields. Instead of storing the data in a string, use `response.json()` to store the data in a Python variable called `data`.

In [None]:
data = response.json()
print(data)

**Task 3**: Use the dictionary to access the artist ID and the artist name. Store them in variables called `artist_id` and `artist_name`.

In [None]:
# YOUR CODE HERE
print("artist_name: ", artist_name, "\nartist_id: ", artist_id)

Now that you've gotten your favorite artist's ID, we can use it to make more interesting API requests.

**Task 4 **: Using the Python **requests** library, we'll now be writing a query to get your favorite artist's Top 10 Tracks. Make the API request, then store the track names in a new list variable named `track_list` and their corresponding albums' names in a new list variable named `album_list`. 

Note: all the endpoint are described [here](https://developer.spotify.com/web-api/console/).

The format you will use is very similar to the one above.

In [None]:
track_list = ...
endpoint_url = 'https://api.spotify.com/v1/' + ...
country = 'US'
params2 = {'country': country }

# YOUR CODE HERE
print(track_list)
print(album_list)

Great! You've found your favorite artist's top tracks in JSON. Although using JSON is a great way of storing and exchanging data, it is difficult to actually read and understand it. A better format for displaying this data is CSV, which displays information in tabular form (like a spreadsheet).

** Task 5 **: Using **pandas**, create a dataframe with three columns: `Artist`, `Track`, and `Album`. Finally, save your data to a CSV file called `Artist_Tracks.csv`.

In [None]:
# YOUR CODE HERE
print(pd.read_csv('data/lab1a_data.csv'))

__Task 6:__ As a sanity check, correct the code below so it returns the the number of times your artist name appears using regular expressions. The artist name should appear at least 10 times.

In [None]:
import re

pattern = re.compile(<YOUR_PATTERN_HERE>)

for index, name in df[<YOUR_ARTIST_COLUMN>].iteritems():
    match =  pattern.match(name)
    if match:
        print(matches.group())

In general to use regular expressions in Python, you would do the following:

1. [Compile](https://docs.python.org/3/library/re.html#re.compile) your pattern
    
    - `pattern = re.compile(...)`
    
2. [Match](https://docs.python.org/3/library/re.html#re.regex.match) your string to the pattern
    
    - `match = p.match(...)`
    
3. Extract your [groups](https://docs.python.org/3/library/re.html#re.match.group)
    
    - `m.group(0)`

## Summary

### API requests in Python:

1. r = requests.get()
2. content = r.json()
3. content[FIELD_1] [FIELD_2] ...

### Regular expressions in Python

1. p = re.compile()
2. m = p.match()
3. m.group()