# Batch Data Processing Using an API

### Project: Extracting Data from the Spotify API

In this project, I will demonstrate how to interact with the **Spotify API** to extract data in a batch process. This involves understanding key concepts like **pagination** and handling API requests that require **authorization**. By the end of this project, you will have a clear understanding of how to efficiently retrieve large datasets from APIs and process them for further analysis.

#### Key Objectives:
1. **Interacting with the Spotify API**: I will walk you through the steps to connect to the Spotify API, including setting up authentication and making authorized requests.
2. **Batch Data Extraction**: I will show you how to extract data in batches, which is essential for handling large datasets that cannot be retrieved in a single request.
3. **Understanding Pagination**: Pagination is a common technique used by APIs to manage large datasets. I will explain how it works and demonstrate how to handle paginated responses to retrieve all available data.
4. **Data Preparation**: Once the data is extracted, I will guide you through the process of cleaning and organizing it for further analysis or integration into a data pipeline.

This project is designed to provide hands-on experience with real-world API interactions, which are a critical part of modern data engineering workflows. Let’s dive in!

# Table of Contents

- [ 1 - Create a Spotify APP](#1)
- [ 2 - Understand the Basics of APIs](#2)
  - [ 2.1 - Get Token](#2-1)
  - [ 2.2 - Get New Releases](#2-2)
    - [ Exercise 1](#ex01)
  - [ 2.3 - Pagination](#2-3)
    - [ Exercise 2](#ex02)
    - [ Exercise 3](#ex03)
  - [ 2.4 - Optional - API Rate Limits](#2-4)
- [ 3 - Batch pipeline](#3)
  - [ Exercise 4](#ex04)
  - [ Exercise 5](#ex05)
  - [ Exercise 6](#ex06)
- [ 4 - Optional - Spotipy SDK](#4)
  - [ Exercise 7](#ex07)

<a id='1'></a>
## 1 - Creating a Spotify App

To access Spotify's API resources, I first needed to create a Spotify account. If you don’t already have one, a trial account is sufficient to complete this project. Here’s how I set up the Spotify App:

**Note:** Since I am executing this project using AWS Platform, I am sharing some basic credentials that may be dummy or else. You can contact me if get stuck anywhere.

### Steps to Create a Spotify App:
1. **Create a Spotify Developer Account**:
   - I visited [Spotify for Developers](https://developer.spotify.com/), created an account, and logged in.
   
2. **Access the Dashboard**:
   - After logging in, I clicked on my account name in the top-right corner and selected **Dashboard**.

3. **Create a New App**:
   - I created a new app with the following details:
     - **App Name**: `xyz-spotify-app`
     - **App Description**: `Spotify app to test the API`
     - **Website**: Left empty
     - **Redirect URIs**: `http://localhost:3000`
     - **API to Use**: Selected `Web API`
   - I then clicked the **Save** button. If you encounter an error stating that your account is not ready, you can log out, wait a few minutes, and repeat steps 2-4.

4. **Retrieve Client ID and Client Secret**:
   - On the App Home page, I clicked on **Settings** and revealed the `Client ID` and `Client Secret`.
   - I stored these credentials in the `src/env` file provided in this project. Make sure to save the `src/env` file using `Ctrl + S` (Windows) or `Cmd + S` (Mac).

### Spotify API Documentation
For reference, I used the [Spotify API documentation](https://developer.spotify.com/documentation/web-api/tutorials/getting-started) throughout this project. The documentation provides detailed information about the API endpoints and how to interact with them.

### Key API Resources Used:
- **New Album Releases**: I interacted with the [New Releases endpoint](https://developer.spotify.com/documentation/web-api/reference/get-new-releases) to retrieve the latest albums.
- **Album Tracks**: In the second part of the project, I used the [Album Tracks endpoint](https://developer.spotify.com/documentation/web-api/reference/get-an-albums-tracks) to fetch tracks from specific albums.

This setup was crucial for accessing Spotify's API and retrieving the data needed for the project. Let’s move on to the next steps!

<a id='2'></a>
## 2 - Understanding the Basics of APIs

In this project, I explored how to interact with APIs using Python. Several Python packages allow you to request data from an API, and for this project, I used the `requests` package. The `requests` library is a popular and versatile tool for performing HTTP requests, making it simple and intuitive to interact with web services and APIs.

### Why Use the `requests` Package?
- **Ease of Use**: The `requests` library provides a straightforward way to send HTTP requests and handle responses.
- **Versatility**: It supports various HTTP methods like GET, POST, PUT, DELETE, etc., making it suitable for interacting with a wide range of APIs.
- **Community Support**: Being one of the most widely used libraries, it has extensive documentation and community support.

### Loading the Required Packages
To get started, I loaded the necessary Python packages. Here’s how I did it:



In [1]:
import os
from typing import Dict, Any, Callable
from dotenv import load_dotenv
import json
import requests 


<a id='2-1'></a>
### 2.1 - Generating an Access Token

The first step in working with any API is understanding its authentication process. For the Spotify API, this involves using the **Client ID** and **Client Secret** generated by the Spotify App to create an **access token**. The access token is a string that contains credentials and permissions required to access specific resources. You can learn more about this process in the [Spotify API documentation](https://developer.spotify.com/documentation/web-api/concepts/access-token).

#### Importance of Understanding API Authentication
Since each API is designed with a specific purpose, it’s essential to read and understand its nuances to ensure responsible and secure access to data. Throughout this project, I’ll provide links to the relevant documentation, and I encourage you to explore them in detail. (During the project, you can quickly skim through the links, but you can always revisit them later for a deeper understanding.)

#### Setting Up Client ID and Client Secret
To generate the access token, I first created variables to store the `client_id` and `client_secret` values that were saved in the `src/env` file. These credentials are essential for authenticating with the Spotify API and generating the access token.

### Next Steps
With the `client_id` and `client_secret` set up, I’ll now walk you through the process of generating the access token and making authenticated API requests.

In [2]:
load_dotenv('./src/env', override=True)

CLIENT_ID = os.getenv('CLIENT_ID')
CLIENT_SECRET = os.getenv('CLIENT_SECRET')

In [3]:
print(CLIENT_ID)
print(CLIENT_SECRET)

fabf791fc1294f2e947d3075fef9b64e
730d672f5a3145299749a453d45d3b5a


The `get_token` function below takes a Client ID, Client secret and a URL as input, and performs a POST request to that URL to obtain an access token using the client credentials. Run the following cell to get the access token.

In [4]:
def get_token(client_id: str, client_secret: str, url: str) -> Dict[Any, Any]:
    """Allows to perform a POST request to obtain an access token 

    Args:
        client_id (str): App client id
        client_secret (str): App client secret
        url (str): URL to perform the post request

    Returns:
        Dict[Any, Any]: Dictionary containing the access token
    """
        
    headers = {        
        "Content-Type": "application/x-www-form-urlencoded"            
    }
    
    payload = {
                "grant_type": "client_credentials", 
                "client_id": client_id, 
                "client_secret": client_secret
               }
    
    try: 
        response = requests.post(url=url, headers=headers, data=payload)
        print(type(response))
        response.raise_for_status()
        response_json = json.loads(response.content)
        
        return response_json
        
    except Exception as err:
        print(f"Error: {err}")
        return {}

URL_TOKEN="https://accounts.spotify.com/api/token"
token = get_token(client_id=CLIENT_ID, client_secret=CLIENT_SECRET, url=URL_TOKEN)

print(token)

<class 'requests.models.Response'>
{'access_token': 'BQCLo-JrmAhJ9rAQ0CkUlnAmeN3YYL8_3E1I1TVwzLGCwN8cN2ztWAZutp1S8plTO94RoCXFYOpsRu8Cjgg_j3FGVFZ6ay0awt6WczvPF2b6LywJ7agp_Ir1fNe_hXQ8oOPVz2O32fs', 'token_type': 'Bearer', 'expires_in': 3600}


In [5]:
print(token)
print(type(token))
print(token["access_token"])
my_access_token = token["access_token"]

{'access_token': 'BQCLo-JrmAhJ9rAQ0CkUlnAmeN3YYL8_3E1I1TVwzLGCwN8cN2ztWAZutp1S8plTO94RoCXFYOpsRu8Cjgg_j3FGVFZ6ay0awt6WczvPF2b6LywJ7agp_Ir1fNe_hXQ8oOPVz2O32fs', 'token_type': 'Bearer', 'expires_in': 3600}
<class 'dict'>
BQCLo-JrmAhJ9rAQ0CkUlnAmeN3YYL8_3E1I1TVwzLGCwN8cN2ztWAZutp1S8plTO94RoCXFYOpsRu8Cjgg_j3FGVFZ6ay0awt6WczvPF2b6LywJ7agp_Ir1fNe_hXQ8oOPVz2O32fs


### Understanding the Temporary Access Token

In this section, I’ll explain how the temporary access token works and how to use it effectively when interacting with the Spotify API.

#### Key Details About the Access Token:
- **Temporary Nature**: The access token provided by Spotify is temporary. The `expires_in` field in the response indicates the duration of the token in seconds. Once the token expires, any API requests made with it will fail.
- **Handling Expired Tokens**: When the token expires, the API will return an error object with a **status code of 401**, which means the request is unauthorized. To continue making requests, you’ll need to generate a new access token.

#### Including the Access Token in API Requests
To interact with the Spotify API, every request must include the access token as part of the **authorization header**. The header must follow a specific format, which I’ll explain below.

#### Using the `get_auth_header` Function
To simplify the process of creating the authorization header, I used a helper function called `get_auth_header`. This function takes the access token as input and returns the properly formatted authorization header. This header can then be included in your API requests.

#### Next Steps
Before proceeding, make sure to declare the `get_auth_header` function, as it will be used throughout this project. Running the provided code cell will set up this function for use in subsequent steps.

In [6]:
def get_auth_header(access_token: str) -> Dict[str, str]:
    return {"Authorization": f"Bearer {access_token}"}

Now, let's use the token to perform a request to access the first resource, which is the [new releases](https://developer.spotify.com/documentation/web-api/reference/get-new-releases).

<a id='2-2'></a>
### 2.2 - Get New Releases

<a id='task01'></a>
### Task 1: Retrieving New Album Releases

In this task, I implemented the `get_new_releases` function to fetch the latest album releases from the Spotify API. This involved authenticating the request, making the API call, and processing the response.

#### Steps to Implement the Function:

1. **Generate the Authorization Header**:
   - I used the `get_auth_header` function to create the authorization header. This function takes the `access_token` (provided as input to `get_new_releases`) and returns the properly formatted header.
   - The output was stored in a variable called `headers`, which was later used to authenticate the API request.

2. **Make the API Request**:
   - I used the `request_url` variable, which contains the URL for the Spotify API endpoint, to send a `GET` request.
   - The `headers` variable was included in the request to ensure proper authentication.

3. **Process the API Response**:
   - The response from the API is an object of type `requests.models.Response`. This object includes a `json()` method, which converts the response content into a Python dictionary or JSON object.
   - I applied the `json()` method to the `response` object to extract and return the data in a usable format.

#### Using the `URL_NEW_RELEASES` Endpoint
To fetch new album releases, I utilized the `URL_NEW_RELEASES` endpoint. This required passing the `access_token` value from the `token` object that I obtained earlier.

#### Outcome
By completing this task, I successfully implemented a function to retrieve and process new album releases from the Spotify API. This function will be used in subsequent steps to analyze and work with the retrieved data.

In [7]:
def get_new_releases(url: str, access_token: str, offset: int=0, limit: int=20, next: str="") -> Dict[Any, Any]:
    """Perform get() request to new releases endpoint

    Args:
        url (str): Base url for the request
        access_token (str): Access token
        offset (int, optional): Page offset for pagination. Defaults to 0.
        limit (int, optional): Number of elements per page. Defaults to 20.
        next (str, optional): Next URL to perform next request. Defaults to "".

    Returns:
        Dict[Any, Any]: Request response
    """

    if next == "":        
        request_url = f"{url}?offset={offset}&limit={limit}"
    else: 
        request_url = f"{next}"

    ### START CODE HERE ### (~ 4 lines of code)
    # Call get_auth_header() function and pass the access token.
    headers = get_auth_header(access_token=my_access_token)
    
    try: 
        # Perform a get() request using the request_url and headers.
        response = requests.get(url=request_url, headers=headers)
        # Use json() method over the response to return it as Python dictionary.
        return response.json()
    ### END CODE HERE ###
    
    except Exception as err:
        print(f"Error requesting data: {err}")
        return {'error': err}
        
URL_NEW_RELEASES = "https://api.spotify.com/v1/browse/new-releases"

# Note: the `access_token` value from the dictionary `token` can be retrieved either using `get()` method or dictionary syntax `token['access_token']`
releases_response = get_new_releases(url=URL_NEW_RELEASES, access_token=token.get('access_token'))

The result you get is a JSON object that was transformed into a python dictionary. You can explore the structure of the response you get:

In [8]:
print(releases_response.get("albums").get("href"))
releases_response.keys()
#releases_response["albums"]
releases_response.get("albums").get("items")



https://api.spotify.com/v1/browse/new-releases?offset=0&limit=20


[{'album_type': 'album',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/06HL4z0CvFAxyc27GXpf02'},
    'href': 'https://api.spotify.com/v1/artists/06HL4z0CvFAxyc27GXpf02',
    'id': '06HL4z0CvFAxyc27GXpf02',
    'name': 'Taylor Swift',
    'type': 'artist',
    'uri': 'spotify:artist:06HL4z0CvFAxyc27GXpf02'}],
  'available_markets': ['AR',
   'AU',
   'AT',
   'BE',
   'BO',
   'BR',
   'BG',
   'CA',
   'CL',
   'CO',
   'CR',
   'CY',
   'CZ',
   'DK',
   'DO',
   'DE',
   'EC',
   'EE',
   'SV',
   'FI',
   'FR',
   'GR',
   'GT',
   'HN',
   'HK',
   'HU',
   'IS',
   'IE',
   'IT',
   'LV',
   'LT',
   'LU',
   'MY',
   'MT',
   'MX',
   'NL',
   'NZ',
   'NI',
   'NO',
   'PA',
   'PY',
   'PE',
   'PH',
   'PL',
   'PT',
   'SG',
   'SK',
   'ES',
   'SE',
   'CH',
   'TW',
   'TR',
   'UY',
   'US',
   'GB',
   'AD',
   'LI',
   'MC',
   'ID',
   'JP',
   'TH',
   'VN',
   'RO',
   'IL',
   'ZA',
   'SA',
   'AE',
   'BH',
   'QA',
   'OM',
   'KW',

In [9]:
releases_response.get('albums').keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

Each API manages responses in its own way so it is highly recommended to read the documentation and understand the nuances behind the API endpoints you are working with. In this case, you see some fields such as `'href'` under the `'albums'` field, which tells you the URL used for the request you just sent.

In [10]:
releases_response.get('albums').get('next')

'https://api.spotify.com/v1/browse/new-releases?offset=20&limit=20'

You can see that there are two parameters: `offset` and `limit` that were added to the endpoint. Those parameters are the base of pagination in this API endpoint. We will take a look at them later. 

You can also explore the returned items using the `'items'` field under `'albums'`. This will return a list of items, you can take a look at the number of items returned:

In [11]:
len(releases_response.get('albums').get('items'))

20

Explore the items:

In [12]:
releases_response.get('albums').get('items')[0]

{'album_type': 'album',
 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/06HL4z0CvFAxyc27GXpf02'},
   'href': 'https://api.spotify.com/v1/artists/06HL4z0CvFAxyc27GXpf02',
   'id': '06HL4z0CvFAxyc27GXpf02',
   'name': 'Taylor Swift',
   'type': 'artist',
   'uri': 'spotify:artist:06HL4z0CvFAxyc27GXpf02'}],
 'available_markets': ['AR',
  'AU',
  'AT',
  'BE',
  'BO',
  'BR',
  'BG',
  'CA',
  'CL',
  'CO',
  'CR',
  'CY',
  'CZ',
  'DK',
  'DO',
  'DE',
  'EC',
  'EE',
  'SV',
  'FI',
  'FR',
  'GR',
  'GT',
  'HN',
  'HK',
  'HU',
  'IS',
  'IE',
  'IT',
  'LV',
  'LT',
  'LU',
  'MY',
  'MT',
  'MX',
  'NL',
  'NZ',
  'NI',
  'NO',
  'PA',
  'PY',
  'PE',
  'PH',
  'PL',
  'PT',
  'SG',
  'SK',
  'ES',
  'SE',
  'CH',
  'TW',
  'TR',
  'UY',
  'US',
  'GB',
  'AD',
  'LI',
  'MC',
  'ID',
  'JP',
  'TH',
  'VN',
  'RO',
  'IL',
  'ZA',
  'SA',
  'AE',
  'BH',
  'QA',
  'OM',
  'KW',
  'EG',
  'MA',
  'DZ',
  'TN',
  'LB',
  'JO',
  'PS',
  'IN',
  'KZ',
  'MD

<a id='2-3'></a>
### 2.3 - Pagination

If you print `releases_response`, you can see the following fields:

```json
{
...,
'limit': 20,
'next': 'https://api.spotify.com/v1/browse/new-releases?offset=20&limit=20',
'offset': 0,
'previous': None,
'total': 100
}
```

Although there is a total of 100 available items to be returned, only 20 were returned. This is established by the `limit` parameter and those were the 20 items you just counted before. This limit on the number of elements returned is a common feature of several APIs and although in some cases you can modify such a limit, a good practice is to use it with **pagination** to get all the elements that can be returned. 

Each API handles pagination differently. For Spotify, the requests response provides you with two fields that allow you to query the different pages of your request: `previous` and `next`. These two fields will return the URL to the previous or next page respectively and they are based on the `offset` and `limit` parameters. In this case, there are two ways for you to explore the rest of the data:

- you can use the value from the next parameter to get the direct URL for the next page of requests, or 
- you can build the URL for the next page from scratch using the offset and limit parameters (make sure to update the offset parameter for the request). 

For the sake of learning, you will use method 2 to build the URL yourself. Then you will also compare it with the result from using the first method just to check that you created the URL correctly.

Before creating a function that will allow you to paginate, let's try to do it manually. If you compare the URLs provided by the `href` and `next` fields, you can see that while the `limit` parameter remains the same, the `offset` parameter has increased with the same value as the one stored in `limit`.

```json
{
...,
'href': 'https://api.spotify.com/v1/browse/new-releases?offset=0&limit=20',
...,
'next': 'https://api.spotify.com/v1/browse/new-releases?offset=20&limit=20',
...
}
```

So for our next call, let's pass 20 to `offset` and keep `limit` as 20:

In [13]:
next_releases_response = get_new_releases(url=URL_NEW_RELEASES, access_token=token.get('access_token'), offset=20, limit=20)

Check the values for `href` and `next` in the new response `next_releases_response`:

In [14]:
next_releases_response.get('albums').get('href')

'https://api.spotify.com/v1/browse/new-releases?offset=20&limit=20'

In [39]:
next_releases_response.get('albums').get("items")

[{'album_type': 'ep',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7q7IUe2AqtifSZ2q52kHFc'},
    'href': 'https://api.spotify.com/v1/artists/7q7IUe2AqtifSZ2q52kHFc',
    'id': '7q7IUe2AqtifSZ2q52kHFc',
    'name': 'Forest Blakk',
    'type': 'artist',
    'uri': 'spotify:artist:7q7IUe2AqtifSZ2q52kHFc'}],
  'available_markets': ['AR',
   'AU',
   'AT',
   'BE',
   'BO',
   'BR',
   'BG',
   'CA',
   'CL',
   'CO',
   'CR',
   'CY',
   'CZ',
   'DK',
   'DO',
   'DE',
   'EC',
   'EE',
   'SV',
   'FI',
   'FR',
   'GR',
   'GT',
   'HN',
   'HK',
   'HU',
   'IS',
   'IE',
   'IT',
   'LV',
   'LT',
   'LU',
   'MY',
   'MT',
   'MX',
   'NL',
   'NZ',
   'NI',
   'NO',
   'PA',
   'PY',
   'PE',
   'PH',
   'PL',
   'PT',
   'SG',
   'SK',
   'ES',
   'SE',
   'CH',
   'TW',
   'TR',
   'UY',
   'US',
   'GB',
   'AD',
   'LI',
   'MC',
   'ID',
   'JP',
   'TH',
   'VN',
   'RO',
   'IL',
   'ZA',
   'SA',
   'AE',
   'BH',
   'QA',
   'OM',
   'KW',
  

In [15]:
next_releases_response.get('albums').get('next')

'https://api.spotify.com/v1/browse/new-releases?offset=40&limit=20'

Given these results, you can see that the `offset` increases by the value of the `limit`. As the responses show that the `total` value is 100, this means that you can access the last page of responses by using an `offset` of 80, while keeping the `limit` value as 20.

In [16]:
last_releases_response = get_new_releases(url=URL_NEW_RELEASES, access_token=token.get('access_token'), offset=80, limit=20)

In [17]:
print(last_releases_response.get('albums').get('previous'))
print(last_releases_response.get('albums').get('next'))

https://api.spotify.com/v1/browse/new-releases?offset=60&limit=20
None


You can see that the value of the `next` field is `None`, indicating that you reached the last page. On the other hand, you can see that `previous` contains the URL to request the data from the previous page, so you can even go backward if required.

<a id='task02'></a>
### Task 2: Implementing Pagination for API Requests

In this task, I created a new function to handle pagination for API requests. This function builds on the `get_new_releases` function and ensures that all data is retrieved, even if it spans multiple pages.

#### Steps to Implement the Function:

1. **Define the Function**:
   - The function requires a callable (`endpoint_request`) that performs the API call to fetch new album releases. This callable is passed as an argument to the function.

2. **Set Up Initial Parameters**:
   - Before entering the `while` loop, I created a dictionary named `kwargs` with the following keys:
     - `'url'`: The URL for the API request, passed as a parameter to the function.
     - `'access_token'`: The access token, also passed as a parameter to the function.
     - `'offset'`: The starting point for the paginated request.
     - `'limit'`: The maximum number of elements to retrieve per page.

3. **Make the Initial API Request**:
   - I called the `endpoint_request()` function using the keyword arguments specified in the `kwargs` dictionary. The response was stored in the `response` variable.

4. **Store the Initial Response**:
   - I extended the `responses` list with the album `items` from the `response`. This list will accumulate all retrieved data across multiple pages.

5. **Determine the Total Number of Elements**:
   - I created a variable named `total_elements` to store the total number of elements available. This value was extracted from the `response` object under the `'albums'` field, which contains a `'total'` field indicating the total number of elements. For reference, I consulted the [Spotify API documentation](https://developer.spotify.com/documentation/web-api/reference/get-new-releases).

6. **Set Up the Pagination Loop**:
   - I used a `while` loop to continue fetching data as long as the `offset` value was smaller than `total_elements`.

7. **Process Subsequent Pages**:
   - Inside the `while` loop, I performed the following steps:
     - Updated the `offset` value by adding the current `offset` to the `limit` value.
     - Recreated the `kwargs` dictionary with the updated `offset` value.
     - Repeated the API request using `endpoint_request()` and extended the `responses` list with the new `items`.

#### Outcome
By completing this task, I implemented a robust function to handle paginated API requests. This ensures that all data is retrieved, regardless of how many pages it spans, and prepares the data for further processing and analysis.

In [18]:
def paginated_new_releases(endpoint_request: Callable, url: str, access_token: str, offset: int=0, limit: int=20) -> list:
    """Allows to perform pagination over and API request done by the endpoint_request function

    Args:
        endpoint_request (Callable): Function that performs the API Calls
        url (str): Endpoint's URL for the request
        access_token (str): Access token
        offset (int, optional): Offset of the page's request. Defaults to 0.
        limit (int, optional): Limit of the page's request. Defaults to 20.

    Returns:
        list: List with the requested items
    """
    
    responses = []
    
    ### START CODE HERE ### (~ 19 lines of code)
    # Create a dictionary named kwargs with the values corresponding to the keys url, token, offset, limit
    kwargs = { 
        
            "url": "https://api.spotify.com/v1/browse/new-releases",
            "access_token": my_access_token,
            "offset": 20,
            "limit": 20,
        } 

    
    # Call the endpoint_request() function with the arguments specified in the kwargs dictionary.
    response = endpoint_request(**kwargs)
    # Use extend() method to add the albums' items to the list of responses.
    responses.extend(response.get('albums').get('items'))
    # Get the total number of the elements in albums and save it in the variable total_elements.
    total_elements = response.get('albums').get('total')

    # Run the loop as long as the offset value is smaller than total_elements.
    while offset < total_elements:
        # Update the offset value with the current value from the request you did plus the limit value.
        offset = response.get('albums').get('offset') + offset
        # Repeat the definition of the kwargs dictionary with the same parameters (with the new offset value).
        kwargs = { 
            
            "url": "https://api.spotify.com/v1/browse/new-releases",
            "access_token": my_access_token,
            "offset": 20,
            "limit": 40,
        } 
        
        # Call the endpoint_request() function with the arguments specified in the kwargs dictionary.
        response = endpoint_request(**kwargs)
         # Use extend() method to add the albums' items to the list of responses.
        responses.extend(response.get('albums').get('items'))
    ### END CODE HERE ###
        
        print(f"Finished iteration for page with offset: {offset-limit}")

    return responses

Now, execute the `paginated_new_releases` with the function `get_new_releases` as the `endpoint_request` callable parameter. Use the same URL used in the previous `get_new_releases` call, as well as the access token. Set the initial `offset` as 0. For the limit, the default value is 20 but you can play with other values if you want.

In [19]:
responses = paginated_new_releases(endpoint_request=get_new_releases,
                                   url=URL_NEW_RELEASES, 
                                   access_token=token.get('access_token'), 
                                   offset=0, limit=20)

Finished iteration for page with offset: 0
Finished iteration for page with offset: 20
Finished iteration for page with offset: 40
Finished iteration for page with offset: 60
Finished iteration for page with offset: 80


##### __Expected Output__ 
```text
Finished iteration for page with offset: 0
Finished iteration for page with offset: 20
Finished iteration for page with offset: 40
Finished iteration for page with offset: 60
Finished iteration for page with offset: 80
```

Have a look at one of the item:

In [20]:
responses[0]


{'album_type': 'ep',
 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7q7IUe2AqtifSZ2q52kHFc'},
   'href': 'https://api.spotify.com/v1/artists/7q7IUe2AqtifSZ2q52kHFc',
   'id': '7q7IUe2AqtifSZ2q52kHFc',
   'name': 'Forest Blakk',
   'type': 'artist',
   'uri': 'spotify:artist:7q7IUe2AqtifSZ2q52kHFc'}],
 'available_markets': ['AR',
  'AU',
  'AT',
  'BE',
  'BO',
  'BR',
  'BG',
  'CA',
  'CL',
  'CO',
  'CR',
  'CY',
  'CZ',
  'DK',
  'DO',
  'DE',
  'EC',
  'EE',
  'SV',
  'FI',
  'FR',
  'GR',
  'GT',
  'HN',
  'HK',
  'HU',
  'IS',
  'IE',
  'IT',
  'LV',
  'LT',
  'LU',
  'MY',
  'MT',
  'MX',
  'NL',
  'NZ',
  'NI',
  'NO',
  'PA',
  'PY',
  'PE',
  'PH',
  'PL',
  'PT',
  'SG',
  'SK',
  'ES',
  'SE',
  'CH',
  'TW',
  'TR',
  'UY',
  'US',
  'GB',
  'AD',
  'LI',
  'MC',
  'ID',
  'JP',
  'TH',
  'VN',
  'RO',
  'IL',
  'ZA',
  'SA',
  'AE',
  'BH',
  'QA',
  'OM',
  'KW',
  'EG',
  'MA',
  'DZ',
  'TN',
  'LB',
  'JO',
  'PS',
  'IN',
  'KZ',
  'MD',


In [21]:
print(type(responses[0]))

<class 'dict'>


In [36]:
print(responses[0].keys())

dict_keys(['album_type', 'artists', 'available_markets', 'external_urls', 'href', 'id', 'images', 'name', 'release_date', 'release_date_precision', 'total_tracks', 'type', 'uri'])


You can check the `responses` variable to see if all the elements were downloaded successfully.

In [22]:
len(responses)

220

With the `paginated_new_releases` function that you created, you are now able to get all 100 available items.

<a id='task03'></a>
### Task 3: Implementing Pagination Using the `next` Parameter

In this task, I created a new function to handle pagination using the `next` parameter. This approach provides an alternative to the `offset` and `limit` parameters used in the previous task. The goal was to compare the results from both methods to ensure consistency and accuracy.

#### Steps to Implement the Function:

1. **Define the Function**:
   - The function uses the `next` parameter to handle pagination, making it more dynamic compared to the `offset` and `limit` approach.

2. **Set Up Initial Parameters**:
   - I created a dictionary named `kwargs` with the following keys:
     - `'url'`: The URL for the API request, passed as a parameter to the function.
     - `'access_token'`: The access token, also passed as a parameter to the function.
     - `'next'`: Initially set as an empty string for the first API call. This key will later store the URL for the next page of results.

3. **Process the API Requests**:
   - Inside the `while` loop, I performed the following steps:
     1. **Make the API Request**:
        - I called the `endpoint_request()` function using the keyword arguments specified in the `kwargs` dictionary. The response was stored in the `response` variable.
     2. **Store the Response Data**:
        - I extended the `responses` list with the album `items` from the `response`. This list accumulates all retrieved data across multiple pages.
     3. **Retrieve the Next Page URL**:
        - I reassigned the `next_page` variable to the value of `'next'` from the `response["albums"]` dictionary. This URL points to the next page of results.
        - For reference, I consulted the [Spotify API documentation](https://developer.spotify.com/documentation/web-api/reference/get-new-releases) to understand the response structure.
     4. **Update the `kwargs` Dictionary**:
        - I updated the `kwargs` dictionary by setting the `'next'` key to the `next_page` URL. This ensures the next API call fetches the subsequent page of results.

#### Outcome
By completing this task, I implemented a function that uses the `next` parameter for pagination. This approach dynamically retrieves all pages of data without requiring manual calculation of `offset` and `limit`. Comparing the results with the previous method confirmed the accuracy and consistency of both approaches.

#### Next Steps
With both pagination methods implemented, I can now analyze and compare their performance and usability in different scenarios.

In [47]:
def paginated_with_next_new_releases(endpoint_request: Callable, url: str, access_token: str) -> list:
    """Manages pagination for API requests done with the endpoint_request callable

    Args:
        endpoint_request (Callable): Function that performs API request
        url (str): Base URL for the request
        access_token (str): Access token

    Returns:
        list: Responses stored in a list
    """
    responses = []
        
    next_page = url
    
    kwargs = {
            "url": url,
            "access_token": access_token,
            "next": ""
        }
    
    while next_page:
        
        ### START CODE HERE ### (~ 4 lines of code)
        # Call the endpoint_request() function with the arguments specified in the kwargs dictionary.
        response = endpoint_request(**kwargs)
        # Use extend() method to add the albums' items to the list of responses.
        responses.extend(response.get('albums' , {}).get('items' , []))
        # Reassign the value of next_page as the 'next' value from the response["albums"] dictionary.
        next_page = response["albums"]["next"]
        print(next_page)
        # Update the kwargs dictionary: set the value of the key 'next' as the variable next_page.
        kwargs["next"] = next_page
        ### END CODE HERE ###
        
        print(f"Executed request with URL: {response.get('albums').get('href')}.")
                
    return responses
    

Now, perform the new paginated call:

In [48]:
responses_with_next = paginated_with_next_new_releases(endpoint_request=get_new_releases, 
                                                             url=URL_NEW_RELEASES, 
                                                             access_token=token.get('access_token'))

https://api.spotify.com/v1/browse/new-releases?offset=20&limit=20
Executed request with URL: https://api.spotify.com/v1/browse/new-releases?offset=0&limit=20.
https://api.spotify.com/v1/browse/new-releases?offset=40&limit=20
Executed request with URL: https://api.spotify.com/v1/browse/new-releases?offset=20&limit=20.
https://api.spotify.com/v1/browse/new-releases?offset=60&limit=20
Executed request with URL: https://api.spotify.com/v1/browse/new-releases?offset=40&limit=20.
https://api.spotify.com/v1/browse/new-releases?offset=80&limit=20
Executed request with URL: https://api.spotify.com/v1/browse/new-releases?offset=60&limit=20.
None
Executed request with URL: https://api.spotify.com/v1/browse/new-releases?offset=80&limit=20.


In [49]:
print(responses_with_next[0])

{'album_type': 'album', 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/06HL4z0CvFAxyc27GXpf02'}, 'href': 'https://api.spotify.com/v1/artists/06HL4z0CvFAxyc27GXpf02', 'id': '06HL4z0CvFAxyc27GXpf02', 'name': 'Taylor Swift', 'type': 'artist', 'uri': 'spotify:artist:06HL4z0CvFAxyc27GXpf02'}], 'available_markets': ['AR', 'AU', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA', 'CL', 'CO', 'CR', 'CY', 'CZ', 'DK', 'DO', 'DE', 'EC', 'EE', 'SV', 'FI', 'FR', 'GR', 'GT', 'HN', 'HK', 'HU', 'IS', 'IE', 'IT', 'LV', 'LT', 'LU', 'MY', 'MT', 'MX', 'NL', 'NZ', 'NI', 'NO', 'PA', 'PY', 'PE', 'PH', 'PL', 'PT', 'SG', 'SK', 'ES', 'SE', 'CH', 'TW', 'TR', 'UY', 'US', 'GB', 'AD', 'LI', 'MC', 'ID', 'JP', 'TH', 'VN', 'RO', 'IL', 'ZA', 'SA', 'AE', 'BH', 'QA', 'OM', 'KW', 'EG', 'MA', 'DZ', 'TN', 'LB', 'JO', 'PS', 'IN', 'KZ', 'MD', 'UA', 'AL', 'BA', 'HR', 'ME', 'MK', 'RS', 'SI', 'KR', 'BD', 'PK', 'LK', 'GH', 'KE', 'NG', 'TZ', 'UG', 'AG', 'AM', 'BS', 'BB', 'BZ', 'BT', 'BW', 'BF', 'CV', 'CW', 'DM', 'FJ', 

Have a look at one of the responses:

In [54]:
print(type(responses_with_next[0]))

<class 'dict'>


<a id='2-4'></a>
### 2.4 - Optional: Understanding API Rate Limits

*Note*: This section is optional and provides additional insights into working with APIs.

When interacting with APIs, it’s crucial to understand **rate limits**. Rate limiting is a mechanism used by APIs to control the number of requests a client can make within a specific time frame. This helps prevent abuse, overload, and ensures fair usage of the API resources. Here’s how rate limiting typically works:

#### Key Concepts of Rate Limiting:
1. **Request Quotas**:
   - APIs often enforce a maximum number of requests a client can make within a given time window. For example, a limit of 100 requests per minute means you can’t exceed 100 requests in any 60-second period.

2. **Time Windows**:
   - The time window defines the duration over which the request quota is measured. For instance, a rate limit of 100 requests per minute resets every 60 seconds.

3. **Handling Exceeded Limits**:
   - If a client exceeds the rate limit, the API typically responds with an error code, such as **429 Too Many Requests**. This indicates that the rate limit has been exceeded, and the client should adjust its behavior.
   - Common strategies to handle rate limits include implementing **exponential backoff** and other retry mechanisms. You can learn more about these strategies in the following resources:
     - [Exponential Backoff Explained](https://medium.com/bobble-engineering/how-does-exponential-backoff-work-90ef02401c65)
     - [Best Practices for Retry Patterns](https://harish-bhattbhatt.medium.com/best-practices-for-retry-pattern-f29d47cd5117)
     - [AWS Guide on Timeouts, Retries, and Backoff](https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/)

4. **Rate Limit Headers**:
   - APIs often include headers in the response to provide information about the client’s rate limit status. These headers may indicate:
     - The number of requests remaining before the limit resets.
     - The time at which the limit will reset.

#### Why Rate Limiting Matters:
Rate limiting ensures the stability and reliability of APIs by:
- Preventing abuse or malicious behavior.
- Allocating resources effectively.
- Managing traffic loads efficiently.

#### Spotify API Rate Limits
The Spotify Web API implements dynamic rate limits based on the number of requests made within a rolling 30-second window. Unlike some APIs, Spotify doesn’t enforce a hard limit but adjusts dynamically based on usage. For more details, refer to the [Spotify API Rate Limits Documentation](https://developer.spotify.com/documentation/web-api/concepts/rate-limits).

Additionally, some blogs have conducted experiments to identify the average number of requests per minute. For example:
- [Spotify API Rate Limit Experiments](https://medium.com/mendix/limiting-your-amount-of-balls-in-mendix-most-of-the-time-rest-835dde55b10e#:~:text=The%20Spotify%20API%20service%20has,for%2060%20requests%20per%20minute)

#### Benchmarking API Calls
To better understand the rate limits, I used a provided code snippet to benchmark API calls. This code allows you to experiment with the number of requests and the interval between them to observe the average request time. If you exceed the rate limits, the API will return a **429 status code**.

*Note*: Running this code may take a few minutes, depending on the number of requests and intervals configured.

#### Next Steps
Understanding rate limits is essential for building robust and efficient applications. By respecting these limits and implementing proper retry mechanisms, I ensured that my application interacts with the Spotify API reliably and responsibly.

In [None]:
import time

# Define the Spotify API endpoint
endpoint = 'https://api.spotify.com/v1/browse/new-releases'

headers = get_auth_header(access_token=token.get('access_token'))

# Define the number of requests to make
num_requests = 200

# Define the interval between requests (in seconds)
request_interval = 0.1  # Adjust as needed based on the API rate limit

# Store the timestamps of successful requests
success_timestamps = []

# Make repeated requests to the endpoint
for i in range(num_requests):
    # Make the request
    response = requests.get(url=endpoint, headers=headers)
    
    # Check if the request was successful
    if response.status_code == 200:
        success_timestamps.append(time.time())
    else:        
        print(f'Request {i+1}: Failed with code {response.status_code}')
    
    # Wait for the specified interval before making the next request
    time.sleep(request_interval)

# Calculate the time between successful requests
if len(success_timestamps) > 1:
    time_gaps = [success_timestamps[i] - success_timestamps[i-1] for i in range(1, len(success_timestamps))]
    print(f'Average time between successful requests: {sum(time_gaps) / len(time_gaps):.2f} seconds')
else:
    print('At least two successful requests are needed to calculate the time between requests.')

<a id='3'></a>
## 3 - Building a Batch Data Pipeline

Now that I’ve covered the basics of working with APIs, I’ll walk you through the process of creating a **batch data pipeline** to extract track information for newly released albums. This pipeline will utilize two key Spotify API endpoints:

1. **[Get New Releases Endpoint](https://developer.spotify.com/documentation/web-api/reference/get-new-releases)**:
   - This endpoint retrieves a list of new album releases. I used it in the previous tasks to fetch the latest albums.

2. **[Get Album Tracks Endpoint](https://developer.spotify.com/documentation/web-api/reference/get-an-albums-tracks)**:
   - This endpoint provides detailed information about the tracks in a specific album. It allows me to extract metadata such as track names, durations, and more.

### Pipeline Workflow
The pipeline will follow these steps:
1. **Fetch New Album Releases**:
   - Using the **Get New Releases** endpoint, I retrieved a list of the latest albums.
   - This step ensures that the pipeline always works with the most up-to-date data.

2. **Extract Track Information**:
   - For each album in the list, I used the **Get Album Tracks** endpoint to fetch detailed information about its tracks.
   - This step involves iterating through the album IDs and making API calls to retrieve track metadata.

3. **Process and Store Data**:
   - Once the track information is retrieved, I processed and organized it into a structured format (e.g., a list of dictionaries or a DataFrame).
   - The processed data can then be stored in a database, written to a file, or used for further analysis.

### Why This Pipeline Matters
This batch pipeline is a practical example of how to:
- Combine multiple API endpoints to build a cohesive data extraction workflow.
- Handle pagination and rate limits effectively.
- Process and organize data for downstream tasks like analysis or storage.

### Next Steps
In the following sections, I’ll dive deeper into the implementation details of each step, including how to handle errors, optimize performance, and ensure the pipeline runs smoothly.

### Implementing Token Refresh in the Batch Pipeline

In this section, I’ll walk you through the process of enhancing the batch pipeline to handle **token refresh**. Since the access token provided by the Spotify API has a limited lifespan (3600 seconds), it’s essential to implement a routine that refreshes the token if the pipeline runs longer than this duration. Without this, the pipeline could fail with a **401 status code** error, indicating an unauthorized request.

#### Overview of the Provided Scripts
The pipeline is built using three scripts located in the `src/` folder:

1. **`authentication.py`**:
   - Contains the `get_token` function, which retrieves the access token required for API authentication.

2. **`endpoint.py`**:
   - Includes two paginated API call functions:
     - `get_paginated_new_releases`: Retrieves a list of new album releases using pagination.
     - `get_paginated_album_tracks`: Fetches Spotify catalog information about an album’s tracks using the **Get Album Tracks** endpoint.

3. **`main.py`**:
   - Orchestrates the pipeline by:
     - Calling `get_paginated_new_releases` to fetch the IDs of new albums.
     - For each album ID, calling `get_paginated_album_tracks` to extract catalog information.

#### Problem: Token Expiry
Currently, the pipeline handles paginated requests but does not account for the access token’s expiration. If the pipeline runs for more than 3600 seconds, the token will expire, causing subsequent API requests to fail with a **401 status code**.

#### Solution: Implementing Token Refresh
To address this, I implemented a routine in the `get_paginated_new_releases` function to refresh the access token when necessary. Here’s how I approached it:

1. **Check Token Expiry**:
   - Before making an API request, I checked the remaining time until the token’s expiration.
   - If the token is close to expiring (e.g., within a few minutes), I refreshed it.

2. **Refresh the Token**:
   - I called the `get_token` function from `authentication.py` to generate a new access token.
   - Updated the `headers` dictionary with the new token to ensure subsequent requests are authenticated.

3. **Handle Token Refresh Seamlessly**:
   - I ensured that the token refresh process is seamless and does not interrupt the pipeline’s workflow.
   - This involved adding error handling to retry failed requests with the new token.

#### Steps to Implement the Routine
Here’s a high-level overview of the steps I followed to implement the token refresh routine:

1. **Track Token Expiry Time**:
   - I stored the token’s expiration time and compared it with the current time before each API request.

2. **Refresh Token When Necessary**:
   - If the token is about to expire, I called the `get_token` function to refresh it.

3. **Update Headers**:
   - I updated the `headers` dictionary with the new token to ensure all subsequent requests are authenticated.

4. **Retry Failed Requests**:
   - If a request fails due to an expired token, I refreshed the token and retried the request.

#### Outcome
By implementing this routine, I ensured that the pipeline can handle long-running tasks without failing due to token expiration. This makes the pipeline more robust and reliable, even for large datasets or extended processing times.

#### Next Steps
With token refresh implemented, the pipeline is now ready to handle large-scale data extraction tasks. In the next section, I’ll discuss how to optimize the pipeline further and handle potential edge cases.

Run the following commands in the terminal to run the `main.py` script:

```bash
cd src
python main.py
```

*Notes*: To open the terminal, click on Terminal -> New Terminal in the menu:

<img src="images/VSCodeCourseraTerminal.png"  width="600"/>

Once the script is finished, you should be able to see a file named `album_items_<DATETIME>.json` in the folder `src`.

<a id='4'></a>
## 4 - Optional - Spotipy SDK

In several cases, the API developers also provide a Software Development Kit (SDK) to connect and perform requests to the different endpoints of the API without the necessity of creating the code from scratch. For Spotify Web API they developed the [Spotipy SDK](https://spotipy.readthedocs.io/en/2.22.1/) to do it. Let's see an example of how it will work to replicate the extraction of data from the new album releases endpoint in a paginated way.

In [None]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

In [None]:
credentials = SpotifyClientCredentials(
        client_id=CLIENT_ID, client_secret=CLIENT_SECRET
    )

spotify = spotipy.Spotify(client_credentials_manager=credentials)

You can see that the `credentials` object handles the authentication process and contains the token to be used in later requests.

*Note*: Please ignore the `DeprecationWarning` message if you see an access token in the output.

In [None]:
credentials.get_access_token()

Let's get data from of the new album releases, as you did in the previous example:

In [None]:
limit = 20
response = spotify.new_releases(limit=limit)

You can also paginate through these responses. If you check the documentation of the [`new_releases` method](https://spotipy.readthedocs.io/en/2.22.1/#spotipy.client.Spotify.new_releases), you can see that you can specify the parameter `offset`, as you previously did. 

In [None]:
def paginated_new_releases_sdk(limit: int=20) -> list:

    album_data = []
    ### START CODE HERE ### (~ 6 lines of code)
    album_data.None(None.None('None').None('None'))
    total_albums_elements = None.None('None').None('None')
    offset_idx = list(range(None, None, limit))

    for idx in offset_idx: 
        
        response_page = spotify.None(limit=None, offset=None)
        album_data.None(None.None('None').None('None'))
    ### END CODE HERE ###
    return album_data
    
album_data_sdk = paginated_new_releases_sdk()
album_data_sdk[0]

In [None]:
len(album_data_sdk)

In this lab you learned the basics of ingesting data from the API. You worked with authentication process and pagination in a manual way as well as using an API SDK.