# Spotify Batch Data Processing

A Python project for batch processing data from the Spotify Web API, demonstrating authentication, pagination, and data extraction techniques.

# Table of Contents

- [1 - Setup Spotify App](#1)
- [2 - API Basics](#2)
  - [2.1 - Authentication](#2-1)
  - [2.2 - New Releases Endpoint](#2-2)
  - [2.3 - Pagination Implementation](#2-3)
  - [2.4 - API Rate Limits](#2-4)
- [3 - Batch Data Pipeline](#3)
- [4 - Spotipy SDK Alternative](#4)

<a id='1'></a>
## 1 - Setup Spotify App

### Prerequisites
- Spotify account (free account is sufficient)
- Basic Python

### Spotify Developer Setup
1. Go to https://developer.spotify.com/ and log in
2. Navigate to **Dashboard** from your account menu
3. Create a new app with these settings:
   - App name: `spotify-batch-processing`
   - App description: `Batch data processing from Spotify API`
   - Redirect URIs: `http://127.0.0.1:3000`
   - API: Select `Web API`
4. Save your `Client ID` and `Client Secret` in `src/.env`

### API Endpoints Used
- [New Releases](https://developer.spotify.com/documentation/web-api/reference/get-new-releases): Get latest album releases
- [Album Tracks](https://developer.spotify.com/documentation/web-api/reference/get-an-albums-tracks): Get tracks for specific albums

<a id='2'></a>
## 2 - API Basics

This section demonstrates core API concepts using Python's `requests` library for HTTP communication with the Spotify Web API.

In [29]:
%%capture

!python -m pip install requests python-dotenv spotipy

In [1]:
import os
from typing import Dict, Any, Callable

from dotenv import load_dotenv
import json
import requests 

<a id='2-1'></a>
### 2.1 - Authentication

Spotify API uses OAuth 2.0 Client Credentials flow for authentication. This generates an access token required for all API requests.

In [2]:
load_dotenv('./src/.env', override=True)

CLIENT_ID = os.getenv('CLIENT_ID')
CLIENT_SECRET = os.getenv('CLIENT_SECRET')

The `get_token` function performs OAuth 2.0 client credentials authentication to obtain an access token.

In [3]:
def get_token(client_id: str, client_secret: str, url: str) -> Dict[Any, Any]:
    """Allows to perform a POST request to obtain an access token 

    Args:
        client_id (str): App client id
        client_secret (str): App client secret
        url (str): URL to perform the post request

    Returns:
        Dict[Any, Any]: Dictionary containing the access token
    """
        
    headers = {        
        "Content-Type": "application/x-www-form-urlencoded"            
    }
    
    payload = {
                "grant_type": "client_credentials", 
                "client_id": client_id, 
                "client_secret": client_secret
               }
    
    try: 
        response = requests.post(url=url, headers=headers, data=payload)
        print(type(response))
        response.raise_for_status()
        response_json = json.loads(response.content)
        
        return response_json
        
    except Exception as err:
        print(f"Error: {err}")
        return {}

URL_TOKEN="https://accounts.spotify.com/api/token"
token = get_token(client_id=CLIENT_ID, client_secret=CLIENT_SECRET, url=URL_TOKEN)

print(token)

<class 'requests.models.Response'>
{'access_token': 'BQC1uK4d0fFYmYArbNRq1mzoFdHxlqg6vxRVLIc2w5K3NiHMTAx7O9Rz6BHEm8AZ7p-fgBM6CAZ8EIklaVuVfRAb3S646aVNT-Io8c3_LnHa8uGp5srFiGfdja999yStgb4jvYR8Gro', 'token_type': 'Bearer', 'expires_in': 3600}


Access tokens are temporary (3600 seconds) and must be included in API requests as Authorization headers. The `get_auth_header` function formats the token properly.

In [4]:
def get_auth_header(access_token: str) -> Dict[str, str]:
    return {"Authorization": f"Bearer {access_token}"}

Using the token to access the [new releases endpoint](https://developer.spotify.com/documentation/web-api/reference/get-new-releases).

<a id='2-2'></a>
### 2.2 - New Releases Endpoint

The `get_new_releases` function handles API requests to retrieve new album releases with pagination support.

In [5]:
def get_new_releases(url: str, access_token: str, offset: int=0, limit: int=20, next: str="") -> Dict[Any, Any]:
    """Perform get() request to new releases endpoint

    Args:
        url (str): Base url for the request
        access_token (str): Access token
        offset (int, optional): Page offset for pagination. Defaults to 0.
        limit (int, optional): Number of elements per page. Defaults to 20.
        next (str, optional): Next URL to perform next request. Defaults to "".

    Returns:
        Dict[Any, Any]: Request response
    """

    if next == "":        
        request_url = f"{url}?offset={offset}&limit={limit}"
    else: 
        request_url = f"{next}"

    headers = get_auth_header(access_token=access_token)
    
    try: 
        response = requests.get(url=request_url, headers=headers)
        return response.json()
    except Exception as err:
        print(f"Error requesting data: {err}")
        return {'error': err}
        
URL_NEW_RELEASES = "https://api.spotify.com/v1/browse/new-releases"

# Note: the `access_token` value from the dictionary `token` can be retrieved either using `get()` method or dictionary syntax `token['access_token']`
releases_response = get_new_releases(url=URL_NEW_RELEASES, access_token=token.get('access_token'))

In [6]:
releases_response

{'albums': {'href': 'https://api.spotify.com/v1/browse/new-releases?offset=0&limit=20',
  'items': [{'album_type': 'album',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7owHrEghIYMf5fTVPPwkVB'},
      'href': 'https://api.spotify.com/v1/artists/7owHrEghIYMf5fTVPPwkVB',
      'id': '7owHrEghIYMf5fTVPPwkVB',
      'name': 'Loun',
      'type': 'artist',
      'uri': 'spotify:artist:7owHrEghIYMf5fTVPPwkVB'}],
    'available_markets': ['AR',
     'AU',
     'AT',
     'BE',
     'BO',
     'BR',
     'BG',
     'CA',
     'CL',
     'CO',
     'CR',
     'CY',
     'CZ',
     'DK',
     'DO',
     'DE',
     'EC',
     'EE',
     'SV',
     'FI',
     'FR',
     'GR',
     'GT',
     'HN',
     'HK',
     'HU',
     'IS',
     'IE',
     'IT',
     'LV',
     'LT',
     'LU',
     'MY',
     'MT',
     'MX',
     'NL',
     'NZ',
     'NI',
     'NO',
     'PA',
     'PY',
     'PE',
     'PH',
     'PL',
     'PT',
     'SG',
     'SK',
     'ES',
     'S

The result you get is a JSON object that was transformed into a python dictionary. You can explore the structure of the response you get:

In [7]:
releases_response.keys()

dict_keys(['albums'])

In [8]:
releases_response.get('albums').keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

Each API manages responses in its own way so it is highly recommended to read the documentation and understand the nuances behind the API endpoints you are working with. In this case, you see some fields such as `'href'` under the `'albums'` field, which tells you the URL used for the request you just sent.

In [9]:
releases_response.get('albums').get('total')

100

You can see that there are two parameters: `offset` and `limit` that were added to the endpoint. Those parameters are the base of pagination in this API endpoint. We will take a look at them later. 

You can also explore the returned items using the `'items'` field under `'albums'`. This will return a list of items, you can take a look at the number of items returned:

In [10]:
len(releases_response.get('albums').get('items'))

20

Explore the items:

In [11]:
releases_response.get('albums').get('items')[0]

{'album_type': 'album',
 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7owHrEghIYMf5fTVPPwkVB'},
   'href': 'https://api.spotify.com/v1/artists/7owHrEghIYMf5fTVPPwkVB',
   'id': '7owHrEghIYMf5fTVPPwkVB',
   'name': 'Loun',
   'type': 'artist',
   'uri': 'spotify:artist:7owHrEghIYMf5fTVPPwkVB'}],
 'available_markets': ['AR',
  'AU',
  'AT',
  'BE',
  'BO',
  'BR',
  'BG',
  'CA',
  'CL',
  'CO',
  'CR',
  'CY',
  'CZ',
  'DK',
  'DO',
  'DE',
  'EC',
  'EE',
  'SV',
  'FI',
  'FR',
  'GR',
  'GT',
  'HN',
  'HK',
  'HU',
  'IS',
  'IE',
  'IT',
  'LV',
  'LT',
  'LU',
  'MY',
  'MT',
  'MX',
  'NL',
  'NZ',
  'NI',
  'NO',
  'PA',
  'PY',
  'PE',
  'PH',
  'PL',
  'PT',
  'SG',
  'SK',
  'ES',
  'SE',
  'CH',
  'TW',
  'TR',
  'UY',
  'US',
  'GB',
  'AD',
  'LI',
  'MC',
  'ID',
  'JP',
  'TH',
  'VN',
  'RO',
  'IL',
  'ZA',
  'SA',
  'AE',
  'BH',
  'QA',
  'OM',
  'KW',
  'EG',
  'MA',
  'DZ',
  'TN',
  'LB',
  'JO',
  'PS',
  'IN',
  'BY',
  'KZ',
  'MD

<a id='2-3'></a>
### 2.3 - Pagination Implementation

The Spotify API returns paginated results with `limit`, `offset`, `next`, and `total` fields. Two pagination approaches are demonstrated:

1. **Offset-based**: Manually calculate next page using `offset` + `limit`
2. **URL-based**: Use the provided `next` URL directly

Example pagination response structure:
```json
{
  "limit": 20,
  "next": "https://api.spotify.com/v1/browse/new-releases?offset=20&limit=20",
  "offset": 0,
  "previous": null,
  "total": 100
}
```

In [12]:
next_releases_response = get_new_releases(url=URL_NEW_RELEASES, access_token=token.get('access_token'), offset=20, limit=20)

Check the values for `href` and `next` in the new response `next_releases_response`:

In [13]:
next_releases_response.get('albums').get('href')

'https://api.spotify.com/v1/browse/new-releases?offset=20&limit=20'

In [14]:
next_releases_response.get('albums').get('next')

'https://api.spotify.com/v1/browse/new-releases?offset=40&limit=20'

Given these results, you can see that the `offset` increases by the value of the `limit`. As the responses show that the `total` value is 100, this means that you can access the last page of responses by using an `offset` of 80, while keeping the `limit` value as 20.

In [15]:
last_releases_response = get_new_releases(url=URL_NEW_RELEASES, access_token=token.get('access_token'), offset=80, limit=20)

In [16]:
print(last_releases_response.get('albums').get('previous'))
print(last_releases_response.get('albums').get('next'))

https://api.spotify.com/v1/browse/new-releases?offset=60&limit=20
None


You can see that the value of the `next` field is `None`, indicating that you reached the last page. On the other hand, you can see that `previous` contains the URL to request the data from the previous page, so you can even go backward if required.

#### Method 1: Offset-based Pagination

This function iterates through all pages using manual offset calculation:
  

In [17]:
def paginated_new_releases(endpoint_request: Callable, url: str, access_token: str, offset: int=0, limit: int=20) -> list:
    """Perform pagination over API request using offset-based approach

    Args:
        endpoint_request (Callable): Function that performs the API Calls
        url (str): Endpoint's URL for the request
        access_token (str): Access token
        offset (int, optional): Offset of the page's request. Defaults to 0.
        limit (int, optional): Limit of the page's request. Defaults to 20.

    Returns:
        list: List with the requested items
    """
    
    responses = []
    
    kwargs = { 
        "url": url,
        "access_token": access_token,
        "offset": offset,
        "limit": limit,
    } 

    response = endpoint_request(**kwargs)
    responses.extend(response.get('albums').get('items'))
    total_elements = response.get('albums').get('total')

    while offset < total_elements:
        offset = response.get('albums').get('offset') + limit
        kwargs = { 
            "url": url,
            "access_token": access_token,
            "offset": offset,
            "limit": limit,
        } 
        
        response = endpoint_request(**kwargs)
        responses.extend(response.get('albums').get('items'))
        
        print(f"Finished iteration for page with offset: {offset-limit}")

    return responses

Execute the offset-based pagination function to retrieve all new releases:

In [18]:
responses = paginated_new_releases(endpoint_request=get_new_releases,
                                   url=URL_NEW_RELEASES, 
                                   access_token=token.get('access_token'), 
                                   offset=0, limit=20)

Finished iteration for page with offset: 0
Finished iteration for page with offset: 20
Finished iteration for page with offset: 40
Finished iteration for page with offset: 60
Finished iteration for page with offset: 80


##### __Expected Output__ 
```text
Finished iteration for page with offset: 0
Finished iteration for page with offset: 20
Finished iteration for page with offset: 40
Finished iteration for page with offset: 60
Finished iteration for page with offset: 80
```

Have a look at one of the item:

In [19]:
responses[0]

{'album_type': 'album',
 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7owHrEghIYMf5fTVPPwkVB'},
   'href': 'https://api.spotify.com/v1/artists/7owHrEghIYMf5fTVPPwkVB',
   'id': '7owHrEghIYMf5fTVPPwkVB',
   'name': 'Loun',
   'type': 'artist',
   'uri': 'spotify:artist:7owHrEghIYMf5fTVPPwkVB'}],
 'available_markets': ['AR',
  'AU',
  'AT',
  'BE',
  'BO',
  'BR',
  'BG',
  'CA',
  'CL',
  'CO',
  'CR',
  'CY',
  'CZ',
  'DK',
  'DO',
  'DE',
  'EC',
  'EE',
  'SV',
  'FI',
  'FR',
  'GR',
  'GT',
  'HN',
  'HK',
  'HU',
  'IS',
  'IE',
  'IT',
  'LV',
  'LT',
  'LU',
  'MY',
  'MT',
  'MX',
  'NL',
  'NZ',
  'NI',
  'NO',
  'PA',
  'PY',
  'PE',
  'PH',
  'PL',
  'PT',
  'SG',
  'SK',
  'ES',
  'SE',
  'CH',
  'TW',
  'TR',
  'UY',
  'US',
  'GB',
  'AD',
  'LI',
  'MC',
  'ID',
  'JP',
  'TH',
  'VN',
  'RO',
  'IL',
  'ZA',
  'SA',
  'AE',
  'BH',
  'QA',
  'OM',
  'KW',
  'EG',
  'MA',
  'DZ',
  'TN',
  'LB',
  'JO',
  'PS',
  'IN',
  'BY',
  'KZ',
  'MD

You can check the `responses` variable to see if all the elements were downloaded successfully.

In [20]:
len(responses)

100

With the `paginated_new_releases` function that you created, you are now able to get all 100 available items.

#### Method 2: URL-based Pagination

This function uses the `next` URL provided by the API for simpler pagination:

In [21]:
def paginated_with_next_new_releases(endpoint_request: Callable, url: str, access_token: str) -> list:
    """Manage pagination using the 'next' URL from API responses

    Args:
        endpoint_request (Callable): Function that performs API request
        url (str): Base URL for the request
        access_token (str): Access token

    Returns:
        list: Responses stored in a list
    """
    responses = []
        
    next_page = url
    
    kwargs = {
            "url": url,
            "access_token": access_token,
            "next": ""
        }
    
    while next_page:
        response = endpoint_request(**kwargs)
        responses.extend(response.get('albums').get('items'))
        next_page = response.get('albums').get('next')
        kwargs["next"] = next_page
        
        print(f"Executed request with URL: {response.get('albums').get('href')}.")
                
    return responses
    

Execute the URL-based pagination function:

In [22]:
responses_with_next = paginated_with_next_new_releases(endpoint_request=get_new_releases, 
                                                             url=URL_NEW_RELEASES, 
                                                             access_token=token.get('access_token'))

Executed request with URL: https://api.spotify.com/v1/browse/new-releases?offset=0&limit=20.
Executed request with URL: https://api.spotify.com/v1/browse/new-releases?offset=20&limit=20.
Executed request with URL: https://api.spotify.com/v1/browse/new-releases?offset=40&limit=20.
Executed request with URL: https://api.spotify.com/v1/browse/new-releases?offset=60&limit=20.
Executed request with URL: https://api.spotify.com/v1/browse/new-releases?offset=80&limit=20.


Have a look at one of the responses:

In [23]:
responses_with_next[0]

{'album_type': 'album',
 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7owHrEghIYMf5fTVPPwkVB'},
   'href': 'https://api.spotify.com/v1/artists/7owHrEghIYMf5fTVPPwkVB',
   'id': '7owHrEghIYMf5fTVPPwkVB',
   'name': 'Loun',
   'type': 'artist',
   'uri': 'spotify:artist:7owHrEghIYMf5fTVPPwkVB'}],
 'available_markets': ['AR',
  'AU',
  'AT',
  'BE',
  'BO',
  'BR',
  'BG',
  'CA',
  'CL',
  'CO',
  'CR',
  'CY',
  'CZ',
  'DK',
  'DO',
  'DE',
  'EC',
  'EE',
  'SV',
  'FI',
  'FR',
  'GR',
  'GT',
  'HN',
  'HK',
  'HU',
  'IS',
  'IE',
  'IT',
  'LV',
  'LT',
  'LU',
  'MY',
  'MT',
  'MX',
  'NL',
  'NZ',
  'NI',
  'NO',
  'PA',
  'PY',
  'PE',
  'PH',
  'PL',
  'PT',
  'SG',
  'SK',
  'ES',
  'SE',
  'CH',
  'TW',
  'TR',
  'UY',
  'US',
  'GB',
  'AD',
  'LI',
  'MC',
  'ID',
  'JP',
  'TH',
  'VN',
  'RO',
  'IL',
  'ZA',
  'SA',
  'AE',
  'BH',
  'QA',
  'OM',
  'KW',
  'EG',
  'MA',
  'DZ',
  'TN',
  'LB',
  'JO',
  'PS',
  'IN',
  'BY',
  'KZ',
  'MD

<a id='2-4'></a>
### 2.4 - API Rate Limits

Spotify API uses dynamic rate limiting based on a rolling 30-second window. When limits are exceeded, the API returns a 429 status code.

**Rate Limiting Best Practices:**
- Implement exponential backoff for retry logic
- Monitor response headers for rate limit status
- Add delays between requests during bulk operations

**Benchmarking API Performance:**

In [24]:
import time

# Define the Spotify API endpoint
endpoint = 'https://api.spotify.com/v1/browse/new-releases'

headers = get_auth_header(access_token=token.get('access_token'))

# Define the number of requests to make
num_requests = 200

# Define the interval between requests (in seconds)
request_interval = 0.1  # Adjust as needed based on the API rate limit

# Store the timestamps of successful requests
success_timestamps = []

# Make repeated requests to the endpoint
for i in range(num_requests):
    # Make the request
    response = requests.get(url=endpoint, headers=headers)
    
    # Check if the request was successful
    if response.status_code == 200:
        success_timestamps.append(time.time())
    else:        
        print(f'Request {i+1}: Failed with code {response.status_code}')
    
    # Wait for the specified interval before making the next request
    time.sleep(request_interval)

# Calculate the time between successful requests
if len(success_timestamps) > 1:
    time_gaps = [success_timestamps[i] - success_timestamps[i-1] for i in range(1, len(success_timestamps))]
    print(f'Average time between successful requests: {sum(time_gaps) / len(time_gaps):.2f} seconds')
else:
    print('At least two successful requests are needed to calculate the time between requests.')

Average time between successful requests: 0.55 seconds


<a id='3'></a>
## 3 - Batch pipeline

Now that you have learned the basics of working with APIs, let's create a pipeline that extracts the track information for the new released albums. For that, you will use two endpoints:
* The same [Get New Releases endpoint](https://developer.spotify.com/documentation/web-api/reference/get-new-releases) you used in the previous exercises.
* The [Get Album Tracks endpoint](https://developer.spotify.com/documentation/web-api/reference/get-an-albums-tracks). This endpoint allows you to get Spotify catalog information about an album’s tracks.

In the `src/` folder, you are given three scripts (`authentication.py`, `endpoint.py` and `main.py`) that will allow you to perform such extraction.
- The `endpoint.py` file contains two paginated api calls. The first one `get_paginated_new_releases`allows you get the list of new album releases using the same paginated call you used in the first part. The second one `get_paginated_album_tracks` allows you to get Spotify catalog information about an album’s tracks using the Get Album Tracks endpoint. 
- The `authentication.py` file contains the script of the `get_token` function that returns an access token.
- The `main.py` file calls the first paginated API call to get the ids of the new albums. Then for each album id, the second paginated API call is performed to extract the catalog information for each album id. 

At this moment, the code manages paginated requests but we haven't taken into account that our access token has a limited time, so if your pipeline requests last more than 3600 seconds, you can get a 401 status code error. So the first step is to write a routine that handles token refresh in the `get_paginated_new_releases`. Follow the instructions to implement this routine.

### Token Refresh Implementation

The pipeline includes automatic token refresh functionality to handle expired access tokens (401 errors). This ensures uninterrupted data collection for long-running batch operations.

**Key Features:**
- Automatic detection of token expiration
- Seamless token refresh without data loss
- Retry mechanism for failed requests
- Graceful error handling for refresh failures

### Album Track Extraction

After retrieving new album releases, the pipeline extracts detailed track information for each album using the [Get Album Tracks endpoint](https://developer.spotify.com/documentation/web-api/reference/get-an-albums-tracks).

**Process Flow:**
1. Extract album IDs from new releases response
2. For each album ID, construct the tracks endpoint URL
3. Paginate through all tracks for the album
4. Collect comprehensive track metadata

**Endpoint Structure:**
```
https://api.spotify.com/v1/albums/{album_id}/tracks
```

### Pagination Implementation for Album Tracks

The `get_paginated_album_tracks` function in `src/endpoint.py` handles:

- **URL Construction**: Builds the complete endpoint URL with album ID
- **Authorization Headers**: Formats the access token for API requests  
- **Pagination Loop**: Iterates through all pages of track results
- **Token Refresh**: Automatically handles expired tokens (401 errors)
- **Data Collection**: Aggregates all tracks from paginated responses
- **Error Handling**: Manages network errors and API failures

This function ensures complete track data extraction for each album, regardless of the number of tracks or API pagination requirements.

### Pipeline Orchestration

The main pipeline in `src/main.py` orchestrates the complete data extraction process:

**Pipeline Steps:**
1. **Authentication**: Initialize OAuth 2.0 credentials and obtain access token
2. **New Releases**: Extract all new album releases using pagination
3. **Album Processing**: For each album, extract complete track information
4. **Data Aggregation**: Combine all track data into a structured format
5. **Export**: Save results to timestamped JSON files

**Function Parameters:**
- `base_url`: Albums endpoint base URL
- `access_token`: OAuth 2.0 access token
- `album_id`: Spotify album identifier
- `get_token`: Token refresh function
- `**kwargs`: Authentication parameters for token refresh

The pipeline aggregates track data for each album into a structured dictionary format, using album IDs as keys. The final dataset is exported to a timestamped JSON file to prevent filename collisions and maintain data versioning.

Run the following commands in the terminal to run the `main.py` script:

```bash
cd src
python main.py
```

Once the script is finished, you should be able to see a file named `album_items_<DATETIME>.json` in the folder `src`.

<a id='4'></a>
## 4 - Optional - Spotipy SDK

In several cases, the API developers also provide a Software Development Kit (SDK) to connect and perform requests to the different endpoints of the API without the necessity of creating the code from scratch. For Spotify Web API they developed the [Spotipy SDK](https://spotipy.readthedocs.io/en/2.22.1/) to do it. Let's see an example of how it will work to replicate the extraction of data from the new album releases endpoint in a paginated way.

In [30]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

In [31]:
credentials = SpotifyClientCredentials(
        client_id=CLIENT_ID, client_secret=CLIENT_SECRET
    )

spotify = spotipy.Spotify(client_credentials_manager=credentials)

You can see that the `credentials` object handles the authentication process and contains the token to be used in later requests.

*Note*: Please ignore the `DeprecationWarning` message if you see an access token in the output.

In [32]:
credentials.get_access_token()

  credentials.get_access_token()


{'access_token': 'BQBAL4L5su6SstsWfXZQOA5srnSZmHeUaK4tJR8KaxHdGRDH0SXlkzvkMT3S6d0KpRMczdOmBMoixSRuLQCAva4T5i9AwiEac7LpdVzel4BOK9SIqtPeEDzGo-wI1dM2oITKQumIYAI',
 'token_type': 'Bearer',
 'expires_in': 3600,
 'expires_at': 1756949879}

Let's get data from of the new album releases, as you did in the previous example:

In [33]:
limit = 20
response = spotify.new_releases(limit=limit)

You can also paginate through these responses. If you check the documentation of the [`new_releases` method](https://spotipy.readthedocs.io/en/2.22.1/#spotipy.client.Spotify.new_releases), you can see that you can specify the parameter `offset`, as you previously did. 

### SDK Pagination Implementation

Implementing pagination with the Spotipy SDK provides a simplified approach compared to manual API calls. The SDK handles much of the complexity while still allowing fine-grained control over pagination parameters.

**Key Benefits of SDK Approach:**
- **Simplified Authentication**: Automatic token management
- **Built-in Pagination**: Native support for offset-based pagination
- **Error Handling**: Integrated retry and error handling mechanisms
- **Type Safety**: Better IDE support and documentation

**Implementation Details:**
- Collect initial response data and metadata
- Calculate pagination parameters based on total available items
- Iterate through all pages using offset increments
- Aggregate results for comprehensive data collection

In [34]:
def paginated_new_releases_sdk(limit: int=20) -> list:
    """Retrieve all new releases using Spotipy SDK with pagination
    
    Args:
        limit (int): Number of items per page (max 50)
        
    Returns:
        list: List of all new release albums
    """
    album_data = []
    
    # Get initial response
    response = spotify.new_releases(limit=limit)
    album_data.extend(response['albums']['items'])
    total_albums_elements = response['albums']['total']
    
    # Calculate offset indices for remaining pages
    offset_idx = list(range(limit, total_albums_elements, limit))

    # Paginate through remaining pages
    for idx in offset_idx:
        response_page = spotify.new_releases(limit=limit, offset=idx)
        album_data.extend(response_page['albums']['items'])
        
    return album_data
    
# Execute SDK pagination
album_data_sdk = paginated_new_releases_sdk()
print(f"Total albums retrieved: {len(album_data_sdk)}")
album_data_sdk[0]

Total albums retrieved: 100


{'album_type': 'album',
 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7owHrEghIYMf5fTVPPwkVB'},
   'href': 'https://api.spotify.com/v1/artists/7owHrEghIYMf5fTVPPwkVB',
   'id': '7owHrEghIYMf5fTVPPwkVB',
   'name': 'Loun',
   'type': 'artist',
   'uri': 'spotify:artist:7owHrEghIYMf5fTVPPwkVB'}],
 'available_markets': ['AR',
  'AU',
  'AT',
  'BE',
  'BO',
  'BR',
  'BG',
  'CA',
  'CL',
  'CO',
  'CR',
  'CY',
  'CZ',
  'DK',
  'DO',
  'DE',
  'EC',
  'EE',
  'SV',
  'FI',
  'FR',
  'GR',
  'GT',
  'HN',
  'HK',
  'HU',
  'IS',
  'IE',
  'IT',
  'LV',
  'LT',
  'LU',
  'MY',
  'MT',
  'MX',
  'NL',
  'NZ',
  'NI',
  'NO',
  'PA',
  'PY',
  'PE',
  'PH',
  'PL',
  'PT',
  'SG',
  'SK',
  'ES',
  'SE',
  'CH',
  'TW',
  'TR',
  'UY',
  'US',
  'GB',
  'AD',
  'LI',
  'MC',
  'ID',
  'JP',
  'TH',
  'VN',
  'RO',
  'IL',
  'ZA',
  'SA',
  'AE',
  'BH',
  'QA',
  'OM',
  'KW',
  'EG',
  'MA',
  'DZ',
  'TN',
  'LB',
  'JO',
  'PS',
  'IN',
  'BY',
  'KZ',
  'MD

In [35]:
len(album_data_sdk)

100

## Summary

This project demonstrates comprehensive API data ingestion techniques including:

- **OAuth 2.0 Authentication**: Secure token-based API access
- **Manual Pagination**: Custom implementation for complete data retrieval
- **SDK Integration**: Simplified API interaction using official libraries
- **Token Management**: Automatic refresh handling for long-running operations
- **Data Processing**: Structured extraction and export of music metadata

The implementation provides both educational value and production-ready code for building scalable music analytics pipelines.