# Extraction Notebook

**T√≠tulo:** 
- Extracci√≥n de datos desde la API de Spotify  

**Descripci√≥n:**
- Este notebook se encarga de obtener datos de la API de Spotify, incluyendo los top 50 canciones por pa√≠s, top canciones por g√©nero y otros datos relevantes para el an√°lisis. ¬°Prep√°rate para el gran retorno! En este apartado, nuestros datos viajar√°n desde la nube hasta un impresionante DataFrame de pandas, listo para la acci√≥n. ¬°Y eso no es todo! Como parte de nuestro emocionante espect√°culo de datos, tambi√©n generaremos un archivo en formato PICKLE. ¬°Esto promete ser √©pico!

**Returns:** 
- `df_unprocessed.pkl` (DataFrame): El DataFrame sin procesar obtenido en la fase de extracci√≥n.

## Imports

In [1]:
import pandas as pd
import numpy as np

from requests import get, post
import base64
import json

import time
from datetime import datetime, timedelta

In [2]:
from dotenv import load_dotenv
import os

load_dotenv()

True

## Credenciales desde .env

In [3]:
# Client ID y Client Secret
client_id = os.getenv("SPOTIFY_CLIENT_ID")
client_secret = os.getenv("SPOTIFY_CLIENT_SECRET")

## API SPOTIFY - Documentaci√≥n

https://developer.spotify.com/documentation/web-api

## C√≥digo Color

Para incluir en los print

In [4]:
colors = {
    "black": "\033[30m",
    "red": "\033[31m",
    "green": "\033[32m",
    "yellow": "\033[33m",
    "blue": "\033[34m",
    "purple": "\033[35m",
    "cyan": "\033[36m",
    "white": "\033[37m",
    "light_gray": "\033[37;1m",
    "reset": "\033[0m"
}

## C√≥digos de Pa√≠s

| C√≥digo | Pa√≠s                              | C√≥digo | Pa√≠s                              |
|-------|-----------------------------------|-------|-----------------------------------|
| AD     | Andorra                           | MD     | Moldova, Republic of              |
| AE     | United Arab Emirates              | ME     | Montenegro                        |
| AF     | Afghanistan                       | MF     | Saint Martin (French part)        |
| AG     | Antigua and Barbuda               | MG     | Madagascar                        |
| AI     | Anguilla                          | MH     | Marshall Islands                  |
| AL     | Albania                           | MK     | North Macedonia                   |
| AM     | Armenia                           | ML     | Mali                              |
| AO     | Angola                            | MM     | Myanmar                           |
| AQ     | Antarctica                        | MN     | Mongolia                          |
| AR     | Argentina                         | MO     | Macao                             |
| AS     | American Samoa                    | MP     | Northern Mariana Islands          |
| AT     | Austria                           | MQ     | Martinique                        |
| AU     | Australia                         | MR     | Mauritania                        |
| AW     | Aruba                             | MS     | Montserrat                        |
| AX     | √Öland Islands                     | MT     | Malta                             |
| AZ     | Azerbaijan                        | MU     | Mauritius                         |
| BA     | Bosnia and Herzegovina            | MV     | Maldives                          |
| BB     | Barbados                          | MW     | Malawi                            |
| BD     | Bangladesh                        | MX     | Mexico                            |
| BE     | Belgium                           | MY     | Malaysia                          |
| BF     | Burkina Faso                      | MZ     | Mozambique                        |
| BG     | Bulgaria                          | NA     | Namibia                           |
| BH     | Bahrain                           | NC     | New Caledonia                     |
| BI     | Burundi                           | NE     | Niger                             |
| BJ     | Benin                             | NF     | Norfolk Island                    |
| BL     | Saint Barth√©lemy                  | NG     | Nigeria                           |
| BM     | Bermuda                           | NI     | Nicaragua                         |
| BN     | Brunei Darussalam                 | NL     | Netherlands, Kingdom of           |
| BO     | Bolivia, Plurinational State of   | NO     | Norway                            |
| BQ     | Bonaire, Sint Eustatius and Saba  | NP     | Nepal                             |
| BR     | Brazil                            | NR     | Nauru                             |
| BS     | Bahamas                           | NU     | Niue                              |
| BT     | Bhutan                            | NZ     | New Zealand                       |
| BV     | Bouvet Island                     | OM     | Oman                              |
| BW     | Botswana                          | PA     | Panama                            |
| BY     | Belarus                           | PE     | Peru                              |
| BZ     | Belize                            | PF     | French Polynesia                  |
| CA     | Canada                            | PG     | Papua New Guinea                  |
| CC     | Cocos (Keeling) Islands           | PH     | Philippines                       |
| CD     | Congo, Democratic Republic of     | PK     | Pakistan                          |
| CF     | Central African Republic          | PL     | Poland                            |
| CG     | Congo                             | PM     | Saint Pierre and Miquelon         |
| CH     | Switzerland                       | PN     | Pitcairn                          |
| CI     | C√¥te d'Ivoire                     | PR     | Puerto Rico                       |
| CK     | Cook Islands                      | PS     | Palestine, State of               |
| CL     | Chile                             | PT     | Portugal                          |
| CM     | Cameroon                          | PW     | Palau                             |
| CN     | China                             | PY     | Paraguay                          |
| CO     | Colombia                          | QA     | Qatar                             |
| CR     | Costa Rica                        | RE     | R√©union                           |
| CU     | Cuba                              | RO     | Romania                           |
| CV     | Cabo Verde                        | RS     | Serbia                            |
| CW     | Cura√ßao                           | RU     | Russian Federation                |
| CX     | Christmas Island                  | RW     | Rwanda                            |
| CY     | Cyprus                            | SA     | Saudi Arabia                      |
| CZ     | Czechia                           | SB     | Solomon Islands                   |
| DE     | Germany                           | SC     | Seychelles                        |
| DJ     | Djibouti                          | SD     | Sudan                             |
| DK     | Denmark                           | SE     | Sweden                            |
| DM     | Dominica                          | SG     | Singapore                         |
| DO     | Dominican Republic                | SH     | Saint Helena, Ascension and Tristan da Cunha |
| DZ     | Algeria                           | SI     | Slovenia                          |
| EC     | Ecuador                           | SJ     | Svalbard and Jan Mayen            |
| EE     | Estonia                           | SK     | Slovakia                          |
| EG     | Egypt                             | SL     | Sierra Leone                      |
| EH     | Western Sahara                    | SM     | San Marino                        |
| ER     | Eritrea                           | SN     | Senegal                           |
| ES     | Spain                             | SO     | Somalia                           |
| ET     | Ethiopia                          | SR     | Suriname                          |
| FI     | Finland                           | SS     | South Sudan                       |
| FJ     | Fiji                              | ST     | Sao Tome and Principe             |
| FK     | Falkland Islands (Malvinas)       | SV     | El Salvador                       |
| FM     | Micronesia, Federated States of   | SX     | Sint Maarten (Dutch part)         |
| FO     | Faroe Islands                     | SY     | Syrian Arab Republic              |
| FR     | France                            | SZ     | Eswatini                          |
| GA     | Gabon                             | TC     | Turks and Caicos Islands          |
| GB     | United Kingdom of Great Britain and Northern Ireland | TD | Chad            |
| GD     | Grenada                           | TF     | French Southern Territories       |
| GE     | Georgia                           | TG     | Togo                              |
| GF     | French Guiana                     | TH     | Thailand                          |
| GG     | Guernsey                          | TJ     | Tajikistan                        |
| GH     | Ghana                             | TK     | Tokelau                           |
| GI     | Gibraltar                         | TL     | Timor-Leste                       |
| GL     | Greenland                         | TM     | Turkmenistan                      |
| GM     | Gambia                            | TN     | Tunisia                           |
| GN     | Guinea                            | TO     | Tonga                             |
| GP     | Guadeloupe                        | TR     | T√ºrkiye                           |
| GQ     | Equatorial Guinea                 | TT     | Trinidad and Tobago               |
| GR     | Greece                            | TV     | Tuvalu                            |
| GS     | South Georgia and the South Sandwich Islands


In [5]:
countries_dict = {'Andorra': 'AD', 'Moldova, Republic of': 'MD', 'United Arab Emirates': 'AE', 'Montenegro': 'ME',
                  'Afghanistan': 'AF', 'Saint Martin (French part)': 'MF', 'Antigua and Barbuda': 'AG',
                  'Madagascar': 'MG', 'Anguilla': 'AI', 'Marshall Islands': 'MH', 'Albania': 'AL', 'North Macedonia': 'MK',
                  'Armenia': 'AM', 'Mali': 'ML', 'Angola': 'AO', 'Myanmar': 'MM', 'Antarctica': 'AQ', 'Mongolia': 'MN',
                  'Argentina': 'AR', 'Macao': 'MO', 'American Samoa': 'AS', 'Northern Mariana Islands': 'MP', 'Austria': 'AT',
                  'Martinique': 'MQ', 'Australia': 'AU', 'Mauritania': 'MR', 'Aruba': 'AW', 'Montserrat': 'MS', '√Öland Islands': 'AX',
                  'Malta': 'MT', 'Azerbaijan': 'AZ', 'Mauritius': 'MU', 'Bosnia and Herzegovina': 'BA', 'Maldives': 'MV',
                  'Barbados': 'BB', 'Malawi': 'MW', 'Bangladesh': 'BD', 'Mexico': 'MX', 'Belgium': 'BE', 'Malaysia': 'MY',
                  'Burkina Faso': 'BF', 'Mozambique': 'MZ', 'Bulgaria': 'BG', 'Namibia': 'NA', 'Bahrain': 'BH', 'New Caledonia': 'NC',
                  'Burundi': 'BI', 'Niger': 'NE', 'Benin': 'BJ', 'Norfolk Island': 'NF', 'Saint Barth√©lemy': 'BL', 'Nigeria': 'NG',
                  'Bermuda': 'BM', 'Nicaragua': 'NI', 'Brunei Darussalam': 'BN', 'Netherlands, Kingdom of': 'NL',
                  'Bolivia, Plurinational State of': 'BO', 'Norway': 'NO', 'Bonaire, Sint Eustatius and Saba': 'BQ',
                  'Nepal': 'NP', 'Brazil': 'BR', 'Nauru': 'NR', 'Bahamas': 'BS', 'Niue': 'NU', 'Bhutan': 'BT', 'New Zealand': 'NZ',
                  'Bouvet Island': 'BV', 'Oman': 'OM', 'Botswana': 'BW', 'Panama': 'PA', 'Belarus': 'BY', 'Peru': 'PE', 'Belize': 'BZ',
                  'French Polynesia': 'PF', 'Canada': 'CA', 'Papua New Guinea': 'PG', 'Cocos (Keeling) Islands': 'CC',
                  'Philippines': 'PH', 'Congo, Democratic Republic of': 'CD', 'Pakistan': 'PK', 'Central African Republic': 'CF',
                  'Poland': 'PL', 'Congo': 'CG', 'Saint Pierre and Miquelon': 'PM', 'Switzerland': 'CH', 'Pitcairn': 'PN',
                  "C√¥te d'Ivoire": 'CI', 'Puerto Rico': 'PR', 'Cook Islands': 'CK', 'Palestine, State of': 'PS', 'Chile': 'CL',
                  'Portugal': 'PT', 'Cameroon': 'CM', 'Palau': 'PW', 'China': 'CN', 'Paraguay': 'PY', 'Colombia': 'CO', 'Qatar': 'QA',
                  'Costa Rica': 'CR', 'R√©union': 'RE', 'Cuba': 'CU', 'Romania': 'RO', 'Cabo Verde': 'CV', 'Serbia': 'RS',
                  'Cura√ßao': 'CW', 'Russian Federation': 'RU', 'Christmas Island': 'CX', 'Rwanda': 'RW', 'Cyprus': 'CY',
                  'Saudi Arabia': 'SA', 'Czechia': 'CZ', 'Solomon Islands': 'SB', 'Germany': 'DE', 'Seychelles': 'SC', 'Djibouti': 'DJ',
                  'Sudan': 'SD', 'Denmark': 'DK', 'Sweden': 'SE', 'Dominica': 'DM', 'Singapore': 'SG', 'Dominican Republic': 'DO',
                  'Saint Helena, Ascension and Tristan da Cunha': 'SH', 'Algeria': 'DZ', 'Slovenia': 'SI', 'Ecuador': 'EC',
                  'Svalbard and Jan Mayen': 'SJ', 'Estonia': 'EE', 'Slovakia': 'SK', 'Egypt': 'EG', 'Sierra Leone': 'SL',
                  'Western Sahara': 'EH', 'San Marino': 'SM', 'Eritrea': 'ER', 'Senegal': 'SN', 'Spain': 'ES', 'Somalia': 'SO',
                  'Ethiopia': 'ET', 'Suriname': 'SR', 'Finland': 'FI', 'South Sudan': 'SS', 'Fiji': 'FJ', 'Sao Tome and Principe': 'ST',
                  'Falkland Islands (Malvinas)': 'FK', 'El Salvador': 'SV', 'Micronesia, Federated States of': 'FM',
                  'Sint Maarten (Dutch part)': 'SX', 'Faroe Islands': 'FO', 'Syrian Arab Republic': 'SY', 'France': 'FR',
                  'Eswatini': 'SZ', 'Gabon': 'GA', 'Turks and Caicos Islands': 'TC', 'United Kingdom of Great Britain and Northern Ireland': 'GB',
                  'Chad': 'TD', 'Grenada': 'GD', 'French Southern Territories': 'TF', 'Georgia': 'GE', 'Togo': 'TG', 'French Guiana': 'GF',
                  'Thailand': 'TH', 'Guernsey': 'GG', 'Tajikistan': 'TJ', 'Ghana': 'GH', 'Tokelau': 'TK', 'Gibraltar': 'GI',
                  'Timor-Leste': 'TL', 'Greenland': 'GL', 'Turkmenistan': 'TM', 'Gambia': 'GM', 'Tunisia': 'TN', 'Guinea': 'GN', 'Tonga': 'TO',
                  'Guadeloupe': 'GP', 'T√ºrkiye': 'TR', 'Equatorial Guinea': 'GQ', 'Trinidad and Tobago': 'TT', 'Greece': 'GR', 'Tuvalu': 'TV', 'Italy': 'IT'}
inv_countries_dict = {v: k for k, v in countries_dict.items()}

# Funciones

## Obtener Token
### 'get_token()'
Se incluye una validaci√≥n de tiempo restante, para no estar pidiendo un nuevo token siempre, o llegar a quedarnos sin √©l. 

Con esta comprobaci√≥n, si no existe el token, o si el tiempo restante ha llegado a 0, se genera uno nuevo y podemos incluir la funcion en cada llamada a la API.

___
*Note that the access token is valid for 1 hour (3600 seconds).
After that time, the token expires and you need to request a new one.*

___
RUTA Docu para hacer el Encode

https://developer.spotify.com/documentation/web-api/tutorials/client-credentials-flow

https://github.com/spotify/web-api-examples/blob/master/authorization/client_credentials/app.js

In [6]:
# Definir una variable global para almacenar el token y el tiempo restante
global_access_token = None
token_expiry_time = 0

def get_token(print_token = False):
    """
    Obtiene un token de acceso a la API de Spotify utilizando el flujo de credenciales de cliente.
    Args:
        print_token (bool, optional): Indica si se debe imprimir el tiempo restante del token.
    Por defecto es False.
    Returns:
        str: El token de acceso a la API de Spotify.
    """
    global global_access_token
    global token_expiry_time
    
    current_time = int(time.time())
    if global_access_token is None or current_time >= token_expiry_time:
        token_url = 'https://accounts.spotify.com/api/token'
        client_credentials = f"{client_id}:{client_secret}"
        client_credentials_b64 = base64.b64encode(client_credentials.encode()).decode()
        headers = {'Authorization': f'Basic {client_credentials_b64}'}
        params = {'grant_type': 'client_credentials'}
        response = post(token_url, headers=headers, data=params)
        token_data = response.json()
        global_access_token = token_data['access_token']
        token_expiry_time = current_time + token_data['expires_in'] - 60

    # Calcular tiempo restante del token
    tiempo_restante = datetime.fromtimestamp(token_expiry_time) - datetime.now()
    minutos_restantes = tiempo_restante.seconds // 60
    segundos_restantes = tiempo_restante.seconds % 60
    if print_token:
        print(f"{colors['light_gray']}Tiempo restante de token: {minutos_restantes} minutos {segundos_restantes} segundos{colors['reset']}")
        
    return global_access_token

## Headers   
### 'get_headers()'

In [7]:
def get_headers(token):
    """
    Genera los encabezados necesarios para realizar una solicitud a la API de Spotify utilizando un token de acceso.
    Args:
        token (str): El token de acceso a la API de Spotify.
    Returns:
        dict: Un diccionario que contiene los encabezados necesarios para la solicitud, incluyendo el token de acceso.
    Example:
    >>> access_token = get_token()
    >>> headers = get_headers(access_token)
    """
    return {"Authorization": f"Bearer {token}"}

## Codigos de Paises
### get_country_code()

In [8]:
def get_country_code():
    """
    Solicita al usuario el nombre o el c√≥digo del pa√≠s para buscar informaci√≥n y devuelve el c√≥digo del pa√≠s.
    Returns:
    str or None: El c√≥digo del pa√≠s ingresado por el usuario, o None si no se encuentra en la lista.
    Note:
    Para consultar la lista completa de pa√≠ses y c√≥digos, consulte la secci√≥n [C√≥digos de Pa√≠s] en este documento.
    """
    user_input = input(f"{colors['yellow']}Ingresa el nombre o el c√≥digo del pa√≠s para buscar informaci√≥n: ").strip().title()
    if user_input in countries_dict:
        return countries_dict[user_input]
    elif user_input.upper() in countries_dict.values():
        return user_input.upper()
    else:
        print("El pa√≠s o c√≥digo ingresado no se encuentra en la lista.")
        return None

## Atributos de Cancion
### get_track_feature(id)

In [9]:
# Dict de Claves Musica
key_dict = {
    -1: "No detectada",
    0: "C",
    1: "C‚ôØ/D‚ô≠",
    2: "D",
    3: "D‚ôØ/E‚ô≠",
    4: "E",
    5: "F",
    6: "F‚ôØ/G‚ô≠",
    7: "G",
    8: "G‚ôØ/A‚ô≠",
    9: "A",
    10: "A‚ôØ/B‚ô≠",
    11: "B"
}

In [10]:
# Ft Auxiliar Acusticness
def is_acoustic(acousticness, umbral=0.5):
    """
    Determina si una canci√≥n es ac√∫stica bas√°ndose en su valor de acousticness.
    Args:
    - acousticness (float): El valor de acousticness de la canci√≥n.
    - umbral (float): El umbral a partir del cual considerar que una canci√≥n es ac√∫stica. Por defecto, 0.5.
    Returns:
    - bool: True si la canci√≥n es ac√∫stica, False en caso contrario.
    """
    return acousticness > umbral

In [11]:
# Print Auxiliar Get_Features
def print_features(features):
    """
    Imprime las caracter√≠sticas de una pista de Spotify.
    Args:
    - features (dict): Un diccionario que contiene las caracter√≠sticas de la pista, como acousticness, danceability, duration_ms, etc.
    """
    key = key_dict.get(features['key'], "No v√°lida")
    print(f"Key {key}")
    dance_perc = "{:.2f}".format(features['danceability'] * 100)
    print(f"Bailable al {dance_perc}%")
    duracion_ms = features['duration_ms']
    duracion_seg = duracion_ms / 1000
    minutos = int (duracion_seg // 60)
    segundos = int(duracion_seg % 60)
    print(f"Duracion: {minutos}mins {segundos}segs")
    acoustic = is_acoustic(features['acousticness'], umbral=0.3)
    print(f"Ac√∫stica? {acoustic}")

In [12]:
def get_track_feature(id, print_feat = False):
    """
    Obtiene las caracter√≠sticas de una pista de Spotify utilizando su ID.
    Args:
        id (str): El ID de la pista de Spotify.
        print_feat (bool, optional): Indica si se deben imprimir las caracter√≠sticas de la pista. 
            Por defecto es False.
    Returns:
        dict: Un diccionario que contiene las caracter√≠sticas de la pista.
    """
    endpoint = f"https://api.spotify.com/v1/audio-features/{id}"
    token = get_token()
    headers = get_headers(token)
    response = get(url = endpoint, headers = headers)
    data = response.json()
    if print_feat == True:
        print_features(data)
    return data

## Top 50 por Paises
### top50_by_country()

In [13]:
def top50_by_country(country=None, print_feat=False):
    """
    Esta funci√≥n obtiene la lista de reproducci√≥n Top 50 de un pa√≠s especificado por el usuario desde Spotify.
    Args:
    - country (str, opcional): El nombre del pa√≠s para el cual se desea obtener el Top 50.
    Si no se proporciona, se solicitar√° al usuario que ingrese el pa√≠s a trav√©s de la entrada est√°ndar.
    - print_features (bool, opcional): Un indicador booleano que controla si se deben imprimir las caracter√≠sticas de las pistas.
    Returns:
    - tuple: Una tupla que contiene una lista de IDs de canciones correspondientes al Top 50 del pa√≠s especificado y una lista de diccionarios,
    donde cada diccionario contiene la informaci√≥n de una pista en el Top 50 del pa√≠s especificado, incluidas las caracter√≠sticas si se solicitan.
    """
    if country is None:
        country = input(f"{colors['yellow']}Top 50 de...{colors['reset']}")
    token = get_token()
    search_headers = get_headers(token)
    query = f"top 50 {country}"
    search_url = f"https://api.spotify.com/v1/search?q={query}&type=playlist"
    response = get(url=search_url, headers=search_headers)
    data = response.json()
    playlist_id = data['playlists']['items'][0]['id']
    playlist_headers = get_headers(token)
    playlist_url = f"https://api.spotify.com/v1/playlists/{playlist_id}"
    response = get(url=playlist_url, headers=playlist_headers)
    data = response.json()
    ids = []
    tracks_info = []
    headers = get_headers(token)
    print(f"{colors['green']}-----   TOP 50 de {country.upper()}     -----{colors['reset']}")
    for i, song in enumerate(data['tracks']['items'], 1):
        print(f"{colors['cyan']}{i}.{colors['reset']}")
        print(f"{colors['blue']}Cancion > {colors['reset']}{song['track']['name']}")
        id = song['track']['id']
        ids.append(id)
        endpoint = f"https://api.spotify.com/v1/tracks/{id}"
        track_response = get(url=endpoint, headers=headers)
        track_data = track_response.json()
        if print_feat:
            print(f"{colors['blue']}Popularidad > {colors['reset']}{track_data['popularity']}")
            print(f"{colors['blue']}Artista > {colors['reset']}{song['track']['artists'][0]['name']}")
        features = get_track_feature(id, print_feat)
        track_info = {
            'Pais': country.upper(),
            'Orden en el Top 50': i,
            'Nombre de la Cancion': song['track']['name'],
            'Popularidad': track_data['popularity'],
            'Artista': song['track']['artists'][0]['name']
        }
        track_info.update(features)
        tracks_info.append(track_info)
    return ids, tracks_info

## Top Artist by Genre

### top_artist_by_genre(genre)
### get_artist_top10_tracks_features(artist_id)

In [14]:
def get_artist_by_genre(genre):
    token = get_token()
    url = 'https://api.spotify.com/v1/search'
    headers = get_headers(token)
    params = {
        'q': f'genre:"{genre}"',
        'type': 'artist',
        'limit': 10
    }

    response = get(url, headers=headers, params=params)
    return response

In [15]:
def get_artist_top10_tracks_features(href):
    token = get_token()
    headers = get_headers(token)
    # https://api.spotify.com/v1/artists/{id}/top-tracks
    endpoint = f"{href}/top-tracks"
    response = get(url=endpoint, headers=headers)
    top_info = response.json()
    return top_info

# DATAFRAMES

## Creacion del DataFrame - Top50 por Paises
### df_create_top50()

In [16]:
def df_normalize_country(country):
    """
    Normaliza el nombre de un pa√≠s.
    Parameters:
    country (str): El nombre del pa√≠s.
    Returns:
    str: El nombre normalizado del pa√≠s, o el nombre original si no se encuentra en el diccionario inverso.
    """
    normalized_name = inv_countries_dict.get(country.upper())
    if normalized_name:
        return normalized_name
    return country

In [17]:
def df_get_country_list():
    """
    Obtiene una lista de pa√≠ses ingresados por el usuario.
    Returns:
    list: Una lista de pa√≠ses normalizados ingresados por el usuario.
    """
    countries = []
    print(f"{colors['blue']}¬øDe qu√© pa√≠ses deseas obtener el TOP 50?{colors['reset']}")
    print(f"{colors['yellow']}Inserta 'STOP' cuando hayas acabado{colors['reset']}")
    while True:
        answer = input().strip().title()
        if answer == "Stop":
            break
        normalized_country = df_normalize_country(answer)
        if normalized_country in countries_dict:
            if normalized_country not in countries:
                countries.append(normalized_country)
                print(f"{colors['green']}'{normalized_country}' agregado a la lista{colors['reset']}")
            else:
                print(f"{colors['yellow']}El pa√≠s '{normalized_country}' ya est√° en la lista.{colors['reset']}")
        else:
            print(f"{colors['red']}El pa√≠s '{normalized_country}' no est√° en la lista de pa√≠ses v√°lidos.\nPor favor, intenta de nuevo.{colors['reset']}")
    print(countries)
    return countries

In [18]:
def df_create_top50():
    """
    Esta funci√≥n crea un nuevo DataFrame combinando la informaci√≥n de las 50 mejores canciones de varios pa√≠ses.
    Utiliza las funciones df_get_country_list() para obtener una lista de pa√≠ses y top50_by_country(country) para obtener la informaci√≥n de las 50 mejores canciones de cada pa√≠s.
    Returns:
    pandas.DataFrame: Un DataFrame que contiene la informaci√≥n combinada de las 50 mejores canciones de varios pa√≠ses.
    """
    # Consultar de que paises se quiere obtener informacion de su TOP50
    countries = df_get_country_list()
    # Obtenci√≥n de los datos del TOP50
    all_tracks_info = []
    for country in countries:
        tracks_info = top50_by_country(country)
        all_tracks_info.extend(tracks_info)
    # Creaci√≥n del DataFrame
    data_list = []
    for item in all_tracks_info[1::2]:
        for country in item:
            row_data = {}
            for key, value in country.items():
                row_data[key] = value
            data_list.append(row_data)

    df = pd.DataFrame(data_list)
    directory = "./df_unprocessed/"
    if not os.path.exists(directory):
        os.makedirs(directory)
    df.to_pickle(f"{directory}/df_top50_unprocessed.pkl")
    return df

In [19]:
def feature_explanation(columna):
    """
    Esta funci√≥n imprime la explicaci√≥n de la columna especificada.
    Args:
    columna (str): El nombre de la columna a consultar.
    Returns:
    None
    """
    explicaciones = {
        'Pais': 'El pa√≠s al que pertenece la canci√≥n.',
        'Orden en el Top 50': 'La posici√≥n de la canci√≥n en el Top 50 de su pa√≠s.',
        'Nombre de la Cancion': 'El nombre de la canci√≥n.',
        'Popularidad': 'La popularidad de la canci√≥n.',
        'Artista': 'El nombre del artista o grupo que interpreta la canci√≥n.',
        'danceability': 'Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.',
        'energy': 'Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.',
        'key': 'The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C‚ôØ/D‚ô≠, 2 = D, and so on. If no key was detected, the value is -1.\nRange: -1 - 11',
        'loudness': 'The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db.',
        'mode': 'Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.',
        'speechiness': 'Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.',
        'acousticness': 'A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.\nRange: 0 - 1\nExample: 0.00242',
        'instrumentalness': 'Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.',
        'liveness': 'Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.',
        'valence': 'A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).\nRange: 0 - 1',
        'tempo': 'The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.',
        'duration_ms': 'The duration of the track in milliseconds.',
        'time_signature': 'An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of "3/4", to "7/4".\nRange: 3 - 7',
        'duration_mins': 'The duration of the track in minutes:seconds'
    }

    if columna in explicaciones:
        print(f"{colors['cyan']}'{columna}':{colors['reset']} {explicaciones[columna]}")
    else:
        print(f"{colors['red']}No se encontr√≥ una explicaci√≥n para la columna '{columna}'.{colors['reset']}")

In [20]:
%%time

df = df_create_top50()

[34m¬øDe qu√© pa√≠ses deseas obtener el TOP 50?[0m
[33mInserta 'STOP' cuando hayas acabado[0m


 es


[32m'Spain' agregado a la lista[0m


 argentina


[32m'Argentina' agregado a la lista[0m


 colombia


[32m'Colombia' agregado a la lista[0m


 peru


[32m'Peru' agregado a la lista[0m


 mexico


[32m'Mexico' agregado a la lista[0m


 portugal


[32m'Portugal' agregado a la lista[0m


 brazil


[32m'Brazil' agregado a la lista[0m


 france


[32m'France' agregado a la lista[0m


 italy


[32m'Italy' agregado a la lista[0m


 germany


[32m'Germany' agregado a la lista[0m


 greece


[32m'Greece' agregado a la lista[0m


 sweeden


[31mEl pa√≠s 'Sweeden' no est√° en la lista de pa√≠ses v√°lidos.
Por favor, intenta de nuevo.[0m


 sweden


[32m'Sweden' agregado a la lista[0m


 norway


[32m'Norway' agregado a la lista[0m


 au


[32m'Australia' agregado a la lista[0m


 ca


[32m'Canada' agregado a la lista[0m


 egypt


[32m'Egypt' agregado a la lista[0m


 stop


['Spain', 'Argentina', 'Colombia', 'Peru', 'Mexico', 'Portugal', 'Brazil', 'France', 'Italy', 'Germany', 'Greece', 'Sweden', 'Norway', 'Australia', 'Canada', 'Egypt']
[32m-----   TOP 50 de SPAIN     -----[0m
[36m1.[0m
[34mCancion > [0mBADGYAL
[36m2.[0m
[34mCancion > [0mSanta
[36m3.[0m
[34mCancion > [0mADIVINO
[36m4.[0m
[34mCancion > [0mX'CLUSIVO - REMIX
[36m5.[0m
[34mCancion > [0mEl Conjuntito
[36m6.[0m
[34mCancion > [0mGata Only
[36m7.[0m
[34mCancion > [0mYO LO SO√ë√â
[36m8.[0m
[34mCancion > [0mLa Vida Sin Ti
[36m9.[0m
[34mCancion > [0mLUNA
[36m10.[0m
[34mCancion > [0mLA RANGER (feat. Myke Towers)
[36m11.[0m
[34mCancion > [0mBBY BOO - REMIX
[36m12.[0m
[34mCancion > [0mLA SEVILLANA - SEVILLANAS
[36m13.[0m
[34mCancion > [0m100xCiento
[36m14.[0m
[34mCancion > [0mGuay
[36m15.[0m
[34mCancion > [0mLo Que Tiene
[36m16.[0m
[34mCancion > [0mFARDOS
[36m17.[0m
[34mCancion > [0mLA FALDA
[36m18.[0m
[34mCancion > [0mCRUSH
[3

## Creacion del DataFrame - Top10 por Generos
### df_create_top_genres()

In [21]:
def top_artist_by_genre(genre):
    """
    Busca los mejores artistas de un g√©nero espec√≠fico utilizando la API de Spotify.
    Args:
        genre (str): El g√©nero musical para el cual se buscar√°n los mejores artistas.
    Returns:
        DataFrame or None: Un DataFrame que contiene informaci√≥n sobre los mejores artistas encontrados o None si ocurri√≥ un error.
    """
    response = get_artist_by_genre(genre)

    if response.status_code == 200:
        data = response.json()
        top10_dict = {}
        top10_dict['artists_top10'] = []
        top10_dict['followers_top10'] = []
        top10_dict['position_top10'] = []
        top10_dict['hrefs'] = []
        for i, subdata in enumerate(data['artists']['items']):
            for j in range(10):
                top10_dict['artists_top10'].append(subdata['name'])
                top10_dict['followers_top10'].append(subdata['followers']['total'])
                top10_dict['position_top10'].append(i + 1)
                href = subdata['href']
                top10_dict['hrefs'].append(href)
                artist_new_data = get_artist_top10_tracks_features(href)
                for k, info in enumerate(artist_new_data):
                    print(info)
                    top10_dict['track_position'] = j + 1
                    top10_dict['track_name'] = info['name']
                    top10_dict['track_id'] = info['id']
                    top10_dict['album'] = info['album']['name']
                    top10_dict['release_date'] = info['album']['release_date']            
        
        df = pd.DataFrame(top10_dict)
        return df
    else:
        print(f'Error al buscar artistas: {response.status_code}')
        return None

In [22]:
def ask_genres():
    generos_musicales = [
        "Pop",
        "Rock",
        "Hip-Hop/Rap",
        "Electr√≥nica/Dance",
        "Reggae",
        "Country",
        "Jazz",
        "Cl√°sica",
        "R&B/Soul",
        "Indie",
        "Folk",
        "Reguet√≥n",
        "Metal",
        "Blues",
        "Punk",
        "Funk",
        "Ska",
        "Gospel",
        "Dubstep",
        "Ambient"
    ]
    
    genres = []
    while True:
        genre = input("Ingresa un g√©nero musical (o 'fin' para terminar): ").strip().capitalize()
        if genre == 'Fin':
            break
        elif genre in generos_musicales:
            genres.append(genre)
        else:
            print(f"El g√©nero '{genre}' no es v√°lido. Por favor, intenta nuevamente.")
    
    return genres

In [23]:
def get_top10_data():
    genres = ask_genres()
    
    
    top10_dict = {
            'genre': [],
            'artists_top10': [],
            'followers_top10': [],
            'position_top10': [],
            'track_position': [],
            'track_name': [],
            'id': [],
            'album': [],
            'release_date': []
    }
    
    all_tracks_info = []    

    for genre in genres:
        print(genre)
        response = get_artist_by_genre(genre)
        if response.status_code == 200:
            data = response.json()
        for i, subdata in enumerate(data['artists']['items']):
            href = subdata['href']
            artist_new_data = get_artist_top10_tracks_features(href)
            for j, info in enumerate(artist_new_data['tracks']):
                top10_dict['genre'].append(genre)
                artist = subdata['name']
                top10_dict['artists_top10'].append(artist)
                followers = subdata['followers']['total']
                top10_dict['followers_top10'].append(followers)
                pos_top10 = i + 1
                top10_dict['position_top10'].append(pos_top10)
                track_pos = j + 1
                top10_dict['track_position'].append(track_pos)
                track_name = info['name']
                top10_dict['track_name'].append(track_name)
                track_id = info['id']
                top10_dict['id'].append(track_id)
                album = info['album']['name']
                top10_dict['album'].append(album)
                rel_date = info['album']['release_date']
                top10_dict['release_date'].append(rel_date)
                all_tracks_info.append(get_track_feature(track_id))
    df1 = pd.DataFrame(top10_dict)
    df2 = pd.DataFrame(all_tracks_info)
    merged_df = pd.merge(df1, df2, on='id', how='left')
    directory = "./df_unprocessed/"
    if not os.path.exists(directory):
        os.makedirs(directory)
    merged_df.to_pickle("./df_unprocessed/df_top10_unprocessed.pkl")
    return merged_df

In [24]:
%%time

df2 = get_top10_data()

Ingresa un g√©nero musical (o 'fin' para terminar):  pop
Ingresa un g√©nero musical (o 'fin' para terminar):  rock
Ingresa un g√©nero musical (o 'fin' para terminar):  metal
Ingresa un g√©nero musical (o 'fin' para terminar):  indie
Ingresa un g√©nero musical (o 'fin' para terminar):  blues
Ingresa un g√©nero musical (o 'fin' para terminar):  jazz
Ingresa un g√©nero musical (o 'fin' para terminar):  ambient
Ingresa un g√©nero musical (o 'fin' para terminar):  country
Ingresa un g√©nero musical (o 'fin' para terminar):  dubstep
Ingresa un g√©nero musical (o 'fin' para terminar):  fin


Pop
Rock
Metal
Indie
Blues
Jazz
Ambient
Country
Dubstep
CPU times: total: 46.2 s
Wall time: 5min 17s


In [25]:
df2

Unnamed: 0,genre,artists_top10,followers_top10,position_top10,track_position,track_name,id,album,release_date,danceability,...,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,duration_ms,time_signature
0,Pop,Taylor Swift,109763338,1,1,Fortnight (feat. Post Malone),2OzhQlSqBEmt7hmkYxfT6m,THE TORTURED POETS DEPARTMENT,2024-04-18,0.504,...,0.000015,0.0961,0.281,192.004,audio_features,spotify:track:2OzhQlSqBEmt7hmkYxfT6m,https://api.spotify.com/v1/tracks/2OzhQlSqBEmt...,https://api.spotify.com/v1/audio-analysis/2Ozh...,228965,4
1,Pop,Taylor Swift,109763338,1,1,Fortnight (feat. Post Malone),2OzhQlSqBEmt7hmkYxfT6m,THE TORTURED POETS DEPARTMENT,2024-04-18,0.504,...,0.000015,0.0961,0.281,192.004,audio_features,spotify:track:2OzhQlSqBEmt7hmkYxfT6m,https://api.spotify.com/v1/tracks/2OzhQlSqBEmt...,https://api.spotify.com/v1/audio-analysis/2Ozh...,228965,4
2,Pop,Taylor Swift,109763338,1,2,I Can Do It With a Broken Heart,4q5YezDOIPcoLr8R81x9qy,THE TORTURED POETS DEPARTMENT,2024-04-18,0.701,...,0.000000,0.1500,0.220,129.994,audio_features,spotify:track:4q5YezDOIPcoLr8R81x9qy,https://api.spotify.com/v1/tracks/4q5YezDOIPco...,https://api.spotify.com/v1/audio-analysis/4q5Y...,218005,4
3,Pop,Taylor Swift,109763338,1,3,Down Bad,2F3N9tdombb64aW6VtZOdo,THE TORTURED POETS DEPARTMENT,2024-04-18,0.541,...,0.000001,0.0946,0.168,159.707,audio_features,spotify:track:2F3N9tdombb64aW6VtZOdo,https://api.spotify.com/v1/tracks/2F3N9tdombb6...,https://api.spotify.com/v1/audio-analysis/2F3N...,261228,4
4,Pop,Taylor Swift,109763338,1,4,Cruel Summer,1BxfuPKGuaTgP7aM0Bbdwr,Lover,2019-08-23,0.552,...,0.000021,0.1050,0.564,169.994,audio_features,spotify:track:1BxfuPKGuaTgP7aM0Bbdwr,https://api.spotify.com/v1/tracks/1BxfuPKGuaTg...,https://api.spotify.com/v1/audio-analysis/1Bxf...,178427,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1105,Dubstep,SVDDEN DEATH,223166,10,6,Behemoth VIP,0tkwUKjhuBA5UuA95JY2bU,VOYD Vol. 1.5,2019-08-09,0.850,...,0.567000,0.0302,0.564,139.878,audio_features,spotify:track:0tkwUKjhuBA5UuA95JY2bU,https://api.spotify.com/v1/tracks/0tkwUKjhuBA5...,https://api.spotify.com/v1/audio-analysis/0tkw...,114857,4
1106,Dubstep,SVDDEN DEATH,223166,10,7,Behemoth,4o7Rszx7VVCzrCr1RPlPot,VOYD Vol. I,2018-07-30,0.891,...,0.142000,0.0768,0.555,139.933,audio_features,spotify:track:4o7Rszx7VVCzrCr1RPlPot,https://api.spotify.com/v1/tracks/4o7Rszx7VVCz...,https://api.spotify.com/v1/audio-analysis/4o7R...,192000,4
1107,Dubstep,SVDDEN DEATH,223166,10,8,Burn It Down,7q8P4LHGckqsQt4uFnUQ5N,MELLODEATH Tapes Vol. I,2024-03-08,0.550,...,0.000114,0.5050,0.568,140.024,audio_features,spotify:track:7q8P4LHGckqsQt4uFnUQ5N,https://api.spotify.com/v1/tracks/7q8P4LHGckqs...,https://api.spotify.com/v1/audio-analysis/7q8P...,185143,4
1108,Dubstep,SVDDEN DEATH,223166,10,9,Blood On Me,4dblKUfR2u2iQXCQ82awv6,Blood On Me,2020-08-21,0.720,...,0.000447,0.1200,0.346,140.043,audio_features,spotify:track:4dblKUfR2u2iQXCQ82awv6,https://api.spotify.com/v1/tracks/4dblKUfR2u2i...,https://api.spotify.com/v1/audio-analysis/4dbl...,192000,4


# Guardar DFs sin Procesar

In [26]:
# Esto est√° ya incorporado en la funci√≥n Create
# directory = "./df_unprocessed/"
# if not os.path.exists(directory):
#     os.makedirs(directory)
# df.to_pickle("./df_unprocessed/df_top50_unprocessed.pkl")