This notebook is to show the process of gathering the top K-pop artists' Spotify ID for the last 25-20 years.
Since there is not objective and comprehensive list of all famous K-pop artists, I used the following two approaches to get as many and objective information as possible.

(1) Ask ChatGPT about the top 50 K-pop artists as of 2010, 2012, 2015, 2018, 2020
(2) Get the all artists from Spotify official playlists "Top-KPop Artists of 2022", "Millenium K-Pop", "Best of 2018: K-Pop" 



In [1]:
import pandas as pd
import numpy as np
!pip install ipykernel




In [3]:
# Config to use spotify API
!pip install spotipy
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import sys
import pprint

from config import my_id, my_secret

ccm = SpotifyClientCredentials(my_id, my_secret)
spotify = spotipy.Spotify(client_credentials_manager = ccm, language='en')

Collecting spotipy
  Downloading spotipy-2.23.0-py3-none-any.whl (29 kB)
Collecting redis>=3.5.3 (from spotipy)
  Downloading redis-4.5.5-py3-none-any.whl (240 kB)
     -------------------------------------- 240.3/240.3 kB 1.1 MB/s eta 0:00:00
Collecting async-timeout>=4.0.2 (from redis>=3.5.3->spotipy)
  Using cached async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Installing collected packages: async-timeout, redis, spotipy
  Attempting uninstall: async-timeout
    Found existing installation: async-timeout 4.0.1
    Uninstalling async-timeout-4.0.1:
      Successfully uninstalled async-timeout-4.0.1
Successfully installed async-timeout-4.0.2 redis-4.5.5 spotipy-2.23.0


In [18]:
# Using Spotify API to get the artists' ID from the playlist and save it in a csv file

# Create a list of the playlist's ID
playlists = ["37i9dQZF1DWUtMIvjJU4QQ", "37i9dQZF1DX5CHqBODmn2c", "37i9dQZF1DWUoY6Ih7vsxr"]


# How do I get the artist ID from the playlist?
# Create a function to get the artist ID from the playlist
def get_artist_id(playlist_id):
    artist_id = []
    for playlist in playlists:
        results = spotify.playlist(playlist_id)
        for item in results['tracks']['items']:
            track = item['track']
            artist_id.append(track['artists'][0]['id'])
    return artist_id

# Using the function to get the artist ID from the playlist

artists_id = []
for playlist in playlists:
    artist_id = get_artist_id(playlist)
    print(artist_id)
    artists_id.append(artist_id)

['3Nrfpe0tUJi4K4DXYWgMUX', '41MozSoPIsD1dJM0CLPjZF', '7n2Ycct7Beij7Dj7meI4X0', '2dIgFjalVxs4ThymZ67YCE', '7nqOGRxlXj7N2JYbgNEjYH', '0ghlgldX5Dd6720Q3qFyQB', '5t5FqBwTcgKTaWmfEbwQY9', '2KC9Qb60EaY0kW4eH68vr3', '6HaGTQPmzraVmaVxvz6EUc', '2AfmfGFbe0A0WsTYm0SDTx', '1z4g3DjTBBZKhvAroFlhOM', '6YVMFz59CuY7ngCxTxjpxE', '5L1lO4eRHmJ7a0Q6csE5cT', '6RHTUrRF63xao58xh9FXYJ', '1gBUSTR3TyDdTVFIaQnc02', '0b1sIQumIAsNbqAoIClSpy', '3HqSLMAZ3g3d5poNaI7GOU', '68KmkJeZGfwe1OUaivBa2L', '7f4ignuCJhLXfZ9giKT7rH', '3cjEqqelV9zb4BYE3qDQ4O', '3KonOYiLsU53m4yT7gNotP', '3JsHnjpbhX4SnySpvpa9DK', '3qNVuliS40BLgXGxhdBdqu', '4SpbR6yFEvexJuaBpgAU5p', '6HvZYsbFfjnjFrWF950C9d', '01XYiBYaoMJcNhPokrg0l0', '2dd5mrQZvg6SmahdgVKDzh', '4Kxlr1PRlDKEB0ekOCyHgX', '4TnGh5PKbSjpYqpIdlW5nz', '5R7AMwDeroq6Ls0COQYpS4', '52zMTJCKluDlFwMQWmccY7', '64k5e9kV9MdukXjFrR5R37', '5RmQ8k4l3HZ8JoPb4mNsML', '4XDi67ZENZcbfKnvMnTYsI', '3OBkZ9NG8F0Fn4oNpg0yuU', '0Sadg1vgvaPqGTOjxu0N6c', '0XATRDCYuuGhk0oE7C0o5G', '1VwDG9aBflQupaFNjUru9A', '1oSPZhvZMI

In [28]:
# Now we have a list of artists' ID from the playlist
artists_id

# Now artists_id has 3 lists of artists' ID from three playlists. I need to combine them into one list and make it a dataframe
# Create a function to combine the lists
def combine_list(lists):
    combined_list = []
    for list in lists:
        combined_list += list
    return combined_list

# Using the function to combine the lists
combined_list = combine_list(artists_id)

# make it a dataframe
df = pd.DataFrame(combined_list, columns=['Spotify_ID'])


In [29]:
df2 = pd.read_csv("Kpop_artist.csv")
df2 = df2.drop(columns=['Artist', 'artist_lower'])

# Combine the two dataframes 
df3 = pd.concat([df, df2], axis=0, ignore_index=True)

In [31]:
# Remove the duplicates
df3 = df3.drop_duplicates(subset=['Spotify_ID'], keep='first')

In [33]:
# Now we have a dataframe with all the artists' ID from the playlist and the Kpop artists' ID
print(df3)

# Export to csv
df3.to_csv("kpop_artist_id.csv")

                 Spotify_ID
0    3Nrfpe0tUJi4K4DXYWgMUX
1    41MozSoPIsD1dJM0CLPjZF
2    7n2Ycct7Beij7Dj7meI4X0
3    2dIgFjalVxs4ThymZ67YCE
4    7nqOGRxlXj7N2JYbgNEjYH
..                      ...
843  1Y02Ts0nV3DTziLQZYVX4V
846  0CmvFWTX9zmMNCUi6fHtAx
849  2GB76Lm833jVI5kLvKEB7Z
850  2HDRkPTO5A59sFcC2ROGbw
854  6lGfLCig2b5mvDTtsPSrb0

[186 rows x 1 columns]


Now I gathered Spotify artists ID of 185 top K-pop artists.
I might need a data cleaning because it includes non-Kpop artists with whom K-pop artists collaborated.