# Model Plan

As Satisfaction Scouts, we want to be able to generate the ultimate ordered-lineup for a concert. In order to do that, we need to determine similarities among various artists to efficiently predict a cohesive lineup plan using the user's preferences. 

In order to execute this model, we would need to convert our dataframe, which is a dataset of tracks, into a dataframe of artists (average_data), and a dataframe of genres (genre_data) using mean statistics for quantitative features, and lists for qualitatitve features. Both dataframes will be independent from each other. That way as the user inputs their preferences, the program will be able to narrow down options to implement K-Nearest Neighbors most accurately. 

There are three important components that we want to consider: artist, genre, and explicit content, all of which will be retrieved from the user. When running the code, the program will first ask the user to input an artist.

## User Input: Artist 

When the user inputs the artist, the program will look for the artist in average_data, and will output that artist's list of genres. If the artist is not in the dataframe, the program will state that the artist cannot be found and will be terminated. The program will then ask the user to input a genre from that list.

## User Input: Genre

When the user inputs a genre from that list, a 2-Nearest Neighbors model will be implemented using genre_data to retrieve two similar genres. The two similar genres, as well as the user input genre, will be appended to a list of three potential genres for the five similar artists. Using those three genres, a new dataframe will be created (new_df) from average_data, where it will be a dataframe of all artists within those three genres. The program will then ask the user whether they want their five recommended artists to have explicit content. If the user doesn't input a genre from the list given, the program will state that the genre is not listed for that artist and will be terminated.

## User Input: Explicit

When the user inputs their explicit preference and either inputs "no" or an invalid response, the program will refer to new_df and remove all the artists that have "True" in their explicit lists. This is to ensure no recommended artist has explicit content. If the user inputs "yes", new_df will remain the same. Note that if the user says "yes", this does not guarantee that the five similar artists will be explicit. Also, if the user inputs an explicit artist, but doesn't want the recommendations to be explicit, the code ensures that the artist will not be removed from new_df.

## Output

Using our cleaned new_df, a 5-Nearest Neighbors model will be implemented to find 5 similar artists. The program will output those 5 artists.

## Limitations and Potential Issues

The dataframes used may contain dead or retired artists. Some tracks given also have fultiple artists, where the artist_popularity statistic is for the first listed artist. This may affect the average statistics for artists and genres. 

After much cleaning of the original datafrane, we are left with very little tracks to work with, resulting in less accurate recommendations. We would need more tracks in our data frame after cleaning to provide more accurate recommendations.

In [37]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import StandardScaler
from datetime import datetime, timedelta

# Load Data

In [38]:
data = pd.read_csv("../Data/dataset.csv")

In [39]:
data

Unnamed: 0,track_id,artists,album_name,track_name,popularity,artist_popularity,release_dates,duration_ms,explicit,danceability,...,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature,track_genre
0,5SuOikwiRyPMVoIQDJUgSV,Gen Hoshino,Comedy,Comedy,73,58,2022-04-08,230666,False,0.676,...,-6.746,0,0.1430,0.03220,0.000001,0.3580,0.7150,87.917,4,World/Folk
1,4qPNDBW1i3p13qLCt0Ki3A,Ben Woodward,Ghost (Acoustic),Ghost - Acoustic,55,42,2021-04-30,149610,False,0.420,...,-17.235,1,0.0763,0.92400,0.000006,0.1010,0.2670,77.489,4,World/Folk
2,1iJBSr7s7jYXzM8EGcbK5b,Ingrid Michaelson;ZAYN,To Begin Again,To Begin Again,57,54,2021-03-17,210826,False,0.438,...,-9.734,1,0.0557,0.21000,0.000000,0.1170,0.1200,76.332,4,World/Folk
3,6lfxq3CG4xtTiEg7opyCyx,Kina Grannis,Crazy Rich Asians (Original Motion Picture Sou...,Can't Help Falling In Love,71,57,2018-08-10,201933,False,0.266,...,-18.515,1,0.0363,0.90500,0.000071,0.1320,0.1430,181.740,3,World/Folk
4,5vjLSffimiIP26QG5WcN2K,Chord Overstreet,Hold On,Hold On,82,59,2017-02-03,198853,False,0.618,...,-9.681,1,0.0526,0.46900,0.000000,0.0829,0.1670,119.949,4,World/Folk
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29234,6X6wIzuxsh7GVNMPz1xTNa,Hillsong Worship,No Other Name,Depths,37,70,2014-07-01,377697,False,0.363,...,-8.232,1,0.0284,0.00887,0.000009,0.7060,0.0687,80.003,4,World/Folk
29235,5y8ARSg47Yx52xvQQAlS35,Mosaic MSC,HUMAN (Deluxe) [Live],Fountain (I Am Good) - Live,22,52,2020-10-09,318874,False,0.438,...,-8.285,1,0.0357,0.02060,0.000013,0.2530,0.1140,139.983,4,World/Folk
29236,6PM55W7WiUmHVPdUebJP55,Planetshakers,Greater (Live),Stay (You Are Good) - Live,38,56,2022-09-02,462397,False,0.296,...,-5.696,1,0.0548,0.07240,0.000003,0.3740,0.1460,139.051,4,World/Folk
29237,0XEgJiDryoDd2gIJhVXghd,Bryan & Katie Torwalt;Brock Human,I've Got Good News (Live) [Deluxe],Hallelujah On My Knees - Live,22,47,2022-07-01,380344,False,0.495,...,-12.070,1,0.0316,0.39200,0.000000,0.6620,0.2060,127.731,4,World/Folk


In [40]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29239 entries, 0 to 29238
Data columns (total 22 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   track_id           29239 non-null  object 
 1   artists            29239 non-null  object 
 2   album_name         29239 non-null  object 
 3   track_name         29239 non-null  object 
 4   popularity         29239 non-null  int64  
 5   artist_popularity  29239 non-null  int64  
 6   release_dates      29239 non-null  object 
 7   duration_ms        29239 non-null  int64  
 8   explicit           29239 non-null  bool   
 9   danceability       29239 non-null  float64
 10  energy             29239 non-null  float64
 11  key                29239 non-null  int64  
 12  loudness           29239 non-null  float64
 13  mode               29239 non-null  int64  
 14  speechiness        29239 non-null  float64
 15  acousticness       29239 non-null  float64
 16  instrumentalness   292

# Data Cleaning and Organization

## Peter's Genre Mapping + Data Cleaning

In [41]:
# Mapping dictionary
genre_mapping = {
    'acoustic': 'World/Folk',
    'afrobeat': 'World/Folk',
    'alt-rock': 'Rock',
    'ambient': 'Electronic',
    'anime': 'World/Folk',
    'black-metal': 'Rock',
    'bluegrass': 'World/Folk',
    'blues': 'World/Folk',
    'breakbeat': 'Electronic',
    'british': 'Rock',
    'chicago-house': 'Electronic',
    'chill': 'Electronic',
    'classical': 'Classical',
    'club': 'Electronic',
    'comedy': 'Other/Miscellaneous',
    'country': 'World/Folk',
    'dance': 'Pop',
    'dancehall': 'World/Folk',
    'death-metal': 'Rock',
    'deep-house': 'Electronic',
    'detroit-techno': 'Electronic',
    'disco': 'Pop',
    'drum-and-bass': 'Electronic',
    'dub': 'Electronic',
    'dubstep': 'Electronic',
    'edm': 'Pop',
    'electro': 'Electronic',
    'electronic': 'Electronic',
    'emo': 'Rock',
    'folk': 'World/Folk',
    'forro': 'World/Folk',
    'funk': 'World/Folk',
    'garage': 'World/Folk',
    'goth': 'Rock',
    'grindcore': 'Rock',
    'groove': 'World/Folk',
    'grunge': 'Rock',
    'guitar': 'World/Folk',
    'happy': 'Other/Miscellaneous',
    'hard-rock': 'Rock',
    'hardcore': 'Rock',
    'hardstyle': 'Electronic',
    'heavy-metal': 'Rock',
    'hip-hop': 'Hip-Hop/Rap',
    'honky-tonk': 'World/Folk',
    'house': 'Electronic',
    'idm': 'Electronic',
    'indie': 'Rock',
    'industrial': 'Rock',
    'j-dance': 'Electronic',
    'j-pop': 'Pop',
    'j-rock': 'Rock',
    'jazz': 'Jazz',
    'malay': 'World/Folk',
    'mandopop': 'Pop',
    'metal': 'Rock',
    'metalcore': 'Rock',
    'minimal-techno': 'Electronic',
    'mpb': 'World/Folk',
    'new-age': 'Classical',
    'party': 'Other/Miscellaneous',
    'piano': 'Classical',
    'pop-film': 'Pop',
    'pop': 'Pop',
    'power-pop': 'Pop',
    'progressive-house': 'Electronic',
    'psych-rock': 'Rock',
    'punk-rock': 'Rock',
    'punk': 'Rock',
    'r-n-b': 'Pop',
    'reggae': 'World/Folk',
    'reggaeton': 'World/Folk',
    'rock-n-roll': 'Rock',
    'rock': 'Rock',
    'rockabilly': 'World/Folk',
    'romance': 'Other/Miscellaneous',
    'sad': 'Other/Miscellaneous',
    'show-tunes': 'Other/Miscellaneous',
    'singer-songwriter': 'World/Folk',
    'ska': 'World/Folk',
    'sleep': 'Other/Miscellaneous',
    'soul': 'World/Folk',
    'study': 'Other/Miscellaneous',
    'synth-pop': 'Pop',
    'tango': 'World/Folk',
    'techno': 'Electronic',
    'trance': 'Electronic',
    'trip-hop': 'Electronic',
    'world-music': 'World/Folk'
}

# Apply the mapping
data['gen_genre'] = data['track_genre'].replace(genre_mapping)



## Rachel's Data Cleaning

For the most accurate (and current) results, we need to do further data cleaning. There are too many features in this dataset that a concert goer doesn't really think about, and some artists in the dataset may be dead or no longer perform. As a group, we've decided to focus popularity, artist popularity, genre, acousticness, danceability, energy, instrumentalness, tempo, and valence. The artist also must have popularity of at least 50. This will ensure that our lineup will be most efficient. We will have the user decide whether the concert will have explicit content.

In [42]:
#Drop duration_ms, key, loudness, mode, speechiness, liveness, and time_signature
data = data.drop(['duration_ms', 'key', 'loudness', 'mode', 'speechiness', 'liveness', 'time_signature'], axis=1)

#Drop rows of tracks where artist popularity is less than 50
data = data[data['artist_popularity'] >= 50]

#Drop Other/Misc
data = data[data['gen_genre'] != 'Other/Miscellaneous']


In [43]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10820 entries, 0 to 29238
Data columns (total 16 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   track_id           10820 non-null  object 
 1   artists            10820 non-null  object 
 2   album_name         10820 non-null  object 
 3   track_name         10820 non-null  object 
 4   popularity         10820 non-null  int64  
 5   artist_popularity  10820 non-null  int64  
 6   release_dates      10820 non-null  object 
 7   explicit           10820 non-null  bool   
 8   danceability       10820 non-null  float64
 9   energy             10820 non-null  float64
 10  acousticness       10820 non-null  float64
 11  instrumentalness   10820 non-null  float64
 12  valence            10820 non-null  float64
 13  tempo              10820 non-null  float64
 14  track_genre        10820 non-null  object 
 15  gen_genre          10820 non-null  object 
dtypes: bool(1), float64(6), int

## Future Notes for Data Cleaning

Futher data cleaning needs to be performed to ensure that the artist is alive and active. The artist's latest track's release date should be sometime within the last 20 years.

In [44]:
#Adjust for datetime and clean data
def fill_missing_date_parts(date_string):
    #try to parse the date as a full date
    try: 
        return datetime.strptime(date_string, '%Y-%m-%d')
    except ValueError:
        try:
            return datetime.strptime(date_string, '%Y').replace(month=1, day = 1)
        except ValueError:
            return None

data['release_dates']=data['release_dates'].apply(fill_missing_date_parts)
data["release_dates"] = pd.to_datetime(data["release_dates"])

#Add 20 year cut off
cutoff_date = datetime.now() - timedelta(days=20*365)
most_recent_release = data.groupby('artists')['release_dates'].max()
data = data[~data['artists'].isin(most_recent_release[most_recent_release <= cutoff_date].index)].copy()

## Organizing for New Dataframes

## Create a List of Genres and Explicit for Each Artist

This is to prepare for organizing our current dataset by artist, rather than by track. This also keeps categorical variables consistent with each other.

In [45]:
artist_genres = data.groupby('artists')['gen_genre'].agg(list).reset_index()
#Duplicate Removal and Sorting
artist_genres['gen_genre'] = artist_genres['gen_genre'].apply(lambda x: sorted(list(set(x))))

artist_genres

Unnamed: 0,artists,gen_genre
0,&ME;Rampa;Adam Port;Sofie Royer,[Electronic]
1,1991,[Electronic]
2,220 KID;LANY,[Electronic]
3,24kGoldn;iann dior,[Pop]
4,3 Doors Down,[Rock]
...,...,...
4255,yaeow;Gustixa,[Electronic]
4256,yaeow;Neptune,[Electronic]
4257,yaeow;Roiael,[Electronic]
4258,yaeow;Rxseboy,[Electronic]


In [46]:
artist_explicit = data.groupby('artists')['explicit'].agg(list).reset_index()

#Duplicate Removal and Sorting
artist_explicit['explicit'] = artist_explicit['explicit'].apply(lambda x: sorted(list(set(x))))

artist_explicit


Unnamed: 0,artists,explicit
0,&ME;Rampa;Adam Port;Sofie Royer,[False]
1,1991,[False]
2,220 KID;LANY,[False]
3,24kGoldn;iann dior,[True]
4,3 Doors Down,[False]
...,...,...
4255,yaeow;Gustixa,[False]
4256,yaeow;Neptune,[False]
4257,yaeow;Roiael,[False]
4258,yaeow;Rxseboy,[False]


Make dataframe of categorical features to make it easier (for me)

In [47]:
categorical_features  = pd.merge(artist_explicit, artist_genres, on='artists')


In [48]:
categorical_features

Unnamed: 0,artists,explicit,gen_genre
0,&ME;Rampa;Adam Port;Sofie Royer,[False],[Electronic]
1,1991,[False],[Electronic]
2,220 KID;LANY,[False],[Electronic]
3,24kGoldn;iann dior,[True],[Pop]
4,3 Doors Down,[False],[Rock]
...,...,...,...
4255,yaeow;Gustixa,[False],[Electronic]
4256,yaeow;Neptune,[False],[Electronic]
4257,yaeow;Roiael,[False],[Electronic]
4258,yaeow;Rxseboy,[False],[Electronic]


# Make a New Dataframe Organized by Artist and Corresponding Average Statistics

Get numerical features

In [49]:
numerical_features = ['popularity', 'artist_popularity', 'danceability', 'energy', 'acousticness', 'instrumentalness', 'valence', 'tempo']

Group dataset.csv by artist by taking the average of track statistics for each artist, named average_data

In [50]:
average_data = data.groupby('artists')[numerical_features].mean().reset_index()

Sort by highest to lowest artist popularity for organizational purposes

In [51]:
average_data = average_data.sort_values(by='artist_popularity', ascending = False)

Merge categorical dataframe to average_data dataset

In [52]:
average_data = pd.merge(average_data, categorical_features, on='artists')


In [53]:
average_data

Unnamed: 0,artists,popularity,artist_popularity,danceability,energy,acousticness,instrumentalness,valence,tempo,explicit,gen_genre
0,Taylor Swift,86.000000,100.0,0.532000,0.623000,0.538000,0.000073,0.4030,89.937000,[False],[Pop]
1,Drake;Travis Scott,83.000000,93.0,0.666000,0.465000,0.050300,0.000000,0.2920,167.937000,[True],[Hip-Hop/Rap]
2,Drake;21 Savage,91.000000,93.0,0.529000,0.673000,0.000307,0.000002,0.3660,165.921000,[True],[Hip-Hop/Rap]
3,The Weeknd;Daft Punk,3.000000,91.0,0.773000,0.820000,0.394000,0.000000,0.5550,92.996000,[False],[Pop]
4,Travis Scott;HVME,81.000000,90.0,0.841000,0.593000,0.418000,0.000000,0.8080,124.917000,[True],[Hip-Hop/Rap]
...,...,...,...,...,...,...,...,...,...,...,...
4255,Noisia;The Outsiders,28.000000,50.0,0.493000,0.739000,0.082000,0.139000,0.1480,114.473000,[False],[Electronic]
4256,Noisia;Skrillex;josh pan;Dylan Brady,48.000000,50.0,0.416000,0.506000,0.040400,0.020200,0.0613,172.028000,[False],[Electronic]
4257,Noisia;NickBee,15.000000,50.0,0.752000,0.760000,0.007030,0.872000,0.1280,171.987000,[False],[Electronic]
4258,Noisia;Nami,29.000000,50.0,0.720000,0.950000,0.002930,0.653000,0.5190,174.028000,[False],[Electronic]


# Make a New Dataframe Organized by Genre and Corresponding Average Statistics

All performers in the concert lineup don't necessarily have to be part of the same genre. When the user selects a genre, we can implement a KNN model on the genres to find 2 similar genres. Using the artist genre and 2 similar genres, the program will then find 5 similar artists within those 3 genres.

In [54]:
genre_data = data.groupby('gen_genre')[numerical_features].mean().reset_index()

In [55]:
genre_data

Unnamed: 0,gen_genre,popularity,artist_popularity,danceability,energy,acousticness,instrumentalness,valence,tempo
0,Classical,28.183417,63.59799,0.383638,0.232918,0.812447,0.602179,0.330703,112.923462
1,Electronic,46.713243,60.179701,0.586163,0.640287,0.263953,0.294265,0.351985,123.955766
2,Hip-Hop/Rap,60.185455,69.305455,0.692847,0.694964,0.235709,0.022028,0.536811,120.101505
3,Jazz,44.714286,59.857143,0.590571,0.287729,0.708571,0.162766,0.382771,119.716
4,Pop,55.392857,67.80662,0.623386,0.661229,0.291776,0.026446,0.520079,119.320132
5,Rock,50.79669,64.632624,0.497906,0.722808,0.160878,0.099073,0.412126,124.227524
6,World/Folk,43.73494,60.330374,0.568499,0.593934,0.364937,0.089631,0.487335,120.968816
7,alternative,52.867188,68.945312,0.599031,0.718727,0.129419,0.021977,0.462616,120.378727


# Predict 5 Similar Artists Using User Input, genre_data, and average_data

We will be implementing two KNN Models. The user will input an artist, and the computer will output the genres that the artist associates with. The user then gets to input one of their genres, and the computer will find two similar genres using genre_data. Out of those three total genres, a new dataframe will be constructed containing all artists that associate with those three genres. This new dataframe may be cleaned furhter depending on the user's explicit preferences. A second KNN model will then be implemented in the new dataframe to find 5 similar artists.

Since the first KNN model will be implemented on genre_data, we will (for now) focus on genre_data.

## Preprocessing genre_data

Define features and target for genres

In [56]:
#Genre
features_genre = genre_data.drop(columns=['gen_genre'])
target_genre = genre_data['gen_genre']

Scale numerical features for genres

In [57]:
scaler = StandardScaler()

X_genre = genre_data[numerical_features]  # quantitative features of artists
X_genre_scaled = scaler.fit_transform(X_genre)  # Scale the quantitative features

## Train KNN Model for genre_data

In [58]:
#Split data into training and testing sets
X_genre_train, X_genre_test, y_genre_train, y_genre_test = train_test_split(features_genre, target_genre, test_size=0.2, random_state=440)

In [59]:
#Train the k-NN model
knn_genre = NearestNeighbors(n_neighbors=2,metric='cosine') #cosine is used for recommendation systems
knn_genre.fit(X_genre_train)

## Model Execution

In [60]:
# Get input artist from the user
input_artist = input("Enter the name of the artist: ").lower()

# Find the index of the input artist in the dataset
input_index = average_data.index[average_data['artists'].str.lower() == input_artist].tolist()

# If input artist not found
if not input_index:
    print("Artist not found.")
else:
    # Find genre list for input_artist
    input_artist_genres = average_data.loc[input_index, 'gen_genre'].values[0]
    
    # Print genres for user    
    print(f"Genres of {average_data.loc[input_index, 'artists'].values[0]}: {input_artist_genres}")

    # Get preferred genre of artist from the user
    selected_genre = input("Select a genre from the list above: ").lower()

    if selected_genre not in [genre.lower() for genre in input_artist_genres]: 
        print("Genre not found for selected artist.")
    else:
        potential_genres = [] #used to make new dataframe of artists
        
        #-------------------START: 2-Nearest Neighbors for selected Genre, using genre_data -------------------
        
        # Index of selected genre in genre_data
        genre_index = genre_data.index[genre_data['gen_genre'].str.lower() == selected_genre].tolist()
        
        # Get the features of the selected genre
        query_features_genre = features_genre.iloc[genre_index]
        
        # Find 2 similar genres
        distances, indices = knn_genre.kneighbors(query_features_genre)
        
        #Get the statistics of similar genres from genre_data
        similar_genres = genre_data.iloc[indices[0]]
        
        #Drop genre identical to user input
        row_drop =[]
        similar_genres_distances = distances[0] 
        for i in range(len(similar_genres_distances)): 
            if similar_genres_distances[i] == 0: 
                row_drop.append(similar_genres.iloc[i].name) 
                similar_genres = similar_genres.drop(index = row_drop) 
                
        #-------------------END: 2-Nearest Neighbors for selected Genre, using genre_data -------------------

        #Store genre neighbors and user input genres in potential_genres
        for index in indices[0]:
            genre_name = genre_data.iloc[index]['gen_genre']
            potential_genres.append(genre_name)
        potential_genres.append(genre_data.loc[genre_index, 'gen_genre'].values[0])
        print(f"The 5 similar artists will come from the following genres: {potential_genres}")
        
        #Make dataframe of potential artists with the potential genres
        new_df = average_data[average_data['gen_genre'].apply(lambda x: potential_genres[0] in x or 
                                                                potential_genres[1] in x or
                                                                potential_genres[2] in x)]
        
        #Ask user if they prefer explicit content
        input_explicit = str(input("Do you want the recommendations to include explicit artists? Type Yes or No: "))
        true = ['yes']
        false = ['no']
        
        # Convert user input to lowercase for case-insensitive comparison
        input_explicit = input_explicit.lower()
        
        # Check if user input is 'yes' or 'no' and convert it to a boolean
        if input_explicit in true:
            input_explicit = True
            print("Based on your response, some artists may have explicit content.")
        elif input_explicit in false:
            input_explicit = False
            print("Based on your response, no artists will have explicit content.")
            
            # Remove explicit artists
            new_df = new_df[(new_df['artists'].str.lower() == input_artist.lower()) | (~new_df['explicit'].apply(lambda x: True in x))]
        else:
            print("Invalid input. No artists will have explicit content.")
            input_explicit = False
            
            # Remove explicit artists
            new_df = new_df[(new_df['artists'].str.lower() == input_artist.lower()) | (~new_df['explicit'].apply(lambda x: True in x))]

            
        #-------------------START: 5-Nearest Neighbors for selected artist, using new_df -------------------
        
        #------------Preprocessing new_df------------
        
        #Define features and target for new_df 
        
        #explicit and genre won't count as features since the user inputted their preferences
        features_artist = new_df.drop(columns=['artists','gen_genre', 'explicit'])
        target_artist = new_df['artists']
        
        #Scale numerical features for artists
        X_artist = new_df[numerical_features]  # quantitative features of artists
        X_artist_scaled = scaler.fit_transform(X_artist)  # Scale the quantitative features
        
        #------------Train KNN Model for new_df------------
        
        #Split data into training and testing sets
        X_artist_train, X_artist_test, y_artist_train, y_artist_test = train_test_split(
            features_artist, target_artist, test_size=0.2, random_state=440)
        
        #Train the KNN Model
        knn_artist = NearestNeighbors(n_neighbors=5,metric='cosine') #cosine is used for recommendation systems
        knn_artist.fit(X_artist_train)
        
        #------------Execution------------
        
        # Find the index of the input artist in new_df (has to be in new_df since selected genre from artist is included)
        new_input_index = new_df.index[new_df['artists'].str.lower() == input_artist].tolist()
        
        # Get the features of the input artist
        query_features_artist = features_artist.iloc[new_input_index]

        # Find k similar artists
        distances, indices = knn_artist.kneighbors(query_features_artist)

        # Get the details of similar artists from the dataset
        similar_artists = new_df.iloc[indices[0]]
        
        #-------------------END: 5-Nearest Neighbors for selected artist, using new_df -------------------

        # Print the details of similar artists
        print("Similar Artists:")
        print(similar_artists)

Genres of Coldplay: ['Pop']
The 5 similar artists will come from the following genres: ['Classical', 'Jazz', 'Pop']
Based on your response, some artists may have explicit content.
Similar Artists:
                                                artists  popularity  \
1972                                   Eric Prydz;Floyd        54.0   
218                          The Chainsmokers;Bob Moses        60.0   
146                             Elton John;Dua Lipa;PS1        64.0   
1494                         Amit Trivedi;Jasleen Royal        65.0   
1492  Shaan;Udit Narayan;Shreya Ghoshal;Sunidhi Chau...        59.0   

      artist_popularity  danceability  energy  acousticness  instrumentalness  \
1972               61.0         0.537   0.937       0.00102          0.035000   
218                79.0         0.707   0.582       0.01530          0.000005   
146                80.0         0.655   0.877       0.00191          0.000000   
1494               65.0         0.688   0.701       

## Current Issues With the Model that Need to be Addressed

For potential_genres, we want 2 similar genres and the user's inputted genre in that list, totalling 3 unique genres. Right now, despite having a code for removal of a neighbor identical to the user input, there are still cases where a neighbor is identical to the user input. we need to fix the code so that this does not happen, as we want 3 unique genres.

Also, because of this current issue, the model's code doesn't even address that same possibility for the artitst, where one of the 5 nearest neighbors could be identical to the user input. Once we can figure out the issue for the genres, we can do the same for the artists.