Dear xxxxxxxx,

We are thrilled to welcome you as a Data Analyst for Gnoosic!

As you know, we are trying to come up with ways to enhance our music recommendations. One of the new features we'd like to research is to recommend songs (not only bands). We're also aware of the limitations of our collaborative filtering algorithms, and would like to give users two new possibilities when searching for recommendations:

- Songs that are actually similar to the ones they picked from an acoustic point of view.
- Songs that are popular around the world right now, independently from their tastes.

Coming up with the perfect song recommender will take us months - no need to stress out too much. In this first week, we want you to explore new data sources for songs. The internet is full of information and our first step is to acquire it do an initial exploration. Feel free to use APIs or directly scrape the web to collect as much information as possible from popular songs. Eventually, we'll need to collect data from millions of songs, but we can start with a few hundreds or thousands from each source and see if the collected features are useful. 

Once the data is collected, we want you to create clusters of songs that are similar to each other. The idea is that if a user inputs a song from one group, we'll prioritize giving them recommendations of songs from that same group.

On Friday, you will present your work to me and Marek, the CEO and founder. Full disclosure: I need you to be very convincing about this whole song-recommender, as this has been my personal push and the main reason we hired you for!

Be open minded about this process: we are agile, and that means that we define our products and features on-the-go, while exploring the tools and the data that's available to us. We'd love you to provide your own vision of the product and the next steps to be taken.

Lots of luck and strength for this first week with us!

Jane


## Imports

### Libraries and Spotify Connection

In [1]:
import pandas as pd
import numpy as np
import random
import pickle
import spotipy
from random import randint
from IPython.display import clear_output

from spotipy.oauth2 import SpotifyClientCredentials

In [2]:
secrets_file = open("/Users/NicolasVollmerMac/Documents/Ironhack-Lessons/6.5 API wrappers, Spotipy/secrets.txt", "r")

In [3]:
string = secrets_file.read()

In [4]:
secrets_dict={}
for line in string.split('\n'):
    if len(line) > 0:
        secrets_dict[line.split(':')[0]]=line.split(':')[1].strip()

### Datafiles

In [9]:
spotify_df = pd.read_csv('CSV Files/spotify_clustered.csv')

In [22]:
hot100_df = pd.read_csv('CSV Files/Hot100_Tracks_Oct_10_2022.csv')

### Pickled Model & Scaler Imports

In [20]:
scaler_pickle = open ('Model Files/scaler.sav', 'rb')
scaler = pickle.load(scaler_pickle)

In [21]:
model_pickle = open ('Model Files/kmeans_cluster_model.sav', 'rb')
model = pickle.load(model_pickle)

## Recommender Build

In [24]:
# Finally figured out a tool that takes either artist or song title input and suggests a random title/artist pair
# couldn't figure out to avoid the tool potentially suggesting the same title/artist pair as user input

# asking user input
srch = str(input("Please enter a popular song title or artist you like and our tool will make a suggestion you might enjoy: \n"))

# setting false flag
x = False

# if srch is present in df, set true flag and proceed, else print Error 
for a in ['title', 'artist']:
    if srch in hot100_df[a].values:
        x = True

# if true flag was set, set random integer based on df index length and return artist and title for corresponding index int
if x == True:
    i = random.randint(0, len(hot100_df.index)-1)
#     for a in ['title', 'artist']:
#         if srch == hot100.at[hot100.index[i],'title']:
#             print("Input Error, please try again with alternate search term")
#         elif srch == hot100.at[hot100.index[i],'artist']:
#             print("Input Error, please try again with alternate search term")
    print("\nYou should try this hot track:", hot100_df.at[hot100_df.index[i],'title'], "\nby Artist(s):", hot100_df.at[hot100_df.index[i],'artist'])

else:
    print("Input Error, please try again with alternate search term")

Please enter a popular song title or artist you like and our tool will make a suggestion you might enjoy: 
Everywhere

You should try this hot track: Hold Me Closer 
by Artist(s): Elton John & Britney Spears


In [25]:
def song_uri(song_id):
    try:
        # Creating the spotipy element for the playlist URI
        query = 'track:'+str(song_id)
        track = sp.search(q=song_id, limit=1)
        return track['tracks']['items'][0]['uri'].split('spotify:track:')[1]                 
    except:
        return 'Null'


In [26]:
# Function to get the song details of a specific URI
def get_details(uri):
    # Creatinga a dataframe with the columns that we need
    playlist_lst = ['danceability','energy','key','loudness','mode', 'speechiness',
                    'instrumentalness','liveness','valence','tempo',
                    'duration_ms','time_signature']
    
    playlist_df = pd.DataFrame(columns = playlist_lst)
    # Get audio features and adding them to the respective columns
    audio_features = sp.audio_features(uri)[0]
    playlist_df.loc[len(playlist_df)] = [audio_features[i] for i in playlist_lst]
    return playlist_df

In [28]:
state = True
while state:
    
    # User inserts search term
    print('Insert song or search term:')
    search=input()
    if search == 'quit':
        state = False
    
    if state:
        # flag that controls if the search string exists in the songs dataset
        exist = False

        # testing if the search string exists
        for i in ['song', 'artist', 'genre']:
            if len(top100_df[top100_df[i].str.contains(search, case = False, regex = False)]) != 0:
                exist = True
        # if the song or search term exists in the dataset we sugest a random song, otherwise we search spotify
        if exist == True:
            clear_output()
            index = random.randint(0,len(top100_df))
            print('\nI have a sugestion! \n\nLink: ','https://open.spotify.com/track/'+top100_df['uri'].values[index],  '\nSong: ',top100_df['song'].values[index],  '\nArtist: ', top100_df['artist'].values[index])      
        else:
            # Using a try calause because the user can insert a song that's not on spotify, and the app will crash
            try:
                clear_output()
                # creating a single row dataframe with all the audio features from the song the user inserted
                df = get_details(song_uri(search))
                # calculating the cluster number for the song the user sugested
                cluster = model.predict(pd.DataFrame(transformer.transform(df), columns = df.columns))
                print('Spotify sugestion!')
                # filtering our dataset to a new dataset just containing the rows of the matching cluster of the user input song
                element = spotify_df[spotify_df['cluster'] == int(cluster)]
                # randomizing a row number from the subset
                index = random.randint(0,len(element))
                print('\nLink: ','https://open.spotify.com/track/'+element['track_id'].values[index],  '\nSong: ',element['track_name'].values[index],  '\nArtist: ', element['artist'].values[index])
            except:
                # Exception in the case that the song doesn't exists in spotify
                print('Invalid song. Sorry!')

Insert song or search term:
Everywhere


KeyError: 'song'