## MR Music Recommender v.1

### Case Study - The site for recommendations - "Gnod"

#### Scenario

You have been hired as a Data Analyst for “Gnod”.

“Gnod” is a site that provides recommendations for music, art, literature and products based on collaborative filtering algorithms. Their flagship product is the music recommender, which you can try at www.gnoosic.com. The site asks users to input 3 bands they like, and computes similarity scores with the rest of the users. Then, they recommend to the user bands that users with similar tastes have picked.

“Gnod” is a small company, and its only revenue stream so far are adds in the site. In the future, they would like to explore partnership options with music apps (such as Deezer, Soundcloud or even Apple Music and Spotify). However, for that to be possible, they need to expand and improve their recommendations.

That’s precisely where you come. They have hired you as a Data Analyst, and they expect you to bring a mix of technical expertise and business mindset to the table.

Jane, CTO of Gnod, has sent you an email assigning you with your first task.

#### Task(s)

This is an e-mail Jane - CTO of Gnod - sent over your inbox in the first weeks working there.

Dear xxxxxxxx, We are thrilled to welcome you as a Data Analyst for Gnoosic!

As you know, we are trying to come up with ways to enhance our music recommendations. One of the new features we’d like to research is to recommend songs (not only bands). We’re also aware of the limitations of our collaborative filtering algorithms, and would like to give users two new possibilities when searching for recommendations:

Songs that are actually similar to the ones they picked from an acoustic point of view.
Songs that are popular around the world right now, independently from their tastes.
Coming up with the perfect song recommender will take us months - no need to stress out too much. In this first week, we want you to explore new data sources for songs. The Internet is full of information and our first step is to acquire it do an initial exploration. Feel free to use APIs or directly scrape the web to collect as much information as possible from popular songs. Eventually, we’ll need to collect data from millions of songs, but we can start with a few hundreds or thousands from each source and see if the collected features are useful.

Once the data is collected, we want you to create clusters of songs that are similar to each other. The idea is that if a user inputs a song from one group, we’ll prioritize giving them recommendations of songs from that same group.

On Friday, you will present your work to me and Marek, the CEO and founder. Full disclosure: I need you to be very convincing about this whole song-recommender, as this has been my personal push and the main reason we hired you for!

Be open minded about this process: we are agile, and that means that we define our products and features on-the-go, while exploring the tools and the data that’s available to us. We’d love you to provide your own vision of the product and the next steps to be taken.

Lots of luck and strength for this first week with us!

-Jane

In [1]:
import pandas as pd

import spotipy #spotify API 
from spotipy.oauth2 import SpotifyClientCredentials

from  time import sleep

import pickle

import sys

import webbrowser
from IPython.display import Markdown, display

In [2]:
from Credentials import *


In [3]:
#importing kaggle dataset of 160 000 songs, clutestered 
data_k = pd.read_csv('kaggle_data_predictions_100.csv').drop(columns = 'Unnamed: 0')

In [4]:
#importing top 100, clustered, and with song of  
top = pd.read_csv('top_features_predictions_8').drop(columns = 'Unnamed: 0')
top_id = pd.read_csv('top_features_id.csv').drop(columns = 'Unnamed: 0')

In [5]:
#function that allowes to print out in i.e. bold
def printmd(string):
    display(Markdown(string))

## functions

Function takes song title, artist name, list of music features and returns a dataframe with audio features 

In [6]:
def clean_artist(artist_name):
    '''removes characters from artist name that are
    contained in break_list 
    i.e. 
    break_list = [ ' featuring ', ' x ', ' & ', 'with']'''
        
    break_list = [ ' featuring ', ' x ', ' & ', ' with']
    for break_point in break_list:
        artist_name = artist_name.replace(break_point,' ')
    
    return artist_name

In [7]:
def  get_audio_features(song_name, artist, music_features):
    '''Function takes song title, artist name, list of music features
    and returns a dataframe with audio features  
    i.e.
    song_name = 'rain on me'
    artist = 'lady gaga'
    music_features = ['danceability', 'energy', 'key','loudness',
      'mode','speechiness','acousticness','instrumentalness',
      'liveness', 'valence','tempo']'''
    
    sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id= Client_ID,
                                                           client_secret= Client_Secret))
    
    artist = clean_artist(artist)
    
    #form query for the song with the artist filer
    results = sp.search(q = ("artist:"+artist+' track:'+song_name))
    
    #taking item [0] as best match 
    #returns dict within a list; dictionary is one element of the list - hence [0] in the end
    features = (sp.audio_features(results["tracks"]["items"][0]["uri"]))[0]
    
    #only leaving the features we want - music_features
    unwanted =  set(features) - set(music_features)
    for unwanted_key in unwanted: 
        del features[unwanted_key]
    
    #to create df after - replacing values with lists
    for key, value in features.items():
        value_list = []
        value_list.append(value)
        features[key] = value_list
    
    feature_dataframe = pd.DataFrame(features)
    
    return feature_dataframe

In [8]:
def prediction(audio_features):
    ''' function takes in audio geatures from get_audio_feautures
    function, model that was saved previously in a 
    pickle file and predicts the cluster for the songs'''
    model = pickle.load(open('my_kaggle_model_100.pkl','rb'))
    prediction = model.predict(audio_features)
    return prediction

In [9]:
def recommendation(song, name):
    ''' recommends 3 songs similar to the input
    recommendation from dataset data_k'''
    
    music_features = ['danceability', 'energy', 'key',
                      'loudness',
                      'mode',
                      'speechiness',
                      'acousticness',
                      'instrumentalness',
                      'liveness',
                      'valence',
                      'tempo']
    
    audio_features = get_audio_features(song, name, music_features)
    
    #identifying cluster of the song:
    cluster_id = prediction(audio_features)
    
    #sample 3 songs from a cluster:
    x = data_k[(data_k.cluster == int(cluster_id))].sample(n = 3).reset_index()
    
    printmd("**Great**, we think you'll like these ones from our collection")
    
    for i in range(len(x)): 
        artist_list = ''.join(list(x['artists'][i])).replace("'",'').replace("[",'').replace("]",'')    
        printmd('* **{}** by: **{}**'.format(x['name'][i], artist_list)) 
    
    #open spotify links to songs:
    #for i in range(len(x)):
    sleep(8)
    url = 'https://open.spotify.com/track/'+x['id'][1]
    webbrowser.open(url)
    
    return

In [10]:
def from_the_hot(song, name):
    ''' recommends songs from the hot list from 
    bilboard 1000'''

    #excluding user's song from reccomendation:
    x = top_id[(top_id.title_lower != song) & (top_id.artist_lower != name)].sample(n = 3).reset_index()

    printmd("**Awesome**, it's on the hot list - we think you'll like these:")

    for i in range(len(x)):    
         printmd('* **{}** by **{}**'.format(x['title'][i], x['artist'][i]))    
            
    url = 'https://open.spotify.com/track/'+x['id'][1]
    sleep(8)
    webbrowser.open_new(url)    

In [11]:
def recommender():
    '''UI - takes track names and artists and recoomends songs:
    if track is on the top 100 list, recommendation comes 
    from that list,
    if not - suggests similar song from the database
    '''
    
    flag = 'y'
    while flag == 'y':
    
        #input from customer
        print('What song do you like right now?')
        print('Artist:')
        name = input().lower() 
        print('Song:')
        song = input().lower()
        print()


        filter_artist = top[top.artist_lower.str.contains(name)]

        if song in filter_artist.values:
            from_the_hot(song, name)

        else:
            recommendation(song, name)
        print() 
        print('----------------------------------------')
        print()
        print('Want to play again?')
        print('y/n')
        flag = input().lower() 

        if flag != 'y':
            print()
            print('----------------------------------------')
            print()
            print('Bye bye, see you sooon!')
        
    return

## Recommender 

In [None]:
recommender()

What song do you like right now?
Artist:
a perfect circle
Song:
the outsider



**Great**, we think you'll like these ones from our collection

* **The Swamp Song - Version 1** by: **Oasis**

* **What's Behind The Mask - Remastered** by: **The Cramps**

* **Space Lord** by: **Monster Magnet**


----------------------------------------

Want to play again?
y/n
y
What song do you like right now?
Artist:
air
Song:
all i need



**Great**, we think you'll like these ones from our collection

* **Mpaglamades (paradosiako)** by: **Kostas Karipis**

* **Old Cape Cod** by: **Jerry Vale**

* **Mr. Guder** by: **Carpenters**


----------------------------------------

Want to play again?
y/n


### Test tracks:

#### Hot ones:
* artist: ariana grande
* track: positions

#### Not hot ones: 
* artist: solomon burke
* track: cry to me