# Project 5: Alternative R&B for Me
Purpose of project: Create a 2 pronged recommendation system for Alternative R&B songs. You can get songs based on similarities in lyrics or on similar audio sounds.

**Packages used:**
+ Keras
+ Gensim
+ pymongo
+ Flask

In [75]:
import lyricsgenius
from Dans_Genius_API import my_api_token #personal token, go toe Genius.com to get your own!
from pymongo import MongoClient
import pandas as pd
import pickle
import re
import string
from nltk.stem import WordNetLemmatizer
from ast import literal_eval

import matplotlib.pyplot as plt
%matplotlib inline

### List of Alternative R&B Artists
Created by cross referencing 2 lists found onlines as well as adding/removing artist based on domain knowledge.

In [2]:
rnb_artists = [
"11:11", "Abra", "Active Child", "Alessia Cara", "Alex Clare", "Allan Kingdom", "Aloe Blacc", "AlunaGeorge",
"Always Never", "Amber Coffman", "Anders", "Anderson Paak", "Anna Wise", "Ari Lennox", "Arlissa", "Autre Ne Veut",
"Banks", "Black Atlass", "Blackbear", "Blood Orange", "Boots", "Bryson Tiller", "Chet Faker", "Childish Gambino",
"Clarence Clarity", "Cocaine 80s", "D'Angelo", "Daley", "Daniel Caesar", "Danny!", "Dawn Richard", "Dean",
"Dvsn", "Elijah Blake", "Erykah Badu", "Estelle", "FKA twigs", "Francis and the Lights", "Frank Ocean",
"Gallant", "GoldLink", "Grimes", "Groove Theory", "H.E.R.", "Hiatus Kaiyote", "How To Dress Well", "Ibeyi",
"Illangelo", "ILoveMakonnen", "Inc.", "Jack Garratt", "Jai Paul", "James Fauntleroy", "Jamie Woon", "Jamila Woods",
"Janelle Monáe", "Jesse Boykins III", "Jessy Lanza", "Jhené Aiko", "JMSN", "Jon Bellion", "Jorja Smith", "Kacy Hill",
"Kali Uchis", "Kaytranada", "Kehlani", "Kelela", "Kelis", "Kenna", "Kevin Abstract", "Khalid", "Kiana Ledé",
"Kid Cudi", "Kiiara", "Kilo Kish", "Kimbra", "King", "Lance Skiiiwalker", "Lapalux", "Låpsley", "Lauv",
"Lion Babe", "Little Dragon", "Lykke Li", "M.I.A.", "Mabel", "Mac Ayres", "Maejor", "Mahalia", "Majid Jordan",
"Malay", "Marian Hill", "Mateo", "Matt Martians", "Maxwell", "Miguel", "Mila J", "Mr Hudson", "Nao",
"Nick Murphy", "NxWorries", "Oh Wonder", "PARTYNEXTDOOR", "Pell", "Perfume Genius", "Quadron","R.LUM.R",
"Rainy Milo", "Raleigh Ritchie", "Raury", "Reggie Sears", "Rhye", "River Tiber", "Ro James", "Rosie Lowe",
"Roy Woods", "Sabrina Claudio", "Samantha Urbani", "Sampha", "Seinabo Sey", "Sevdaliza", "Sevyn Streeter",
"Shura", "Shy Girls", "Sia Furler", "Sinéad Harnett", "Snakehips", "SOHN", "Solange", "Spooky Black",
"Steve Lacy", "Syd tha Kid", "SZA", "Tei Shi", "The Internet", "The Neighbourhood", "The Weeknd", "Thee Satisfaction",
"THEY.", "Thundercat", "Tinashe", "Toro y Moi", "Tory Lanez", "Travis Scott", "Wet", "William Singe",
"Willow Smith", "Yummy Bingham", "Yuna", "Zayn",
]

### Querying data from MongoDB to create a dataframe
**NOTE: This has to be done after running the Song_lyric_scrape.py file and successfully storing data in your mongoDB**

In [3]:
client = MongoClient()
db = client.sep_19_songs #create connection with database

df = pd.DataFrame(list(db.sep_19_songs.find({}, {'_id':0}))) #extract all except mongo auto-generated id field
print("Shape before dropping the dups: ", df.shape)
df.drop_duplicates(inplace=True) ###remove duplicates incase you had to run song_lyric_scrape multiple times..
print("Shape after dropping the dups: ", df.shape)

10105
Shape before dropping the dups:  (10105, 4)
Shape after dropping the dups:  (10093, 4)


### CHECKPOINT: Save/Load dataframe created from MongoDB

In [2]:

'''
#save file: unccoment to do so
with open('orig_df.pkl', 'wb') as picklefile:
   pickle.dump(df, picklefile)    
'''
#
'''
#load file: uncomment to do so
with open('orig_df.pkl', 'rb') as picklefile:
    df = pickle.load(picklefile)
'''

In [3]:
print("shape of df", df.shape)

shape of df (10093, 4)


### Correcting Artist spelling (due to unicode characters)/ Removing out-of-genre artists
Removing artists that shouldnt be included in the modeling process

In [4]:
#dict to change names to proper names w/o unicode/ change special letters to regular letters
name_mapper = {
 '\u200banders': 'anders',
 '\u200bblackbear': 'blackbear',
 '\u200bdvsn': 'dvsn',
 '\u200biLoveMakonnen': 'iLoveMakonnen',
 '\u200b¿\u200bT\u200be\u200bo\u200b?\u200b': 'Teo',
 'Sinéad Harnett': 'Sinead Harnett',
 'Jhené Aiko': 'Jhene Aiko',
 'Kiana Ledé': 'Kiana Lede'
}

df['Artist_name'] = df['Artist_name'].replace(name_mapper) 

In [5]:
#remove some artists that dont fit the genre/ were grabbed with fuzzy matching from API
wrong_artists = ['iLoveMakonnen','6ix9ine', 'Alex Clare', 'Allan Kingdom','D’Angelo','Francis and the Lights']

print("shape before removing artists", df.shape)
df = df[~df["Artist_name"].isin(wrong_artists)].reset_index(drop=True)
print("shape after removing artists", df.shape)

shape before removing artists (10093, 4)
shape after removing artists (9639, 4)


### Removing duplicate songs and alt-versions/remixes/skits/etc
The Genius API returned quite abit of duplicate songs that need to be filtered out. These include:
+ Acoustic
+ Remixes
+ Skits
+ Demos/Previews
+ etc

In [6]:
#General list types of songs to remove from the dataframe
gen_dup_songs = ['freestyle',
'remix', '\(live', '- live', 'acoustic', 'skit', 'no lyrics yet!', 'version', 'türkçe', 'turkce', 'demo\)', 
             'mix\)', 'radio edit', 'remaster', 'cover\)', '\(unreleased', '\(single\)', '\(snippet\)', 
             '\(spotify session\)', '\(spotify singles\)', 'session', 'without justin bieber', 'trailer',
             'en español\)', 'edit\)','re-work', 'rework\)', 'bonus track', 'accapeela', 'acapella\)',
             '- acapella', 'remake\)', 'godspeed screenplay', 'mix\]', 'cover\]', 
             '\(teaser', '\[demo', 'leak', 'short film','bbc radio', 'reimagined', 'tedx talk', 'music video',
             'lollapalooza', '\(unfinished\)', 'radio', 'reprise\)','mixture', '\[duplicate\]', 'medley',
             'tba\*','mashup', 'freshman', 'stripped', 'refix', 'dub', 'uncut', '- cut', 'edit', 'speech', '\@',
              'untitled', 'interlude', 'dates' ]

#specific cases for this project/ songs extracted
specific_cases= ['talk \(disclosure vip\)', 'intro - \(let’s talk about it\)', 
                 'travis scott takes over hot 97 in the am', 'khalid - better \(official music video\)',
              'Home Going - 2354122', 'cliche r&b joint with auto tune \(i do\)', 
             'waiting game \(kaytranada edition\)', 'love & feeling \(sleep d dub\)',
             'because the internet', 'both hands \(black rainbow\)', 'clapping for the wrong reasons',
             'childish gambino @ the atrium', 'complex party house freestyle',
             'complex photo shoot', 'drexel university performance', 'i love clothes \(deadbeat summer\)',
             'leaving one direction', 'song notes', 'be on my \(interlude\)', 'going \(interlude\)',
             'lady luck - royce wood jr retwix', 'betty \(for boogie\)', 'locked inside -walsh', 'jhene aiko’s tattoos',
             'my name is jhene', '4th of july \(fireworks\)', 'milkshake 2', 'segue', 'drake diss', 'twitter note',
             'fire fire \(piracy funds terrorism\)', 'the p is mine', 'voice memo', 'chppd', 'breakdown\)', 
             'open letter to fans', 'royalty', 'false skull 7', 'bitches talk \(repeat\)', 'show respect', 
             'girl with the tattoo enter.lewd', "coachella interlude", '102 hours of introductions', 'sampler',
             'tumblr post on adjectives', 'the deep web tour dates', 'not on doasm 03', 'copernicus landing',
             'warm up \(cloud 9\)', 'tathagātagarbha', 'response to grammy awards producers', 'beltway', 'ibeyi',
             'beltway',
                ]

#combine the lists together
dup_songs =  gen_dup_songs + specific_cases


print("shape before removing dup songs", df.shape)
for song_remove in dup_songs:
    df = df[~df['Song'].str.lower().str.contains(song_remove)]
print("shape after removing dup songs", df.shape)
#replace weird unicode in song titles
df['Song'] = df['Song'].str.replace('\u200b', '')

shape before removing dup songs (9639, 4)
shape after removing dup songs (8216, 4)


### Remove songs that do not have lyrics

In [7]:
#returns of lyrics from songs that dont have any lyrics listed in the API
bad_lyrics = ['lyrics for this song have yet to be released',
              'no lyrics yet!', '\[instrumental\]', '\[spoken interlude\]', '\(instrumental\)',
              '\(instrumental with vocals\)', '\[instrumental w/ vocalisations\]',
              'lyrics will be available upon release.' 'lyrics are yet to be released',
              'stay tuned']

In [8]:
#remove tagged bad lyrics
print("shape before removing bad lyrics", df.shape)
for song_remove in bad_lyrics: #remove songs that have any of the above "lyrics"
    df = df[~(df['Lyrics'].str.lower().str.contains(song_remove, na=False))]
    
df = df[df['Lyrics']!=""] #removes those with empty lyrics
df =df[df['Lyrics'].str.len() > 300] #removed songs w/ less than 300 characters (really short songs skew results)
print("shape after removing bad lyrics", df.shape)
df.reset_index(drop=True, inplace=True)

shape before removing bad lyrics (8216, 4)
shape after removing bad lyrics (7421, 4)


### Cleaning text of lyrics
Processing the data and performing the following steps:
+ Getting rid of song structure tags (ie [chorus])
+ Replace '\n' escape character from lyrics
+ Lemmatize words to better aid doc2vec model
+ Remove punctuation and transform to lowercase
+ Replace double spaces with single spaces

In [9]:
##Cleaning up additional stuff in the lyrics
df['Lyrics'] = df['Lyrics'].str.replace("[\[].*?[\]]", "") #1: gets rid of verse/chorus in brackets
df['Lyrics'] = df['Lyrics'].str.replace("\n", " ").str.lower() #2: replace '\n' with space + lower

lemmatizer = WordNetLemmatizer()
df['Lyrics'] = df['Lyrics'].apply(lambda x : lemmatizer.lemmatize(str(x))) #3: lemmatize words before removing punc

#alphanumeric = (lambda x: re.sub('\w*\d\w*', ' ', x)) #remove non alpha numeric values/ OFFF
punc_lower = (lambda x: re.sub('[%s]' % re.escape(string.punctuation), ' ', str(x))) #get rid of punctuation+lower
df['Lyrics'] = df['Lyrics'].map(punc_lower)  #4: applying above to remove punc

df['Lyrics'] = df['Lyrics'].str.replace("  ", " ") #5: replace double spaces (do after replace punc)
print(df.shape)


(7421, 4)


In [19]:
#checking results of cleaning > looks great
df.iloc[2351].Lyrics

' i love to fly it s just you re alone with peace and quiet nothing around you but clear blue sky no one to hassle you no one to tell you where to go or what to do the only bad part about flying is having to come back down to the fucking world  i wasn t mistreated  he whispered as he came and now you re just sailing on you re sending on your pain i was undressed in all your shame you re sailing waters too deep for me to care have you ever wondered why you stress so hard you can t even seem to wonder what s on your mind have you ever held yourself on a secret all in there have you ever had yourself for all one time have you ever asked you why are you cheapening yourself  have you ever let a look of goodness spread across your face have you ever loved yourself out of a secret all in there say my name or say whatever i was in the streets when what s his name  came sailing on you re resting on your name how was i to rest under all your weight you say my love was undressed from all your str

### CHECKPOINT: Saving cleaned dataframe

In [None]:
df.to_csv('final_df.csv')

# Lyrical Recommendation - Using Doc2vec
Steps taken:
+ Create a class used to tokentize and create TaggedDocuments where tags are "Artist|Song"
+ Create and train Doc2Vec model
+ Analyze results

In [9]:
import csv
#from nltk.stem import WordNetLemmatizer
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from nltk.tokenize import word_tokenize

In [11]:
#borrowed function to iterate over docs and not kill RAM..
#https://tmthyjames.github.io/2018/january/Analyzing-Rap-Lyrics-Using-Word-Vectors/

class Sentences(object):
    
    def __init__(self, filename, column):
        self.filename = filename
        self.column = column
        
    @staticmethod
    def get_tokens(text):
        """Helper function for tokenizing data"""
        #return [wnl.lemmatize(r.lower()) for r in text.split()]
        return word_tokenize(text) #returns 
    
    def __iter__(self):
        reader = csv.DictReader(open(self.filename, 'r' ))
        for row in reader:
            words = self.get_tokens(row[self.column])
            tags = ['%s|%s' % (row['Artist_name'], row['Song'])]
            yield TaggedDocument(words=words, tags=tags)

In [12]:
filename = 'final_df.csv'
sentences = Sentences(filename=filename, column='Lyrics') #column with lyrics
# for song lookups
orig_table = pd.read_csv(filename) # dont need artist song order in there

In [21]:
#Testin to make sure that teh tokenization function of the class works well
print(sentences.get_tokens(orig_table.loc[0]["Lyrics"]))

['they', 'say', 'a', 'good', 'thing', 'won', 't', 'go', 'away', 'why', 'did', 'you', 'go', 'away', 'baby', 'you', 'had', 'a', 'role', 'to', 'play', 'why', 'were', 'you', 'led', 'astray', 'seems', 'all', 'love', 'that', 'you', 'were', 'given', 'didn', 't', 'have', 'a', 'second', 'for', 'me', 'but', 'a', 'second', 's', 'all', 'i', 'need', 'to', 'tell', 'you', 'the', 'truth', 'so', 'maybe', 'you', 'll', 'see', 'you', 'were', 'my', 'ally', 'ally', 'cause', 'were', 'there', 'was', 'you', 'there', 'would', 'be', 'me', 'you', 'were', 'my', 'ally', 'whether', 'wrong', 'or', 'right', 'you', 'were', 'by', 'my', 'side', 'and', 'i', 'still', 'want', 'you', 'want', 'you', 'and', 'i', 'don', 't', 'know', 'if', 'you', 'll', 'ever', 'really', 'feel', 'the', 'same', 'or', 'if', 'i', 'll', 'ever', 'forgive', 'the', 'fact', 'that', 'maybe', 'i', 'm', 'the', 'one', 'to', 'blame', 'cause', 'loving', 'you', 'were', 'given', 'didn', 't', 'have', 'a', 'second', 'to', 'breathe', 'but', 'a', 'second', 's', 'all

In [74]:
model = Doc2Vec(
    alpha=0.05,
    min_alpha=0.025,
    workers=15, 
    min_count=10,
    window=5,
    vector_size=300,
    epochs=30,
    sample=0.001,
    negative=5
)

In [75]:
model.build_vocab(sentences) #build corpus of words by inserting sentences
model.train(sentences, total_examples=model.corpus_count, epochs=model.epochs) #train model


  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


### CHECKPOINT: Saving Doc2vec trained model

In [None]:
model.save('alt_rnb_lyrics.doc2vec') #save model for flask app/later usage

In [8]:
#LOAD the model 
model = Doc2Vec.load('alt_rnb_lyrics.doc2vec')

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


### Analyze the model NLP
Using model.docvecs.most_similar will return the top n songs that have the most similar doc vectors.  
NOTE: Returns the input song but we will make sure to exlcude that in the Flask app

In [90]:
model.docvecs.most_similar([model.docvecs['The Weeknd|Call Out My Name']], topn=14)

[('The Weeknd|Call Out My Name', 0.9999998807907104),
 ('The Weeknd|Call Out My Name (A Cappella)', 0.9659292697906494),
 ('BANKS|Fall Over', 0.36231181025505066),
 ('Sinead Harnett|Ally', 0.3450368642807007),
 ('Rhye|Song for You', 0.3423432409763336),
 ('Autre Ne Veut|On & On', 0.331125408411026),
 ('Toro y Moi|Imprint After', 0.31505703926086426),
 ('Kevin Abstract|Runner (Original)', 0.3132482171058655),
 ('Kevin Abstract|Runner', 0.31315815448760986),
 ('blackbear|froze over', 0.30821770429611206),
 ('DEAN|instagram', 0.30410879850387573),
 ('Rhye|Shed Some Blood', 0.30292677879333496),
 ('Dawn Richard|Castles', 0.30236953496932983),
 ('Kid Cudi|Fuchsia Butterflies', 0.29958975315093994)]

# Preprocessing audio files
Steps done to preporcess mp3 files for audio signal similarity:
+ Create list of avaialable mp3s for modeling
+ 

In [2]:
from pydub import AudioSegment
from pydub.playback import play
import librosa 
import librosa.display



### Working on conveting to spectrograms 

In [6]:
#create list of mp3 files store in the audio_files folder 
mp3_files = !ls audio_files/*.mp3 
print("number of songs", len(mp3_files))

number of songs 138


### Convert to spectros 
The create_save_spectrog function loads in a mp3 file, takes an n sec segement with a n sec delay (to take a segment that isnt from the start of the song) converts it to a spectrogram and saves it as a png file in the spectrograms subfolder.

In [24]:
def create_save_spectrog(file_name,song_start=0,song_duration=20):
    """
    Creates a spectrogram from the inputted mp3 file path and saves it in folder
    """
    y, sr = librosa.load(file_name, offset=song_start, duration=song_duration)
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)#,fmax=8000
    plt.figure(figsize=(10, 4))
    librosa.display.specshow(librosa.power_to_db(S ,ref=np.max))
    plt.tight_layout()
    temp_path = file_name.split('/')[1]
    file_pref = temp_path.split('.mp3')[0]
    output_f = "spectrograms/" + file_pref + ".png"  #save in appropriate folder 
    plt.savefig(output_f, transparent=True, pad_inches=0.0)
    plt.close()

In [25]:
#add for loop to transform songs in with a 20 sec delay start and a 30 sec duration
for songs in mp3_files:
    '''
    Find a way to go through many iterations of a song and split
    '''
    create_save_spectrog(songs, song_start=20, song_duration=30)


# Using VGG-16 Transfer Learning

In [28]:
import numpy as np
from keras.applications import VGG16
from keras.models import Model
from keras.layers import Dense, Activation, Flatten
from keras.losses import categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator
import os
import cv2
from scipy.spatial.distance import cosine

In [9]:
#Using VGG-16 trained on imagenet, set input waits
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))  # leaving out b/c optional, input_shape=(720, 288, 3)

# Freeze convolutional layers: we do not want to re-train them 
for layer in base_model.layers:
    layer.trainable = False    

In [10]:
x = base_model.output 
x = Flatten(name='features')(x) # flatten from convolution tensor output, we will use this for feature extraction
x = Dense(64,activation='relu')(x) #dense layer 2
preds = Dense(5 ,activation='softmax')(x) #final layer with softmax activation / doesn't 

# this is the model we will train
model = Model(inputs=base_model.input, outputs=preds)

In [12]:
#checking the layers of the model 
model.summary()

Model: "model_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0   

In [18]:
!pwd

/Users/danielobennett/metis/work/projects/Project05


In [13]:
Train_dir = '/Users/danielobennett/metis/work/projects/Project05/Train_dir'
train_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        # This is the target directory
        Train_dir,
        target_size=(224, 224),
        batch_size=20,
        class_mode='categorical')

Found 138 images belonging to 5 classes.


In [14]:
#compile and run model
model.compile(loss='binary_crossentropy',
              optimizer='Adam',
              metrics=['acc'])

history = model.fit_generator(
      train_generator,
      steps_per_epoch=7,
      epochs=20,
      verbose=1)

Instructions for updating:
Use tf.cast instead.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


### CHECKPOINT: Saving cnn model 

In [40]:
#save the model so there is no need to rerun
#model.save('cnn_music.h5')
from keras.models import load_model
model = load_model('cnn_music.h5')

In [5]:
#check summary of model once again
model.summary()

Model: "model_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0   

In [41]:
#Extract features from the flattened layer
feature_extractor = Model(inputs=model.input, outputs=model.get_layer('features').output)

In [175]:
#save feature model since it will be smaller in size for flask app 
feature_extractor.save('model_predict.h5')

In [42]:
#get the paths of the spectrograms created above
spectros = !ls spectrograms/*

In [108]:
features_vectors = {}

for item in spectros:
    img = cv2.imread(item)
    img = cv2.resize(img,(224,224))
    img = np.reshape(img,[1,224,224,3])
    img_vector = feature_extractor.predict(img)
    song = item.split('/')[1]
    features_vectors[song] = img_vector.tolist() #creates all as a python list 

In [154]:
df = pd.DataFrame.from_dict(features_vectors, orient='index')
df.reset_index(inplace=True)
df.columns = ['song', 'vectors']
df.to_pickle('flask_df.pkl') #need to pickle so that the vector column is a series of lists and not strings

In [140]:
#extract vector for new item, need to convert to spectrogram before hand
def extract_vector(new_path):
    y, sr = librosa.load(new_path, offset=20, duration=30)
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)#,fmax=8000
    plt.figure(figsize=(10, 4))
    librosa.display.specshow(librosa.power_to_db(S ,ref=np.max))
    plt.tight_layout()
    temp_path = new_path.split('/')[1]
    file_pref = temp_path.split('.mp3')[0]
    output_f = "test_songs/" + file_pref + ".png"  #save in appropriate folder 
    plt.savefig(output_f, transparent=True, pad_inches=0.0)
    plt.close()
        #save the file and open with opencv######## 
    img = cv2.imread(output_f) #no need to read in 
    img = cv2.resize(img,(224,224))
    img = np.reshape(img,[1,224,224,3])
    img_vector = feature_extractor.predict(img)
    return img_vector

#calculate cosign distance
def cosine_similarity(row, recommendation_vector):
    distance = cosine(row["vectors"], recommendation_vector)
    return distance
    
#get recos from model 
def plot_recommendations(new_path, recommendation_name):
    rec_vector = extract_vector(new_path)
    df[recommendation_name] = df.apply(cosine_similarity, axis=1, recommendation_vector = rec_vector)
    top_recommendations = df.sort_values(recommendation_name, ascending=False)#.head(10)
    return top_recommendations

"    \n    plt.figure(figsize=[8,8])\n    im = mpimg.imread(new_path)\n    plt.imshow(im)\n    plt.title('Input')\n    plt.axis('off');\n    \n    rec1 = top_recommendations.index[1]\n    rec2 = top_recommendations.index[2]\n    rec3 = top_recommendations.index[3]\n    rec4 = top_recommendations.index[4]\n    rec5 = top_recommendations.index[5]\n    rec6 = top_recommendations.index[6]\n"

In [193]:
feed1 = 'test_songs/Ibeyi-River.mp3' #name of input mp3 to use for recommendation system
try1 = plot_recommendations(feed1, recommendation_name = "recommendation")

In [194]:
try1 #current one : #treat me like im yours 10 sec max > arlissa whats it gonna be 10 sec max

Unnamed: 0,song,vectors,recommendation
84,R.LUM.R_-_Suddenly_altered.png,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.511550
114,The_Weeknd_-_Tears_In_The_Rain.png,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.415480
28,Jhene_Aiko-_Wading_(Souled_Out).png,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.407254
23,Ibeyi_ft._Kamasi_Washington_-_Deathless.png,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.389001
95,The_Weeknd_-_Devil_May_Cry.png,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.371913
10,Childish_Gambino_-_All_The_Shine.png,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.367026
87,Sinead_Harnett_-_Body_Acoustic.png,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.363244
74,Majid_Jordan_-_King_City.png,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.361324
70,Mac_Ayres_-_Calvin's_Joint.png,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.359122
100,The_Weeknd_-_I_Was_Never_There_(Official_Audio...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.348378


### End of Notebook :)