# Artist Similarity with Graph Neural Network 2nd Notebook

In this notebook are shown the performances of the networks obtained from the training as described in the first notebook.  
In addition to the authors we have seen the quality of the recommended artists, with a query artists and the aid of the K-NN computation.  
It is in our interest to compare all the architectures and to see how the GAT ourtperforms in the results the GraphSAGE layer.

* Another important aspect that we see in this notebook is the possibility to create non-existing artists by only specifiyng fake relationship in the Graph with existing artists. The procedure is simple, we just create a feature vector for the fake artist and then we embed it with the other samples. This procedure also require that are specified one or more existing artist that are related to the fake one, in this way is possible to mix musical genres and see what are the recommended artists.

In [None]:
import os
import torch
os.environ['TORCH'] = torch.__version__
print(torch.__version__)
!pip install torchmetrics
!pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q git+https://github.com/pyg-team/pytorch_geometric.git
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import json
import ipywidgets as widgets
import torch.nn as nn
import torch.nn.functional as F
from torchmetrics.functional import pairwise_euclidean_distance
from torch_geometric.nn import GATConv, SAGEConv
from torch.optim import lr_scheduler
import random
from random import choice,randrange
import matplotlib.pyplot as plt
from sklearn.neighbors import NearestNeighbors
import math
import time
from torch_geometric import seed_everything


random_seed=280085

seed_everything(random_seed)

In [None]:
from google.colab import drive
drive.mount("/content/drive")
%cd drive/My Drive/asproject
from utils import *  #In this files are reported the most useful functions
from architectures import *

In [None]:
import IPython
js_code = '''
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)
'''
display(IPython.display.Javascript(js_code))

<IPython.core.display.Javascript object>

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

With the help of the [Torch geometric framework](https://pytorch-geometric.readthedocs.io/en/latest/) was really easy to handle the graph attributes and nodes and then the training of the GNNs.

In [None]:
X = torch.load('instance').T.to(device)      # Instance matrix
A = torch.load('adjacencyCOO').to(device)    # Adjacency matrix in the COO format, that is that supported by torch geometric
A1 = torch.load('adjacency').to(device)      # Normal adjacency matrix format is obtained with torch.load('adjacency')
num_samples = X.shape[0]
print(num_samples)

11261


##### Data in the dataset corrensponds to a set of artist, we have stored their names in two dictionaries. These two data structure are fundamental to keep track of their names and eventually to do more specific experiments.

In [None]:
''' These variables contain the information about the artists' names, and their position in the dataset, this makes easy to look for their name and to better draw conclusions at inference time '''
num2artist = load_data('dizofartist.json')
artist2num = {num2artist[key]:key for key in num2artist}

print(num2artist)
print(artist2num)

## Import the data with Torch geometric:


In [None]:
''' In order to conduct the experiments was fundamental to split the dataset, either the nodes and also the edges.
    The splitting was performed according to the information in the paper, and considering the fact that a lot of date were lost in the preprocessing part.'''

from torch_geometric.data import Data
from torch_geometric.utils import structured_negative_sampling



data = Data(x=X, edge_index = A)


In [None]:
# Number of layers #
G1 = GraphSage(1)
G2 = GraphSage(2)
G3 = GraphSage(3)

Gat1 = GAT1()
Gat2 = GAT2()

diz_of_models = {'G1':{'path': "./models/one_layerSAGE.pt", 'model': G1, 'accuracy':load_model("./models/one_layerSAGE.pt", G1, device)['accuracy']}, 
                 'G2':{'path': "./models/two_layerSAGE.pt", 'model': G2, 'accuracy':load_model("./models/two_layerSAGE.pt", G2, device)['accuracy']},
                 'G3':{'path': "./models/three_layerSAGE.pt", 'model': G3, 'accuracy':load_model("./models/three_layerSAGE.pt", G3, device)['accuracy']},
                 'GAT1':{'path': "./models/GAT1.pt", 'model': Gat1, 'accuracy':load_model("./models/GAT1.pt", Gat1, device)['accuracy']},
                 'GAT2':{'path': "./models/GAT2.pt", 'model': Gat2, 'accuracy':load_model("./models/GAT2.pt", Gat2, device)['accuracy']}}

## What happens if we create an unreal artist?
* Since we are able now to embed artists, it is easy to augment our Graph Data Structure with new information, and we would expect to obtain plausible results as well.
* Thus we insert a new artist by specifying its connections in the graph, and then we compute its features as a linear combination of its neighbors features, namely the average.

* In the next cell we can either look for the neighbors of existing artist and also to create new artist and their neighborhoods.

In [None]:
#@title


def get_nearest_artists(embedding, artist_name, K, artist_to_id, id_to_artist):
  '''To get the nearest artist we use the K-NN algorithm:
      - embedding:   Is the embedding of artist,
      - artist_name: Query artist for which we are looking at its neighbors,
      - K:           Number of neighbors,
      - artist_to_id, and id_to_artist are the dictionary that keep track of the artists and their ids. '''
  Knew = K+50
  T=embedding.detach().to(torch.device("cpu")).numpy()
  neigh=NearestNeighbors(n_neighbors=Knew,algorithm='kd_tree').fit(T)#With the K-NN we get the nearest 
  dist,ind = neigh.kneighbors(T[int(artist_to_id[artist_name])].reshape((1,-1))) 
  
  artist_id = artist_to_id[artist_name]

  neighbors_list = list(ind[0])[1:]
  dist_list = list(dist[0])[1:]
  neighbors_ = []
  c = 1
  while len(neighbors_)<K:
    if id_to_artist[str(neighbors_list[c])]!=None:
      neighbors_.append((id_to_artist[str(neighbors_list[c])],round(dist_list[c],4)))
      c+=1
    else:
      c+=1

  #neighbors_list = [id_to_artist[str(artist)] for artist in neighbors_list if str(artist) in id_to_artist]
  
  return neighbors_

def get_embeddings(model_name,data):
  ''' This function simply computes the embeddings given a model name, and the data that we are using '''
  
  model = diz_of_models[model_name]['model'].to(device)
  load_model(diz_of_models[model_name]['path'], model, device)
  embedding = model(data.x, data.edge_index.to(device))

  return embedding

def add_new_artist(artist_query, friend_artist_list): 
  ''' This function augment the dataset if the artist_query is not already present in the dataset. 
      - artist_query:       artist for which we conduct the search,
      - friend_artist_list: list of correlated artists.                                            '''


  X_new = X.clone()
  A_new = A.clone()
  artist2num_new = artist2num.copy()
  num2artist_new = num2artist.copy()
  if artist_query not in artist2num and len(friend_artist_list) != 0:
    print("{} does not exist in  the dataset, or in real life. \n But we still can create it!".format(artist_query))
    artist2num_new[artist_query] = str(X_new.shape[0])
    num2artist_new[str(X_new.shape[0])] = artist_query
    feat_sum = torch.zeros(2613, device = device.type)
    for artist in friend_artist_list:
      if artist not in artist2num_new:
        print("{} is not in the dataset, so it is not valid for the neighbors list".format(artist))
      artist_num = artist2num_new[artist]
      A_new = torch.cat((A_new, torch.tensor([[int(artist2num_new[artist_query])],[int(artist2num_new[artist])]], device = device.type)), dim = 1)
      A_new = torch.cat((A_new, torch.tensor([[int(artist2num_new[artist])],[int(artist2num_new[artist_query])]], device = device.type)), dim = 1)
      feat_sum += X_new[int(artist2num_new[artist])]
    feat_sum /=len(friend_artist_list)
    X_new = torch.cat((X_new, feat_sum.unsqueeze(0)), dim = 0)

    data = Data(x=X_new, edge_index = A_new)
    print("\n{} has been created considering its neighbors:\n {}\n".format(artist_query, friend_artist_list))
  else:
    print("{} is an existing artist".format(artist_query))
    data = Data(x=X, edge_index = A)


  return data, artist2num_new, num2artist_new

def append_models(model1, model2, model3, model4, model5, neighbors, artist):
  ''' This function is used just to communicate with the interactive libraries. '''

  list_of_models = []
  neighbors = int(neighbors)
  if artist == '': # Do not return nothing if nothing is inserted.
    return
  data_n, artist2num_new, num2artist_new = add_new_artist(artist, friend_artist_list)
  if model1:
    print("\nThe {}-nearest neighbors for {}, computed with the GraphSAGE1 model are: \n".format(neighbors, artist))
    embedding = get_embeddings('G1', data_n)
    print(get_nearest_artists(embedding, artist, neighbors, artist2num_new, num2artist_new))

  if model2:
    print("\nThe {}-nearest neighbors for {}, computed with the GraphSAGE2 model, are: \n".format(neighbors, artist))
    embedding = get_embeddings('G2', data_n)
    print(get_nearest_artists(embedding, artist, neighbors, artist2num_new, num2artist_new))
  if model3:
      print("\nThe {}-nearest neighbors for {}, computed with the GraphSAGE3 model are: \n".format(neighbors, artist))
      embedding = get_embeddings('G3', data_n)
      print(get_nearest_artists(embedding, artist, neighbors, artist2num_new, num2artist_new))
  if model4:
      print("\nThe {}-nearest neighbors for {}, computed with the GAT1 model are: \n".format(neighbors, artist))
      embedding = get_embeddings('GAT1', data_n)
      print(get_nearest_artists(embedding, artist, neighbors, artist2num_new, num2artist_new))
  if model5:
      print("\nThe {}-nearest neighbors for {}, computed with the GAT2 model are: \n".format(neighbors, artist))
      embedding = get_embeddings('GAT2', data_n)
      print(get_nearest_artists(embedding, artist, neighbors, artist2num_new, num2artist_new))



SAGE1 = widgets.Checkbox(description = 'GraphSAGE 1 layer', style = {'description_width':'initial'})
SAGE2 = widgets.Checkbox(description = 'GraphSAGE 2 layers', style = {'description_width':'initial'})
SAGE3 = widgets.Checkbox(description = 'GraphSAGE 3 layers', style = {'description_width':'initial'})
GAT_1 = widgets.Checkbox(description = 'Graph Attention Network (1)', style = {'description_width':'initial'})
GAT_2 = widgets.Checkbox(description = 'Graph Attention Network (2)', style = {'description_width':'initial'})

num_of_neighbors = widgets.FloatSlider(
                                  value=10,
                                  min=0,
                                  max=50,
                                  step=1,
                                  description='Number of neighbors',
                                  disabled=False,
                                  continuous_update=False,
                                  orientation='horizontal',
                                  readout=True,
                                  readout_format='.1f',
                              )
Artist_Name = widgets.Text(
    value='',
    placeholder='Type something',
    description='String:',
    disabled=False
)

# We construct the graph layout
box_layout = widgets.Layout(display = 'inline-flex', flex_flow = 'row', align_items = 'stretch',
                            border = 'solid', width = '100%')

ui = widgets.HBox([Artist_Name, num_of_neighbors, SAGE1, SAGE2, SAGE3, GAT_1, GAT_2], layout = box_layout)

out = widgets.interactive_output(append_models, {'model1': SAGE1,
                                                 'model2': SAGE2,
                                                 'model3': SAGE3,
                                                 'model4': GAT_1,
                                                 'model5': GAT_2,
                                                 'neighbors': num_of_neighbors,
                                                 'artist': Artist_Name})

data_n = data
display(ui, out)


HBox(children=(Text(value='', description='String:', placeholder='Type something'), FloatSlider(value=10.0, co…

Output()

Please run the 'friend_artist_list' before. Obviously it is required to specify the desired artist to correlate with the fictitious artist.

In [None]:
friend_artist_list = []