# Artist Similarity with Graph Neural Network 2nd Notebook

In this notebook are shown the performances of the networks obtained from the training as described in the first notebook.  
In addition to the authors we have seen the quality of the recommended artists, with a query artists and the aid of the K-NN computation.  
It is in our interest to compare all the architectures and to see how the GAT ourtperforms in the results the GraphSAGE layer.

* Another important aspect that we see in this notebook is the possibility to create non-existing artists by only specifiyng fake relationship in the Graph with existing artists. The procedure is simple, we just create a feature vector for the fake artist and then we embed it with the other samples. This procedure also require that are specified one or more existing artist that are related to the fake one, in this way is possible to mix musical genres and see what are the recommended artists.

In [1]:
import os
import torch
os.environ['TORCH'] = torch.__version__
print(torch.__version__)
import numpy as np
# import plotly.express as px
# import plotly.graph_objects as go
import json
import torch.nn as nn
import torch.nn.functional as F
from torchmetrics.functional import pairwise_euclidean_distance
from torch_geometric.nn import GATConv, SAGEConv
from torch.optim import lr_scheduler
import random
from random import choice,randrange
import matplotlib.pyplot as plt
from sklearn.neighbors import NearestNeighbors
import math
import time
from torch_geometric import seed_everything
from torch_geometric.data import Data, DataLoader
import pandas as pd

random_seed=280085

seed_everything(random_seed)

2.5.1+cu118


In [2]:
from src.architectures import *
from src.utils import *

In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

With the help of the [Torch geometric framework](https://pytorch-geometric.readthedocs.io/en/latest/) was really easy to handle the graph attributes and nodes and then the training of the GNNs.

## What happens if we create an unreal artist?
* Since we are able now to embed artists, it is easy to augment our Graph Data Structure with new information, and we would expect to obtain plausible results as well.
* Thus we insert a new artist by specifying its connections in the graph, and then we compute its features as a linear combination of its neighbors features, namely the average.

* In the next cell we can either look for the neighbors of existing artist and also to create new artist and their neighborhoods.

In [4]:
X = torch.load('data/instance.pt').to(device)      # Instance matrix
A = torch.load('data/adjacencyCOO.pt').to(device)    # Adjacency matrix in the COO format, that is that supported by torch geometric
A1 = torch.load('data/adjacency.pt').to(device)      # Normal adjacency matrix format is obtained with torch.load('adjacency')
num_samples = X.shape[0]
data = Data(x=X, edge_index=A, edge_attr=None, y=None).to(device)

path = './Models/'
diz_models = {
	'GAT': ['GAT_7_3_1_0.001_0.0_triplet_', 1, 3],
	'SAGE': ['SAGE_7_3_0.0001_1e-06_triplet_', 0, 3],
	'GCN': ['GCN_7_3_0.0001_0.0_triplet_', 0, 3],
	'GIN': ['GIN_7_2_0.0001_0.0_triplet_', 0, 2],
}



  X = torch.load('data/instance.pt').to(device)      # Instance matrix
  A = torch.load('data/adjacencyCOO.pt').to(device)    # Adjacency matrix in the COO format, that is that supported by torch geometric
  A1 = torch.load('data/adjacency.pt').to(device)      # Normal adjacency matrix format is obtained with torch.load('adjacency')


In [5]:



def get_nearest_artists(embedding, artist_name, K, artist_to_id, id_to_artist, friend_artist_list):
  '''To get the nearest artist we use the K-NN algorithm:
      - embedding:   Is the embedding of artist,
      - artist_name: Query artist for which we are looking at its neighbors,
      - K:           Number of neighbors,
      - artist_to_id, and id_to_artist are the dictionary that keep track of the artists and their ids. '''
  Knew = K+50
  T=embedding.detach().to(torch.device("cpu")).numpy()
  neigh=NearestNeighbors(n_neighbors=Knew,algorithm='kd_tree').fit(T)#With the K-NN we get the nearest 
  dist,ind = neigh.kneighbors(T[int(artist_to_id[artist_name])].reshape((1,-1)))
  


  neighbors_list = list(ind[0])[1:]
  dist_list = list(dist[0])[1:]
  neighbors_ = []
  c = 1
  while len(neighbors_)<K:
    if id_to_artist[str(neighbors_list[c])]!=None and id_to_artist[str(neighbors_list[c])] not in friend_artist_list:
      neighbors_.append((id_to_artist[str(neighbors_list[c])],round(dist_list[c],4)))

      c+=1
    else:
      c+=1

  #neighbors_list = [id_to_artist[str(artist)] for artist in neighbors_list if str(artist) in id_to_artist]
  
  return neighbors_

def get_embeddings(model, data):
  ''' This function simply computes the embeddings given a model name, and the data that we are using '''
  
  embedding = model(data.x, data.edge_index.to(device))

  return embedding

def add_new_artist(artist_query, friend_artist_list, artist_to_id, id_to_artist): 
  ''' This function augment the dataset if the artist_query is not already present in the dataset. 
      - artist_query:       artist for which we conduct the search,
      - friend_artist_list: list of correlated artists.                                            '''


  X_new = X.clone()
  A_new = A.clone()
  artist2num_new = artist_to_id.copy()
  num2artist_new = id_to_artist.copy()
  if artist_query not in artist_to_id and len(friend_artist_list) != 0:
    print("{} does not exist in  the dataset, or in real life. \n But we still can create it!".format(artist_query))
    artist2num_new[artist_query] = str(X_new.shape[0])
    num2artist_new[str(X_new.shape[0])] = artist_query
    feat_sum = torch.zeros(2613, device = device)
    for artist in friend_artist_list:
      if artist not in artist2num_new:
        print("{} is not in the dataset, so it is not valid for the neighbors list".format(artist))
      artist_num = artist2num_new[artist]
      A_new = torch.cat((A_new, torch.tensor([[int(artist2num_new[artist_query])],[int(artist2num_new[artist])]], device = device)), dim = 1)
      A_new = torch.cat((A_new, torch.tensor([[int(artist2num_new[artist])],[int(artist2num_new[artist_query])]], device = device)), dim = 1)
      feat_sum += X_new[int(artist2num_new[artist])]
    feat_sum /=len(friend_artist_list)
    X_new = torch.cat((X_new, feat_sum.unsqueeze(0)), dim = 0)

    data = Data(x=X_new, edge_index = A_new)
    print("\n{} has been created considering its neighbors:\n {}\n".format(artist_query, friend_artist_list))
  else:
    print("{} is an existing artist".format(artist_query))
    data = Data(x=X, edge_index = A)


  return data, artist2num_new, num2artist_new



Please run the 'friend_artist_list' before. Obviously it is required to specify the desired artist to correlate someone with the fictitious artist.

In [6]:
artist_data = pd.read_csv('dataset_construction/olga_augmented_labels_.csv')

id_to_artist = {str(i): artist for i, artist in enumerate(artist_data['artist_name'])}

artist_to_id = {artist: str(i) for i, artist in enumerate(artist_data['artist_name'])}


artist2num_new = artist_to_id.copy()
num2artist_new = id_to_artist.copy()

In [7]:
friend_artist_list = []
topK = 10


model_name = 'GAT'
random_feat = True

model = load_model(model_name, random_feat)

embedding = get_embeddings(model = model, data = data)
artist_name = 'Pink Floyd'

if artist_name in list(artist2num_new.keys()):
    print(get_nearest_artists(embedding, artist_name, K = topK, artist_to_id = artist2num_new, id_to_artist = num2artist_new, friend_artist_list = friend_artist_list))
else:
    data, artist2num_new, num2artist_new = add_new_artist(artist_name, friend_artist_list = friend_artist_list, artist_to_id = artist2num_new, id_to_artist = num2artist_new)
    print('Run again the cell!!!')


Let's load GAT's weights...


  checkpoint = torch.load(path, map_location=device)


[('Tintern Abbey', 0.4837), ('The Nice', 0.515), ('Mike Batt', 0.5238), ('Roy Wood', 0.6002), ('Roy Harper', 0.605), ('The Bonzo Dog Band', 0.6177), ('Strawbs', 0.6233), ('The Moody Blues', 0.6421), ('Arthur Brown', 0.6497), ('Syd Barrett', 0.6765)]
