<a href="https://colab.research.google.com/github/nazareno/redes-do-spotify/blob/main/gera-rede-para-gephi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Baseado [nesse notebook](https://gist.github.com/Kautenja/71f139eee58099b77e91a0d775e42b47#file-visualizing-spotify-related-artists-with-gephi-ipynb) de @Kautenja no gist. 

# Spotify API

instead of directly interacting with the Spotify Restful API, I use a lightweight Python wrapper for ease.

In [1]:
!pip install spotipy

Collecting spotipy
  Downloading spotipy-2.16.1-py3-none-any.whl (24 kB)
Installing collected packages: spotipy
Successfully installed spotipy-2.16.1


In [2]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
# spotify credentials for the application:
# "Related Artist Network Visualizer"
client_id = ' '
client_secret = ' '
# create a credential manager and api layer
client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

## Root Node

I'll start with a root node of on of my favorite artists, [Bassnectar](https://www.bassnectar.net).

In [7]:
# the spotify id for the artist Bassnectar
seed_artist = 'spotify:artist:0sWTkzCrdEvuX7Du6MFLzc'

# Building A Network

To build a network, I'll use a modified form of depth limited DFS to generate a dictionary of artists to a list of related artists.

In [8]:
def related_network(artist_id, depth: int=3) -> dict:
    """
    Return a dictionary of artist names to lists of their related artists.
    
    Args:
        arist_id: the id of the artist to start the graph from
        depth: the depth into the related artist network 
    
    Returns: a dictionary of strings (artist name) to lists (related artists)
    """
    graph = dict()
    _related_network(artist_id, depth, graph)
    return graph

def _related_network(artist_id, depth, graph):
    """
    Recursively collect related artists and store the results in the graph.
    
    Args:
        artist_id: the artist to get the related artists of
        depth: the current depth in the graph
        graph: the dictionary to put the related artist results into
    """
    if depth == 0:
        return
    name = sp.artist(artist_id)['name']
    if name in graph.keys(): 
      print("revisit for " + name)
      return
    print("fetching for " + name)
    like_artist = sp.artist_related_artists(artist_id)
    graph[name] = [related['name'] for related in like_artist['artists']]
    [_related_network(related['id'], depth - 1, graph) for related in like_artist['artists']]

In [9]:
# check the base case
related_network(seed_artist, 1)

fetching for Luedji Luna


{'Luedji Luna': ['Serena Assumpção',
  'Xênia França',
  'Omulu',
  'Metá Metá',
  'ÀTTØØXXÁ',
  'Letrux',
  'As Baías',
  'Elza Soares',
  'Anelis Assumpção',
  'Otto',
  'Mahmundi',
  'Larissa Luz',
  'Céu',
  'Moreno Veloso',
  'Mariana Aydar',
  'Alessandra Leao',
  'Fino Coletivo',
  'Luiza Lian',
  'Itamar Assumpção',
  'Curumin']}

In [10]:
like_seed = related_network(seed_artist, 5)

fetching for Luedji Luna
fetching for Serena Assumpção
fetching for Metá Metá
revisit for Serena Assumpção
fetching for Giovani Cidreira
revisit for Metá Metá
fetching for Ava Rocha
fetching for Carne Doce
fetching for Xênia França
fetching for Maria Beraldo
fetching for Bruna Mendez
fetching for Baleia
fetching for Trombone de Frutas
fetching for Boogarins
fetching for Saulo Duarte e a Unidade
fetching for Kiko Dinucci
fetching for Itamar Assumpção
fetching for Mãeana
fetching for Letrux
fetching for Ventre
fetching for Karina Buhr
fetching for Terno Rei
fetching for Anelis Assumpção
fetching for Josyara
fetching for Luiza Lian
revisit for Kiko Dinucci
fetching for Rodrigo Campos
fetching for Passo Torto
fetching for Duo Moviola
fetching for Juçara Marçal e Kiko Dinucci
fetching for Romulo Fróes
revisit for Metá Metá
revisit for Itamar Assumpção
revisit for Kiko Dinucci
fetching for Alessandra Leao
fetching for Juçara Marçal
fetching for Letuce
fetching for Junio Barreto
fetching for 

# Formatting the Network for Gephi

In [12]:
import pandas as pd
import numpy as np

## nodes.csv

In [13]:
def nodes(graph: dict) -> pd.DataFrame:
    """
    Return a dataframe of nodes for the given graph.
    
    Args:
        graph: the graph to generate a unique table of nodes from
        
    Returns: a dataframe with nodes and unique ids
    """
    _nodes = []

    # iterate over all the artists in the list
    for artist, related_list in graph.items():
        _nodes.append(artist)
        [_nodes.append(related) for related in related_list]

    # keep only unique nodes
    _nodes = np.unique(_nodes)
    # make a dataframe to generate ids
    _nodes = pd.DataFrame(_nodes, columns=['label'])
    # use the index columns as the id
    _nodes['id'] = _nodes.index
    return _nodes

In [14]:
network_nodes = nodes(like_seed)
network_nodes.head()

Unnamed: 0,label,id
0,3 Na Massa,0
1,A Banda Mais Bonita da Cidade,1
2,A Barca,2
3,A Bolha,3
4,A Cor Do Som,4


## edges.csv

In [15]:
def edges(graph: dict, nodes: pd.DataFrame) -> pd.DataFrame:
    """
    Return a dataframe of edges based on the graph and table of node ids.
    
    Args:
        graph: the graph to find edges in
        nodes: the table of nodes with unique node ids
        
    Returns: a table of targets to destinations by unique id
    """
    _edges = []

    for artist, related_list in graph.items():
        artist_node = nodes['id'][nodes['label'] == artist].values[0]
        for related in related_list:
            related_node = nodes['id'][nodes['label'] == related].values[0]
            _edges.append((artist_node, related_node))

    return pd.DataFrame(_edges, columns=['Source','Target'])

In [16]:
network_edges = edges(like_seed, network_nodes)
network_edges.head()

Unnamed: 0,Source,Target
0,361,582
1,361,666
2,361,480
3,361,423
4,361,682


## Save CSVs 

In [18]:
#from google.colab import files
network_nodes.to_csv('nodes.csv', index = False) 
#files.download('nodes.csv')

In [19]:
network_edges.to_csv('edges.csv', index = False) 
#files.download('edges.csv')