# Creando el grafo

Hemos leído ya el repertorio de la pagina https://www.allmusic.com/ para la lista de artistas que disponemos. Estos datos contienen 4 campos:

- Identificador del artista
- Nombre del artista
- Identificador del artista con el que comparte una cancion
- Cancion que comparten

El objetivo es obtener un documento en el cual tengamos unicamente los nombres de los artistas que son los extremos de la arista y el peso de la arista, del tipo.

|Artist|Collaborator|Number collaborations|
|--|--|--|
|Metallica|ACDC|6|

In [183]:
import pandas as pd
import unidecode

In [185]:
# Primero leo el repertorio
repertorio = pd.read_csv('./graph_project_crawl/Repertorio-Final.csv',
                         sep=',', 
                         encoding='utf-8',
                         index_col = None)

# Todas las canciones a minúscula
repertorio['song'] = repertorio['song'].str.lower()

""" Elimina los acentos en las canciones"""
repertorio['song'] = repertorio['song'].str.normalize('NFD').str.encode('ascii', errors='ignore').str.decode('utf-8')
repertorio.head(5)

Unnamed: 0,artist_id,artist_name,feat_artist_id,feat_artist_name,song
0,mn0000690254,ZZ Top,mn0000503563,The Moving Sidewalks,you make me shake
1,mn0000690254,ZZ Top,mn0000503563,The Moving Sidewalks,you were so close to me
2,mn0000246960,Wham!,mn0000545074,George Michael,young guns
3,mn0000246960,Wham!,mn0000545074,George Michael,young guns
4,mn0000246960,Wham!,mn0000545074,George Michael,young guns (go for it!)


In [186]:
# Ordeno el repertorio por el nombre del artísta
repertorio = repertorio.sort_values(by='artist_name')
repertorio.head()

Unnamed: 0,artist_id,artist_name,feat_artist_id,feat_artist_name,song
5671,mn0000516929,*NSYNC,mn0000567809,Modern Talking,you got it
5770,mn0000516929,*NSYNC,mn0000101895,Joe,i'll never stop(vcd)
5771,mn0000516929,*NSYNC,mn0000101895,Joe,if i'm not the one
5772,mn0000516929,*NSYNC,mn0000101895,Joe,if i'm not the one
5773,mn0000516929,*NSYNC,mn0000101895,Joe,if only in heaven's eyes


In [187]:
# Hay cancionoes duplicadas (featurings duplicados), los elimino
repertorio = repertorio.drop_duplicates(subset=repertorio.columns).reset_index(drop=True)

In [188]:
repertorio['Number collaborations'] = 1
repertorio.head()

Unnamed: 0,artist_id,artist_name,feat_artist_id,feat_artist_name,song,Number collaborations
0,mn0000516929,*NSYNC,mn0000567809,Modern Talking,you got it,1
1,mn0000516929,*NSYNC,mn0000101895,Joe,i'll never stop(vcd),1
2,mn0000516929,*NSYNC,mn0000101895,Joe,if i'm not the one,1
3,mn0000516929,*NSYNC,mn0000101895,Joe,if only in heaven's eyes,1
4,mn0000516929,*NSYNC,mn0000101895,Joe,in conversation,1


In [189]:
f = {'Number collaborations': 'sum', 'artist_name': 'first', 'feat_artist_name': 'first'}
test = repertorio.groupby(['artist_id', 'feat_artist_id'],
                          as_index=False).agg(f)


test.head()

Unnamed: 0,artist_id,feat_artist_id,Number collaborations,artist_name,feat_artist_name
0,mn0000000534,mn0000344634,10,All Saints,Melanie Blatt
1,mn0000000534,mn0000642542,6,All Saints,Burt Bacharach
2,mn0000000534,mn0000815862,8,All Saints,All-Saints Ensemble
3,mn0000000534,mn0001233067,50,All Saints,Danny Thompson
4,mn0000002578,mn0000236246,19,Tanita Tikaram,Mark Isham


In [191]:
output = pd.DataFrame()
for _, row in test.iterrows():
    inverted = test[(test['artist_name'] == row['feat_artist_name']) & (test['feat_artist_name'] == row['artist_name'])]
    

         artist_id feat_artist_id  Number collaborations artist_name  \
1745  mn0000239827   mn0000005307                      7  John Mayer   

     feat_artist_name  
1745      Alicia Keys  
        artist_id feat_artist_id  Number collaborations artist_name  \
125  mn0000021009   mn0000005953                      6       Slash   

    feat_artist_name  
125     Alice Cooper  
         artist_id feat_artist_id  Number collaborations  artist_name  \
1808  mn0000255210   mn0000006334                     85  Diana Krall   

     feat_artist_name  
1808     Tony Bennett  
         artist_id feat_artist_id  Number collaborations    artist_name  \
2243  mn0000317093   mn0000006334                     41  Nat King Cole   

     feat_artist_name  
2243     Tony Bennett  
         artist_id feat_artist_id  Number collaborations artist_name  \
4186  mn0000796734   mn0000006334                     30  Elton John   

     feat_artist_name  
4186     Tony Bennett  
         artist_id feat_artist_

        artist_id feat_artist_id  Number collaborations   artist_name  \
458  mn0000070929   mn0000083013                     69  The Jacksons   

    feat_artist_name  
458    The Jackson 5  
         artist_id feat_artist_id  Number collaborations      artist_name  \
2774  mn0000467203   mn0000083013                    332  Michael Jackson   

     feat_artist_name  
2774    The Jackson 5  
        artist_id feat_artist_id  Number collaborations     artist_name  \
191  mn0000029884   mn0000085915                     56  Paul McCartney   

    feat_artist_name  
191       Billy Joel  
        artist_id feat_artist_id  Number collaborations  artist_name  \
315  mn0000046861   mn0000085915                     56  Ray Charles   

    feat_artist_name  
315       Billy Joel  
         artist_id feat_artist_id  Number collaborations artist_name  \
3275  mn0000606283   mn0000088035                      7  Al Jarreau   

     feat_artist_name  
3275      Kelly Price  
         artist_id feat

         artist_id feat_artist_id  Number collaborations   artist_name  \
1796  mn0000252978   mn0000159611                     26  Pedro Guerra   

     feat_artist_name  
1796             Bebe  
         artist_id feat_artist_id  Number collaborations    artist_name  \
4345  mn0000831236   mn0000167517                    187  Mark Knopfler   

     feat_artist_name  
4345     Dire Straits  
        artist_id feat_artist_id  Number collaborations artist_name  \
155  mn0000026061   mn0000172360                    102   Ana Belén   

    feat_artist_name  
155    Víctor Manuel  
        artist_id feat_artist_id  Number collaborations         artist_name  \
983  mn0000138672   mn0000172360                     21  Joan Manuel Serrat   

    feat_artist_name  
983    Víctor Manuel  
        artist_id feat_artist_id  Number collaborations   artist_name  \
448  mn0000069986   mn0000175286                     95  Kenny Rogers   

    feat_artist_name  
448     Dolly Parton  
        artist_id

         artist_id feat_artist_id  Number collaborations  \
4409  mn0000840402   mn0000233066                     77   

                 artist_name feat_artist_name  
4409  The Velvet Underground         Lou Reed  
         artist_id feat_artist_id  Number collaborations      artist_name  \
1315  mn0000184502   mn0000234518                     42  Ella Fitzgerald   

     feat_artist_name  
1315  Louis Armstrong  
        artist_id feat_artist_id  Number collaborations  artist_name  \
243  mn0000032438   mn0000237205                      2  Andrea Corr   

    feat_artist_name  
243          Madonna  
         artist_id feat_artist_id  Number collaborations artist_name  \
2513  mn0000361541   mn0000237773                     11    Kasabian   

     feat_artist_name  
2513      Mark Ronson  
         artist_id feat_artist_id  Number collaborations    artist_name  \
3434  mn0000627026   mn0000237773                      4  Amy Winehouse   

     feat_artist_name  
3434      Mark Ronson

         artist_id feat_artist_id  Number collaborations artist_name  \
2368  mn0000346336   mn0000316834                    119  Pink Floyd   

     feat_artist_name  
2368      Marvin Gaye  
         artist_id feat_artist_id  Number collaborations artist_name  \
3209  mn0000594665   mn0000316834                     81  Diana Ross   

     feat_artist_name  
3209      Marvin Gaye  
       artist_id feat_artist_id  Number collaborations   artist_name  \
62  mn0000006334   mn0000317093                     29  Tony Bennett   

   feat_artist_name  
62    Nat King Cole  
        artist_id feat_artist_id  Number collaborations artist_name  \
361  mn0000052062   mn0000317865                     20    Chayanne   

    feat_artist_name  
361   Jennifer Lopez  
         artist_id feat_artist_id  Number collaborations   artist_name  \
3627  mn0000673486   mn0000317865                      3  Marc Anthony   

     feat_artist_name  
3627   Jennifer Lopez  
         artist_id feat_artist_id  Numb

         artist_id feat_artist_id  Number collaborations artist_name  \
1581  mn0000219203   mn0000446509                     19          U2   

     feat_artist_name  
1581        Metallica  
         artist_id feat_artist_id  Number collaborations artist_name  \
1682  mn0000233066   mn0000446509                     48    Lou Reed   

     feat_artist_name  
1682        Metallica  
         artist_id feat_artist_id  Number collaborations     artist_name  \
3565  mn0000664817   mn0000449578                     51  Gloria Estefan   

         feat_artist_name  
3565  Miami Sound Machine  
        artist_id feat_artist_id  Number collaborations   artist_name  \
460  mn0000070929   mn0000467203                     29  The Jacksons   

    feat_artist_name  
460  Michael Jackson  
        artist_id feat_artist_id  Number collaborations    artist_name  \
594  mn0000083013   mn0000467203                    201  The Jackson 5   

    feat_artist_name  
594  Michael Jackson  
         artist_i

         artist_id feat_artist_id  Number collaborations artist_name  \
4499  mn0000858827   mn0000531986                      5       Queen   

     feat_artist_name  
4499      David Bowie  
         artist_id feat_artist_id  Number collaborations artist_name  \
4715  mn0000926548   mn0000531986                     45    Iggy Pop   

     feat_artist_name  
4715      David Bowie  
         artist_id feat_artist_id  Number collaborations   artist_name  \
3457  mn0000635848   mn0000536894                      4  David Bisbal   

     feat_artist_name  
3457     David Civera  
         artist_id feat_artist_id  Number collaborations artist_name  \
1792  mn0000246960   mn0000545074                     48       Wham!   

     feat_artist_name  
1792   George Michael  
         artist_id feat_artist_id  Number collaborations artist_name  \
4196  mn0000796734   mn0000545074                     76  Elton John   

     feat_artist_name  
4196   George Michael  
         artist_id feat_artist_

         artist_id feat_artist_id  Number collaborations     artist_name  \
3490  mn0000639397   mn0000652255                    146  Caetano Veloso   

     feat_artist_name  
3490     Gilberto Gil  
         artist_id feat_artist_id  Number collaborations    artist_name  \
3332  mn0000610166   mn0000659277                     43  Tiziano Ferro   

     feat_artist_name  
3332  Franco Battiato  
         artist_id feat_artist_id  Number collaborations   artist_name  \
3460  mn0000635848   mn0000664144                     26  David Bisbal   

     feat_artist_name  
3460           Gisela  
         artist_id feat_artist_id  Number collaborations artist_name  \
4006  mn0000775966   mn0000664144                     12      Chenoa   

     feat_artist_name  
4006           Gisela  
         artist_id feat_artist_id  Number collaborations artist_name  \
1357  mn0000186312   mn0000664817                     55  Celia Cruz   

     feat_artist_name  
1357   Gloria Estefan  
         artist_i

        artist_id feat_artist_id  Number collaborations artist_name  \
213  mn0000031442   mn0000810166                      9   Anastacia   

    feat_artist_name  
213    Vonda Shepard  
         artist_id feat_artist_id  Number collaborations artist_name  \
3296  mn0000607448   mn0000810166                     33    Al Green   

     feat_artist_name  
3296    Vonda Shepard  
         artist_id feat_artist_id  Number collaborations   artist_name  \
2176  mn0000307461   mn0000815039                     20  Van Morrison   

     feat_artist_name  
2176  John Lee Hooker  
        artist_id feat_artist_id  Number collaborations artist_name  \
395  mn0000066915   mn0000816890                     51   Bob Dylan   

    feat_artist_name  
395      Johnny Cash  
         artist_id feat_artist_id  Number collaborations      artist_name  \
2313  mn0000332141   mn0000816890                     68  Jerry Lee Lewis   

     feat_artist_name  
2313      Johnny Cash  
        artist_id feat_artist

         artist_id feat_artist_id  Number collaborations artist_name  \
2591  mn0000379125   mn0000939567                     12  Neil Young   

         feat_artist_name  
2591  Buffalo Springfield  
         artist_id feat_artist_id  Number collaborations artist_name  \
2740  mn0000424078   mn0000942595                     36      Miliki   

     feat_artist_name  
2740     Café Quijano  
         artist_id feat_artist_id  Number collaborations artist_name  \
2444  mn0000354733   mn0000942680                     12   Los Lobos   

     feat_artist_name  
2444      Café Tacuba  
         artist_id feat_artist_id  Number collaborations  artist_name  \
2634  mn0000391586   mn0000944700                      3  James Blunt   

     feat_artist_name  
2634      Leona Lewis  
         artist_id feat_artist_id  Number collaborations  artist_name  \
5153  mn0002996996   mn0000945951                      1  John Newman   

     feat_artist_name  
5153    Calvin Harris  
         artist_id feat