# Adding node attributes gender and country to the dataset

In order to have more attributes to look into for the ERGM, we decided to add node attributes to the dataset. Unfortunately, the Spotify API restricted was very restricted and did not contain this kind of information. A different API was found (https://musicbrainz.org) which contains a lot of informatin of artists, including gender and country. A response link was used and this response link gave an XML structure as output containing the artist information. 191 out of the final 384 nodes were checked manually because they either contained Unknown values after initial data collection or there were misalignments between the artist retrieved from the Spotify data and the artist the API retrieved. 

### Data preprocessing

In [1]:
#Importing the required libraries
import pandas as pd
import requests
import xmltodict
from xml.parsers.expat import ExpatError
import time

In [2]:
#Reading in the collaboration CSV into a dataframe
df_10_collab = pd.read_csv("df_10_collabs_no_dupl.csv")
df_10_collab.head()

Unnamed: 0,Ego,Alter1,Ego_id,Ego_popularity,Alter1_id,Alter1_popularity,N
0,Martin Garrix,AREA21,60d24wfXkVzDSfLS6hyCjZ,74,76YIoWHp3Ri3q1ocOPtFzp,47,24
1,David Guetta,AFROJACK,1Cs0zKBU1kc0i8ypK3B9ai,85,4D75GcNG95ebPtNvoNVXhz,69,30
2,David Guetta,Chris Willis,1Cs0zKBU1kc0i8ypK3B9ai,85,2qSEpijpT3YSXgxcXac1ly,55,65
3,David Guetta,Jack Back,1Cs0zKBU1kc0i8ypK3B9ai,85,4bXUaTjc7TQTvLqqCAlfYt,45,26
4,David Guetta,MORTEN,1Cs0zKBU1kc0i8ypK3B9ai,85,19HFRWmRCl27kTk6LeqAO8,60,49


In [3]:
#Removing all collaborations who have the same ID (meaning they collaborated with themselves)
df_10_collab_no_loop = df_10_collab[df_10_collab["Ego_id"] != df_10_collab["Alter1_id"]]
df_10_collab_no_loop = df_10_collab_no_loop.reset_index(drop = True)
df_10_collab_no_loop.head()

Unnamed: 0,Ego,Alter1,Ego_id,Ego_popularity,Alter1_id,Alter1_popularity,N
0,Martin Garrix,AREA21,60d24wfXkVzDSfLS6hyCjZ,74,76YIoWHp3Ri3q1ocOPtFzp,47,24
1,David Guetta,AFROJACK,1Cs0zKBU1kc0i8ypK3B9ai,85,4D75GcNG95ebPtNvoNVXhz,69,30
2,David Guetta,Chris Willis,1Cs0zKBU1kc0i8ypK3B9ai,85,2qSEpijpT3YSXgxcXac1ly,55,65
3,David Guetta,Jack Back,1Cs0zKBU1kc0i8ypK3B9ai,85,4bXUaTjc7TQTvLqqCAlfYt,45,26
4,David Guetta,MORTEN,1Cs0zKBU1kc0i8ypK3B9ai,85,19HFRWmRCl27kTk6LeqAO8,60,49


In [4]:
#Some names appear abbreviated in the dataset, which makes them harder to find, so these will be changed manually
df_10_collab_no_loop_adjusted = df_10_collab_no_loop.copy()
df_10_collab_no_loop_adjusted['Ego'].replace({"Armando": "Armando Manzanero", "24h": "24hrs", "DBX": "DMX", 
                                             "Thailand": "Thailand Philharmonic Orchestra", "BiD": "Bad Bunny", 
                                             "Cari": "Carin Leon", "Charles B": "Charles Bradley", "Dragen": "Drake", 
                                             "Dajiro": "Dairon La Formule", "Mainx": "Maino", "Miss B": "Miss Bashful", 
                                             "Susana": "Susana Baca"}, inplace = True)
df_10_collab_no_loop_adjusted['Alter1'].replace({"Cari": "Carin Leon"}, inplace = True)

In [5]:
#Removing collaborations of which one Artist has unknown node attributes after manual search
mysterious_artists_lst = ["Dj Victor", "James Perry", "Nashville Kids' Sound", "The Elite", "Power-Haus", "Avalok",
                         "Violet Light", "louisette", "nom de plume", "The Inala Ensemble", "Axan", "ORIGINS", "Bee"]
df_10_collab_no_loop_adjusted = df_10_collab_no_loop_adjusted[~(df_10_collab_no_loop_adjusted['Ego'].isin(mysterious_artists_lst))]
df_10_collab_no_loop_adjusted = df_10_collab_no_loop_adjusted[~(df_10_collab_no_loop_adjusted['Alter1'].isin(mysterious_artists_lst))]
df_10_collab_no_loop_adjusted = df_10_collab_no_loop_adjusted.reset_index(drop = True)
df_10_collab_no_loop_adjusted.head()

Unnamed: 0,Ego,Alter1,Ego_id,Ego_popularity,Alter1_id,Alter1_popularity,N
0,Martin Garrix,AREA21,60d24wfXkVzDSfLS6hyCjZ,74,76YIoWHp3Ri3q1ocOPtFzp,47,24
1,David Guetta,AFROJACK,1Cs0zKBU1kc0i8ypK3B9ai,85,4D75GcNG95ebPtNvoNVXhz,69,30
2,David Guetta,Chris Willis,1Cs0zKBU1kc0i8ypK3B9ai,85,2qSEpijpT3YSXgxcXac1ly,55,65
3,David Guetta,Jack Back,1Cs0zKBU1kc0i8ypK3B9ai,85,4bXUaTjc7TQTvLqqCAlfYt,45,26
4,David Guetta,MORTEN,1Cs0zKBU1kc0i8ypK3B9ai,85,19HFRWmRCl27kTk6LeqAO8,60,49


In [6]:
#Creating a series object to create a dataframe with only node information and its attributes
s_artist = pd.concat([df_10_collab_no_loop_adjusted["Ego"], df_10_collab_no_loop_adjusted["Alter1"]])
s_popularity = pd.concat([df_10_collab_no_loop_adjusted["Ego_popularity"], df_10_collab_no_loop_adjusted["Alter1_popularity"]])
df_nodes = pd.concat([s_artist, s_popularity], axis=1) #Stack the nodes below each other
df_nodes = df_nodes.reset_index(drop = True)
df_nodes = df_nodes.rename(columns={0: 'Artist', 1: 'Popularity'}) #Give appropriate column names

#Drop duplicate artist popularity score combinations
df_nodes_no_dup = df_nodes.drop_duplicates(subset=['Artist', 'Popularity'], keep= "first")
df_nodes_no_dup = df_nodes.drop_duplicates(subset=['Artist'], keep= "first") #Mistake with Spotify data collection, so bad bunny appears twice
df_nodes_no_dup = df_nodes_no_dup.reset_index(drop = True)

#Drop two entries who are retrieved wrong from the Spotify API
df_nodes_no_dup = df_nodes_no_dup[df_nodes_no_dup["Artist"] != "Aly & Fila FSOE Radio"] #ALy & Fila are already in the data
df_nodes_no_dup = df_nodes_no_dup[df_nodes_no_dup["Artist"] != "Future Sound of Egypt"] #This is a song
df_nodes_no_dup = df_nodes_no_dup.reset_index(drop = True)

In [7]:
#Preparing the data for the request link. All spaces in artist names need to be filled with a %
adjusted_artist_lst = [] 
for artist in df_nodes_no_dup["Artist"]:
    space = " "
    space_index_lst = [idx for idx, letter in enumerate(artist) if letter == space] #Retrieve the indexes of the spaces
    for i in space_index_lst:
        new_string = artist[:i] + "%" + artist[i+1:] #Fill the spaces with %
        artist = new_string
    adjusted_artist_lst.append(artist) #Store all adjusted artist names in a list
    space_index_lst = [] #Reset the index list for the next iteration

### Request artist information via API

In [8]:
final_artist_dict = {}
for str_artist in adjusted_artist_lst:
    time.sleep(1) #The API has a limit of 1 request per second, so a delay of 1 second was added before every request
    request_link = "https://musicbrainz.org/ws/2/artist/?query=" + str_artist #Creating the request link
    response = requests.get(request_link)
    #The first try and except makes sure the loop continues even if there are issues with the API structure
    try:
        artist_dict = xmltodict.parse(response.content) #Translate XML output to a dictionary
        additional_info = []
        for artist_info in artist_dict["metadata"]["artist-list"]["artist"]: #Get into the layer where the needed info is stored
            name = artist_info["name"] #Retrieve artist name according to the API such that later the correctness can be checked
            try:
                country = artist_info["area"]["name"]  #Retrieve the country name
            except KeyError:
                country = "Unknown"  #Fill Unknown if there is no country name stored in the API output
            try:
                gender = artist_info["gender"]["#text"] #Retrieve the gender 
            except KeyError:
                gender = "Unknown"  #Fill Unknown if there is no gender stored in the API output
            additional_info.extend([name, gender, country]) #Add name, gender and country retrieved from the API
            final_artist_dict[str_artist] = additional_info #Store artist name from the used dataset as key together with the attributes retrieved from the API
            break
    except (ExpatError, TypeError, KeyError):
        final_artist_dict[str_artist] = ["Unknown", "Unknown"] #Add unknown for gender and country if API raises an error for that particular artist
        pass
final_artist_dict

{'Martin%Garrix': ['Martin Garrix', 'male', 'Netherlands'],
 'David%Guetta': ['David Guetta', 'male', 'France'],
 'Dimitri%Vegas%&%Like%Mike': ['Dimitri Vegas', 'male', 'Belgium'],
 '24hrs': ['24hrs', 'male', 'United States'],
 '2WEI': ['2WEI', 'Unknown', 'Germany'],
 'A%Boogie%Wit%da%Hoodie': ['A Boogie Wit da Hoodie', 'male', 'United States'],
 'A%Touch%Of%Class': ['ATC', 'Unknown', 'Germany'],
 'Above%&%Beyond': ['Above', 'Unknown', 'Unknown'],
 'AFROJACK': ['Afrojack', 'male', 'Netherlands'],
 'Alan%Walker': ['Alan Walker', 'male', 'Norway'],
 'Alex%M.O.R.P.H.': ['Alex M.O.R.P.H.', 'male', 'Germany'],
 'Aloe%Blacc': ['Aloe Blacc', 'male', 'United States'],
 'ALOTT': ['ALOTT', 'Unknown', 'Berlin'],
 'Aly%&%Fila': ['Aly', 'male', 'Unknown'],
 'Ana%Criado': ['Ana Criado', 'female', 'United Kingdom'],
 'Antoine%Clamaran': ['Antoine Clamaran', 'male', 'France'],
 'Anuel%AA': ['400+', 'male', 'Chicago'],
 'Armando%Manzanero': ['Armando Manzanero', 'male', 'Mexico'],
 'Barbara%Tucker': ['

### Data Preparation

Here, the obtained dictionary is transformed into a dataframe. 

In [9]:
#Put the dictionary output into a dataframe, with the Spotify Artist, API artist name, gender and country as columns
df_combined = pd.DataFrame(final_artist_dict.items(), columns = ["Artist", "Combined_list"])
df_split = pd.DataFrame(df_combined['Combined_list'].to_list(), columns=["API_Artist", 'Gender','Country'])

In [10]:
#Replace the % back to spaces in the dataframe
df_temporary = df_combined.join(df_split)
df_temporary['Artist'] = df_temporary['Artist'].str.replace('%',' ', regex=False)
df_temporary = df_temporary[["Artist", "API_Artist", "Gender", "Country"]]

#### Replacing API artist 400+

Some of the existing artists (also existing on the API website) got the API artist name 400+, which means all these artists who got this classification are classified as males and are from Chicago. This is manually fixed using the information on the API website directly and where needed, some Googling. The 400+ instances occured 21 times. 

In [11]:
df_final = df_temporary.copy() #Make a copy in which the adjustments are made

In [12]:
#Example of the 400+ classifications
df_final[df_final["API_Artist"] == "400+"].head()

Unnamed: 0,Artist,API_Artist,Gender,Country
16,Anuel AA,400+,male,Chicago
20,Benny Benassi,400+,male,Chicago
67,J Balvin,400+,male,Chicago
81,Miss Bashful,400+,male,Chicago
90,Richard Bedford,400+,male,Chicago


In [13]:
annuel = {"API_Artist":"Annuel AA", "Gender": "male", "Country": "Puerto Rico"}
df_final.loc[16, annuel.keys()] = tuple(annuel.values())

benny_benassi = {"API_Artist":"Benny Benassi", "Gender": "male", "Country": "Italy"}
df_final.loc[20, benny_benassi.keys()] = tuple(benny_benassi.values())

j_balvin = {"API_Artist":"J Balvin", "Gender": "male", "Country": "Colombia"}
df_final.loc[67, j_balvin.keys()] = tuple(j_balvin.values())

miss_b = {"API_Artist":"Miss Bashful", "Country": "United States"}
df_final.loc[81, miss_b.keys()] = tuple(miss_b.values())

rich_bedform = {"API_Artist":"Richard Bedford", "Gender": "male", "Country": "United Kingdom"}
df_final.loc[90, rich_bedform.keys()] = tuple(rich_bedform.values())

susana_baca = {"API_Artist":"Susana Baca", "Gender": "female", "Country": "Peru"}
df_final.loc[103, susana_baca.keys()] = tuple(susana_baca.values())

jack_back = {"API_Artist":"Jack Back", "Gender": "male", "Country": "United States"}
df_final.loc[110, jack_back.keys()] = tuple(jack_back.values())

paul_denton = {"API_Artist":"Paul Denton", "Gender": "male", "Country": "Ireland"}
df_final.loc[144, paul_denton.keys()] = tuple(paul_denton.values())

arman_baez = {"API_Artist":"Armando Baez", "Gender": "male", "Country": "Mexico"}
df_final.loc[155, arman_baez.keys()] = tuple(arman_baez.values())

jason_caesar = {"API_Artist":"Jason Caesar", "Gender": "male", "Country": "United States"}
df_final.loc[174, jason_caesar.keys()] = tuple(jason_caesar.values())

egbert_derix = {"API_Artist":"Egbert Derix", "Gender": "male", "Country": "Netherlands"}
df_final.loc[220, egbert_derix.keys()] = tuple(egbert_derix.values())

holland_baroque = {"API_Artist":"Holland Baroque", "Gender": "mixed", "Country": "Netherlands"}
df_final.loc[229, holland_baroque.keys()] = tuple(holland_baroque.values())

colin_benders = {"API_Artist":"Colin Benders", "Gender": "male", "Country": "Netherlands"}
df_final.loc[235, colin_benders.keys()] = tuple(colin_benders.values())

michel_banabila = {"API_Artist":"Michel Banabila", "Gender": "male", "Country": "Netherlands"}
df_final.loc[240, michel_banabila.keys()] = tuple(michel_banabila.values())

joey_baron = {"API_Artist":"Joey Baron", "Gender": "male", "Country": "United States"}
df_final.loc[243, joey_baron.keys()] = tuple(joey_baron.values())

pieter_bast = {"API_Artist":"Pieter Bast", "Gender": "male", "Country": "United States"}
df_final.loc[249, pieter_bast.keys()] = tuple(pieter_bast.values())

db_bantino = {"API_Artist":"Db Bantino", "Gender": "male", "Country": "United States"}
df_final.loc[251, db_bantino.keys()] = tuple(db_bantino.values())

dj_ace = {"API_Artist":"DJ Ace", "Gender": "male", "Country": "United States"}
df_final.loc[302, dj_ace.keys()] = tuple(dj_ace.values())

lil_baby = {"API_Artist":"Lil Baby", "Gender": "male", "Country": "United States"}
df_final.loc[307, lil_baby.keys()] = tuple(lil_baby.values())

petra_berger = {"API_Artist":"Petra Berger", "Gender": "female", "Country": "Netherlands"}
df_final.loc[312, petra_berger.keys()] = tuple(petra_berger.values())

lady_bee = {"API_Artist":"Lady Bee", "Gender": "female", "Country": "Netherlands"}
df_final.loc[355, lady_bee.keys()] = tuple(lady_bee.values())

#### Filling unknown countries

Here, the rows where only the country is unknown is manually filled. This occured 18 times. All information was double checked when searching for the country, so sometimes gender or artist name is adjusted as well. 

In [14]:
df_final[(df_final["Country"] == "Unknown") & (df_final["Gender"] != "Unknown")].head()

Unnamed: 0,Artist,API_Artist,Gender,Country
13,Aly & Fila,Aly,male,Unknown
27,Bruno Martini,Bruno Martini,male,Unknown
44,Djay W,Djay,male,Unknown
111,MORTEN,MORTEN,male,Unknown
117,Ali Christenhusz,Ali Christenhusz,male,Unknown


In [15]:
#unknown countries filled
aly_fila = {"API_Artist":"Aly & Fila", "Country": "Egypt"}
df_final.loc[13, aly_fila.keys()] = tuple(aly_fila.values())

bruno_martini = {"API_Artist":"Bruno Martini", "Country": "Brazil"}
df_final.loc[27, bruno_martini.keys()] = tuple(bruno_martini.values())

djay_w = {"API_Artist":"Djay W", "Country": "Brazil"}
df_final.loc[44, djay_w.keys()] = tuple(djay_w.values())

morten = {"API_Artist":"MORTEN", "Country": "Denmark"}
df_final.loc[111, morten.keys()] = tuple(morten.values())

ali_c = {"API_Artist":"Ali Christenhusz", "Country": "Germany"}
df_final.loc[117, ali_c.keys()] = tuple(ali_c.values())

edda_hayes = {"API_Artist":"Edda Hayes", "Country": "United States"}
df_final.loc[118, edda_hayes.keys()] = tuple(edda_hayes.values())

nlw = {"API_Artist":"NLW", "Country": "Netherlands"}
df_final.loc[124, nlw.keys()] = tuple(nlw.values())

omar_sherif = {"API_Artist":"Omar Sherif", "Country": "Egypt"}
df_final.loc[133, omar_sherif.keys()] = tuple(omar_sherif.values())

craig_con = {"API_Artist":"Craig Connelly", "Country": "United Kingdom"}
df_final.loc[134, craig_con.keys()] = tuple(craig_con.values())

ciaran_mc = {"API_Artist":"Ciaran McAuley", "Country": "Ireland"}
df_final.loc[148, ciaran_mc.keys()] = tuple(ciaran_mc.values())

princess_ros = {"API_Artist":"PRINCE$$ ROSIE", "Gender": "male", "Country": "United States"}
df_final.loc[192, princess_ros.keys()] = tuple(princess_ros.values())

john_hutch = {"Artist":"John Hutchinson","API_Artist":"John Hutchinson", "Country": "United Kingdom"}
df_final.loc[201, john_hutch.keys()] = tuple(john_hutch.values())

juan_pablo = {"API_Artist":"Juan Pablo Dobal", "Country": "Argentina"}
df_final.loc[227, juan_pablo.keys()] = tuple(juan_pablo.values())

kris_spirit = {"API_Artist":"Kris the $pirit", "Country": "Canada"}
df_final.loc[250, kris_spirit.keys()] = tuple(kris_spirit.values())

lucky_luke = {"API_Artist":"Lucky Luke", "Country": "Lithuania"}
df_final.loc[272, lucky_luke.keys()] = tuple(lucky_luke.values())

oj_d_juice = {"API_Artist":"OJ Da Juiceman", "Country": "United States"}
df_final.loc[276, oj_d_juice.keys()] = tuple(oj_d_juice.values())

adina_butar = {"API_Artist":"Adina Butar", "Country": "Romania"}
df_final.loc[325, adina_butar.keys()] = tuple(adina_butar.values())

azra = {"API_Artist":"Az-Ra", "Country": "United Kingdom"}
df_final.loc[381, azra.keys()] = tuple(azra.values())

#### Filling unknown genders 

Here, the rows where only the gender is unknown is manually filled. This occured 82 times. All information was double checked when searching for the gender, so sometimes country or artist name is adjusted as well.

In [16]:
df_final[(df_final["Country"] != "Unknown") & (df_final["Gender"] == "Unknown")].head()

Unnamed: 0,Artist,API_Artist,Gender,Country
4,2WEI,2WEI,Unknown,Germany
6,A Touch Of Class,ATC,Unknown,Germany
12,ALOTT,ALOTT,Unknown,Berlin
31,"Cerf, Mitiska & Jaren","Cerf, Mitiska & Jaren",Unknown,United States
33,Cheat Codes,Cheat Codes,Unknown,United States


In [17]:
#Unknown genders filled
two_wei = {"API_Artist":"2WEI", "Gender": "male"}
df_final.loc[4, two_wei.keys()] = tuple(two_wei.values())

atc = {"API_Artist":"A Touch Of Class", "Gender": "mixed"}
df_final.loc[6, atc.keys()] = tuple(atc.values())

alott = {"API_Artist":"ALOTT", "Gender": "male", "Country": "Germany"}
df_final.loc[12, alott.keys()] = tuple(alott.values())

cmj = {"API_Artist":"Cerf, Mitiska & Jaren", "Gender": "mixed"}
df_final.loc[31, cmj.keys()] = tuple(cmj.values())

cheat_codes = {"API_Artist":"Cheat Codes", "Gender": "mixed"}
df_final.loc[33, cheat_codes.keys()] = tuple(cheat_codes.values())

cosmic_gate = {"API_Artist":"Cosmic Gate", "Gender": "male"}
df_final.loc[36, cosmic_gate.keys()] = tuple(cosmic_gate.values())

dmitriy = {"API_Artist":"Dmitriy Mityukhin", "Gender": "male", "Country": "Russia"}
df_final.loc[45, dmitriy.keys()] = tuple(dmitriy.values())

faithless = {"API_Artist":"Faithless", "Gender": "mixed"}
df_final.loc[50, faithless.keys()] = tuple(faithless.values())

fatum = {"API_Artist":"Fatum", "Gender": "male"}
df_final.loc[51, fatum.keys()] = tuple(fatum.values())

frenna = {"Artist":"Frenna", "API_Artist":"Frenna", "Gender": "male", "Country": "Netherlands"}
df_final.loc[58, frenna.keys()] = tuple(frenna.values())

gaullin = {"Artist":"Gaullin", "API_Artist":"Gaullin", "Gender": "male", "Country": "Lithuania"}
df_final.loc[61, gaullin.keys()] = tuple(gaullin.values())

gitd = {"Artist":"GLOWINTHEDARK", "API_Artist":"GLOWINTHEDARK", "Gender": "male", "Country": "Netherlands"}
df_final.loc[62, gitd.keys()] = tuple(gitd.values())

inner_city = {"API_Artist":"Inner City", "Gender": "mixed"}
df_final.loc[66, inner_city.keys()] = tuple(inner_city.values())

lbm = {"API_Artist":"Ladysmith Black Mambazo", "Gender": "male", "Country": "South Africa" }
df_final.loc[74, lbm.keys()] = tuple(lbm.values())

mark_six = {"API_Artist":"Mark Sixma", "Gender": "male" }
df_final.loc[77, mark_six.keys()] = tuple(mark_six.values())

sleazy_ster = {"API_Artist":"Sleazy Stereo", "Gender": "male", "Country": "Netherlands"}
df_final.loc[98, sleazy_ster.keys()] = tuple(sleazy_ster.values())

thailand = {"API_Artist":"Thailand Philharmonic Orchestra", "Gender": "mixed"}
df_final.loc[104, thailand.keys()] = tuple(thailand.values())

the_egg = {"API_Artist":"The Egg", "Gender": "male"}
df_final.loc[105, the_egg.keys()] = tuple(the_egg.values())

wnw = {"API_Artist":"W&W", "Gender": "male", "Country": "Netherlands"}
df_final.loc[107, wnw.keys()] = tuple(wnw.values())

bassjack = {"API_Artist":"Bassjackers", "Gender": "male"}
df_final.loc[112, bassjack.keys()] = tuple(bassjack.values())

dj_jam = {"API_Artist":"DJ JAM", "Gender": "male", "Country": "Japan"}
df_final.loc[113, dj_jam.keys()] = tuple(dj_jam.values())

madein = {"API_Artist":"MadeinTYO", "Gender": "male", "Country": "United States"}
df_final.loc[114, madein.keys()] = tuple(madein.values())

slf = {"Artist":"Slick LaFlare", "API_Artist":"Slick LaFlare", "Gender": "male", "Country": "United States"}
df_final.loc[116, slf.keys()] = tuple(slf.values())

oceanlab = {"Artist":"OceanLab", "API_Artist":"OceanLab", "Gender": "mixed", "Country": "United Kingdom"}
df_final.loc[123, oceanlab.keys()] = tuple(oceanlab.values())

cradle = {"Artist":"Cradle Orchestra", "API_Artist":"Cradle Orchestra", "Gender": "mixed", "Country": "Japan"}
df_final.loc[127, cradle.keys()] = tuple(cradle.values())

emanon = {"Artist":"Emanon", "API_Artist":"Emanon", "Gender": "male", "Country": "United States"}
df_final.loc[128, emanon.keys()] = tuple(emanon.values())

exile = {"Artist":"Exile", "API_Artist":"Exile", "Gender": "male", "Country": "United States"}
df_final.loc[129, exile.keys()] = tuple(exile.values())

vize = {"API_Artist":"VIZE", "Gender": "male"}
df_final.loc[130, vize.keys()] = tuple(vize.values())

daxson = {"Artist":"Daxson", "API_Artist":"Daxson", "Gender": "male", "Country": "United Kingdom"}
df_final.loc[131, daxson.keys()] = tuple(daxson.values())

milk = {"Artist":"Milkwish", "API_Artist":"Milkwish", "Gender": "male", "Country": "Poland"}
df_final.loc[136, milk.keys()] = tuple(milk.values())

dylhen = {"Artist":"Dylhen", "API_Artist":"Dylhen", "Gender": "male", "Country": "United Kingdom"}
df_final.loc[137, dylhen.keys()] = tuple(dylhen.values())

fuenka = {"Artist":"Fuenka", "API_Artist":"Fuenka", "Gender": "male", "Country": "United Kingdom"}
df_final.loc[138, fuenka.keys()] = tuple(fuenka.values())

blaze = {"Artist":"Blaze", "API_Artist":"Blaze", "Gender": "male", "Country": "United States"}
df_final.loc[163, blaze.keys()] = tuple(blaze.values())

udaufl = {"Artist":"U.D.A.U.F.L.", "API_Artist":"U.D.A.U.F.L.", "Gender": "mixed", "Country": "United States"}
df_final.loc[164, udaufl.keys()] = tuple(udaufl.values())

andy_kauf = {"API_Artist":"Andy Kaufhold", "Gender": "male"}
df_final.loc[170, andy_kauf.keys()] = tuple(andy_kauf.values())

equador = {"API_Artist":"Equador", "Gender": "mixed", "Country": "United Kingdom"}
df_final.loc[176, equador.keys()] = tuple(equador.values())

blade = {"Artist":"Blademasterz","API_Artist":"Blademasterz", "Gender": "male", "Country": "Netherlands"}
df_final.loc[177, blade.keys()] = tuple(blade.values())

tnt = {"API_Artist":"TNT", "Gender": "male"}
df_final.loc[178, tnt.keys()] = tuple(tnt.values())

silk = {"API_Artist":"Silk Sonic", "Gender": "male"}
df_final.loc[180, silk.keys()] = tuple(silk.values())

funk_wav = {"API_Artist":"Funk Wav", "Gender": "male", "Country": "United Kingdom"}
df_final.loc[182, funk_wav.keys()] = tuple(funk_wav.values())

msb = {"API_Artist":"Menahan Street Band", "Gender": "male"}
df_final.loc[191, msb.keys()] = tuple(msb.values())

tm = {"API_Artist":"Tin Machine", "Gender": "male"}
df_final.loc[202, tm.keys()] = tuple(tm.values())

phil_orch = {"API_Artist":"Philadelphia Orchestra", "Gender": "mixed"}
df_final.loc[206, phil_orch.keys()] = tuple(phil_orch.values())

blackb = {"API_Artist":"Blackburner", "Gender": "male", "Country": "United States"}
df_final.loc[208, blackb.keys()] = tuple(blackb.values())

tw_juke = {"API_Artist":"Twisted Jukebox", "Gender": "mixed", "Country": "Mexico"}
df_final.loc[212, tw_juke.keys()] = tuple(tw_juke.values())

boelie = {"API_Artist":"Boelie Vis", "Gender": "male", "Country": "Netherlands"}
df_final.loc[222, boelie.keys()] = tuple(boelie.values())

crq = {"API_Artist":"Calefax Reed Quintet", "Gender": "male"}
df_final.loc[224, crq.keys()] = tuple(crq.values())

nso = {"API_Artist":"Netherlands Symphony Orchestra", "Gender": "mixed"}
df_final.loc[233, nso.keys()] = tuple(nso.values())

kyte = {"Artist":"Kytecrash","API_Artist":"Kytecrash", "Gender": "male", "Country": "Netherlands"}
df_final.loc[234, kyte.keys()] = tuple(kyte.values())

gate = {"API_Artist":"Gatecrash", "Gender": "male", "Country": "Netherlands"}
df_final.loc[237, gate.keys()] = tuple(gate.values())

mat_quar = {"API_Artist":"Matangi Quartet", "Gender": "male"}
df_final.loc[238, mat_quar.keys()] = tuple(mat_quar.values())

aphro = {"API_Artist":"Aphrohead", "Gender": "male", "Country": "United States"}
df_final.loc[258, aphro.keys()] = tuple(aphro.values())

gour = {"API_Artist":"Gouryella", "Gender": "male"}
df_final.loc[260, gour.keys()] = tuple(gour.values())

tens = {"API_Artist":"Tensnake", "Gender": "male", "Country": "Germany"}
df_final.loc[265, tens.keys()] = tuple(tens.values())

diq = {"API_Artist":"Diquenza", "Gender": "male", "Country": "Netherlands"}
df_final.loc[269, diq.keys()] = tuple(diq.values())

sfb = {"API_Artist":"SFB", "Gender": "male", "Country": "Netherlands"}
df_final.loc[274, sfb.keys()] = tuple(sfb.values())

bwd = {"API_Artist":"BigWalkDog", "Gender": "male", "Country": "United States"}
df_final.loc[279, bwd.keys()] = tuple(bwd.values())

foog = {"API_Artist":"Foogiano", "Gender": "male", "Country": "United States"}
df_final.loc[286, foog.keys()] = tuple(foog.values())

migos = {"API_Artist":"Migos", "Gender": "male", "Country": "United States"}
df_final.loc[288, migos.keys()] = tuple(migos.values())

cnote = {"API_Artist":"C-Note", "Gender": "male", "Country": "United States"}
df_final.loc[294, cnote.keys()] = tuple(cnote.values())

ridwelle = {"API_Artist":"Ridwello", "Gender": "male", "Country": "France"}
df_final.loc[313, ridwelle.keys()] = tuple(ridwelle.values())

theroots = {"API_Artist":"The Roots", "Gender": "male", "Country": "United States"}
df_final.loc[314, theroots.keys()] = tuple(theroots.values())

dakota = {"API_Artist":"Dakota", "Gender": "female", "Country": "United Kingdom"}
df_final.loc[326, dakota.keys()] = tuple(dakota.values())

thevamps = {"API_Artist":"The Vamps", "Gender": "male", "Country": "United Kingdom"}
df_final.loc[327, thevamps.keys()] = tuple(thevamps.values())

fort = {"API_Artist":"Fort Minor", "Gender": "male"}
df_final.loc[329, fort.keys()] = tuple(fort.values())

sob = {"API_Artist":"Styles Of Beyond", "Gender": "male"}
df_final.loc[330, sob.keys()] = tuple(sob.values())

dbbd = {"API_Artist":"DBBD", "Gender": "male", "Country": "Germany"}
df_final.loc[332, dbbd.keys()] = tuple(dbbd.values())

monocule = {"API_Artist":"Monocule", "Gender": "male", "Country": "Netherlands"}
df_final.loc[333, monocule.keys()] = tuple(monocule.values())

astronau = {"API_Artist":"Astronautalis", "Gender": "male", "Country": "United States"}
df_final.loc[335, astronau.keys()] = tuple(astronau.values())

shredd = {"API_Artist":"Shredders", "Gender": "male", "Country": "United States"}
df_final.loc[337, shredd.keys()] = tuple(shredd.values())

gp = {"API_Artist":"Guaranteed Pure", "Gender": "male", "Country": "United Kingdom"}
df_final.loc[342, gp.keys()] = tuple(gp.values())

stils = {"API_Artist":"Stiltskin", "Gender": "male"}
df_final.loc[343, stils.keys()] = tuple(stils.values())

tbse = {"API_Artist":"The Berlin Symphony Ensemble", "Gender": "mixed", "Country": "Germany"}
df_final.loc[344, tbse.keys()] = tuple(tbse.values())

tho = {"API_Artist":"The Heritage Orchestra", "Gender": "mixed"}
df_final.loc[345, tho.keys()] = tuple(tho.values())

shabo = {"API_Artist":"Shaboozey", "Gender": "male", "Country": "United States"}
df_final.loc[351, shabo.keys()] = tuple(shabo.values())

loneb = {"API_Artist":"lonelyboy", "Gender": "male", "Country": "United States"}
df_final.loc[359, loneb.keys()] = tuple(loneb.values())

mowe = {"API_Artist":"MOUNT WESTMORE", "Gender": "male"}
df_final.loc[361, mowe.keys()] = tuple(mowe.values())

tdp = {"API_Artist":"Tha Dogg Pound", "Gender": "male"}
df_final.loc[370, tdp.keys()] = tuple(tdp.values())

dof7 = {"API_Artist":"7 Days Of Funk", "Gender": "male"}
df_final.loc[374, dof7.keys()] = tuple(dof7.values())

theau = {"API_Artist":"Theaudience", "Gender": "mixed", "Country": "United Kingdom"}
df_final.loc[376, theau.keys()] = tuple(theau.values())

tacol = {"API_Artist":"Tattoo Colour", "Gender": "male"}
df_final.loc[380, tacol.keys()] = tuple(tacol.values())

allure = {"API_Artist":"Allure", "Gender": "male"}
df_final.loc[382, allure.keys()] = tuple(allure.values())

#### Filling unknown gender and country columns

Here, the rows where gender and country is unknown is manually filled. This occured 40 times. All information was double checked when searching so when in doubt if even the artist was the right one, this was also checked and adjusted.

In [18]:
df_final[(df_final["Country"] == "Unknown") & (df_final["Gender"] == "Unknown")].head()

Unnamed: 0,Artist,API_Artist,Gender,Country
7,Above & Beyond,Above,Unknown,Unknown
23,Blank & Jones,blank blank,Unknown,Unknown
38,Dairon La Formule,Dairon,Unknown,Unknown
86,Prezioso,Prezioso,Unknown,Unknown
93,Shato,Shato,Unknown,Unknown


In [19]:
#Missing both gender and country in the initial API pull
a_and_b = {"API_Artist":"Above & Beyond", "Gender": "male", "Country": "United Kingdom & Finland"}
df_final.loc[7, a_and_b.keys()] = tuple(a_and_b.values())

b_and_j = {"API_Artist":"Blank & Jones", "Gender": "male", "Country": "Germany"}
df_final.loc[23, b_and_j.keys()] = tuple(b_and_j.values())

dlf = {"Artist":"Dairon La Formula", "API_Artist":"Dairon La Formula", "Gender": "male", "Country": "United States"}
df_final.loc[38, dlf.keys()] = tuple(dlf.values())

prezioso = {"API_Artist":"Prezioso", "Gender": "male", "Country": "Italy"}
df_final.loc[86, prezioso.keys()] = tuple(prezioso.values())

shato = {"API_Artist":"Shato", "Gender": "male", "Country": "Slovakia"}
df_final.loc[93, shato.keys()] = tuple(shato.values())

area21 = {"API_Artist":"AREA21", "Gender": "male", "Country": "Netherlands and United States"}
df_final.loc[108, area21.keys()] = tuple(area21.values())

lekoma = {"API_Artist":"Le Koma", "Gender": "male", "Country": "France"}
df_final.loc[120, lekoma.keys()] = tuple(lekoma.values())

james_dym = {"API_Artist":"James Dymond", "Gender": "male", "Country": "United Kingdom"}
df_final.loc[140, james_dym.keys()] = tuple(james_dym.values())

s_and_t = {"API_Artist":"Stoneface & Terminal", "Gender": "male", "Country": "Germany"}
df_final.loc[145, s_and_t.keys()] = tuple(s_and_t.values())

asg = {"API_Artist":"Agua Sin Gas", "Gender": "male", "Country": "France"}
df_final.loc[152, asg.keys()] = tuple(asg.values())

eedm = {"API_Artist":"Eje Ejecutantes de México", "Gender": "male", "Country": "Mexico"}
df_final.loc[154, eedm.keys()] = tuple(eedm.values())

tuccillo = {"API_Artist":"Tuccillo", "Gender": "male", "Country": "Italy"}
df_final.loc[162, tuccillo.keys()] = tuple(tuccillo.values())

thebiz = {"API_Artist":"The Biz", "Gender": "mixed", "Country": "Italy"}
df_final.loc[165, thebiz.keys()] = tuple(thebiz.values())

jap_jon = {"API_Artist":"Jaspa Jones", "Gender": "male", "Country": "Germany"}
df_final.loc[171, jap_jon.keys()] = tuple(jap_jon.values())

cerf = {"API_Artist":"Cerf", "Gender": "female", "Country": "United States"}
df_final.loc[188, cerf.keys()] = tuple(cerf.values())

da_tw = {"API_Artist":"Da Tweekaz", "Gender": "male", "Country": "Norway"}
df_final.loc[195, da_tw.keys()] = tuple(da_tw.values())

shh = {"API_Artist":"Sted-E & Hybrid Heights", "Gender": "male", "Country": "United States"}
df_final.loc[199, shh.keys()] = tuple(shh.values())

the_yabo = {"API_Artist":"The Yabo", "Gender": "male", "Country": "Cuba"}
df_final.loc[200, the_yabo.keys()] = tuple(the_yabo.values())

rav_bas = {"API_Artist":"Ravelli Brass", "Gender": "male", "Country": "Netherlands"}
df_final.loc[218, rav_bas.keys()] = tuple(rav_bas.values())

evq = {"API_Artist":"Eric Vloeimans Quartet", "Gender": "male", "Country": "Netherlands"}
df_final.loc[239, evq.keys()] = tuple(evq.values())

evf = {"API_Artist":"Eric Vloeimans' Fugimundi", "Gender": "male", "Country": "Netherlands"}
df_final.loc[244, evf.keys()] = tuple(evf.values())

arnold_do = {"API_Artist":"Arnold Dooyeweerd", "Gender": "male", "Country": "Netherlands"}
df_final.loc[248, arnold_do.keys()] = tuple(arnold_do.values())

r_plus = {"API_Artist":"R Plus", "Gender": "male", "Country": "United Kingdom"}
df_final.loc[252, r_plus.keys()] = tuple(r_plus.values())

am_fox = {"API_Artist":"Amelia Fox", "Gender": "female", "Country": "United States"}
df_final.loc[253, am_fox.keys()] = tuple(am_fox.values())

judah = {"API_Artist":"Judah", "Gender": "male", "Country": "United States"}
df_final.loc[256, judah.keys()] = tuple(judah.values())

ge_ris = {"API_Artist":"Gemini Rising", "Gender": "mixed", "Country": "United States"}
df_final.loc[264, ge_ris.keys()] = tuple(ge_ris.values())

ap3 = {"API_Artist":"AP3", "Gender": "male", "Country": "Canada"}
df_final.loc[266, ap3.keys()] = tuple(ap3.values())

priceless = {"API_Artist":"Priceless", "Gender": "male", "Country": "Netherlands"}
df_final.loc[268, priceless.keys()] = tuple(priceless.values())

phil_mor = {"API_Artist":"Philly Moré", "Gender": "male", "Country": "Netherlands"}
df_final.loc[273, phil_mor.keys()] = tuple(phil_mor.values())

figg_pan = {"API_Artist":"Figg Panamera", "Gender": "male", "Country": "United States"}
df_final.loc[304, figg_pan.keys()] = tuple(figg_pan.values())

ysl = {"API_Artist":"Young Stoner Life", "Gender": "male", "Country": "United States"}
df_final.loc[308, ysl.keys()] = tuple(ysl.values())

lob_boy = {"API_Artist":"Lobby Boyz", "Gender": "male", "Country": "United States"}
df_final.loc[320, lob_boy.keys()] = tuple(lob_boy.values())

m6 = {"API_Artist":"M6", "Gender": "male", "Country": "Netherlands"}
df_final.loc[323, m6.keys()] = tuple(m6.values())

mcrt = {"API_Artist":"MCR-T", "Gender": "male", "Country": "Germany"}
df_final.loc[331, mcrt.keys()] = tuple(mcrt.values())

fo_fi = {"API_Artist":"Four Fists", "Gender": "male", "Country": "United States"}
df_final.loc[334, fo_fi.keys()] = tuple(fo_fi.values())

disfu = {"API_Artist":"Disfunktion", "Gender": "male", "Country": "Netherlands"}
df_final.loc[339, disfu.keys()] = tuple(disfu.values())

cro_ct = {"API_Artist":"Crowd+Ctrl", "Gender": "male", "Country": "Australia"}
df_final.loc[346, cro_ct.keys()] = tuple(cro_ct.values())

kinoh = {"API_Artist":"Kinoh", "Gender": "male", "Country": "Netherlands"}
df_final.loc[360, kinoh.keys()] = tuple(kinoh.values())

oc_lo = {"API_Artist":"October London", "Gender": "male", "Country": "United States"}
df_final.loc[367, oc_lo.keys()] = tuple(oc_lo.values())

nwyr = {"API_Artist":"NWYR", "Gender": "male", "Country": "Netherlands"}
df_final.loc[383, nwyr.keys()] = tuple(nwyr.values())

#### Replacing Country values

Here, the rows where the country is displaced as a city or other type of placed is replaced with the country they are in. If something was off with gender or artist, this is replaced as well. This occured 21 times.

In [20]:
#Example
df_final[df_final["Country"] == "Kingdom of the Netherlands"]

Unnamed: 0,Artist,API_Artist,Gender,Country
25,Brennan Heart,Brennan Heart,male,Kingdom of the Netherlands
126,Woody van Eyden,Woody van Eyden,male,Kingdom of the Netherlands


In [21]:
#Fixing the Country column where cities were added instead of the country

#Listed as Kingdom of the Netherlands
bren_h = {"API_Artist":"Brennan Heart", "Country": "Netherlands"}
df_final.loc[25, bren_h.keys()] = tuple(bren_h.values())

wve = {"API_Artist":"Woody van Eyden", "Country": "Netherlands"}
df_final.loc[126, wve.keys()] = tuple(wve.values())

#Listed as Scotland
calvin_h = {"API_Artist":"Calvin Harris", "Country": "United Kingdom"}
df_final.loc[28, calvin_h.keys()] = tuple(calvin_h.values())

lo_reg = {"API_Artist":"Love Regenerator", "Country": "United Kingdom"}
df_final.loc[183, lo_reg.keys()] = tuple(lo_reg.values())

#Listed as England
david_bow = {"API_Artist":"David Bowie", "Country": "United Kingdom"}
df_final.loc[39, david_bow.keys()] = tuple(david_bow.values())

#Listed as Minneapolis
pos = {"API_Artist":"P.O.S", "Country": "United States"}
df_final.loc[84, pos.keys()] = tuple(pos.values())

sims = {"API_Artist":"Sims", "Country": "United States"}
df_final.loc[336, sims.keys()] = tuple(sims.values())

#Listed as Newcastle upon Tyne
suemc = {"API_Artist":"Sue McLaren", "Country": "United Kingdom"}
df_final.loc[146, suemc.keys()] = tuple(suemc.values())

#Listed as Mt. Juliet
a_and_r = {"API_Artist":"Adrian&Raz", "Country": "United States"}
df_final.loc[150, a_and_r.keys()] = tuple(a_and_r.values())

#Listed as Reynosa
reyn = {"API_Artist":"Marco Antonio Denis", "Country": "Mexico"}
df_final.loc[157, reyn.keys()] = tuple(reyn.values())

#Listed as Berlin
ma_ro = {"API_Artist":"Martin Roth", "Country": "Germany"}
df_final.loc[175, ma_ro.keys()] = tuple(ma_ro.values())

#Listed as Mazatlan
emlal = {"API_Artist":"El Mimoso Luis Antonio López", "Country": "Mexico"}
df_final.loc[184, emlal.keys()] = tuple(emlal.values())

luis_a = {"API_Artist":'Luis Angel "El Flaco"', "Country": "Mexico"}
df_final.loc[185, luis_a.keys()] = tuple(luis_a.values())

#Listed as Amsterdam
juvi = {"API_Artist":"Julian Vincent", "Country": "Netherlands"}
df_final.loc[187, juvi.keys()] = tuple(juvi.values())

#Listed as Los Angeles
emma_hew = {"API_Artist":"Emma Hewitt", "Country": "United States"}
df_final.loc[197, emma_hew.keys()] = tuple(emma_hew.values())

jos_tra = {"API_Artist":"Joseph Trapanese", "Country": "United States"}
df_final.loc[328, jos_tra.keys()] = tuple(jos_tra.values())

#Listed as Toulouse
monty = {"API_Artist":"Monty", "Country": "France"}
df_final.loc[262, monty.keys()] = tuple(monty.values())

#Listed as Atlanta
wdak = {"Artist":"Wooh da Kid", "Country": "United States"}
df_final.loc[301, wdak.keys()] = tuple(wdak.values())

#Listed as Studio City
da_au = {"API_Artist":"Dave Audé", "Country": "United States"}
df_final.loc[348, da_au.keys()] = tuple(da_au.values())

#Listed as Oakland
too_s = {"API_Artist":"Too $hort", "Country": "United States"}
df_final.loc[364, too_s.keys()] = tuple(too_s.values())

#Listed as Frankfurt am Main
chris_l = {"API_Artist":"Chris Liebing", "Country": "France"}
df_final.loc[377, chris_l.keys()] = tuple(chris_l.values())

#### Checking the gender list

Here, it is checked there are only 3 gender categories in the data, which are male, female and mixed. There was 1 occurence where the gender was mentioned as other. 

In [22]:
#Gender listed as other
#Eva Shaw has been refered to as she on several internet sources (wikipedia, djanetop), hence, her gender is changed
eva_s = {"API_Artist":"Eva Shaw", "Gender": "female"}
df_final.loc[49, eva_s.keys()] = tuple(eva_s.values())

#### Fixing misalignments between Spotify data Artists and API Artists

Some artists were retrieved wrong from the API, this because either the artist did not exist on the API website, or another artist with a similar name is more popular and occurs higher in the search results list. This was checked manually and occured 7 times. One artist had a wrongly classified country, so this is adjusted as well. 

In [23]:
df_final[(df_final["Artist"] == "Martine Thomas") | (df_final["Artist"] == "Rihanna")] #ADD MISALIGNMENT EXAMPLE

Unnamed: 0,Artist,API_Artist,Gender,Country
92,Rihanna,Rihanna,female,United States
166,Martine Thomas,Mr Hilter,male,Germany


In [24]:
g_and_d = {"API_Artist":"Gabriel & Dresden", "Gender": "male", "Country": "United States"}
df_final.loc[59, g_and_d.keys()] = tuple(g_and_d.values())

s_and_s = {"API_Artist":"Simone & Simaria", "Gender": "female", "Country": "Brazil"}
df_final.loc[96, s_and_s.keys()] = tuple(s_and_s.values())

mart_tho = {"API_Artist":"Martine Thomas", "Gender": "female", "Country": "South Africa"}
df_final.loc[166, mart_tho.keys()] = tuple(mart_tho.values())

j_and_rw = {"API_Artist":"Julian & Roman Wasserfuhr", "Gender": "male", "Country": "Germany"}
df_final.loc[168, j_and_rw.keys()] = tuple(j_and_rw.values())

mayra = {"API_Artist":"Mayra", "Gender": "female", "Country": "Brazil"}
df_final.loc[181, mayra.keys()] = tuple(mayra.values())

stef_chri = {"API_Artist":"Steffanie Christi'an", "Gender": "female", "Country": "United States"}
df_final.loc[310, stef_chri.keys()] = tuple(stef_chri.values())

marvin = {"API_Artist":"Marvin", "Gender": "male", "Country": "Italy"}
df_final.loc[341, marvin.keys()] = tuple(marvin.values())

#Rihanna was classified as that she is from the US, but she is from Barbados
rihanna = {"API_Artist":"Rihanna", "Country": "Barbados"}
df_final.loc[92, rihanna.keys()] = tuple(rihanna.values())

### Saving the Nodelist and Edge list to CSV files

Setting up the node list needed to make a network in Rstudio.

In [25]:
df_final["Popularity"] = df_nodes_no_dup["Popularity"]
df_final = df_final[["Artist", "Gender", "Country", "Popularity"]]
df_final.head()

Unnamed: 0,Artist,Gender,Country,Popularity
0,Martin Garrix,male,Netherlands,74
1,David Guetta,male,France,85
2,Dimitri Vegas & Like Mike,male,Belgium,67
3,24hrs,male,United States,49
4,2WEI,male,Germany,60


In [26]:
df_final.to_csv("Nodelist_Final3.csv", index = False)

#### Finalizing the Edge list

In [27]:
#Using the previously defined dataframe

#Removing edges from edge list, these were already removed in the node list, but not yet in the edge list
df_10_collab_no_loop_adjusted = df_10_collab_no_loop_adjusted[df_10_collab_no_loop_adjusted["Alter1"] != "Aly & Fila FSOE Radio"] #ALy & Fila are already in the data
df_10_collab_no_loop_adjusted = df_10_collab_no_loop_adjusted[df_10_collab_no_loop_adjusted["Alter1"] != "Future Sound of Egypt"] #This is a song

df_10_collab_no_loop_adjusted = df_10_collab_no_loop_adjusted.reset_index(drop = True)
df_10_collab_no_loop_adjusted = df_10_collab_no_loop_adjusted[["Ego", "Alter1"]]

#Adjusting names such that they align with the names in the node list
df_10_collab_no_loop_adjusted['Alter1'].replace({"John 'Hutch' Hutchinson": "John Hutchinson", 
                                                "SlickLaFlare": "Slick LaFlare", "Cradle": "Cradle Orchestra", 
                                                "UDAUFL": "U.D.A.U.F.L.", "KYTECRASH": "Kytecrash", 
                                                "Wooh The Kid": "Wooh da Kid", "Oceanlab": "OceanLab", 
                                                "Dairon La Formule": "Dairon La Formula"}, inplace = True)

df_10_collab_no_loop_adjusted['Ego'].replace({"Dairon La Formule": "Dairon La Formula"}, inplace = True)

In [28]:
df_10_collab_no_loop_adjusted.to_csv("Edgelist_Final3.csv", index = False)