<h1><center> Billboard's Latin-streaming Songs</center></h1>
<h2><center> Data Preparation</center></h2>
<h3><center> For Network Analysis</center></h3>
<h3><center> 2013 - 2020 </center></h3>

In this section I will use both csv files: 'weekly_charts' and 'year_end_charts' to create new variables in order to proceed to the network creation

In [1]:
#First we import the modules we will be using 
import pandas as pd
import numpy as np
import os

In [26]:
#First, we load bith dataframes: 'year_end_charts' and 'weekly_charts' from github.
weekly = pd.read_csv(r'https://raw.githubusercontent.com/Franciscojmara/Latin-Artists-Network/main/Data/weekly_charts.csv')
year_end = pd.read_csv(r'https://raw.githubusercontent.com/Franciscojmara/Latin-Artists-Network/main/Data/year_end_charts.csv')

First we create a dummy variable which will take the value of 1 if the song in the weekly chart of year $x$ made into the year-end chart in year $x$

In [27]:
#Create the new variable 'hot' if the song in weekly is in year_end
weekly['Hot'] = weekly.Title.isin(year_end['Title'])
weekly['Hot'] = weekly['Hot'].astype(int)
weekly

Unnamed: 0,Year,Month,Day,Rank,Title,Principal,Collaborator,Peak_position,Weeks_in_chart,Hot
0,2013,4,20,1,Hips Don't Lie,Shakira,Wyclef Jean,1,1,1
1,2013,4,20,2,Waka Waka (Esto Es Africa),Shakira,Freshlyground,2,1,1
2,2013,4,20,3,Danza Kuduro,Don Omar,Lucenzo,3,1,1
3,2013,4,20,4,Ai Se Eu Te Pego,Michel Telo,,4,1,1
4,2013,4,20,5,Loca,Shakira,El Cata,5,1,1
...,...,...,...,...,...,...,...,...,...,...
13485,2020,12,26,22,La Toxica,Farruko,,6,8,0
13486,2020,12,26,23,Si Veo A Tu Mama,Bad Bunny,,1,40,1
13487,2020,12,26,24,Maldita Pobreza,Bad Bunny,,12,3,0
13488,2020,12,26,25,Safaera,Bad Bunny,Jowell-Randy,2,40,1


Now, for each song we create a weight. The weight will consider the peak position, rank, weeks in chart, and if the song entered the year-end chart. This new variable takes the form:

$$ Weight = \frac{\frac{\sum_{i=1}^{n} Ranking_{i}}{Weeks\, in\, Chart}(Peak\,Position)^{-1} + Hot}{Total\,Songs}$$

Which can be rewriten in the form of:

$$ Weight = \frac{Average\, Rank}{(Total\, Songs)(Peak\, Position)} + \frac{Hot}{Total\, Songs} $$

where $Average\, Rank = \frac{\sum_{i=1}^{n} Ranking_{i}}{Weeks\, in\, Chart}$

However, because 'Hot' can take either 1 or 0, thus the weight is:

$$ Weight = \begin{cases}
      \frac{Average\, Rank}{(Total\, Songs)(Peak\, Position)} + \frac{1}{Total\, songs} & \text{if $Hot = 1$} \\
      \frac{Average\, Rank}{(Total\, Songs)(Peak\, Position)} & \text{if $Hot = 0$} \\
\end{cases} $$


After defining the weight theoretically, we now implement it to the 'weekly' df.

In [28]:
#To create the weights we will first define a function.

def weights(df):
    w = df.groupby('Title')
    #Average Rank
    av1 = w['Rank'].sum()
    av2 = w['Weeks_in_chart'].max()
    avt = av1 / av2
    #Peak Position
    pp = w['Rank'].min()
    #Total songs
    ts = w['Rank'].min() #Función 'min()' es arbitraria, podría ser 'mean()', 'sum()', etc. La cosa es obtener una serie.
    ts = len(ts.index)
    #Hot
    h = []
    if w['Hot'] == 1:
        h = 1
    else:
        h = 0
    #Weights calculation
    w = (avt / (ts * pp)) + (h / ts)
    return w

In [29]:
#Ahora usamos la función recién creada para calcular los pesos y agregarlos al dataframe

weights_to_weekly = weights(weekly)
weights_to_weekly = round(weights_to_weekly, 6)

print('Weight number 1:', weights_to_weekly[0])
print('Length:', len(weights_to_weekly))

Weight number 1: 0.003137
Length: 420


In [30]:
#Como la longitud de los pesos es de 420 (número de canciones totales), necesitamos repetir cada peso tantas veces 
#se repita la canción en el 'weekly' df.

title = weekly['Title']
title = title.value_counts()
weights_to_weekly = list(weights_to_weekly.repeat(title)) #Lo repite, pero sin el orden del dataframe
title = list(weekly['Title'])

print('Type weights_to_weekly: ', type(weights_to_weekly))
print('weights_to_weekly length:', len(weights_to_weekly))
print('Weekly length:', len(weekly))

Type weights_to_weekly:  <class 'list'>
weights_to_weekly length: 13490
Weekly length: 13490


In [31]:
#Ahora debemos agregar 'weights_to_weekly' al df 'weekly', pero para hacerlo primero ordenamos el df por 'Title'.
weekly = weekly.sort_values(by = ['Title'])
weekly

Unnamed: 0,Year,Month,Day,Rank,Title,Principal,Collaborator,Peak_position,Weeks_in_chart,Hot
10107,2019,6,22,21,11 PM,Maluma,,17,4,1
10198,2019,7,6,24,11 PM,Maluma,,17,6,1
10280,2019,7,20,20,11 PM,Maluma,,17,7,1
10065,2019,6,15,21,11 PM,Maluma,,17,3,1
10023,2019,6,8,21,11 PM,Maluma,,17,2,1
...,...,...,...,...,...,...,...,...,...,...
10784,2019,10,5,18,"Yo X Ti, Tu X Mi",Rosalia,Ozuna,12,6,1
10826,2019,10,12,17,"Yo X Ti, Tu X Mi",Rosalia,Ozuna,12,7,1
23,2013,4,20,22,You,Romeo Santos,,22,1,0
50,2013,4,27,22,You,Romeo Santos,,22,2,0


In [32]:
#Una vez ordenado por 'Title', agregamos los pesos.
weekly['Weight'] = weights_to_weekly

#Reordenamos por año, mes, día y ranking.
weekly = weekly.sort_values(by = ['Year', 'Month', 'Day', 'Rank'])

weekly

Unnamed: 0,Year,Month,Day,Rank,Title,Principal,Collaborator,Peak_position,Weeks_in_chart,Hot,Weight
0,2013,4,20,1,Hips Don't Lie,Shakira,Wyclef Jean,1,1,1,0.008508
1,2013,4,20,2,Waka Waka (Esto Es Africa),Shakira,Freshlyground,2,1,1,0.005556
2,2013,4,20,3,Danza Kuduro,Don Omar,Lucenzo,3,1,1,0.002381
3,2013,4,20,4,Ai Se Eu Te Pego,Michel Telo,,4,1,1,0.002698
4,2013,4,20,5,Loca,Shakira,El Cata,5,1,1,0.024082
...,...,...,...,...,...,...,...,...,...,...,...
13485,2020,12,26,22,La Toxica,Farruko,,6,8,0,0.002381
13486,2020,12,26,23,Si Veo A Tu Mama,Bad Bunny,,1,40,1,0.014857
13487,2020,12,26,24,Maldita Pobreza,Bad Bunny,,12,3,0,0.003061
13489,2020,12,26,25,Safaera,Bad Bunny,Nengo Flow,2,40,1,0.004348


Now, to increase the information of the network, we will add the genre of the Principal and Collaborators

In [33]:
weekly['Principal'].unique()

array(['Shakira', 'Don Omar', 'Michel Telo', 'Romeo Santos', 'Belinda',
       'Wisin-Yandel', 'Roberto Junior y Su Bandeno', 'Prince Royce',
       'Daddy Yankee', 'Rey Sanchez', 'Arcangel', 'Pitbull',
       'Voz de Mando', 'Gerardo Ortiz', 'Selena', 'Malu', 'Maria Jose',
       'J Alvarez', 'Zion', 'Marc Anthony', 'Denise de Kalafe', 'Pesado',
       'Aventura', 'Loona', 'Luis Coronel', 'Reik',
       'La Arrolladora Banda el Limon de Rene Camacho',
       'Banda Sinaloense MS de Sergio Lizarraga', 'Alejandro Fernandez ',
       'Xtreme', 'Santana', 'Vicente Fernandez', 'Enrique Iglesias',
       'Los Primos MX', 'Tacabro', 'Banda Los Recoditos',
       'Banda El Recodo de Cruz Lizarraga', 'Jose Feliciano',
       'Coral Voces Blancas', 'Michael Buble', 'Dora The Explorer',
       'Tony Camargo', 'Raulin Rodriguez', 'Chiquis', 'Wisin',
       'Julion Alvarez y Su Norteno Banda', 'J Balvin', 'Descemer Bueno',
       'Ricky Martin', 'Becky G', 'Tito Torbellino',
       'Banda Tierra S

In [34]:
#Artistas principales
regueton = ['Don Omar', 'Wisin-Yandel', 'Daddy Yankee', 'Arcangel', 'J Alvarez', 'Zion', 'Tacabro', 'Wisin',
            'J Balvin', 'Nicky Jam', 'Plan B', 'Kings Of Regueton', 'Maluma', 'Chino', 'Farruko', 'Joey Montana',
            'Zion-Lennox', 'Ozuna', 'Mambo Kingz', 'Chris Jeday', 'Danny Ocean', 'Karol G', 'Bad Bunny', 
            'Alex Sensation', 'Nacho', 'Natti Natasha', 'Malu Trevejo', 'DJ Luian', 'Anitta', 'Almighty', 'DJ Kass',
            'Casper Magico', 'Cosculluela', 'Wolfine', 'Anuel AA', 'Alex Rose', 'Myke Towers', 'Rosalia', 'Sean Paul', 
            'Sech', 'Lunay', 'Jhay Cortez', 'Sebastian Yatra', 'Tainy', 'Rauw Alejandro', 'Bryant Myers', 'Rvssian',
            'Nio Garcia', 'Feid', 'DJ Nelson', 'Jerry Di', 'Manuel Turizo']

pop_esp = ['Shakira', 'Michel Telo', 'Belinda', 'Malu', 'Maria Jose', 'Denise de Kalafe', 'Enrique Iglesias', 
           'Ricky Martin','Sandra Echeverria', 'Carmen Sarahi', 'Carlos Vives', 'Thalia', 
           'Bomba Estereo', 'Lin-Manuel Miranda', 'Jose Feliciano', 'CNCO', 'Luis Fonsi', 'Lele Pons', 'Pedro Capo']

pop_en = ['Black Eyed Peas', 'Pitbull', 'Jennifer Lopez', 'JLo']

regional_mexicana = ['Roberto Junior y Su Bandeno', 'Rey Sanchez', 'Voz de Mando', 'Gerardo Ortiz', 'Pesado',
                    'La Arrolladora Banda el Limon de Rene Camacho', 'Banda Sinaloense MS de Sergio Lizarraga', 
                     'Alejandro Fernandez ', 'Vicente Fernandez', 'Los Primos MX', 'Banda Los Recoditos', 
                     'Banda El Recodo de Cruz Lizarraga', 'Chiquis', 'Julion Alvarez y Su Norteno Banda',
                    'Banda Tierra Sagrada', 'El Komander', 'Ariel Camacho y Los Plebes del Rancho', 'Joan Sebastian',
                    'La Adictiva Banda San Jose de Mesillas', 'Los Plebes del Rancho de Ariel Camacho', 'Juan Gabriel',
                    'Ulices Chaidez y Sus Plebes', 'Christian Nodal', 'Banda La Misma Tierra', 'Los Tucanes de Tijuana',
                    'Fuerza Regida', 'Banda Los Sebastianes de Mazatlan Sinaloa', 'Eslabon Armado', 'Natanael Cano',
                    'Banda MS de Sergio Lizarraga', 'Los Dos Carnales', 'El Fantasma', 'Luis Coronel', 'Tito Torbellino',
                    'T3r Elemento', 'Tony Camargo']

cumbia_bachata = ['Romeo Santos', 'Prince Royce', 'Aventura', 'Marc Anthony', 'Marco Antonio Solis', 'Los Bukis', 
                  'Raymix', 'Los Angeles Azules', 'Selena', 'Descemer Bueno', 'Trio Vegabajeno', 'Raulin Rodriguez']

rap_hiphop_rb = ['Tory Lanez', 'Frank Ocean', 'Lil Pump', '6ix9ine']

electronica = ['DJ Snake', 'Major Lazer']

rock = ['Santana']

kpop = ['Loona']

otros = ['Xtreme', 'Coral Voces Blancas', 'Dora The Explorer', 'Pulcino Pio']

In [35]:
weekly['Collaborator'].unique()

array(['Wyclef Jean', 'Freshlyground', 'Lucenzo', nan, 'El Cata', 'Usher',
       'T-Pain', 'Chris Brown', 'Pitbull', 'Raulin Rodriguez',
       'Luis Varges', 'Anthony "El Mayimbe" Santos', 'Ken-Y', 'Jory',
       'J. Alvarez', 'Natalia Jimenez', 'Christina Aguilera',
       'Romeo Santos', 'Thalia', 'Marco Antonio Solis', 'Drake',
       'Ricky Martin', 'Jennifer Lopez', 'Farruko', 'Carlos Santana',
       'Gente de Zona', 'Marc Anthony', 'Descemer Bueno', 'EP', 'III',
       'Don Miguelo', 'Tomatito', 'Enrique Iglesias', 'Los Bukis',
       'Sensato', 'Lil Jon', 'Osmani Garcia', 'Will Smith', 'Wisin',
       'Nacho', 'Daddy Yankee', 'Yandel', 'Mohombi', 'Akon', 'Shakira',
       'Ky-Mani Marley', 'Juanes', 'J Balvin', 'Pharrell Williams', 'Sky',
       'BIA', 'Yotuel', 'Maluma', 'Juhn', 'Bryant Myers', 'Noriel',
       'Don Omar', 'Fifth Harmony', 'DJ Luian', 'Arcangel', 'Bad Bunny',
       'Justin Bieber', 'Zion-Lennox', 'Sean Paul', 'Camila Cabello',
       'Anuel', 'Cardi B', 'Of

In [36]:
#Artistas colaboradores
regueton2 = ['Lucenzo', 'El Cata', 'Jory', 'J. Alvarez', 'Farruko', 'Gente de Zona', 'Don Miguelo', 'Osmani Garcia',
            'Wisin', 'Nacho', 'Daddy Yankee', 'Yandel', 'J Balvin', 'Sky', 'Maluma', 'Bryant Myers', 'Noriel', 
            'Don Omar', 'DJ Luian', 'Arcangel', 'Bad Bunny', 'Zion-Lennox', 'Sean Paul', 'Anuel', 'Cardi B', 'Ozuna',
            'Rvssian', 'Nicky Jam', 'Jowell-Randy', 'Mambo Kingz,',  'Nego do Borel', 'Anitta', 'Jeon', 'Brytiago',
            'El Chombo', 'Karol G', 'Natti Natasha', 'Darell', 'Manuel Turizo', 'Wisin-Yandel', 'Rauw Alejandro', 
            'Becky G', 'Mambo Kingz', 'Lunay', 'Lyanno', 'El Alfa', 'El Guincho', 'Tainy', 'Sech', 'Nengo Flow',
            'Mora', 'Kendo Kaponi', 'Myke Towers', 'Tego Calderon', 'Juanka', 'Rosalia', 'Justin Quiles', 'Camilo',
            'Jay Wheeler', 'Cosculluela', 'Jhay Cortez']

pop_esp2 = ['Natalia Jimenez', 'Thalia', 'Ricky Martin', 'Enrique Iglesias', 'Shakira', 'Juanes', 'Arthur Hanlon',
           'Natalia LaFourcade']

pop_en2 = ['Usher', 'Pitbull', 'Christina Aguilera','Jennifer Lopez', 'Pharrell Williams', 'Fifth Harmony', 'Justin Bieber',
          'Camila Cabello', 'Beyonce', 'Demi Lovato', 'Selena Gomez', 'Jonas Brothers','Dua Lipa', 'The Weeknd', 'Sia']

regional_mexicana2 = ['Gerardo Ortiz', 'Los Dos Carnales', 'Angela Aguilar']

cumbia_bachata2 = ['Raulin Rodriguez', 'Anthony "El Mayimbe" Santos', 'Romeo Santos', 'Marco Antonio Solis', 'Marc Anthony',
                  'Descemer Bueno', 'Los Bukis', 'Prince Royce']

rap_hiphop_rb2 = ['Wyclef Jean', 'Freshlyground', 'T-Pain', 'Chris Brown', 'Drake', 'Sensato', 'Lil Jon', 'Will Smith',
                 'Akon', 'Yotuel', 'Cardi B', 'Offset', '21 Savage', 'Nicki Minaj', 'Haze', 'Snow', 'Tyga', 
                 'Pablo Chil-e', 'Duki', 'Juice WRLD', 'Snoop Dogg', 'Travis Scott', 'Brray', 'J.Rey Soul',
                 'Doja Cat', 'ABRA', 'Mohombi', 'Yaviah']

electronica2 = ['Willy William', 'Diplo']

rock2 = ['Carlos Santana', 'Marciano Cantero']

reggae = ['Ky-Mani Marley', 'Cutty Ranks']

otros2 = ['EP', 'III', 'Tomatito', 'BIA', 'Artists For Puerto Rico']

In [37]:
#Agregamos el género musical del artista principal y de los colaboradores.

##Artista principal
genres = [pop_esp, pop_en, regional_mexicana, cumbia_bachata, rap_hiphop_rb, electronica, rock, kpop, otros, regueton]

#Creamos una variable por cada género, reemplazando los valores booleanos a strings del género.
weekly['pop_esp'] = weekly.Principal.isin(genres[0])
d = {True: 'Spanish Pop', False: ''}
weekly['pop_esp'] = weekly['pop_esp'].map(d)

weekly['pop_en'] = weekly.Principal.isin(genres[1])
d = {True: 'English pop', False: ''}
weekly['pop_en'] = weekly['pop_en'].map(d)

weekly['regional_mexicana'] = weekly.Principal.isin(genres[2])
d = {True: 'Regional Mexicana', False: ''}
weekly['regional_mexicana'] = weekly['regional_mexicana'].map(d)

weekly['cumbia_bachata'] = weekly.Principal.isin(genres[3])
d = {True: 'Cumbia/Bachata', False: ''}
weekly['cumbia_bachata'] = weekly['cumbia_bachata'].map(d)

weekly['rap_hiphop_rb'] = weekly.Principal.isin(genres[4])
d = {True: 'Rap/Hip-Hop/R&B', False: ''}
weekly['rap_hiphop_rb'] = weekly['rap_hiphop_rb'].map(d)

weekly['electronica'] = weekly.Principal.isin(genres[5])
d = {True: 'EDM', False: ''}
weekly['electronica'] = weekly['electronica'].map(d)

weekly['rock'] = weekly.Principal.isin(genres[6])
d = {True: 'Rock', False: ''}
weekly['rock'] = weekly['rock'].map(d)

weekly['kpop'] = weekly.Principal.isin(genres[7])
d = {True: 'K-pop', False: ''}
weekly['kpop'] = weekly['kpop'].map(d)

weekly['otros'] = weekly.Principal.isin(genres[8])
d = {True: 'Other', False: ''}
weekly['otros'] = weekly['otros'].map(d)

weekly['regueton'] = weekly.Principal.isin(genres[9])
d = {True: 'Reggaeton', False: ''}
weekly['regueton'] = weekly['regueton'].map(d)

#Concatenamos las variables recien creadas en una sola
weekly['Gen_Principal'] = (weekly['pop_esp'] + weekly['pop_en'] + weekly['regional_mexicana'] + weekly['cumbia_bachata'] +
    weekly['rap_hiphop_rb'] + weekly['electronica'] + weekly['rock'] + weekly['kpop'] + weekly['otros'] + weekly['regueton'])

#Eliminamos las variables de género musical
weekly = weekly.drop(['pop_esp', 'pop_en', 'regional_mexicana', 'cumbia_bachata', 'rap_hiphop_rb','electronica', 
                      'rock', 'kpop', 'otros', 'regueton'], axis = 1)

weekly

Unnamed: 0,Year,Month,Day,Rank,Title,Principal,Collaborator,Peak_position,Weeks_in_chart,Hot,Weight,Gen_Principal
0,2013,4,20,1,Hips Don't Lie,Shakira,Wyclef Jean,1,1,1,0.008508,Spanish Pop
1,2013,4,20,2,Waka Waka (Esto Es Africa),Shakira,Freshlyground,2,1,1,0.005556,Spanish Pop
2,2013,4,20,3,Danza Kuduro,Don Omar,Lucenzo,3,1,1,0.002381,Reggaeton
3,2013,4,20,4,Ai Se Eu Te Pego,Michel Telo,,4,1,1,0.002698,Spanish Pop
4,2013,4,20,5,Loca,Shakira,El Cata,5,1,1,0.024082,Spanish Pop
...,...,...,...,...,...,...,...,...,...,...,...,...
13485,2020,12,26,22,La Toxica,Farruko,,6,8,0,0.002381,Reggaeton
13486,2020,12,26,23,Si Veo A Tu Mama,Bad Bunny,,1,40,1,0.014857,Reggaeton
13487,2020,12,26,24,Maldita Pobreza,Bad Bunny,,12,3,0,0.003061,Reggaeton
13489,2020,12,26,25,Safaera,Bad Bunny,Nengo Flow,2,40,1,0.004348,Reggaeton


In [38]:
##Artista colaborador
genres = [pop_esp2, pop_en2, regional_mexicana2, cumbia_bachata2, rap_hiphop_rb2, electronica2, rock2, reggae, otros2,
         regueton2]

#Creamos una variable por cada género, reemplazando los valores booleanos a strings del género.
weekly['pop_esp2'] = weekly.Collaborator.isin(genres[0])
d = {True: 'Spanish Pop', False: np.nan}
weekly['pop_esp2'] = weekly['pop_esp2'].map(d)

weekly['pop_en2'] = weekly.Collaborator.isin(genres[1])
d = {True: 'English Pop', False: np.nan}
weekly['pop_en2'] = weekly['pop_en2'].map(d)

weekly['regional_mexicana2'] = weekly.Collaborator.isin(genres[2])
d = {True: 'Regional Mexicana', False: np.nan}
weekly['regional_mexicana2'] = weekly['regional_mexicana2'].map(d)

weekly['cumbia_bachata2'] = weekly.Collaborator.isin(genres[3])
d = {True: 'Cumbia/Bachata', False: np.nan}
weekly['cumbia_bachata2'] = weekly['cumbia_bachata2'].map(d)

weekly['rap_hiphop_rb2'] = weekly.Collaborator.isin(genres[4])
d = {True: 'Rap/Hip-Hop/R&B', False: np.nan}
weekly['rap_hiphop_rb2'] = weekly['rap_hiphop_rb2'].map(d)

weekly['electronica2'] = weekly.Collaborator.isin(genres[5])
d = {True: 'EDM', False: np.nan}
weekly['electronica2'] = weekly['electronica2'].map(d)

weekly['rock2'] = weekly.Collaborator.isin(genres[6])
d = {True: 'Rock', False: np.nan}
weekly['rock2'] = weekly['rock2'].map(d)

weekly['reggae'] = weekly.Collaborator.isin(genres[7])
d = {True: 'Reggae', False: np.nan}
weekly['reggae'] = weekly['reggae'].map(d)

weekly['otros2'] = weekly.Collaborator.isin(genres[8])
d = {True: 'Other', False: np.nan}
weekly['otros2'] = weekly['otros2'].map(d)

weekly['regueton2'] = weekly.Collaborator.isin(genres[9])
d = {True: 'Reggaeton', False: np.nan}
weekly['regueton2'] = weekly['regueton2'].map(d)

#Concatenamos las variables recien creadas en una sola
weekly['Gen_Collaborator'] = (weekly['pop_esp2'] + weekly['pop_en2'] + weekly['regional_mexicana2'] + 
                              weekly['cumbia_bachata2']+ weekly['rap_hiphop_rb2'] + weekly['electronica2'] + 
                              weekly['rock2'] + weekly['reggae'] + weekly['otros2'] + weekly['regueton2'])

#Eliminamos las variables de género musical
weekly = weekly.drop(['pop_esp2', 'pop_en2', 'regional_mexicana2', 'cumbia_bachata2', 'rap_hiphop_rb2','electronica2', 
                      'rock2', 'reggae', 'otros2', 'regueton2'], axis = 1)

weekly

Unnamed: 0,Year,Month,Day,Rank,Title,Principal,Collaborator,Peak_position,Weeks_in_chart,Hot,Weight,Gen_Principal,Gen_Collaborator
0,2013,4,20,1,Hips Don't Lie,Shakira,Wyclef Jean,1,1,1,0.008508,Spanish Pop,
1,2013,4,20,2,Waka Waka (Esto Es Africa),Shakira,Freshlyground,2,1,1,0.005556,Spanish Pop,
2,2013,4,20,3,Danza Kuduro,Don Omar,Lucenzo,3,1,1,0.002381,Reggaeton,
3,2013,4,20,4,Ai Se Eu Te Pego,Michel Telo,,4,1,1,0.002698,Spanish Pop,
4,2013,4,20,5,Loca,Shakira,El Cata,5,1,1,0.024082,Spanish Pop,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
13485,2020,12,26,22,La Toxica,Farruko,,6,8,0,0.002381,Reggaeton,
13486,2020,12,26,23,Si Veo A Tu Mama,Bad Bunny,,1,40,1,0.014857,Reggaeton,
13487,2020,12,26,24,Maldita Pobreza,Bad Bunny,,12,3,0,0.003061,Reggaeton,
13489,2020,12,26,25,Safaera,Bad Bunny,Nengo Flow,2,40,1,0.004348,Reggaeton,


In [39]:
#Df con los artistas principales.
art_principal = weekly['Principal'].unique()
art_principal = pd.DataFrame(art_principal, columns = ['Artista'])

#Df con los colaboradores
art_colaborador = weekly['Collaborator'].unique()
art_colaborador = pd.DataFrame(art_colaborador, columns = ['Artista'])

#Como aún no soy tan pro en Python, exportaré los dos df a csv y pondré el sexo a mano
path_data = r'C:\Users\Francisco Martínez\Desktop\Economía de redes\Trabajo final\Data'
os.chdir(path_data)

art_principal.to_csv('sexo_principales.csv', index = False)
art_colaborador.to_csv('sexo_colaboradores.csv', index = False)

In [40]:
#Cargamos las bases que contienen el sexo de los artistas
principal = pd.read_csv(r'https://raw.githubusercontent.com/Franciscojmara/Latin-Artists-Network/main/Data/sexo_principales.csv')
colab = pd.read_csv(r'https://raw.githubusercontent.com/Franciscojmara/Latin-Artists-Network/main/Data/sexo_colaboradores.csv')

In [41]:
#Variable Sexo a int en vez de float
principal.Sexo = principal.Sexo.astype(int)
colab.Sexo = colab.Sexo.astype(int)

In [42]:
#Agregamos el sexo al dataframe principal
weekly = weekly.merge(principal, how = 'left', left_on='Principal', right_on='Artista')
weekly = weekly.rename(columns = {'Sexo': 'Principal_Sex'}, inplace = False)
weekly = weekly.merge(colab, how = 'left', left_on='Collaborator', right_on='Artista')
weekly = weekly.rename(columns = {'Sexo': 'Collaborator_Sex'}, inplace = False)
weekly = weekly.drop(['Artista_x', 'Artista_y'], axis=1)

In [20]:
weekly

Unnamed: 0,Year,Month,Day,Rank,Title,Principal,Collaborator,Peak_position,Weeks_in_chart,Hot,Weight,Gen_Principal,Gen_Collaborator,Principal_Sex,Collaborator_Sex
0,2013,4,20,1,Hips Don't Lie,Shakira,Wyclef Jean,1,1,1,0.008508,Spanish Pop,Rap/Hip-Hop/R&B,0,1.0
1,2013,4,20,2,Waka Waka (Esto Es Africa),Shakira,Freshlyground,2,1,1,0.005556,Spanish Pop,Rap/Hip-Hop/R&B,0,1.0
2,2013,4,20,3,Danza Kuduro,Don Omar,Lucenzo,3,1,1,0.002381,Reggaeton,Reggaeton,1,1.0
3,2013,4,20,4,Ai Se Eu Te Pego,Michel Telo,,4,1,1,0.002698,Spanish Pop,,1,
4,2013,4,20,5,Loca,Shakira,El Cata,5,1,1,0.024082,Spanish Pop,Reggaeton,0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13485,2020,12,26,22,La Toxica,Farruko,,6,8,0,0.002381,Reggaeton,,1,
13486,2020,12,26,23,Si Veo A Tu Mama,Bad Bunny,,1,40,1,0.014857,Reggaeton,,1,
13487,2020,12,26,24,Maldita Pobreza,Bad Bunny,,12,3,0,0.003061,Reggaeton,,1,
13488,2020,12,26,25,Safaera,Bad Bunny,Nengo Flow,2,40,1,0.004348,Reggaeton,Reggaeton,1,1.0


In [43]:
#Because there are songs with no collaborators, we replace the NaN values with the name of the principal singer 
weekly.Collaborator.fillna(weekly.Principal, inplace=True)
#Because there are songs with no collaborators, we replace the NaN values with the name of the gender of the principal
weekly.Gen_Collaborator.fillna(weekly.Gen_Principal, inplace=True)
#The same for the sex of the collaborator
weekly.Collaborator_Sex.fillna(weekly.Principal_Sex, inplace=True)

In [44]:
weekly

Unnamed: 0,Year,Month,Day,Rank,Title,Principal,Collaborator,Peak_position,Weeks_in_chart,Hot,Weight,Gen_Principal,Gen_Collaborator,Principal_Sex,Collaborator_Sex
0,2013,4,20,1,Hips Don't Lie,Shakira,Wyclef Jean,1,1,1,0.008508,Spanish Pop,Spanish Pop,0,1.0
1,2013,4,20,2,Waka Waka (Esto Es Africa),Shakira,Freshlyground,2,1,1,0.005556,Spanish Pop,Spanish Pop,0,1.0
2,2013,4,20,3,Danza Kuduro,Don Omar,Lucenzo,3,1,1,0.002381,Reggaeton,Reggaeton,1,1.0
3,2013,4,20,4,Ai Se Eu Te Pego,Michel Telo,Michel Telo,4,1,1,0.002698,Spanish Pop,Spanish Pop,1,1.0
4,2013,4,20,5,Loca,Shakira,El Cata,5,1,1,0.024082,Spanish Pop,Spanish Pop,0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13485,2020,12,26,22,La Toxica,Farruko,Farruko,6,8,0,0.002381,Reggaeton,Reggaeton,1,1.0
13486,2020,12,26,23,Si Veo A Tu Mama,Bad Bunny,Bad Bunny,1,40,1,0.014857,Reggaeton,Reggaeton,1,1.0
13487,2020,12,26,24,Maldita Pobreza,Bad Bunny,Bad Bunny,12,3,0,0.003061,Reggaeton,Reggaeton,1,1.0
13488,2020,12,26,25,Safaera,Bad Bunny,Nengo Flow,2,40,1,0.004348,Reggaeton,Reggaeton,1,1.0


#### Export data set to csv

In [46]:
path_data = r'C:\Users\Francisco Martínez\Desktop\Economía de redes\Trabajo final\Data'
os.chdir(path_data)

weekly.to_csv('network_latin_streaming.csv', index = False)