## In genre and Out genre 

 We want to study what is the distribution of in-genre influence vs out-genre influence. For each artist,  we calculate the number of outgoing edges to artists within the same genre and the number of outgoing edges to artists outside their genre. This will allow us to find the artist that is most influencing within the genre vs outisde the genre, and it is important to study them seprately as this gives us new insights. 

In [65]:
import pandas as pd

# Initialize lists to store artist names, in-genre, and out-genre influence counts
artist_names = []
in_genre_counts = []
out_genre_counts = []

# Iterate through the nodes (artists) in the graph
for artist_id in G.nodes:

    # Store the artist name
    artist_names.append(G.nodes[artist_id]['artist_name'])

    # Store the in-genre and out-genre influence counts for the current artist
    in_genre_counts.append(in_genre_influence[artist_id])
    out_genre_counts.append(out_genre_influence[artist_id])

# Create a DataFrame with the results
results_df = pd.DataFrame({
    'Artist Name': artist_names,
    'In-Genre Influence Count': in_genre_counts,
    'Out-Genre Influence Count': out_genre_counts
})

# Display the DataFrame as a table
print(results_df)


             Artist Name  In-Genre Influence Count  Out-Genre Influence Count
0          Frank Sinatra                        31                         40
1      Vladimir Horowitz                         0                          0
2            Johnny Cash                        52                         60
3         Billie Holiday                        34                         72
4              Bob Dylan                       322                         67
...                  ...                       ...                        ...
5849     Natalie La Rose                         0                          0
5850          Sarah Ross                         0                          0
5851              Rotimi                         0                          0
5852  Jillian Jacqueline                         0                          0
5853         Jaira Burns                         0                          0

[5854 rows x 3 columns]


Let us sort the values with respect to out-genre influence. 

In [67]:
# Sort the DataFrame by the 'Out-Genre Influence Count' column in descending order
sorted_results_df = results_df.sort_values('Out-Genre Influence Count', ascending=False)

# Reset the index of the sorted DataFrame
sorted_results_df.reset_index(drop=True, inplace=True)

print(sorted_results_df.head(10))


      Artist Name  In-Genre Influence Count  Out-Genre Influence Count
0   Hank Williams                        97                         87
1    Muddy Waters                        33                         80
2     Miles Davis                        83                         77
3       Kraftwerk                        31                         77
4     James Brown                        78                         76
5    Howlin' Wolf                        25                         74
6  Billie Holiday                        34                         72
7     Marvin Gaye                        99                         70
8     Ray Charles                        44                         69
9       Bob Dylan                       322                         67


The analysis shows that Hank Williams has the highest out-genre influence count among the artists in the dataset. This suggests that Hank Williams has had a significant impact on artists from genres other than his own. Out-genre influence is an important factor to consider in understanding the overall impact of an artist on the music industry.

Out-genre influence is more important than in-genre influence for several reasons:

Cross-genre inspiration: When an artist has a high out-genre influence, it demonstrates their ability to inspire and affect musicians beyond their own genre. This cross-genre inspiration can lead to the creation of new sub-genres, musical styles, and innovative ideas, enriching the music landscape.
Broadening audience reach: Artists with a high out-genre influence are likely to have a wider audience reach, as their music can resonate with listeners across different genres. This can lead to increased popularity, album sales, and concert attendance, benefiting the artist's career.
Cultural impact: A high out-genre influence suggests that an artist has transcended their own genre and left a mark on the broader cultural landscape. This can elevate the artist to a legendary status and make their music timeless, reaching new generations of listeners.
Industry recognition: Artists with significant out-genre influence are often acknowledged by the music industry for their contributions, receiving accolades, awards, and critical acclaim. This recognition can further enhance their reputation and legacy.
The importance of this analysis lies in identifying artists who have had a far-reaching impact on the music industry, transcending their own genres and influencing musicians from diverse backgrounds. Understanding out-genre influence can provide valuable insights into the factors that contribute to an artist's success and the evolution of music over time.

Now let us observe the artists with the highest in-genre influence. 

In [69]:
# Sort the DataFrame by the 'Out-Genre Influence Count' column in descending order
sorted_results_df1 = results_df.sort_values('In-Genre Influence Count', ascending=False)

# Reset the index of the sorted DataFrame
sorted_results_df1.reset_index(drop=True, inplace=True)

print(sorted_results_df1.head(10))

              Artist Name  In-Genre Influence Count  Out-Genre Influence Count
0             The Beatles                       553                         61
1               Bob Dylan                       322                         67
2      The Rolling Stones                       304                         15
3             David Bowie                       224                         14
4            Led Zeppelin                       213                          8
5               The Kinks                       191                          0
6          The Beach Boys                       179                          6
7  The Velvet Underground                       175                          6
8           Black Sabbath                       169                          2
9               The Byrds                       153                          5


The analysis of the artists with the highest in-genre and out-genre influence provides interesting insights into the impact and reach of these musicians. Comparing the top 10 artists with the highest in-genre influence and out-genre influence reveals some key observations:

Diverse musical styles: The artists with the highest in-genre influence predominantly belong to the rock and pop genres, with bands like The Beatles, The Rolling Stones, and Led Zeppelin leading the list. On the other hand, the top artists with the highest out-genre influence come from a more diverse range of musical styles, including country (Hank Williams), blues (Muddy Waters, Howlin' Wolf), jazz (Miles Davis, Billie Holiday), electronic (Kraftwerk), funk (James Brown), and soul (Marvin Gaye, Ray Charles). This highlights the cross-genre impact of artists with high out-genre influence and their ability to inspire musicians from various backgrounds.

Unique appeal: The artists with high out-genre influence counts demonstrate a unique appeal that transcends their own genres. Hank Williams, for example, has significantly influenced musicians from other genres, indicating the universality of his music and lyrics. Similarly, Muddy Waters and Miles Davis have had a substantial impact on musicians beyond their respective genres of blues and jazz, showcasing the power of their artistry.
Pioneers and innovators: The list of artists with the highest out-genre influence includes several pioneers and innovators in their respective genres. For example, Kraftwerk is known for its groundbreaking work in electronic music, while James Brown revolutionized funk and soul. These artists have had a lasting impact on the music industry and have inspired musicians across genres to experiment with new sounds and styles.

Overlapping artists: Bob Dylan is the only artist who appears in both the top 10 lists for in-genre and out-genre influence. This indicates his exceptional ability to not only dominate his own genre but also inspire musicians from other genres, highlighting his influence and versatility as an artist.

In conclusion, the analysis of in-genre and out-genre influence provides valuable insights into the impact and reach of different artists. While artists with high in-genre influence are often leaders in their respective genres, those with high out-genre influence showcase the power of transcending musical boundaries and inspiring musicians from a variety of backgrounds. Understanding these influence patterns can help in identifying the factors contributing to an artist's success and the evolution of music over time.

In this study, we aim to analyze the influence of musical artists with respect to their relationships within the same genre (in-genre) and across different genres (out-genre). A straightforward approach to identify the most influential artists is to combine the in-genre and out-genre influence counts. However, if the in-genre influence count is significantly higher than the out-genre influence count, it may dominate the combined influence score, thereby overshadowing the impact of out-genre influences. To address this issue, we apply weighting factors to both in-genre and out-genre influence counts to balance their contributions in the combined score. To achieve this, we calculate separate weights for in-genre and out-genre influences using a network centrality measure called eigenvector centrality. This measure is chosen because it takes into account not only the number of influence relationships an artist has (in-degree centrality) but also the importance of the artists they are connected to.

Eigenvector centrality is a measure of an artist's influence based on the idea that the importance of a node (artist) is determined by the importance of its neighbors (other artists they influence or are influenced by). In other words, an artist's influence is more significant if it influences other highly influential artists. This centrality measure allows us to capture the relative importance of artists within the network more accurately than simply counting the number of influence relationships.

To calculate separate weights for in-genre and out-genre influences, we first create two subgraphs: one for in-genre influence relationships and another for out-genre influence relationships. The in-genre influence subgraph contains only the edges where both the influencer and follower artists belong to the same genre, representing the influence within the same genre. The out-genre influence subgraph contains only the edges where the influencer and follower artists belong to different genres, representing the influence between different genres.

Next, we calculate eigenvector centralities for each node in both subgraphs using the NetworkX library's nx.eigenvector_centrality() function. We compute the average eigenvector centrality for in-genre and out-genre subgraphs by taking the mean of the values in the centrality dictionaries. This gives us a measure of the overall importance of in-genre and out-genre influence relationships in the network.

To obtain the weights, we normalize the average centralities by dividing each average centrality by the total centrality (the sum of average in-genre and out-genre centralities). This results in two normalized values, one for in-genre influence and one for out-genre influence, which can be interpreted as the relative importance of these influence types within the network.

Finally, we calculate the weighted combined influence score for each artist by multiplying their in-genre influence count by the in-genre weight and adding it to their out-genre influence count multiplied by the out-genre weight. This produces a score that takes into account both the number of influence relationships and the relative importance of in-genre and out-genre influences, as determined by the eigenvector centrality measure.

By following this methodology, we can identify the overall influencing artists with respect to in-genre and out-genre influences, providing valuable insights into the dynamics of musical influence across genres.

In [70]:
import numpy as np

# Create in-genre and out-genre influence subgraphs
in_genre_edges = [(u, v) for u, v in G.edges() if G.nodes[u]['genre'] == G.nodes[v]['genre']]
out_genre_edges = [(u, v) for u, v in G.edges() if G.nodes[u]['genre'] != G.nodes[v]['genre']]

in_genre_G = G.edge_subgraph(in_genre_edges)
out_genre_G = G.edge_subgraph(out_genre_edges)

# Calculate eigenvector centralities for both subgraphs
in_genre_eigenvector_centrality = nx.eigenvector_centrality(in_genre_G)
out_genre_eigenvector_centrality = nx.eigenvector_centrality(out_genre_G)

# Compute the average eigenvector centralities for in-genre and out-genre subgraphs
avg_in_genre_centrality = np.mean(list(in_genre_eigenvector_centrality.values()))
avg_out_genre_centrality = np.mean(list(out_genre_eigenvector_centrality.values()))

# Normalize the average centralities to obtain the weights
total_centrality = avg_in_genre_centrality + avg_out_genre_centrality
in_genre_weight = avg_in_genre_centrality / total_centrality
out_genre_weight = avg_out_genre_centrality / total_centrality

# Calculate the weighted combined influence score for each artist
results_df['Weighted Combined Influence Count'] = (results_df['In-Genre Influence Count'] * in_genre_weight) + (results_df['Out-Genre Influence Count'] * out_genre_weight)

# Sort the DataFrame by the 'Weighted Combined Influence Count' column in descending order
sorted_results_df_weighted = results_df.sort_values('Weighted Combined Influence Count', ascending=False)

# Reset the index of the sorted DataFrame
sorted_results_df_weighted.reset_index(drop=True, inplace=True)

# Display the top artists in the sorted DataFrame
print(sorted_results_df_weighted.head(10))


          Artist Name  In-Genre Influence Count  Out-Genre Influence Count  \
0         The Beatles                       553                         61   
1           Bob Dylan                       322                         67   
2  The Rolling Stones                       304                         15   
3         David Bowie                       224                         14   
4       Hank Williams                        97                         87   
5        Jimi Hendrix                       151                         50   
6        Led Zeppelin                       213                          8   
7         Marvin Gaye                        99                         70   
8         Miles Davis                        83                         77   
9         James Brown                        78                         76   

   Weighted Combined Influence Count  
0                         251.231841  
1                         165.595771  
2                       

The table shows that The Beatles have the highest weighted combined influence count (251.23), indicating that they are the most influential artists overall when considering both in-genre and out-genre relationships. Bob Dylan comes in second place with a weighted combined influence count of 165.60, followed by The Rolling Stones with a count of 126.74. 

The results presented in the table have important implications for understanding the dynamics of musical influence and the impact of artists across different genres.

First, the weighted combined influence count provides a more balanced and comprehensive perspective on an artist's influence, compared to only considering either in-genre or out-genre influence counts. By accounting for both types of relationships, we can identify artists who have had a significant impact not only within their own genre but also in bridging the gap between different genres. This highlights the role of these artists in fostering cross-genre exchange and innovation.

Second, the list of top 10 artists based on the weighted combined influence count features artists from various genres and time periods, demonstrating the lasting influence and appeal of their work. It is noteworthy that some artists, like Hank Williams, have a higher out-genre influence count than in-genre influence count, suggesting their music has transcended the boundaries of their own genre and has had a broader impact on the music landscape.

These findings underscore the importance of examining both in-genre and out-genre influence relationships when assessing an artist's impact on the music world. Moreover, they offer insights into the creative exchange between genres and the factors that contribute to the enduring influence of certain artists. This understanding can be useful for musicologists, historians, and industry professionals interested in the evolution of music and the key players shaping its development.

In [71]:
print("In-Genre Weight:", in_genre_weight)
print("Out-Genre Weight:", out_genre_weight)


In-Genre Weight: 0.3866500841960266
Out-Genre Weight: 0.6133499158039734


These weights represent the relative importance of in-genre and out-genre influences in the network based on the average eigenvector centralities. In this case, the in-genre weight is 0.3867 (approximately), and the out-genre weight is 0.6133 (approximately).

Here's what these weights mean:

In-Genre Weight (0.3867): This weight shows the importance of in-genre influence in the network. A higher value indicates that artists within the same genre have a greater impact on each other. In your case, the in-genre weight is 0.3867, which means that approximately 38.67% of the artists' influence in the network is attributed to in-genre relationships.
Out-Genre Weight (0.6133): This weight indicates the importance of out-genre influence in the network. A higher value means that artists from different genres have a greater impact on each other. In your case, the out-genre weight is 0.6133, which means that approximately 61.33% of the artists' influence in the network is attributed to out-genre relationships.
These weights show that out-genre influences are more significant in your network than in-genre influences. This suggests that artists from different genres have a more substantial impact on each other, promoting the exchange of ideas and creativity between genres.

Now let us do the above work but now using betweenness centrality. 

In [72]:
import numpy as np

# Create in-genre and out-genre influence subgraphs
in_genre_edges = [(u, v) for u, v in G.edges() if G.nodes[u]['genre'] == G.nodes[v]['genre']]
out_genre_edges = [(u, v) for u, v in G.edges() if G.nodes[u]['genre'] != G.nodes[v]['genre']]

in_genre_G = G.edge_subgraph(in_genre_edges)
out_genre_G = G.edge_subgraph(out_genre_edges)

# Calculate betweenness centralities for both subgraphs
in_genre_betweenness_centrality = nx.betweenness_centrality(in_genre_G)
out_genre_betweenness_centrality = nx.betweenness_centrality(out_genre_G)

# Compute the average betweenness centralities for in-genre and out-genre subgraphs
avg_in_genre_centrality = np.mean(list(in_genre_betweenness_centrality.values()))
avg_out_genre_centrality = np.mean(list(out_genre_betweenness_centrality.values()))

# Normalize the average centralities to obtain the weights
total_centrality = avg_in_genre_centrality + avg_out_genre_centrality
in_genre_weight = avg_in_genre_centrality / total_centrality
out_genre_weight = avg_out_genre_centrality / total_centrality

# Calculate the weighted combined influence score for each artist
results_df['Weighted Combined Influence Count'] = (results_df['In-Genre Influence Count'] * in_genre_weight) + (results_df['Out-Genre Influence Count'] * out_genre_weight)

# Sort the DataFrame by the 'Weighted Combined Influence Count' column in descending order
sorted_results_df_weighted = results_df.sort_values('Weighted Combined Influence Count', ascending=False)

# Reset the index of the sorted DataFrame
sorted_results_df_weighted.reset_index(drop=True, inplace=True)

# Display the top artists in the sorted DataFrame
print(sorted_results_df_weighted.head(10))


          Artist Name  In-Genre Influence Count  Out-Genre Influence Count  \
0         The Beatles                       553                         61   
1           Bob Dylan                       322                         67   
2  The Rolling Stones                       304                         15   
3       Hank Williams                        97                         87   
4         David Bowie                       224                         14   
5        Jimi Hendrix                       151                         50   
6         Marvin Gaye                        99                         70   
7         Miles Davis                        83                         77   
8        Led Zeppelin                       213                          8   
9         James Brown                        78                         76   

   Weighted Combined Influence Count  
0                         227.787603  
1                         153.444794  
2                       

Comparison:

The Beatles, Bob Dylan, and The Rolling Stones maintain their positions as the top 3 artists in both the Eigenvector and Betweenness Centrality calculations.
Hank Williams has a higher ranking in the Betweenness Centrality calculation, moving up to 4th place compared to 5th in the Eigenvector Centrality calculation.
David Bowie's position drops slightly from 3rd to 4th in the Betweenness Centrality calculation compared to the Eigenvector Centrality.
Jimi Hendrix, Marvin Gaye, Miles Davis, and James Brown maintain their relative positions in both calculations.
Led Zeppelin drops from 6th place in Eigenvector Centrality to 8th place in Betweenness Centrality.
Comments:

The results are fairly consistent between the Eigenvector and Betweenness Centrality calculations, indicating that the artist rankings are relatively robust to the choice of centrality measure.
The differences in ranking, such as Hank Williams and Led Zeppelin, might suggest that these artists have a unique position within their respective influence networks when considering different centrality measures.
The top artists in both calculations consistently have a high in-genre influence count, emphasizing their dominance within their own genres.
Overall, the results show that the chosen weighting method, combined with different centrality measures, can provide a comprehensive understanding of the most influential artists across genres.

Now let use look using degree centraliy. 

In [73]:
import numpy as np

# Create in-genre and out-genre influence subgraphs
in_genre_edges = [(u, v) for u, v in G.edges() if G.nodes[u]['genre'] == G.nodes[v]['genre']]
out_genre_edges = [(u, v) for u, v in G.edges() if G.nodes[u]['genre'] != G.nodes[v]['genre']]

in_genre_G = G.edge_subgraph(in_genre_edges)
out_genre_G = G.edge_subgraph(out_genre_edges)

# Calculate degree centralities for both subgraphs
in_genre_degree_centrality = nx.degree_centrality(in_genre_G)
out_genre_degree_centrality = nx.degree_centrality(out_genre_G)

# Compute the average degree centralities for in-genre and out-genre subgraphs
avg_in_genre_centrality = np.mean(list(in_genre_degree_centrality.values()))
avg_out_genre_centrality = np.mean(list(out_genre_degree_centrality.values()))

# Normalize the average centralities to obtain the weights
total_centrality = avg_in_genre_centrality + avg_out_genre_centrality
in_genre_weight = avg_in_genre_centrality / total_centrality
out_genre_weight = avg_out_genre_centrality / total_centrality

# Calculate the weighted combined influence score for each artist
results_df['Weighted Combined Influence Count'] = (results_df['In-Genre Influence Count'] * in_genre_weight) + (results_df['Out-Genre Influence Count'] * out_genre_weight)

# Sort the DataFrame by the 'Weighted Combined Influence Count' column in descending order
sorted_results_df_weighted = results_df.sort_values('Weighted Combined Influence Count', ascending=False)

# Reset the index of the sorted DataFrame
sorted_results_df_weighted.reset_index(drop=True, inplace=True)

# Display the top artists in the sorted DataFrame
print(sorted_results_df_weighted.head(10))


              Artist Name  In-Genre Influence Count  \
0             The Beatles                       553   
1               Bob Dylan                       322   
2      The Rolling Stones                       304   
3             David Bowie                       224   
4            Led Zeppelin                       213   
5               The Kinks                       191   
6            Jimi Hendrix                       151   
7          The Beach Boys                       179   
8  The Velvet Underground                       175   
9           Black Sabbath                       169   

   Out-Genre Influence Count  Weighted Combined Influence Count  
0                         61                         337.191788  
1                         67                         210.148183  
2                         15                         177.234607  
3                         14                         131.886739  
4                          8                         123.079912 

We can see that using degree centrality results some changes in the top 10 list. 