## Part 2:

### Sub-genres
So far we have investigated how each rock artist is connected to each other through references on their wiki page. Each of these rock artists are associated with sub-genres based on the info box on their wiki page. Artists who shares a sub-genre likely have the same inspirations, or maybe they have colloborations. Therefore, a hypothesis can be made here, that artists that are characterized by the same sub-genre is more likely to be linked together compared to other artists.

### Testing the hypothesis with modularity

One way to test this hypothesis could be to use modularity. If we consider each collection of artists within a certain sub-genre as a community, then high modularity would indicate a strong community structure. This would suggest that artists are more densely connected within their sub-genres than between different sub-genres, supporting our hypothesis.


## Detecting the communities and calculating the mdoularity

In [None]:
import pickle
import json
import requests
import networkx as nx
from collections import defaultdict
from collections import Counter
import numpy as np


url = 'https://raw.githubusercontent.com/LucasJuel/socialgraphs2025/main/assignments/answers/rock_artists_network.pkl'
response = requests.get(url)

G_rock = pickle.loads(response.content)

url2 = 'https://raw.githubusercontent.com/LucasJuel/socialgraphs2025/main/assignments/Assignment%202/artist_genres.json'
response2 = requests.get(url2)
artist_genres = response2.json()

"""
Create a new network with only nodes that have genres,
and assign primary genre as node attribute
"""
# Create new graph with same type as original

G_rock = G_rock.to_undirected()

G_filtered = nx.Graph()

# Map nodes to their primary genre
node_to_primary_genre = {}

for node in G_rock.nodes():
    # Check if node has genres (handle various naming formats)
    node_name = str(node)
    
    # Try different variations of the node name
    variations = [
        node_name,
        node_name.replace('_', ' '),
        node_name.replace(' ', '_')
    ]
    
    genres = None
    for variant in variations:
        if variant in artist_genres and artist_genres[variant]:
            genres = artist_genres[variant]
            break
    
    if genres:
        # Use first genre as primary genre
        primary_genre = genres[0]
        node_to_primary_genre[node] = primary_genre
        G_filtered.add_node(node, genre=primary_genre, all_genres=genres)

# Add edges between nodes that both have genres
for u, v in G_rock.edges():
    if u in node_to_primary_genre and v in node_to_primary_genre:
        # Copy edge attributes if they exist
        edge_data = G_rock.get_edge_data(u, v)
        if edge_data:
            G_filtered.add_edge(u, v, **edge_data)
        else:
            G_filtered.add_edge(u, v)


genre_to_nodes = defaultdict(list)

for node, genre in node_to_primary_genre.items():
    genre_to_nodes[genre].append(node)

# Convert to list of communities (sets)
communities = [set(nodes) for nodes in genre_to_nodes.values()]


"""
Main analysis function
"""

print(f"Original network: {G_rock.number_of_nodes()} nodes, {G_rock.number_of_edges()} edges")

print(f"Filtered network: {G_filtered.number_of_nodes()} nodes, {G_filtered.number_of_edges()} edges")
print(f"Removed {G_rock.number_of_nodes() - G_filtered.number_of_nodes()} nodes without genres")

# Print community statistics
print(f"\nFound {len(communities)} genre communities:")
for genre, nodes in sorted(genre_to_nodes.items(), key=lambda x: len(x[1]), reverse=True):
    print(f"  {genre}: {len(nodes)} nodes")

# Additional analysis
print("\nAdditional statistics:")

# Average degree by genre
print("\nAverage degree by genre:")
for genre, nodes in sorted(genre_to_nodes.items(), key=lambda x: len(x[1]), reverse=True)[:10]:
    degrees = [G_filtered.degree(node) for node in nodes if node in G_filtered]
    if degrees:
        avg_degree = np.mean(degrees)
        print(f"  {genre}: {avg_degree:.2f}")

# Inter-genre connectivity
print("\nInter-genre edge statistics:")
intra_genre_edges = 0
inter_genre_edges = 0

for u, v in G_filtered.edges():
    if node_to_primary_genre[u] == node_to_primary_genre[v]:
        intra_genre_edges += 1
    else:
        inter_genre_edges += 1

total_edges = G_filtered.number_of_edges()
print(f"  Intra-genre edges: {intra_genre_edges} ({100*intra_genre_edges/total_edges:.1f}%)")
print(f"  Inter-genre edges: {inter_genre_edges} ({100*inter_genre_edges/total_edges:.1f}%)")


with open("filtered_rock_artists_network.pkl", 'wb') as f:
    pickle.dump(G_filtered, f)



Original network: 484 nodes, 5943 edges
Filtered network: 366 nodes, 3592 edges
Removed 118 nodes without genres

Found 81 genre communities:
  rock music: 56 nodes
  alternative rock: 49 nodes
  hard rock: 33 nodes
  pop rock: 18 nodes
  rock and roll: 15 nodes
  new wave music: 10 nodes
  blues rock: 9 nodes
  heavy metal music: 9 nodes
  folk rock: 8 nodes
  post-grunge: 8 nodes
  pop-punk: 8 nodes
  indie rock: 8 nodes
  punk rock: 7 nodes
  art rock: 6 nodes
  progressive rock: 6 nodes
  alternative metal: 6 nodes
  southern rock: 6 nodes
  pop music: 4 nodes
  soft rock: 4 nodes
  jazz rock: 3 nodes
  nu metal: 3 nodes
  post-punk: 3 nodes
  instrumental rock: 3 nodes
  roots rock: 3 nodes
  progressive pop: 2 nodes
  beat music: 2 nodes
  funk rock: 2 nodes
  arena rock: 2 nodes
  groove metal: 2 nodes
  grunge: 2 nodes
  industrial rock: 2 nodes
  rockabilly: 2 nodes
  rock: 2 nodes
  glam metal: 2 nodes
  hip-hop: 2 nodes
  country music: 2 nodes
  neo-psychedelia: 2 nodes
  m

In [20]:
# NetworkX expects communities as list of sets
modularity = nx.community.modularity(G_filtered, communities)

print(f"Modularity of genre-based communities: {modularity:.4f}")


Modularity of genre-based communities: 0.0885


In [19]:
# Get unique genres
genres = list(set(nx.get_node_attributes(G_filtered, 'genre').values()))
genres.sort()

# Create edge count matrix
edge_counts = Counter()
for u, v in G_filtered.edges():
    genre_u = G_filtered.nodes[u]['genre']
    genre_v = G_filtered.nodes[v]['genre']
    # Order genres alphabetically for consistent counting
    edge_key = tuple(sorted([genre_u, genre_v]))
    edge_counts[edge_key] += 1

# Print top genre connections
print("\nTop 15 genre connections:")
for (g1, g2), count in edge_counts.most_common(15):
    if g1 == g2:
        print(f"  {g1} (internal): {count} edges")
    else:
        print(f"  {g1} <-> {g2}: {count} edges")


Top 15 genre connections:
  rock music (internal): 216 edges
  hard rock <-> rock music: 111 edges
  alternative rock (internal): 101 edges
  hard rock (internal): 101 edges
  rock and roll <-> rock music: 100 edges
  alternative rock <-> rock music: 94 edges
  hard rock <-> heavy metal music: 71 edges
  alternative rock <-> hard rock: 67 edges
  pop rock <-> rock music: 60 edges
  rock and roll (internal): 45 edges
  alternative rock <-> pop-punk: 40 edges
  alternative rock <-> pop rock: 38 edges
  hard rock <-> post-grunge: 37 edges
  alternative rock <-> punk rock: 35 edges
  folk rock <-> rock music: 31 edges


### Detected communities