# Assignmet 2

Loading libraries

In [30]:
import pickle
import json
import networkx as nx
import matplotlib.pyplot as plt
from collections import defaultdict
import community.community_louvain as community
from networkx.algorithms.community.quality import modularity

## Part 1: Genres and communities and plotting 

> **Write about genres and modularity.**


* Modularity quantifies how well a network is divided into communities by comparing the density of edges inside communities to edges between communities. High modularity values indicate that there are more connections within communities than would be expected in a random network with the same node degree distribution. This makes modularity a useful metric for identifying meaningful communities in networks.

* In the context of country music artists we are going to analyse a previously created network extracting the data from Wikipedia. Then we analyse the communities of the netwrok based on the first genre of the artist.

> **Detect the communities, discuss the value of modularity in comparison to the genres.**

In [31]:
def load_artist_genres_dict():
    artist_genres_dict = {}
    with open('artists_genres_dictionary.txt', 'r') as f:
        artist_genres_dict = json.load(f)
    return artist_genres_dict

artists = load_artist_genres_dict()

In [32]:
# Load the graph from the pickle file
with open('../lab5/artist_graph.pkl', 'rb') as f:
    G = pickle.load(f)

# make the graph undirected
G = G.to_undirected()

rm_nodes = []

# filter the network to only include nodes that are in the artist_genres_dict
for node in G.nodes():
    if node.replace('_', ' ') in artists.keys():
        if artists[node.replace('_', ' ')][0] != 'country' or len(artists[node.replace('_', ' ')]) == 1:
            G.nodes[node]['genre'] = artists[node.replace('_', ' ')][1]
        else:
            G.nodes[node]['genre'] = artists[node.replace('_', ' ')][0]
    else:
        rm_nodes.append(node)

for node in rm_nodes:
    G.remove_node(node)

print(f"Number of nodes: {G.number_of_nodes()}")
print(f"Number of edges: {G.number_of_edges()}")
print(G.nodes(data=True))

Number of nodes: 1829
Number of edges: 14060
[('Haley_&_Michaels', {'len_content': 1370, 'genre': 'country'}), ('Dickey_Betts', {'len_content': 4911, 'genre': 'rock'}), ('Two_Tons_of_Steel', {'len_content': 522, 'genre': 'country'}), ('Bacon_Brothers', {'len_content': 101, 'genre': 'country rock'}), ('Cledus_T._Judd', {'len_content': 5373, 'genre': 'country'}), ('Charlie_Major', {'len_content': 932, 'genre': 'country'}), ('Caryl_Mack_Parker', {'len_content': 752, 'genre': 'country'}), ('Tenille_Arts', {'len_content': 4681, 'genre': 'country'}), ('Tyler_Hubbard', {'len_content': 4453, 'genre': 'country'}), ('Steven_Lee_Olsen', {'len_content': 2802, 'genre': 'country'}), ('O._B._McClinton', {'len_content': 1789, 'genre': 'country'}), ('JT_Hodges', {'len_content': 1991, 'genre': 'country'}), ('Hank_Flamingo', {'len_content': 1029, 'genre': 'country'}), ('Shawn_Camp_(singer)', {'len_content': 1678, 'genre': 'country'}), ('Valerie_June', {'len_content': 4844, 'genre': 'americana'}), ('Roxie

In [33]:
# Create a dictionary to hold communities based on the 'category' attribute
communities = defaultdict(list)

# Group nodes by the 'category' attribute
for node, data in G.nodes(data=True):
    category = data['genre']
    communities[category].append(node)

# Convert communities to a list of lists
communities_list = list(communities.values())

# Calculate modularity
mod = modularity(G, communities_list)
print(F"Modularity of the graph: {mod}")

# calculate modularity based on equation
# M is the modularity.
# nc is the number of communities.
# Lc is the number of edges within community 
# L is the total number of edges in the graph.
# kc is the sum of the degrees of all nodes in community 
# Total number of edges in the graph
L = G.number_of_edges()

# Modularity calculation based on the given formula
M = 0
for c in communities_list:
    # Subgraph of the current community
    subgraph = G.subgraph(c)
    
    # L_c: Number of edges inside this community
    L_c = subgraph.number_of_edges()
    
    # k_c: Sum of the degrees of the nodes in this community
    k_c = sum(dict(G.degree(c)).values())
    
    # Apply the modularity formula
    M += (L_c / L) - (k_c / (2 * L))**2

# Print the modularity
print(f"Modularity based on formula: {M}")


Modularity of the graph: 0.07113789914793135
Modularity based on formula: 0.07113789914793132


In [34]:
# first compute the best partition base on Louvain-algorithm
partition = community.best_partition(G)

# calculate the modularity of the partition
mod = community.modularity(partition, G)
print(F"Modularity of the partition by Louvain-algorithm: {mod}")

Modularity of the partition by Louvain-algorithm: 0.3890048213407688


## - Explain the comparison and talk about modularity value.

The higher is M for a partition, the better is the corresponding community structure. Therefore in our case since M is close to zero this means that there is almost only one community, and as mentioned in the teachers nb this is due to the fact that country is the first genre for most of the artists. If we removed this genre maybe the modularity gets better.

> **Calculate the matrix $D$ and discuss your findings.**