## Rete menzioni:
* [Costruzione rete](#Costruzione-rete)  
    * [Algoritmi per analizzare solo la componente connessa più grande](#Componente-connessa)
        * [Degree](#Degree)
        * [Closeness](#Closeness)
        * [Louvain](#Louvain)
        * [Infomap](#Infomap)
        * [Label propagation](#Label-propagation)
        * [Leiden](#Leiden)
    * [Analisi grafo completo](#grafo-completo)
        * [Degree centrality](#Degree2)
        * [Closeness](#Closeness2)
        * [Louvain](#Louvain2)
        * [Infomap](#Infomap2)
        * [Label propagation](#Label-propagation2)
        * [Leiden](#Leiden2)

In [1]:
import json
import re
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import powerlaw

 
import math

from collections import Counter
from collections import defaultdict
from itertools import combinations


import networkx as nx
from conformity import attribute_conformity

from cdlib import algorithms
from cdlib import evaluation
from cdlib.algorithms import louvain
from cdlib import NodeClustering


import warnings
warnings.filterwarnings('ignore')

Note: to be able to use all crisp methods, you need to install some additional packages:  {'wurlitzer', 'bayanpy', 'graph_tool', 'infomap'}
Note: to be able to use all crisp methods, you need to install some additional packages:  {'pyclustering', 'ASLPAw'}
Note: to be able to use all crisp methods, you need to install some additional packages:  {'wurlitzer', 'infomap'}


In [2]:
path = r'C:\Users\naomi\Documents\tesi 2025\prova ufficiale\dataset\\'

output_path = r"C:\Users\naomi\Documents\tesi 2025\prova ufficiale\grafi\IG\general\rete commenti sotto lo stesso post\\"


In [3]:
with open(path+'general_IG.json', 'r', encoding = 'utf-8') as f:
    dataset = json.load(f)

## Costruzione della rete

Per costruire la rete verranno effettuati i seguenti passaggi:
1. i testi dei commenti saranno puliti da emoji, numeri e spazi vuoti
2. dai commenti viene estratta la lista delle menzioni
   - le menzioni valide saranno solo quelle già presenti all'interno del file node_data:
       in questo modo saranno eliminate tutte le menzioni "non attive", ossia utenti che sono stati menzionati, ma che non hanno mai interagito nella  rete. Infatti, se il nodo fa parte dell'insieme dei nodi in node_data, allora quel nodo sarà a sua volta un nodo che ha scritto almeno un commento
3. costruzione archi: un arco viene inserito tra due nodi solo se questi sono apparsi insieme (uno dei due ha taggato l'altro) per almeno 3 volte in tutta la rete
   

In [12]:
emoji_pattern = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U0001F1E6-\U0001F1FF"  # flags
        u"\U0001F600-\U0001F64F"
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001f926-\U0001f937"
        u"\U0001F1F2"
        u"\U0001F1F4"
        u"\U0001F620"
        u"\u200d"
        u"\u2640-\u2642"
        u"\U0001F1E6\U0001F1E8"  # Bandiere di Instagram
        u"\U0001F1E9\U0001F1EA"
        u"\U0001F1EA\U0001F1F8"
        u"\U0001F1EB\U0001F1F7"
        u"\U0001F1EC\U0001F1E7"
        u"\U0001F1EE\U0001F1F9"
        u"\U0001F1EF\U0001F1F5"
        u"\U0001F1F0\U0001F1F7"
        u"\U0001F1F7\U0001F1FA"
        u"\U0001F1FA\U0001F1F8"
        u"\U0001F400-\U0001F4F0"  # Animali & Nature emoji
        u"\U0001F980-\U0001F9FF"  # Animali emoji (alcune nuove aggiunte)
        u"\U0001F493-\U0001F49E"  # Cuori colorati
        u"\U0001F9E1-\U0001F9FF"  # Altri cuori
        u"\U0001F300-\U0001F5FF"  # Simboli e pictogrammi
        u"\U0001F600-\U0001F64F"  # Emoticon
        u"\U0001F680-\U0001F6FF"  # Simboli di trasporto e mappe
        u"\U0001F700-\U0001F77F"  # Simboli alchemici e zodiacali
        u"\U0001F780-\U0001F7FF"  # Emoji geometriche
        u"\U0001F900-\U0001F9FF"  # Simboli e emoji vari
        u"\U00002660-\U000026FF"  # Simboli vari
        u"\U00002700-\U000027BF"  # Emoji vari
        u"\U0001F300-\U0001F9FF"  # Simboli vari supplementari
        u"\U00002B50-\U00002B55"  # Altre emoji
        u"\U0001F91D-\U0001F93F"  # Emoji delle mani
        "]+", flags=re.UNICODE)

In [13]:
only_spaces_or_numbers = re.compile(r'^[\s\d]*$')

def clean_comments(dataset):
    stats = {}

    for account, posts in dataset.items():
        cleaned = 0
        removed = 0

        for post_id in posts:
            post = posts[post_id]
            new_comments = []

            for comment_info in post['interactions_post']:
                original_comment = comment_info['comment']
                cleaned_comment = emoji_pattern.sub('', original_comment).strip()

                # Se è vuoto o contiene solo spazi/numeri → da eliminare
                if not cleaned_comment or only_spaces_or_numbers.fullmatch(cleaned_comment):
                    removed += 1
                    continue
                else:
                    if cleaned_comment != original_comment:
                        cleaned += 1  # Pulito da emoji, ma mantenuto
                    comment_info['comment'] = cleaned_comment
                    new_comments.append(comment_info)

            post['interactions_post'] = new_comments


    return dataset

In [14]:
dataset_pulito = clean_comments(dataset)

#### rete con solo le menzioni all'interno - account inclusi

In [15]:
with open(output_path + 'nodes_attributes_general_IG_nuovo2.json', 'r') as f:
    node_data = json.load(f)

def extract_mentions(text):
    return re.findall(r'@(\w+)', text)

mention_counts = defaultdict(int)           # (author, mentioned) → numero di menzioni
post_counts = defaultdict(set)              # author → set di (account, post_id)
mention_accounts = defaultdict(set)         # (author, mentioned) → set di account

# FASE 1: Estrazione menzioni valide (il menzionato è in node_data)
for account, posts in dataset_pulito.items():
    for post_id, post in posts.items():
        for interaction in post['interactions_post']:
            author = interaction['user_name']
            mentions = [m for m in extract_mentions(interaction.get('comment', '')) if m != author]

            if not mentions:
                continue

            post_counts[author].add((account, post_id))

            for mentioned in mentions:
                if mentioned not in node_data:
                    continue

                mention_counts[(author, mentioned)] += 1
                mention_accounts[(author, mentioned)].add(account)

# FASE 2: Costruzione archi con vincolo sulle menzioni totali (almeno 3)
edges = {}

for (a1, a2), count in mention_counts.items():
    if count < 3:
        continue

    max_posts = len(post_counts[a1])
    if max_posts == 0:
        continue

    accounts = mention_accounts[(a1, a2)]
    weight = (count / max_posts) * math.log(1 + len(accounts))

    edges[(a1, a2)] = weight  # Solo direzione author → mentioned

# FASE 3: Estrai i nodi coinvolti
all_nodes = set()
for a1, a2 in edges:
    all_nodes.add(a1)
    all_nodes.add(a2)

# FASE 4: Salva in formato edgelist
output_edges = output_path + 'mentions_directed_network_from_comments_general_IG_with_account_nuovo.edgelist'
with open(output_edges, 'w') as f:
    for (a1, a2), weight in edges.items():
        f.write(f"{a1} {a2} {weight:.4f}\n")


In [16]:
g = nx.Graph()
with open(output_path+'mentions_directed_network_from_comments_general_IG_with_account_nuovo.edgelist') as f:
    for l in f:
        l = l.rsplit()
        if len(l)==3:
            g.add_edge(l[0], l[1], weight=float(l[2]))


with open(output_path + 'nodes_attributes_general_IG_nuovo2.json', 'r') as f:
    node_data = json.load(f)

# Aggiungi attributi ai nodi presenti nella rete
for node, attributes in node_data.items():
    if node in g.nodes:
        g.nodes[node].update(attributes)

print("Nodi nella rete:", g.number_of_nodes())
print("Archi nella rete:", g.number_of_edges())
print("Density:", nx.density(g))
print("Assortativity:", nx.degree_assortativity_coefficient(g))

Nodi nella rete: 8476
Archi nella rete: 9034
Density: 0.0002515239976557095
Assortativity: -0.5568264973600392


### Componente connessa

In [17]:
largest_ccG = max(nx.connected_components(g), key=len)

G_connected = g.subgraph(largest_ccG).copy()
len(G_connected)

7215

#### Degree

In [18]:
degree_sequence = sorted(((n,d) for n, d in G_connected.degree()), reverse=False, key=lambda item: -item[1])
degree_sequence[0:10]

[('general_33', 1311),
 ('general_39', 1068),
 ('general_15', 827),
 ('general_36', 732),
 ('general_24', 679),
 ('general_26', 557),
 ('general_20', 551),
 ('general_30', 543),
 ('general_14', 380),
 ('general_23', 370)]

#### Closeness

In [19]:
closeness = nx.closeness_centrality(G_connected) # compute the closeness centraliry of all nodes 

In [20]:
ranks = [(k, v) for k, v in sorted(closeness.items(), key=lambda item: -item[1])]
ranks[0:30]

[('user_572416', 0.4652692679780716),
 ('user_255658', 0.4320536623345511),
 ('general_39', 0.4266367023478621),
 ('healthy_16', 0.4245277467192373),
 ('user_111881', 0.42128007474889045),
 ('user_393808', 0.4192236169223617),
 ('general_30', 0.41769440101904926),
 ('user_501464', 0.4164886553894117),
 ('user_370283', 0.41627236006924406),
 ('user_680020', 0.4162483411228435),
 ('user_339221', 0.4160562892900398),
 ('general_14', 0.4146930328811221),
 ('user_117459', 0.41421681212677997),
 ('user_333699', 0.4101660222879236),
 ('user_36960', 0.40939787753248963),
 ('user_198875', 0.4067892184504342),
 ('general_33', 0.406514144032458),
 ('general_37', 0.4050305990679917),
 ('general_38', 0.4049169286035025),
 ('user_327150', 0.3997118794326241),
 ('user_291841', 0.3997118794326241),
 ('general_36', 0.3991810535635237),
 ('user_523878', 0.39904856731939375),
 ('general_4', 0.39788208041475925),
 ('general_41', 0.39724669603524226),
 ('general_24', 0.39591679929751383),
 ('user_66656', 0

In [21]:
def evaluation_algo(graph, algo, algo_name):
    scd = evaluation.avg_transitivity(graph, algo)
    scd_hub = evaluation.hub_dominance(graph, algo)
    ave = evaluation.avg_embeddedness(graph, algo)
    cond = evaluation.conductance(graph, algo)
    mod = evaluation.newman_girvan_modularity(graph, algo)
    int_dens = evaluation.internal_edge_density(graph, algo)
    
    print(f"Results with {algo_name} algorithm")
    print("Transitivity:", scd.score)
    print("Hub Dominance:", scd_hub.score)
    print("Embeddedness:", ave.score)
    print("Conductance:", cond.score)
    print("Modularity:", mod.score)
    print("Internal Edge Density:", int_dens.score)

### Louvain

In [22]:
louvain_coms2 = algorithms.louvain(G_connected, weight='weight')

In [23]:
louvain_coms2.overlap 

False

In [24]:
louvain_coms2.node_coverage

1.0

In [25]:
louvain_communities2 = louvain_coms2.communities
len(louvain_communities2)

94

In [26]:
evaluation_algo(G_connected, louvain_coms2, 'Louvain')

Results with Louvain algorithm
Transitivity: 0.001119195418564091
Hub Dominance: 0.982209390992366
Embeddedness: 0.7769463373271249
Conductance: 0.3120636897060713
Modularity: 0.8959875702550097
Internal Edge Density: 0.722601700357156


In [27]:
reduct_communities_louvain2 = [c for c in louvain_communities2 if len(c) >= 5]
len(reduct_communities_louvain2)

24

In [28]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_louvain = set(node for community in reduct_communities_louvain2 for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = louvain_coms2.graph.subgraph(nodes_in_communities_louvain).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_louvain2 = NodeClustering(
    communities=reduct_communities_louvain2,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=louvain_coms2.method_name + "_reduct"
)

In [29]:
evaluation_algo(G_connected, reduct_nodeclustering_louvain2, 'Louvain')

Results with Louvain algorithm
Transitivity: 0.004383515389376023
Hub Dominance: 0.9442090036089893
Embeddedness: 0.9286300989756834
Conductance: 0.15717008626941437
Modularity: 0.473938346948567
Internal Edge Density: 0.05241221528774975


### Infomap

In [30]:
infomap_coms2 = algorithms.infomap(G_connected)

In [31]:
infomap_coms2.method_parameters

{'igraph': True}

In [32]:
infomap_coms2.overlap 

False

In [33]:
infomap_coms2.average_internal_degree()

FitnessResult(min=0.0, max=2.09375, score=1.1363868113335154, std=0.5214383309569611)

In [34]:
infomap_communities2 = infomap_coms2.communities
len(infomap_communities2)

166

In [35]:
evaluation_algo(G_connected, infomap_coms2, 'Infomap')

Results with Infomap algorithm
Transitivity: 0.00450231630789986
Hub Dominance: 0.9796739411196789
Embeddedness: 0.6810190848212554
Conductance: 0.4012712821878046
Modularity: 0.847011726509347
Internal Edge Density: 0.6765600076051527


In [36]:
reduct_communities_infomap2 = [c for c in infomap_communities2 if len(c) >= 5]
len(reduct_communities_infomap2)

33

In [37]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_infomap2 = set(node for community in reduct_communities_infomap2 for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = infomap_coms2.graph.subgraph(nodes_in_communities_infomap2).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_infomap2 = NodeClustering(
    communities=reduct_communities_infomap2,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=infomap_coms2.method_name + "_reduct"
)

In [38]:
evaluation_algo(G_connected, reduct_nodeclustering_infomap2, 'Infomap')

Results with Infomap algorithm
Transitivity: 0.004971247690243744
Hub Dominance: 0.948012863675308
Embeddedness: 0.8699538012010837
Conductance: 0.22353890178565072
Modularity: 0.38083707298563124
Internal Edge Density: 0.09522104835723266


### Label propagation

In [39]:
lp_coms2 = algorithms.label_propagation(G_connected)

In [40]:
lp_coms2.overlap 

False

In [41]:
lp_coms2.average_internal_degree()

FitnessResult(min=1.0, max=2.096774193548387, score=1.2123177648883, std=0.36901839556807814)

In [42]:
lp_coms_communities2 = lp_coms2.communities
len(lp_coms_communities2)

149

In [43]:
evaluation_algo(G_connected, lp_coms2, 'Label propagation')

Results with Label propagation algorithm
Transitivity: 0.0049137659103851575
Hub Dominance: 0.9961980480635517
Embeddedness: 0.7652621956052474
Conductance: 0.34154788664369384
Modularity: 0.8389082621618876
Internal Edge Density: 0.793958757449137


In [44]:
reduct_communities_lp_coms2 = [c for c in lp_coms_communities2 if len(c) >= 5]
len(reduct_communities_lp_coms2)

27

In [45]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_lp_coms2 = set(node for community in reduct_communities_lp_coms2 for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = lp_coms2.graph.subgraph(nodes_in_communities_lp_coms2).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_lp_coms2 = NodeClustering(
    communities=reduct_communities_lp_coms2,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=lp_coms2.method_name + "_reduct"
)

In [46]:
evaluation_algo(G_connected, reduct_nodeclustering_lp_coms2, 'Label propagation')

Results with Label propagation algorithm
Transitivity: 0.005511769900520558
Hub Dominance: 0.9913645368445377
Embeddedness: 0.9317181621298191
Conductance: 0.19034980124888548
Modularity: 0.3648438112770099
Internal Edge Density: 0.07900697012054587


### Leiden

In [47]:
leiden_coms2 = algorithms.leiden(G_connected, weights='weight')

In [48]:
leiden_coms2.method_parameters

{'initial_membership': None, 'weights': 'weight'}

In [49]:
leiden_coms2.overlap 

False

In [50]:
leiden_coms2.average_internal_degree()

FitnessResult(min=1.0, max=2.1043478260869564, score=1.2914068328463304, std=0.4174981296510239)

In [51]:
leiden_communities2 = leiden_coms2.communities
len(leiden_communities2)

95

In [52]:
evaluation_algo(G_connected, leiden_coms2, 'Leiden')

Results with Leiden algorithm
Transitivity: 0.0010695303010729849
Hub Dominance: 0.9815544429894792
Embeddedness: 0.7777087177908966
Conductance: 0.3134108371057448
Modularity: 0.8960905671269853
Internal Edge Density: 0.7111265480618131


In [53]:
reduct_communities_leiden_coms2 = [c for c in leiden_communities2 if len(c) >= 5]
len(reduct_communities_leiden_coms2)

25

In [54]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_leiden_coms2 = set(node for community in reduct_communities_leiden_coms2 for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = leiden_coms2.graph.subgraph(nodes_in_communities_leiden_coms2).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_leiden_coms2 = NodeClustering(
    communities=reduct_communities_leiden_coms2,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=leiden_coms2.method_name + "_reduct"
)

In [55]:
evaluation_algo(G_connected, reduct_nodeclustering_leiden_coms2, 'Leiden')

Results with Leiden algorithm
Transitivity: 0.004064215144077342
Hub Dominance: 0.9432402166933542
Embeddedness: 0.9229597942720731
Conductance: 0.1684849905256398
Modularity: 0.4713476901022875
Internal Edge Density: 0.055614215968223364


In [56]:
communities_dict = {str(i): community for i, community in enumerate(reduct_communities_leiden_coms2)}

# Salvataggio su file JSON
with open(output_path+'communities_leiden_general_IG_connected_component_MENTIONS_nuovo.json', 'w', encoding='utf-8') as f:
    json.dump(communities_dict, f, ensure_ascii=False, indent=4)

## Analisi grafo completo

#### Degree2

In [57]:
degree_sequence = sorted(((n,d) for n, d in g.degree()), reverse=False, key=lambda item: -item[1])
degree_sequence[0:10]

[('general_33', 1311),
 ('general_39', 1068),
 ('general_15', 827),
 ('general_36', 732),
 ('general_24', 679),
 ('general_26', 557),
 ('general_20', 551),
 ('general_30', 543),
 ('general_14', 380),
 ('general_23', 370)]

#### Closeness2

In [58]:
closeness = nx.closeness_centrality(g) # compute the closeness centraliry of all nodes 

In [59]:
ranks = [(k, v) for k, v in sorted(closeness.items(), key=lambda item: -item[1])]
ranks[0:30]

[('user_572416', 0.39604159282522816),
 ('user_255658', 0.36776815576182326),
 ('general_39', 0.36315718828760796),
 ('healthy_16', 0.36136202534897677),
 ('user_111881', 0.35859757631132694),
 ('user_393808', 0.3568470999973944),
 ('general_30', 0.35554541698541847),
 ('user_501464', 0.3545190749238013),
 ('user_370283', 0.3543349623055489),
 ('user_680020', 0.3543145171516452),
 ('user_339221', 0.35415104081868404),
 ('general_14', 0.3529906240949162),
 ('user_117459', 0.352585260493521),
 ('user_333699', 0.3491371899451423),
 ('user_36960', 0.348483337878393),
 ('user_198875', 0.3462628226432369),
 ('general_33', 0.34602867670208287),
 ('general_37', 0.34476586922436486),
 ('general_38', 0.3446691118519961),
 ('user_327150', 0.34023852486453693),
 ('user_291841', 0.34023852486453693),
 ('general_36', 0.339786680874013),
 ('user_523878', 0.339673907332402),
 ('general_4', 0.33868098266809127),
 ('general_41', 0.3381401374865177),
 ('general_24', 0.3370081168297658),
 ('user_66656', 0

### Louvain2

In [60]:
louvain_coms = algorithms.louvain(g, weight='weight')

In [61]:
louvain_coms.overlap 

False

In [62]:
louvain_coms.node_coverage

1.0

In [63]:
louvain_communities = louvain_coms.communities
len(louvain_communities)

676

In [64]:
evaluation_algo(g, louvain_coms, 'Louvain')

Results with Louvain algorithm
Transitivity: 0.001675596955363517
Hub Dominance: 0.9952339452944828
Embeddedness: 0.9740077442451253
Conductance: 0.03674793890841177
Modularity: 0.9556074568180929
Internal Edge Density: 0.939539620282196


In [65]:
reduct_communities_louvain = [c for c in louvain_communities if len(c) >= 5]
len(reduct_communities_louvain)

25

In [66]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_louvain = set(node for community in reduct_communities_louvain for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = louvain_coms.graph.subgraph(nodes_in_communities_louvain).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_louvain = NodeClustering(
    communities=reduct_communities_louvain,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=louvain_coms.method_name + "_reduct"
)

In [67]:
evaluation_algo(g, reduct_nodeclustering_louvain, 'Louvain')

Results with Louvain algorithm
Transitivity: 0.005308141673029497
Hub Dominance: 0.9111258807628161
Embeddedness: 0.9560582932770788
Conductance: 0.11261664903583536
Modularity: 0.3621398143323317
Internal Edge Density: 0.09848466576391407


### Infomap2

In [68]:
infomap_coms = algorithms.infomap(g)

In [69]:
infomap_coms.overlap 

False

In [70]:
infomap_coms.average_internal_degree()

FitnessResult(min=0.0, max=2.09375, score=1.0566945128012928, std=0.268625533473743)

In [71]:
infomap_communities = infomap_coms.communities
len(infomap_communities)

760

In [72]:
evaluation_algo(g, infomap_coms, 'Infomap')

Results with Infomap algorithm
Transitivity: 0.0022880314044483384
Hub Dominance: 0.993859520597869
Embeddedness: 0.9313149642794845
Conductance: 0.08657545595985428
Modularity: 0.9244632866191989
Internal Edge Density: 0.9037442841320085


In [73]:
reduct_communities_infomap = [c for c in infomap_communities if len(c) >= 5]
len(reduct_communities_infomap)

38

In [74]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_infomap = set(node for community in reduct_communities_infomap for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = infomap_coms.graph.subgraph(nodes_in_communities_infomap).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_infomap = NodeClustering(
    communities=reduct_communities_infomap,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=infomap_coms.method_name + "_reduct"
)

In [75]:
evaluation_algo(g, reduct_nodeclustering_infomap, 'Infomap')

Results with Infomap algorithm
Transitivity: 0.0040939614223001
Hub Dominance: 0.9324074559161731
Embeddedness: 0.8861933389047948
Conductance: 0.19498029714194753
Modularity: 0.2641353968550222
Internal Edge Density: 0.13190322649982


### Label propagation2

In [76]:
lp_coms = algorithms.label_propagation(g)

In [77]:
lp_coms.overlap 

False

In [78]:
lp_coms.average_internal_degree()

FitnessResult(min=1.0, max=2.096774193548387, score=1.0691682467224766, std=0.2064792855705289)

In [79]:
lp_coms_communities = lp_coms.communities
len(lp_coms_communities)

747

In [80]:
evaluation_algo(g, lp_coms, 'Label propagation')

Results with Label propagation algorithm
Transitivity: 0.0023188100678010557
Hub Dominance: 0.9983491867400301
Embeddedness: 0.9519045224314497
Conductance: 0.0700135996244099
Modularity: 0.9193864553461703
Internal Edge Density: 0.9334223849084178


In [81]:
reduct_communities_lp_coms = [c for c in lp_coms_communities if len(c) >= 5]
len(reduct_communities_lp_coms)

29

In [82]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_lp_coms = set(node for community in reduct_communities_lp_coms for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = lp_coms.graph.subgraph(nodes_in_communities_lp_coms).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_lp_coms = NodeClustering(
    communities=reduct_communities_lp_coms,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=lp_coms.method_name + "_reduct"
)

In [83]:
evaluation_algo(g, reduct_nodeclustering_lp_coms, 'Label propagation')

Results with Label propagation algorithm
Transitivity: 0.005131647838415692
Hub Dominance: 0.991960086027673
Embeddedness: 0.9364272543967281
Conductance: 0.17722222874896235
Modularity: 0.24859558657123373
Internal Edge Density: 0.10114442045705994


### Leiden2

In [84]:
leiden_coms = algorithms.leiden(g,weights='weight')

In [85]:
leiden_coms.method_parameters

{'initial_membership': None, 'weights': 'weight'}

In [86]:
leiden_coms.overlap 

False

In [87]:
leiden_coms.average_internal_degree()

FitnessResult(min=1.0, max=2.144578313253012, score=1.0648734126992614, std=0.2024480654445327)

In [88]:
leiden_communities = leiden_coms.communities
len(leiden_communities)

679

In [89]:
evaluation_algo(g, leiden_coms, 'Leiden')

Results with Leiden algorithm
Transitivity: 0.0016628889707419714
Hub Dominance: 0.9953728942001581
Embeddedness: 0.9737975497989891
Conductance: 0.03733187088529337
Modularity: 0.9557416665534699
Internal Edge Density: 0.9370735827468565


In [90]:
reduct_communities_leiden_coms = [c for c in leiden_communities if len(c) >= 5]
len(reduct_communities_leiden_coms)

26

In [91]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_leiden_coms = set(node for community in reduct_communities_leiden_coms for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = leiden_coms.graph.subgraph(nodes_in_communities_leiden_coms).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_leiden_coms = NodeClustering(
    communities=reduct_communities_leiden_coms,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=leiden_coms.method_name + "_reduct"
)

In [92]:
evaluation_algo(g, reduct_nodeclustering_leiden_coms, 'Leiden')

Results with Leiden algorithm
Transitivity: 0.004965446582069179
Hub Dominance: 0.9176228908425887
Embeddedness: 0.9564714821436888
Conductance: 0.11678598343113207
Modularity: 0.35623273338680034
Internal Edge Density: 0.10024215455572806


In [93]:
communities_dict = {str(i): community for i, community in enumerate(reduct_communities_leiden_coms)}

# Salvataggio su file JSON
with open(output_path+'communities_leiden_general_IG_all_graph_MENTIONS_nuovo.json', 'w', encoding='utf-8') as f:
    json.dump(communities_dict, f, ensure_ascii=False, indent=4)