## Rete globale:
* [Costruzione rete](#Costruzione-rete)
    * [Creazione file node attribute](#Creazione-file-node-attribute)
    * [Rete globale con account](#Rete-globale-con-account)
    * [Algoritmi per analizzare solo la componente connessa più grande](#Componente-connessa)
        * [Degree](#Degree)
        * [Closeness](#Closeness)
        * [Louvain](#Louvain)
        * [Infomap](#Infomap)
        * [Label propagation](#Label-propagation)
        * [Leiden](#Leiden)
    * [Analisi grafo completo](#grafo-completo)
        * [Degree centrality](#Degree2)
        * [Closeness](#Closeness2)
        * [Louvain](#Louvain2)
        * [Infomap](#Infomap2)
        * [Label propagation](#Label-propagation2)
        * [Leiden](#Leiden2)

In [1]:
import json
import re
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import powerlaw
import os

 
import math

from collections import Counter
from collections import defaultdict
from itertools import combinations


import networkx as nx

from cdlib import algorithms
from cdlib import evaluation
from cdlib.algorithms import louvain
from cdlib import NodeClustering


import warnings
warnings.filterwarnings('ignore')

Note: to be able to use all crisp methods, you need to install some additional packages:  {'graph_tool', 'infomap', 'wurlitzer', 'bayanpy'}
Note: to be able to use all crisp methods, you need to install some additional packages:  {'pyclustering', 'ASLPAw'}
Note: to be able to use all crisp methods, you need to install some additional packages:  {'infomap', 'wurlitzer'}


In [2]:
path = r'C:\Users\naomi\Documents\tesi 2025\prova ufficiale\dataset\\'

output_path = r"C:\Users\naomi\Documents\tesi 2025\prova ufficiale\grafi\IG\general\rete commenti sotto lo stesso post\\"
output_path2 = r"C:\Users\naomi\Documents\tesi 2025\prova ufficiale\grafi\IG\general\rete commenti sotto lo stesso post\reti singole\\"

In [3]:
with open(path+'General_IG.json', 'r', encoding = 'utf-8') as f:
    dataset = json.load(f)

## Costruzione della rete

Per costruire la rete verranno effettuati i seguenti passaggi:

1. I testi dei commenti saranno puliti da emoji, numeri e spazi vuoti.  
2. Vengono create le singole reti per ogni account: due nodi vengono connessi ogniqualvolta si trovano sotto lo stesso post.  
   Il peso degli archi è dato da:

   $$
   \text{peso}(u_1, u_2) = \frac{\text{numero di co-occorrenze tra } u_1 \text{ e } u_2}{\max(\text{count}_{u_1}, \text{count}_{u_2})}
   $$

3. Vengono aperte tutte le singole reti per estrarre solo i nodi che appaiono insieme in almeno 2 account diversi.  
4. Il peso sugli archi viene calcolato con la seguente formula:

   $$
   W_{i,j} = \left( \frac{1}{N} \sum_{a=1}^{N} w_{i,j}^{(a)} \right) \cdot \log(1 + A_{i,j}) \cdot \log(1 + C_{i,j})
   $$

Dove:

- \( W_{i,j} \): peso composito tra i nodi \( i \) e \( j \)  
- \( w_{i,j}^{(a)} \): peso normalizzato tra \( i \) e \( j \) nell’account \( a \)  
- \( N \): numero totale di account  
- \( A_{i,j} \): numero di account in cui la coppia \( (i, j) \) appare  
- \( C_{i,j} \): numero totale di co-occorrenze tra \( i \) e \( j \) (valore assoluto)
\) (vae assoluto)
assoluto)
$


In [4]:
emoji_pattern = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U0001F1E6-\U0001F1FF"  # flags
        u"\U0001F600-\U0001F64F"
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001f926-\U0001f937"
        u"\U0001F1F2"
        u"\U0001F1F4"
        u"\U0001F620"
        u"\u200d"
        u"\u2640-\u2642"
        u"\U0001F1E6\U0001F1E8"  # Bandiere di Instagram
        u"\U0001F1E9\U0001F1EA"
        u"\U0001F1EA\U0001F1F8"
        u"\U0001F1EB\U0001F1F7"
        u"\U0001F1EC\U0001F1E7"
        u"\U0001F1EE\U0001F1F9"
        u"\U0001F1EF\U0001F1F5"
        u"\U0001F1F0\U0001F1F7"
        u"\U0001F1F7\U0001F1FA"
        u"\U0001F1FA\U0001F1F8"
        u"\U0001F400-\U0001F4F0"  # Animali & Nature emoji
        u"\U0001F980-\U0001F9FF"  # Animali emoji (alcune nuove aggiunte)
        u"\U0001F493-\U0001F49E"  # Cuori colorati
        u"\U0001F9E1-\U0001F9FF"  # Altri cuori
        u"\U0001F300-\U0001F5FF"  # Simboli e pictogrammi
        u"\U0001F600-\U0001F64F"  # Emoticon
        u"\U0001F680-\U0001F6FF"  # Simboli di trasporto e mappe
        u"\U0001F700-\U0001F77F"  # Simboli alchemici e zodiacali
        u"\U0001F780-\U0001F7FF"  # Emoji geometriche
        u"\U0001F900-\U0001F9FF"  # Simboli e emoji vari
        u"\U00002660-\U000026FF"  # Simboli vari
        u"\U00002700-\U000027BF"  # Emoji vari
        u"\U0001F300-\U0001F9FF"  # Simboli vari supplementari
        u"\U00002B50-\U00002B55"  # Altre emoji
        u"\U0001F91D-\U0001F93F"  # Emoji delle mani
        "]+", flags=re.UNICODE)

### Creazione file node attribute

In [5]:
# Raccogli i sentiment score
sentiment_scores = []
for account, posts in dataset.items():
    for post_id, post_info in posts.items():
        for interaction in post_info.get('interactions_post', []):
            score = interaction.get('sentiment', {}).get('score')
            if isinstance(score, (int, float)):
                sentiment_scores.append(score)

# Crea i bin
df = pd.DataFrame({'sentiment_score': sentiment_scores})
sentiment_bins = pd.cut(df['sentiment_score'], bins=3)
bin_edges = sentiment_bins.cat.categories
sentiment_labels = ['negative', 'neutral', 'positive']

In [6]:
import math
import re
import json
from collections import defaultdict, Counter

def extract_mentions(text):
    return re.findall(r'@(\w+)', text)

def analyze_users(dataset, path, output_file, bin_edges, emoji_pattern):

    user_data = defaultdict(lambda: {
        'n_comments': 0,
        'n_mentions': 0,
        'unique_mentions': set(),
        'mentioned_accounts': set(),
        'sentiment_cats': Counter(),
        'hate_scores': [],
        'toxicity_scores': [],
        'emoji_count': 0,
        'sentiment_scores': []  # <-- per memorizzare tutti i punteggi
    })

    for account_name, posts in dataset.items():
        for post_id, post_data in posts.items():
            for interaction in post_data.get('interactions_post', []):
                author = interaction.get('user_name')
                comment = interaction.get('comment', '')
                if not author or not comment.strip() or comment.strip().isdigit():
                    continue

                user_data[author]['n_comments'] += 1

                if emoji_pattern.search(comment):
                    user_data[author]['emoji_count'] += 1

                mentions = extract_mentions(comment)
                user_data[author]['n_mentions'] += len(mentions)
                user_data[author]['unique_mentions'].update(mentions)
                if mentions:
                    user_data[author]['mentioned_accounts'].add(account_name)

                sentiment_score = interaction.get('sentiment', {}).get('score')
                hate = interaction.get('hate')

                hate_score = None
                if hate:
                    # Caso 1: formato {"label": "hateful", "score": 0.88}
                    if 'label' in hate and 'score' in hate:
                        if hate['label'] == 'hateful':
                            hate_score = hate['score']
                    
                    # Caso 2: formato {"hateful": 0.02, "non-hateful": 0.97}
                    elif 'hateful' in hate:
                        hate_score = hate['hateful']
                hate = interaction.get('hate')
                
                hate_score = None
                if hate:
                    # Caso 1: formato {"label": "hateful", "score": 0.88}
                    if 'label' in hate and 'score' in hate:
                        if hate['label'] == 'hateful':
                            hate_score = hate['score']
                        elif hate['label'] == 'non-hateful':
                            hate_score = 1 - hate['score']
                    
                    # Caso 2: formato {"hateful": 0.02, "non-hateful": 0.97}
                    elif 'hateful' in hate:
                        hate_score = hate['hateful']
                
                                
                if sentiment_score is not None:
                    # etichettatura preliminare in base al sentiment
                    if sentiment_score <= bin_edges[0].right:
                        label = 'negative'
                    elif sentiment_score <= bin_edges[1].right:
                        label = 'neutral'
                    else:
                        label = 'positive'
                
                    # riclassificazione: se neutro ma con hate alto → negativo
                    if label == 'neutral' and hate_score is not None and hate_score > 0.5:
                        label = 'negative'
                
                    user_data[author]['sentiment_scores'].append(sentiment_score)
                    user_data[author]['sentiment_cats'][label] += 1
                
                if hate_score is not None:
                    user_data[author]['hate_scores'].append(hate_score)
                
                tox_score = interaction.get('toxicity', {}).get('toxicity')
                if tox_score is not None:
                    user_data[author]['toxicity_scores'].append(tox_score)


    output_data = {}
    for user, data in user_data.items():
        n_comments = data['n_comments']
        emoji_count = data['emoji_count']
        emoji_ratio = round(emoji_count / n_comments, 4) if n_comments else 0.0

        sentiment_total = sum(data['sentiment_cats'].values())
        negative_count = data['sentiment_cats'].get('negative', 0)
        neutral_count = data['sentiment_cats'].get('neutral', 0)
        positive_count = data['sentiment_cats'].get('positive', 0)        
    
        negative_pct = round(negative_count / sentiment_total * 100, 2) if sentiment_total else 0.0
        neutral_pct = round(neutral_count / sentiment_total * 100, 2) if sentiment_total else 0.0
        positive_pct = round(positive_count / sentiment_total * 100, 2) if sentiment_total else 0.0
        avg_hate = round(sum(data['hate_scores']) / len(data['hate_scores']), 4) if data['hate_scores'] else None
        

        if negative_count > positive_count:
            node_label = 'negative'
        elif negative_count == positive_count:
            if neutral_count > 0 and avg_hate is not None and avg_hate > 0.5:
                node_label = 'negative'
            else:
                node_label = 'positive' 
        else:
            node_label = 'positive'
        
        
    # Salva dati dell'utentea
        output_data[user] = {
            'n_comments': n_comments,
            'n_mentions': data['n_mentions'],
            'n_unique_mentions': len(data['unique_mentions']),
            'n_mentioned_accounts': len(data['mentioned_accounts']),
            'negative_comment_count': negative_count,
            'neutral_comment_count': neutral_count,
            'positive_comment_count': positive_count,
            'negative_comment_percentage': negative_pct,
            'neutral_comment_percentage': neutral_pct,
            'positive_comment_percentage': positive_pct,
            'avg_hate': avg_hate,
            'avg_toxicity': round(sum(data['toxicity_scores']) / len(data['toxicity_scores']), 4) if data['toxicity_scores'] else None,
            'emoji_count': emoji_count,
            'emoji_ratio': emoji_ratio,
            'node_label': node_label
        }

    # Salva file JSON
    with open(path + output_file, 'w', encoding='utf-8') as jsonfile:
        json.dump(output_data, jsonfile, ensure_ascii=False, indent=2)

    print(f"✅ File salvato in formato JSON: {output_file}")


In [7]:
analyze_users(dataset, output_path, 'nodes_attributes_general_IG_nuovo2.json', bin_edges, emoji_pattern)

✅ File salvato in formato JSON: nodes_attributes_general_IG_nuovo2.json


In [6]:
only_spaces_or_numbers = re.compile(r'^[\s\d]*$')

def clean_comments(dataset):
    stats = {}

    for account, posts in dataset.items():
        cleaned = 0
        removed = 0

        for post_id in posts:
            post = posts[post_id]
            new_comments = []

            for comment_info in post['interactions_post']:
                original_comment = comment_info['comment']
                cleaned_comment = emoji_pattern.sub('', original_comment).strip()

                # Se è vuoto o contiene solo spazi/numeri → da eliminare
                if not cleaned_comment or only_spaces_or_numbers.fullmatch(cleaned_comment):
                    removed += 1
                    continue
                else:
                    if cleaned_comment != original_comment:
                        cleaned += 1  # Pulito da emoji, ma mantenuto
                    comment_info['comment'] = cleaned_comment
                    new_comments.append(comment_info)

            post['interactions_post'] = new_comments


    return dataset

In [7]:
dataset_pulito = clean_comments(dataset)

In [80]:

for account, posts in dataset_pulito.items():
    author_post_count = defaultdict(set)      # author -> set di post_id (solo per questo account)
    user_pairs_count = defaultdict(int)       # (u1, u2) -> co-presenza count

    # Fase 1: conteggio per l'account corrente
    for post_id, post_data in posts.items():
        interactions = post_data['interactions_post']
        authors = set(interaction['user_name'] for interaction in interactions)

        for author in authors:
            author_post_count[author].add(post_id)

        for u1, u2 in combinations(sorted(authors), 2):
            user_pairs_count[(u1, u2)] += 1

    # Fase 2: salvataggio file per l'account corrente
    file_path = os.path.join(output_path2, f'general_IG_{account}_with_normalized_weight_comment_cleaned_big_graph.edgelist')
    with open(file_path, 'w') as grafo:
        for (u1, u2), count in user_pairs_count.items():
            if count > 3:
                max_posts = max(len(author_post_count[u1]), len(author_post_count[u2]))
                if max_posts > 0:
                    weight = count / max_posts
                    grafo.write(f"{u1} {u2} {weight:.4f} {account}\n")

    # Opzionale: pulizia esplicita delle strutture dati per questo account
    del author_post_count
    del user_pairs_count


In [81]:

# Percorso dove si trovano i file
input_folder = output_path2  # stesso path dove li hai salvati
output_file = os.path.join(output_path2, "general_IG_merged_edges_with_global_weights_big_graph.edgelist")

# Dizionario: (u1, u2) -> set di account in cui compare
pair_to_accounts = defaultdict(set)

# Dizionario: (u1, u2, account) -> peso_vecchio
pair_weight_per_account = dict()

# Legge tutti i file edgelist nell'input_folder
for filename in os.listdir(input_folder):
    if filename.endswith("_with_normalized_weight_comment_cleaned_big_graph.edgelist"):
        account = filename.replace("general_IG_", "").replace("_with_normalized_weight_comment_cleaned_big_graph.edgelist", "")
        file_path = os.path.join(input_folder, filename)

        with open(file_path, "r") as f:
            for line in f:
                parts = line.strip().split()
                if len(parts) != 4:
                    continue  # salta righe mal formattate

                u1, u2, weight_str, acc_from_file = parts
                weight = float(weight_str)

                # Ordinamento garantito (u1, u2) per coerenza
                key = tuple(sorted((u1, u2)))

                pair_to_accounts[key].add(account)
                pair_weight_per_account[(key[0], key[1], account)] = weight

# Scrittura file finale
with open(output_file, "w") as out:
    for (u1, u2), accounts in pair_to_accounts.items():
        peso_nuovo = len(accounts)  # in quanti account appare la coppia
        for account in accounts:
            peso_vecchio = pair_weight_per_account.get((u1, u2, account))
            if peso_vecchio is not None:
                out.write(f"{u1} {u2} {peso_vecchio:.4f} {peso_nuovo}\n")


In [83]:
# Dizionario per tenere il conteggio: quante coppie compaiono in X account
frequency_count = defaultdict(int)

for pair, accounts in pair_to_accounts.items():
    freq = len(accounts)  # in quanti account compare la coppia
    frequency_count[freq] += 1

# Stampa ordinata per frequenza
for freq in sorted(frequency_count.keys()):
    print(f"Coppie presenti in {freq} account: {frequency_count[freq]}")

Coppie presenti in 2 account: 5634
Coppie presenti in 3 account: 78745
Coppie presenti in 4 account: 3
Coppie presenti in 5 account: 276
Coppie presenti in 6 account: 2553
Coppie presenti in 8 account: 67
Coppie presenti in 9 account: 588
Coppie presenti in 10 account: 1
Coppie presenti in 11 account: 11
Coppie presenti in 12 account: 183
Coppie presenti in 14 account: 9
Coppie presenti in 15 account: 53
Coppie presenti in 18 account: 18
Coppie presenti in 20 account: 1
Coppie presenti in 21 account: 4
Coppie presenti in 27 account: 1
Coppie presenti in 30 account: 1
Coppie presenti in 33 account: 1


In [84]:
output_filtered_file = os.path.join(output_path, "general_IG_merged_edges_filtered_min2account_big_graph.edgelist")

with open(output_filtered_file, "w") as out:
    for (u1, u2), accounts in pair_to_accounts.items():
        peso_nuovo = len(accounts)
        if peso_nuovo < 1:
            continue  # salta le coppie che appaiono in un solo account

        for account in accounts:
            peso_vecchio = pair_weight_per_account.get((u1, u2, account))
            if peso_vecchio is not None:
                out.write(f"{u1} {u2} {peso_vecchio:.4f} {peso_nuovo}\n")


In [88]:
# Funzione per calcolare il peso composito moltiplicativo
def compute_multiplicative_weight(pair, weights_by_account, account_count, absolute_counts, N):
    normalized_weights = [
        weights[pair]
        for acc, weights in weights_by_account.items()
        if pair in weights
    ]
    if not normalized_weights:
        return 0
    avg_weight = sum(normalized_weights) / N
    A_ij = account_count.get(pair, 0)
    C_ij = absolute_counts.get(pair, 0)
    weight = avg_weight * math.log(1 + A_ij) * math.log(1 + C_ij)
    return round(weight, 4)

# === Parte 1: lettura dei file ===
input_folder = output_path2
output_file = os.path.join(output_path2, "general_IG_merged_edges_with_global_weights_big_graph.edgelist")

# Strutture dati
pair_to_accounts = defaultdict(set)
pair_weight_per_account = dict()
weights_by_account = defaultdict(dict)
absolute_counts = defaultdict(int)

# Legge tutti i file edgelist
for filename in os.listdir(input_folder):
    if filename.endswith("_with_normalized_weight_comment_cleaned_big_graph.edgelist"):
        account = filename.replace("general_IG", "").replace("_with_normalized_weight_comment_cleaned_big_graph.edgelist", "")
        file_path = os.path.join(input_folder, filename)

        with open(file_path, "r") as f:
            for line in f:
                parts = line.strip().split()
                if len(parts) != 4:
                    continue

                u1, u2, weight_str, acc_from_file = parts
                weight = float(weight_str)
                key = tuple(sorted((u1, u2)))

                pair_to_accounts[key].add(account)
                pair_weight_per_account[(key[0], key[1], account)] = weight
                weights_by_account[account][key] = weight
                absolute_counts[key] += 1  # conta quante volte compare la coppia (su tutti gli account)

# Conta quanti account per coppia
account_count = {pair: len(accounts) for pair, accounts in pair_to_accounts.items()}
total_accounts = len(weights_by_account)

# === Parte 2: scrittura del file con pesi globali ===
with open(output_file, "w") as out:
    for (u1, u2), accounts in pair_to_accounts.items():
        key = (u1, u2)
        peso_globale = compute_multiplicative_weight(key, weights_by_account, account_count, absolute_counts, total_accounts)
        out.write(f"{u1} {u2} {peso_globale:.4f}\n")

# === Parte 3: scrittura file filtrato (solo coppie con almeno 2 account) ===
output_filtered_file = os.path.join(output_path, "general_IG_global_weight_big_graph.edgelist")

with open(output_filtered_file, "w") as out:
    for (u1, u2), accounts in pair_to_accounts.items():
        if len(accounts) < 1:
            continue
        key = (u1, u2)
        peso_globale = compute_multiplicative_weight(key, weights_by_account, account_count, absolute_counts, total_accounts)
        out.write(f"{u1} {u2} {peso_globale:.4f}\n")


### Rete globale con account

In [3]:
g = nx.Graph()
with open(output_path+'general_IG_global_weight_big_graph.edgelist') as f:
    for l in f:
        l = l.rsplit()
        if len(l)==3:
            g.add_edge(l[0], l[1], weight=float(l[2]))


with open(output_path + 'nodes_attributes_general_IG_nuovo2.json', 'r') as f:
    node_data = json.load(f)

# Aggiungi attributi ai nodi presenti nella rete
for node, attributes in node_data.items():
    if node in g.nodes:
        g.nodes[node].update(attributes)

print("Nodi nella rete:", g.number_of_nodes())
print("Archi nella rete:", g.number_of_edges())
print("Density:", nx.density(g))
print("Assortativity:", nx.degree_assortativity_coefficient(g))

Nodi nella rete: 9923
Archi nella rete: 88149
Density: 0.0017906271761623156
Assortativity: -0.28375263025762804


### Componente connessa

In [4]:
largest_ccG = max(nx.connected_components(g), key=len)

G_connected = g.subgraph(largest_ccG).copy()
len(G_connected)

9825

#### Degree

In [53]:
degree_sequence = sorted(((n,d) for n, d in G_connected.degree()), reverse=False, key=lambda item: -item[1])
degree_sequence[0:10]

[('general_30', 1795),
 ('user_339221', 1615),
 ('general_14', 1286),
 ('general_33', 1247),
 ('general_39', 1231),
 ('user_393808', 1182),
 ('user_472248', 980),
 ('user_370283', 919),
 ('user_625432', 889),
 ('user_229255', 853)]

#### Closeness

In [14]:
closeness = nx.closeness_centrality(G_connected) # compute the closeness centraliry of all nodes 

In [15]:
ranks = [(k, v) for k, v in sorted(closeness.items(), key=lambda item: -item[1])]
ranks[0:30]

[('general_30', 0.6002299731697969),
 ('user_339221', 0.56759695541863),
 ('user_393808', 0.558288770053476),
 ('general_14', 0.5357509408142319),
 ('general_33', 0.5244474212993971),
 ('general_39', 0.5216522318454364),
 ('user_572416', 0.5129380936783492),
 ('user_472248', 0.5089372765680859),
 ('user_370283', 0.5063045586808923),
 ('user_333699', 0.5058139534883721),
 ('user_625432', 0.5051612903225806),
 ('healthy_16', 0.5016015374759769),
 ('user_474081', 0.5),
 ('user_229789', 0.4998404085541015),
 ('user_291841', 0.49793322734499207),
 ('user_158318', 0.4954128440366973),
 ('user_426370', 0.4941621962764279),
 ('user_482802', 0.4922980194907262),
 ('user_229255', 0.4922980194907262),
 ('user_366960', 0.49214330609679446),
 ('user_237813', 0.49198868991517436),
 ('general_27', 0.49106302916274697),
 ('user_458439', 0.4909090909090909),
 ('user_255658', 0.48952797749296656),
 ('user_351537', 0.48922211808809746),
 ('general_35', 0.48845913911416095),
 ('user_327150', 0.48800249298

In [5]:
def evaluation_algo(graph, algo, algo_name):
    scd = evaluation.avg_transitivity(graph, algo)
    scd_hub = evaluation.hub_dominance(graph, algo)
    ave = evaluation.avg_embeddedness(graph, algo)
    cond = evaluation.conductance(graph, algo)
    mod = evaluation.newman_girvan_modularity(graph, algo)
    int_dens = evaluation.internal_edge_density(graph, algo)
    
    print(f"Results with {algo_name} algorithm")
    print("Transitivity:", scd.score)
    print("Hub Dominance:", scd_hub.score)
    print("Embeddedness:", ave.score)
    print("Conductance:", cond.score)
    print("Modularity:", mod.score)
    print("Internal Edge Density:", int_dens.score)

### Louvain

In [6]:
louvain_coms2 = algorithms.louvain(G_connected, weight='weight')

In [7]:
louvain_coms2.overlap 

False

In [8]:
louvain_coms2.node_coverage

1.0

In [9]:
louvain_communities2 = louvain_coms2.communities
len(louvain_communities2)

22

In [10]:
evaluation_algo(G_connected, louvain_coms2, 'Louvain')

Results with Louvain algorithm
Transitivity: 0.3086605570331933
Hub Dominance: 0.7616555846804567
Embeddedness: 0.6903677818529265
Conductance: 0.3920756803182934
Modularity: 0.44188390437310865
Internal Edge Density: 0.42452548003777296


In [11]:
reduct_communities_louvain2 = [c for c in louvain_communities2 if len(c) >= 5]
len(reduct_communities_louvain2)

13

In [12]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_louvain = set(node for community in reduct_communities_louvain2 for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = louvain_coms2.graph.subgraph(nodes_in_communities_louvain).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_louvain2 = NodeClustering(
    communities=reduct_communities_louvain2,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=louvain_coms2.method_name + "_reduct"
)

In [13]:
evaluation_algo(G_connected, reduct_nodeclustering_louvain2, 'Louvain')

Results with Louvain algorithm
Transitivity: 0.5223486349792501
Hub Dominance: 0.5966479125361577
Embeddedness: 0.809340348776747
Conductance: 0.3032562795130093
Modularity: 0.43828413818466955
Internal Edge Density: 0.0261200431408466


### Infomap

In [14]:
infomap_coms2 = algorithms.infomap(G_connected)

In [15]:
infomap_coms2.method_parameters

{'igraph': True}

In [16]:
infomap_coms2.overlap 

False

In [17]:
infomap_coms2.average_internal_degree()

FitnessResult(min=1.0, max=37.3984962406015, score=4.842436923703746, std=7.524900045787024)

In [18]:
infomap_communities2 = infomap_coms2.communities
len(infomap_communities2)

53

In [19]:
evaluation_algo(G_connected, infomap_coms2, 'Infomap')

Results with Infomap algorithm
Transitivity: 0.2371875599065822
Hub Dominance: 0.9274527016693861
Embeddedness: 0.7256416110047085
Conductance: 0.4374888907433958
Modularity: 0.3980117237262148
Internal Edge Density: 0.4018056003295724


In [20]:
reduct_communities_infomap2 = [c for c in infomap_communities2 if len(c) >= 5]
len(reduct_communities_infomap2)

33

In [21]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_infomap2 = set(node for community in reduct_communities_infomap2 for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = infomap_coms2.graph.subgraph(nodes_in_communities_infomap2).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_infomap2 = NodeClustering(
    communities=reduct_communities_infomap2,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=infomap_coms2.method_name + "_reduct"
)

In [22]:
evaluation_algo(G_connected, reduct_nodeclustering_infomap2, 'Infomap')

Results with Infomap algorithm
Transitivity: 0.3809375962136017
Hub Dominance: 0.8834846420750745
Embeddedness: 0.7833116109402223
Conductance: 0.41792948830793586
Modularity: 0.3921227522917243
Internal Edge Density: 0.08976859042830321


### Label propagation

In [23]:
lp_coms2 = algorithms.label_propagation(G_connected)

In [24]:
lp_coms2.overlap 

False

In [25]:
lp_coms2.average_internal_degree()

FitnessResult(min=1.0, max=26.328827037773358, score=3.080440677246378, std=4.606357497008617)

In [26]:
lp_coms_communities2 = lp_coms2.communities
len(lp_coms_communities2)

44

In [27]:
evaluation_algo(G_connected, lp_coms2, 'Label propagation')

Results with Label propagation algorithm
Transitivity: 0.12165098562689422
Hub Dominance: 0.9200157747377216
Embeddedness: 0.6787243378122182
Conductance: 0.39845183927484856
Modularity: 0.205312794249788
Internal Edge Density: 0.5728571516606595


In [28]:
reduct_communities_lp_coms2 = [c for c in lp_coms_communities2 if len(c) >= 5]
len(reduct_communities_lp_coms2)

16

In [29]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_lp_coms2 = set(node for community in reduct_communities_lp_coms2 for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = lp_coms2.graph.subgraph(nodes_in_communities_lp_coms2).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_lp_coms2 = NodeClustering(
    communities=reduct_communities_lp_coms2,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=lp_coms2.method_name + "_reduct"
)

In [30]:
evaluation_algo(G_connected, reduct_nodeclustering_lp_coms2, 'Label propagation')

Results with Label propagation algorithm
Transitivity: 0.3345402104739591
Hub Dominance: 0.8217100471954011
Embeddedness: 0.887833694856616
Conductance: 0.2467346214978971
Modularity: 0.1969495200449647
Internal Edge Density: 0.03369050040014718


### Leiden

In [31]:
leiden_coms2 = algorithms.leiden(G_connected, weights='weight')

In [32]:
leiden_coms2.method_parameters

{'initial_membership': None, 'weights': 'weight'}

In [33]:
leiden_coms2.overlap 

False

In [34]:
leiden_coms2.average_internal_degree()

FitnessResult(min=1.0, max=28.29230769230769, score=7.1624329908773445, std=8.25787962022157)

In [35]:
leiden_communities2 = leiden_coms2.communities
len(leiden_communities2)

24

In [36]:
evaluation_algo(G_connected, leiden_coms2, 'Leiden')

Results with Leiden algorithm
Transitivity: 0.2898999261195326
Hub Dominance: 0.7870843808923911
Embeddedness: 0.6622263106782622
Conductance: 0.42104603442607863
Modularity: 0.4420023672299359
Internal Edge Density: 0.47352326707170533


In [37]:
reduct_communities_leiden_coms2 = [c for c in leiden_communities2 if len(c) >= 5]
len(reduct_communities_leiden_coms2)

13

In [38]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_leiden_coms2 = set(node for community in reduct_communities_leiden_coms2 for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = leiden_coms2.graph.subgraph(nodes_in_communities_leiden_coms2).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_leiden_coms2 = NodeClustering(
    communities=reduct_communities_leiden_coms2,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=leiden_coms2.method_name + "_reduct"
)

In [39]:
evaluation_algo(G_connected, reduct_nodeclustering_leiden_coms2, 'Leiden')

Results with Leiden algorithm
Transitivity: 0.535199863605291
Hub Dominance: 0.6069250108782603
Embeddedness: 0.8091101120214071
Conductance: 0.30497143351920736
Modularity: 0.42557763654671443
Internal Edge Density: 0.028042954593917566


In [65]:
communities_dict = {str(i): community for i, community in enumerate(reduct_communities_leiden_coms2)}

# Salvataggio su file JSON
with open(output_path+'communities_leiden_general_IG_connected_component_GLOBAL_big_graph.json', 'w', encoding='utf-8') as f:
    json.dump(communities_dict, f, ensure_ascii=False, indent=4)

## Analisi grafo completo

#### Degree2

In [52]:
degree_sequence = sorted(((n,d) for n, d in g.degree()), reverse=False, key=lambda item: -item[1])
degree_sequence[0:10]

[('general_30', 647),
 ('user_339221', 511),
 ('user_393808', 454),
 ('general_14', 379),
 ('general_33', 334),
 ('general_39', 311),
 ('user_370283', 247),
 ('user_472248', 241),
 ('user_572416', 233),
 ('user_158318', 231)]

#### Closeness2

In [53]:
closeness = nx.closeness_centrality(g) # compute the closeness centraliry of all nodes 

In [54]:
ranks = [(k, v) for k, v in sorted(closeness.items(), key=lambda item: -item[1])]
ranks[0:30]

[('general_30', 0.5979390190737289),
 ('user_339221', 0.5654305548254291),
 ('user_393808', 0.556157896885333),
 ('general_14', 0.5337060898950936),
 ('general_33', 0.522445713584514),
 ('general_39', 0.5196611927925913),
 ('user_572416', 0.5109803146948441),
 ('user_472248', 0.5069947678788947),
 ('user_370283', 0.5043720985332554),
 ('user_333699', 0.5038833658796379),
 ('user_625432', 0.5032331937946318),
 ('healthy_16', 0.49968702779095414),
 ('user_474081', 0.49809160305343514),
 ('user_229789', 0.49793262073519273),
 ('user_291841', 0.49603271884367534),
 ('user_158318', 0.493521955319),
 ('user_426370', 0.49227608102346443),
 ('user_482802', 0.49041901941633415),
 ('user_229255', 0.49041901941633415),
 ('user_366960', 0.49026489653153954),
 ('user_237813', 0.4901108704880172),
 ('general_27', 0.48918874279189684),
 ('user_458439', 0.48903539208882724),
 ('user_255658', 0.48765955009795525),
 ('user_351537', 0.4873548580953948),
 ('general_35', 0.4865947912549466),
 ('user_327150

### Louvain2

In [40]:
louvain_coms = algorithms.louvain(g, weight='weight')

In [41]:
louvain_coms.overlap 

False

In [42]:
louvain_coms.node_coverage

1.0

In [43]:
louvain_communities = louvain_coms.communities
len(louvain_communities)

27

In [44]:
evaluation_algo(g, louvain_coms, 'Louvain')

Results with Louvain algorithm
Transitivity: 0.2596907636051376
Hub Dominance: 0.8100240589889667
Embeddedness: 0.7734458675805869
Conductance: 0.28164977923522416
Modularity: 0.4835240361830465
Internal Edge Density: 0.5411661004742853


In [45]:
reduct_communities_louvain = [c for c in louvain_communities if len(c) >= 5]
len(reduct_communities_louvain)

13

In [46]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_louvain = set(node for community in reduct_communities_louvain for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = louvain_coms.graph.subgraph(nodes_in_communities_louvain).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_louvain = NodeClustering(
    communities=reduct_communities_louvain,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=louvain_coms.method_name + "_reduct"
)

In [47]:
evaluation_algo(g, reduct_nodeclustering_louvain, 'Louvain')

Results with Louvain algorithm
Transitivity: 0.5393577397952858
Hub Dominance: 0.6054345840540077
Embeddedness: 0.8371568018981421
Conductance: 0.25547774661674755
Modularity: 0.4765829154130583
Internal Edge Density: 0.047037285600438565


### Infomap2

In [48]:
infomap_coms = algorithms.infomap(g)

In [49]:
infomap_coms.overlap 

False

In [50]:
infomap_coms.average_internal_degree()

FitnessResult(min=1.0, max=37.55988023952096, score=4.798135549275543, std=7.65661133606267)

In [51]:
infomap_communities = infomap_coms.communities
len(infomap_communities)

60

In [52]:
evaluation_algo(g, infomap_coms, 'Infomap')

Results with Infomap algorithm
Transitivity: 0.2190525118919456
Hub Dominance: 0.9353333040561416
Embeddedness: 0.7569022406969304
Conductance: 0.38616265977676095
Modularity: 0.4441653586311656
Internal Edge Density: 0.46183446856745125


In [53]:
reduct_communities_infomap = [c for c in infomap_communities if len(c) >= 5]
len(reduct_communities_infomap)

34

In [54]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_infomap = set(node for community in reduct_communities_infomap for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = infomap_coms.graph.subgraph(nodes_in_communities_infomap).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_infomap = NodeClustering(
    communities=reduct_communities_infomap,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=infomap_coms.method_name + "_reduct"
)

In [55]:
evaluation_algo(g, reduct_nodeclustering_infomap, 'Infomap')

Results with Infomap algorithm
Transitivity: 0.38656325627990396
Hub Dominance: 0.8858823012755439
Embeddedness: 0.7883650652821645
Conductance: 0.4051329850402223
Modularity: 0.4386868978236488
Internal Edge Density: 0.09931572884452182


### Label propagation2

In [56]:
lp_coms = algorithms.label_propagation(g)

In [57]:
lp_coms.overlap 

False

In [58]:
lp_coms.average_internal_degree()

FitnessResult(min=1.0, max=26.328827037773358, score=3.2755101510944584, std=5.3519323926207765)

In [59]:
lp_coms_communities = lp_coms.communities
len(lp_coms_communities)

51

In [60]:
evaluation_algo(g, lp_coms, 'Label propagation')

Results with Label propagation algorithm
Transitivity: 0.12066129627007573
Hub Dominance: 0.9261497110770653
Embeddedness: 0.7228209973281883
Conductance: 0.34376237113908503
Modularity: 0.297516460360739
Internal Edge Density: 0.61776179356813


In [61]:
reduct_communities_lp_coms = [c for c in lp_coms_communities if len(c) >= 5]
len(reduct_communities_lp_coms)

17

In [62]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_lp_coms = set(node for community in reduct_communities_lp_coms for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = lp_coms.graph.subgraph(nodes_in_communities_lp_coms).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_lp_coms = NodeClustering(
    communities=reduct_communities_lp_coms,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=lp_coms.method_name + "_reduct"
)

In [63]:
evaluation_algo(g, reduct_nodeclustering_lp_coms, 'Label propagation')

Results with Label propagation algorithm
Transitivity: 0.36198388881022714
Hub Dominance: 0.8176648195057061
Embeddedness: 0.8944317128062268
Conductance: 0.2322208202333149
Modularity: 0.28731672194824476
Internal Edge Density: 0.04936381207693903


### Leiden2

In [64]:
leiden_coms = algorithms.leiden(g,weights='weight')

In [65]:
leiden_coms.method_parameters

{'initial_membership': None, 'weights': 'weight'}

In [66]:
leiden_coms.overlap 

False

In [67]:
leiden_coms.average_internal_degree()

FitnessResult(min=1.0, max=28.191570881226053, score=6.472159904946781, std=8.224267781138154)

In [68]:
leiden_communities = leiden_coms.communities
len(leiden_communities)

31

In [69]:
evaluation_algo(g, leiden_coms, 'Leiden')

Results with Leiden algorithm
Transitivity: 0.2470670654817392
Hub Dominance: 0.8254879558521387
Embeddedness: 0.7381159147095994
Conductance: 0.32458718513289775
Modularity: 0.4855365081536149
Internal Edge Density: 0.5693813444761927


In [70]:
reduct_communities_leiden_coms = [c for c in leiden_communities if len(c) >= 5]
len(reduct_communities_leiden_coms)

14

In [71]:
# 1. Ottieni l'insieme dei nodi che compongono tutte le comunità
nodes_in_communities_leiden_coms = set(node for community in reduct_communities_leiden_coms for node in community)

# 2. Crea un sottografo a partire dal grafo originale
subgraph = leiden_coms.graph.subgraph(nodes_in_communities_leiden_coms).copy()

# 3. Crea il NodeClustering usando il sottografo
reduct_nodeclustering_leiden_coms = NodeClustering(
    communities=reduct_communities_leiden_coms,
    graph=subgraph,  # <--- USA il sottografo, non l'intero grafo
    method_name=leiden_coms.method_name + "_reduct"
)

In [72]:
evaluation_algo(g, reduct_nodeclustering_leiden_coms, 'Leiden')

Results with Leiden algorithm
Transitivity: 0.5470770735667083
Hub Dominance: 0.6135804736725928
Embeddedness: 0.8218995254283984
Conductance: 0.2801233249031169
Modularity: 0.4665814203862123
Internal Edge Density: 0.04648726276871241


In [78]:
communities_dict = {str(i): community for i, community in enumerate(reduct_communities_leiden_coms)}

# Salvataggio su file JSON
with open(output_path+'communities_leiden_general_IG_all_graph_GLOBAL_big_graph.json', 'w', encoding='utf-8') as f:
    json.dump(communities_dict, f, ensure_ascii=False, indent=4)