# Detec√ß√£o de Vi√©s Social via Programa√ß√£o Semidefinida

## Implementa√ß√£o Completa do Artigo + Heur√≠stica Eficiente

Este notebook cont√©m:
1. ‚úÖ **Implementa√ß√£o SDP Correta** (conforme artigo original)
2. ‚úÖ **Heur√≠stica Eficiente** (60x mais r√°pida, mesmos resultados!)
3. ‚úÖ **Exemplos**: Karate Club + TwiBot-22
4. ‚úÖ **Compara√ß√£o completa** dos m√©todos

### Resultados Comprovados:
- **+143% em separa√ß√£o de vi√©s** vs Louvain
- **+19% em pureza de vi√©s** vs Louvain
- SDP e Heur√≠stica convergem para mesma solu√ß√£o!

---
**Artigo:** *Detec√ß√£o de Vi√©s Social em Redes Sociais via Programa√ß√£o Semidefinida e An√°lise Estrutural de Grafos*  
**Autores:** Sergio A. Monteiro, Ronaldo M. Gregorio, Nelson Maculan, Vitor Ponciano e Axl Andrade 


## 1. Instala√ß√£o

In [None]:
!pip install networkx python-louvain cvxpy scikit-learn matplotlib seaborn pandas numpy python-igraph psutil transformers torch -q
print("‚úÖ Depend√™ncias instaladas!")

## 2. Imports

In [None]:
# C√©lula Nova 2: Configura√ß√£o e Imports (C√≥digo CORRIGIDO)

# --- Imports Padr√£o ---
import pandas as pd
import json
import glob
import os
import sys
import networkx as nx
import numpy as np
import random
import time
import warnings
from collections import defaultdict
import community.community_louvain as community_louvain
from sklearn.cluster import AgglomerativeClustering
from typing import Dict, Tuple, List, Optional
import matplotlib.pyplot as plt
import seaborn as sns
import igraph as ig

print("‚úÖ M√≥dulos padr√£o importados.")

# --- Adicionar a pasta raiz ao sys.path ---
# Permite que o notebook encontre e importe do diret√≥rio 'src/'
try:
    # Tenta um caminho relativo (funciona se executado como script)
    project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
except NameError:
    # Fallback para ambientes interativos (Jupyter, Colab)
    # Vai um n√≠vel ACIMA do diret√≥rio atual (que deve ser 'notebooks/')
    # para chegar √† raiz do projeto onde 'src/' se encontra.
    project_root = os.path.abspath(os.path.join(os.getcwd(), '..')) # <<< CORRE√á√ÉO AQUI ('..' em vez de '.')
    
if project_root not in sys.path:
    sys.path.append(project_root)
    print(f"Adicionado '{project_root}' ao sys.path para encontrar 'src'.")
else:
    print(f"'{project_root}' j√° est√° no sys.path.")


# --- Imports do Nosso Projeto (de 'src/') ---
try:
    from src.sdp_model import BiasAwareSDP
    from src.heuristic import EnhancedLouvainWithBias
    from src.evaluation import ComprehensiveEvaluator
    from src.data_utils import generate_misaligned_network, generate_twibot_like_network
    print("‚úÖ Classes e fun√ß√µes do projeto ('src/') importadas com sucesso!")
except (ImportError, ModuleNotFoundError) as e:
    print(f"‚ö†Ô∏è ERRO AO IMPORTAR DE 'SRC': {e}")
    print("   - Verifique se o notebook est√° na pasta 'notebooks/' e os arquivos .py est√£o em 'src/'.")
    print("   - Certifique-se de que a pasta raiz do projeto foi adicionada corretamente ao sys.path acima.")
    print("   - Certifique-se de que h√° um arquivo vazio '__init__.py' em 'src/'.")
    raise e

# --- Configura√ß√µes Gerais ---
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
np.random.seed(42)
random.seed(42)

# --- Configura√ß√£o do TwiBot-22 ---
TWIBOT_PATH = os.path.join(project_root, "data", "TwiBot22") # Caminho relativo √† raiz do projeto
if not os.path.exists(TWIBOT_PATH):
    print(f"‚ö†Ô∏è AVISO: Diret√≥rio TwiBot-22 n√£o encontrado em '{TWIBOT_PATH}'. Verifique o caminho.")
else:
    print(f"‚úÖ Usando dados do TwiBot-22 em: {TWIBOT_PATH}")

## 3. Exemplo: Karate Club

In [None]:
print("\n" + "="*80)
print("EXEMPLO: KARATE CLUB")
print("="*80)

G_karate = nx.karate_club_graph()
print(f"\nN√≥s: {G_karate.number_of_nodes()}, Arestas: {G_karate.number_of_edges()}")

# Simular vi√©s
bias_karate = {}
for node in G_karate.nodes():
    base = 0.7 if node < 17 else -0.7
    bias_karate[node] = np.clip(base + np.random.normal(0, 0.2), -1, 1)

# Simular bots
bot_nodes = random.sample(list(G_karate.nodes()), int(G_karate.number_of_nodes() * 0.1))
bot_karate = {node: node in bot_nodes for node in G_karate.nodes()}

# Louvain
print("\nüîç Louvain...")
partition_louvain = community_louvain.best_partition(G_karate)
metrics_louvain = ComprehensiveEvaluator.evaluate_communities(G_karate, partition_louvain, bias_karate, bot_karate)
print(f"  Modularidade: {metrics_louvain['modularity']:.4f}")
print(f"  Separa√ß√£o de vi√©s: {metrics_louvain['bias_separation']:.4f}")

# SDP
print("\nüîç SDP (Œ±=0.5)...")
detector_sdp = BiasAwareSDP(alpha=0.5, verbose=False)
detector_sdp.fit(G_karate, bias_karate)
partition_sdp = detector_sdp.get_communities()
metrics_sdp = ComprehensiveEvaluator.evaluate_communities(G_karate, partition_sdp, bias_karate, bot_karate)
print(f"  Modularidade: {metrics_sdp['modularity']:.4f} ({(metrics_sdp['modularity']/metrics_louvain['modularity']-1)*100:+.1f}%)")
print(f"  Separa√ß√£o de vi√©s: {metrics_sdp['bias_separation']:.4f} ({(metrics_sdp['bias_separation']/metrics_louvain['bias_separation']-1)*100:+.1f}%)")
print(f"  Tempo: {detector_sdp.execution_time:.3f}s")

# Visualiza√ß√£o
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
pos = nx.spring_layout(G_karate, seed=42)

nx.draw_networkx(G_karate, pos, node_color=[partition_louvain[n] for n in G_karate.nodes()],
                 cmap='Set3', with_labels=True, node_size=500, ax=axes[0], font_size=8)
axes[0].set_title(f'Louvain\nMod: {metrics_louvain["modularity"]:.3f}', fontweight='bold')
axes[0].axis('off')

nx.draw_networkx(G_karate, pos, node_color=[partition_sdp[n] for n in G_karate.nodes()],
                 cmap='Set3', with_labels=True, node_size=500, ax=axes[1], font_size=8)
axes[1].set_title(f'SDP (Œ±=0.5)\nSep: {metrics_sdp["bias_separation"]:.3f}', fontweight='bold')
axes[1].axis('off')

plt.tight_layout()
plt.show()

print("\n‚úÖ Exemplo conclu√≠do!")

## 4. Compara√ß√£o SDP vs Heur√≠stica

In [None]:
print("\n" + "="*90)
print("COMPARA√á√ÉO: SDP vs HEUR√çSTICA")
print("="*90)

G, bias_scores, bot_labels = generate_misaligned_network(n_nodes=100)
print(f"\nRede: {G.number_of_nodes()} n√≥s, {G.number_of_edges()} arestas")

results = []
alphas = [0.0, 0.3, 0.5, 0.7, 1.0]

# Louvain baseline
partition_louvain = community_louvain.best_partition(G)
metrics_louvain = ComprehensiveEvaluator.evaluate_communities(G, partition_louvain, bias_scores, bot_labels)
results.append({'method': 'Louvain', 'alpha': None, **metrics_louvain})

print("\nüîç Testando SDP...")
for alpha in alphas:
    detector = BiasAwareSDP(alpha=alpha, verbose=False)
    detector.fit(G, bias_scores)
    partition = detector.get_communities()
    metrics = ComprehensiveEvaluator.evaluate_communities(G, partition, bias_scores, bot_labels)
    results.append({'method': 'SDP', 'alpha': alpha, 'time': detector.execution_time, **metrics})
    print(f"  Œ±={alpha}: Sep={metrics['bias_separation']:.3f}, Tempo={detector.execution_time:.3f}s")

print("\nüîç Testando Heur√≠stica...")
for alpha in alphas:
    detector = EnhancedLouvainWithBias(alpha=alpha, verbose=False)
    detector.fit(G, bias_scores, num_communities=2)
    partition = detector.get_communities()
    metrics = ComprehensiveEvaluator.evaluate_communities(G, partition, bias_scores, bot_labels)
    results.append({'method': 'Heur√≠stica', 'alpha': alpha, 'time': detector.execution_time, **metrics})
    print(f"  Œ±={alpha}: Sep={metrics['bias_separation']:.3f}, Tempo={detector.execution_time:.3f}s")

df = pd.DataFrame(results)

print("\nüìä Resultados:")
print(df[['method', 'alpha', 'modularity', 'bias_separation', 'time']].to_string(index=False))

# Visualiza√ß√£o
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

df_sdp = df[df['method'] == 'SDP']
df_heur = df[df['method'] == 'Heur√≠stica']

axes[0].plot(df_sdp['alpha'], df_sdp['bias_separation'], 'o-', label='SDP', linewidth=2, markersize=8)
axes[0].plot(df_heur['alpha'], df_heur['bias_separation'], 's--', label='Heur√≠stica', linewidth=2, markersize=8)
axes[0].axhline(y=metrics_louvain['bias_separation'], color='red', linestyle=':', label='Louvain')
axes[0].set_xlabel('Alpha (Œ±)')
axes[0].set_ylabel('Separa√ß√£o de Vi√©s')
axes[0].set_title('Qualidade: SDP vs Heur√≠stica')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(df_sdp['alpha'], df_sdp['time'], 'o-', label='SDP', linewidth=2, markersize=8)
axes[1].plot(df_heur['alpha'], df_heur['time'], 's--', label='Heur√≠stica', linewidth=2, markersize=8)
axes[1].set_xlabel('Alpha (Œ±)')
axes[1].set_ylabel('Tempo (s)')
axes[1].set_title('Efici√™ncia Computacional')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
axes[1].set_yscale('log')

plt.tight_layout()
plt.show()

print("\n‚úÖ Compara√ß√£o conclu√≠da!")
print(f"\nüí° Conclus√£o: Heur√≠stica √© ~{df_sdp['time'].mean()/df_heur['time'].mean():.0f}x mais r√°pida com resultados equivalentes!")

## 5. TwiBot-22: Dataset Real

Dataset Real (Requer Download)

Para usar o TwiBot-22 real:
1. Acesse: https://github.com/LuoUndergradXJTU/TwiBot-22
2. Solicite acesso
3. Baixe e mova para a pasta data, na subpasta TwiBot22

Nesta se√ß√£o, aplicaremos a metodologia ao dataset TwiBot-22 real.
Utilizaremos a **Heur√≠stica Eficiente (`EnhancedLouvainWithBias`)** devido √† escala do grafo.

### 5.1 Configura√ß√£o e imports

In [None]:
# Imports espec√≠ficos para esta se√ß√£o (adicionar aos imports gerais se preferir)
import pandas as pd
import json
import glob
import os # Para verificar a exist√™ncia do diret√≥rio

# Imports das nossas classes (j√° devem estar na C√©lula 3 do notebook atualizado)
# from sdp_model import BiasAwareSDP # (N√£o usaremos aqui devido √† escala)
# from heuristic import EnhancedLouvainWithBias
# from evaluation import ComprehensiveEvaluator
# from data_utils import generate_twibot_like_network # (N√£o usaremos aqui, mas pode manter o import)

# --- Configura√ß√£o ---
# AJUSTE ESTE CAMINHO para onde voc√™ descompactou o TwiBot-22
TWIBOT_PATH = "../data/TwiBot22" 

# Verificar se o diret√≥rio existe
if not os.path.exists(TWIBOT_PATH):
    print(f"‚ö†Ô∏è ERRO: Diret√≥rio TwiBot-22 n√£o encontrado em '{TWIBOT_PATH}'.")
    print("   Por favor, ajuste a vari√°vel TWIBOT_PATH ou fa√ßa upload dos dados.")
    # Voc√™ pode querer parar a execu√ß√£o aqui ou usar dados simulados como fallback
    # raise FileNotFoundError(f"Diret√≥rio TwiBot-22 n√£o encontrado em {TWIBOT_PATH}")
else:
    print(f"‚úÖ Usando dados do TwiBot-22 em: {TWIBOT_PATH}")

### 5.2 Carregando Labels e Arestas

Carregamos os r√≥tulos de bot/humano (`label.csv`) e as conex√µes do grafo (`edge.csv`).

In [None]:
# C√âLULA 1: CONSTRU√á√ÉO OTIMIZADA DO GRAFO (IGRAPH - COM SAVE/LOAD ATIVADO)

import pandas as pd
import json
import igraph as ig
import os
import time
import gc
import pickle

# --- Nomes dos arquivos para salvar/carregar ---
graph_save_file = "igraph_real_graph.pkl"
labels_save_file = "igraph_real_labels.json"
id_map_save_file = "igraph_id_maps.json"

# --- Tentar carregar os arquivos pr√©-processados ---
print(f"üíæ Verificando se arquivos de grafo pr√©-processado existem...")
graph_loaded_successfully = False # Assumir que n√£o carregou inicialmente
if os.path.exists(graph_save_file) and os.path.exists(labels_save_file) and os.path.exists(id_map_save_file):
    print(f"   Arquivos encontrados! Carregando grafo, labels e mapas pr√©-processados...")
    try:
        # Carregar grafo igraph
        with open(graph_save_file, 'rb') as f:
            G_igraph_real = pickle.load(f)
        
        # Carregar dicion√°rio de labels (chaves s√£o str no JSON, converter para int)
        with open(labels_save_file, 'r', encoding='utf-8') as f:
            bot_labels_sub_igraph_str_keys = json.load(f)
            bot_labels_sub_igraph = {int(k): v for k, v in bot_labels_sub_igraph_str_keys.items()}

        # Carregar mapeamentos de ID (chaves do rev_map s√£o str no JSON, converter para int)
        with open(id_map_save_file, 'r', encoding='utf-8') as f:
            id_maps = json.load(f)
            user_id_map = id_maps['user_id_map'] # str -> int
            user_id_rev_map = {int(k): v for k, v in id_maps['user_id_rev_map'].items()} # int -> str

        # Verificar se o grafo carregado parece v√°lido
        if isinstance(G_igraph_real, ig.Graph) and G_igraph_real.vcount() > 1:
            print(f"   ‚úÖ Grafo igraph carregado: {G_igraph_real.vcount():,} n√≥s, {G_igraph_real.ecount():,} arestas.")
            print(f"   ‚úÖ Labels carregados para {len(bot_labels_sub_igraph):,} n√≥s.")
            print(f"   ‚úÖ Mapeamentos de ID carregados.")
            graph_loaded_successfully = True # Marcar como carregado com sucesso
        else:
            print(f"   ‚ö†Ô∏è Arquivo de grafo '{graph_save_file}' inv√°lido ou vazio. Recalculando...")
            # Limpar vari√°veis se o grafo carregado for inv√°lido
            del G_igraph_real 
            del bot_labels_sub_igraph 
            del user_id_map
            del user_id_rev_map
            # gc.collect()

    except Exception as e:
        print(f"   ‚ö†Ô∏è Erro ao carregar arquivos: {e}. Recalculando grafo...")
        # Limpar vari√°veis em caso de erro
        if 'G_igraph_real' in locals(): del G_igraph_real
        if 'bot_labels_sub_igraph' in locals(): del bot_labels_sub_igraph
        if 'user_id_map' in locals(): del user_id_map
        if 'user_id_rev_map' in locals(): del user_id_rev_map
        # gc.collect()
else:
    print(f"   Arquivos n√£o encontrados. Grafo ser√° constru√≠do.")

# --- Executar constru√ß√£o completa APENAS se n√£o carregou os arquivos ---
if not graph_loaded_successfully:
    print("\n--- Iniciando constru√ß√£o completa do grafo ---")
    
    # --- Carregar Labels (Original) ---
    try:
        label_df = pd.read_csv(f"{TWIBOT_PATH}/label.csv")
        bot_labels_real = dict(zip(label_df['id'].astype(str), label_df['label'] == 'bot'))
        user_ids_str_list = sorted(list(bot_labels_real.keys()))
        user_id_map = {user_id: i for i, user_id in enumerate(user_ids_str_list)}
        user_id_rev_map = {i: user_id for user_id, i in user_id_map.items()}
        valid_user_ids_str_set = set(bot_labels_real.keys())
        print(f"üìä Carregados {len(bot_labels_real):,} r√≥tulos.")
    except Exception as e:
        print(f"‚ö†Ô∏è ERRO ao carregar label.csv: {e}")
        raise e

    # --- Construir Grafo igraph lendo edge.csv em Chunks (Original) ---
    print("\n‚öôÔ∏è Processando edge.csv em chunks...")
    start_time_graph = time.time()
    chunk_size = 100000 # Manter chunk pequeno
    print(f"   Usando chunk size: {chunk_size:,}")
    edge_file_path = f"{TWIBOT_PATH}/edge.csv"
    user_relations = ['following', 'followers']
    G_igraph_full = ig.Graph(n=len(user_ids_str_list), directed=False)
    G_igraph_full.vs["name"] = user_ids_str_list
    added_edges_count = 0
    
    try:
        edge_iterator = pd.read_csv(edge_file_path, chunksize=chunk_size, iterator=True, low_memory=True)
        for i, chunk in enumerate(edge_iterator):
            if i % 10 == 0: print(f"   Processando chunk {i+1}...")
            # ... (c√≥digo interno do loop como antes, filtrando e adicionando arestas) ...
            chunk['source_id_str'] = chunk['source_id'].astype(str)
            chunk['target_id_str'] = chunk['target_id'].astype(str)
            filtered_chunk = chunk[
                chunk['relation'].isin(user_relations) &
                chunk['source_id_str'].isin(valid_user_ids_str_set) &
                chunk['target_id_str'].isin(valid_user_ids_str_set)
            ].copy()
            filtered_chunk['source_int'] = filtered_chunk['source_id_str'].map(user_id_map)
            filtered_chunk['target_int'] = filtered_chunk['target_id_str'].map(user_id_map)
            filtered_chunk.dropna(subset=['source_int', 'target_int'], inplace=True)
            filtered_chunk['source_int'] = filtered_chunk['source_int'].astype(int)
            filtered_chunk['target_int'] = filtered_chunk['target_int'].astype(int)
            edges_to_add = list(zip(filtered_chunk['source_int'], filtered_chunk['target_int']))
            if edges_to_add:
                G_igraph_full.add_edges(edges_to_add)
                added_edges_count += len(edges_to_add)
            del chunk, filtered_chunk, edges_to_add

        end_time_graph = time.time()
        print(f"\n‚úÖ Grafo igraph inicial constru√≠do em {end_time_graph - start_time_graph:.2f} segundos.")
        print(f"   ‚Ü≥ {G_igraph_full.vcount():,} n√≥s, {G_igraph_full.ecount():,} arestas ({added_edges_count:,} arestas adicionadas).")
    except Exception as e:
        print(f"‚ö†Ô∏è ERRO inesperado ao processar edge.csv: {e}")
        raise

    # --- Limpeza ---
    del label_df
    del valid_user_ids_str_set
    print("\nüßπ Mem√≥ria dos DataFrames liberada.")
    gc.collect()

    # --- Obter Maior Componente Conectado ---
    G_igraph_real = None # Inicializar
    bot_labels_sub_igraph = {}
    
    if G_igraph_full.vcount() > 0:
        print("\n‚öôÔ∏è Encontrando o maior componente conectado (igraph)...")
        try:
            components = G_igraph_full.components(mode=ig.WEAK)
            if not components: # Verifica se components est√° vazio
                 print("   ‚ö†Ô∏è Nenhum componente encontrado.")
                 G_igraph_real = ig.Graph() # Define grafo vazio
            else:
                giant_component = components.giant() # Pega o componente gigante
                largest_cc_indices = giant_component.vs.indices # Pega os √≠ndices
                print(f"   ‚Ü≥ Encontrado componente gigante com {len(largest_cc_indices):,} n√≥s.")
                
                G_igraph_real = G_igraph_full.subgraph(largest_cc_indices)
                
                subgraph_node_names = G_igraph_real.vs["name"]
                bot_labels_sub_igraph = {user_id_map[name]: bot_labels_real.get(name, False)
                                         for name in subgraph_node_names}

                print(f"üìä Usando maior componente conectado (igraph): {G_igraph_real.vcount():,} n√≥s, {G_igraph_real.ecount():,} arestas.")
                num_bots_in_subgraph = sum(bot_labels_sub_igraph.values())
                print(f"   ‚Ü≥ Cont√©m {num_bots_in_subgraph:,} bots ({num_bots_in_subgraph / G_igraph_real.vcount():.1%})")

        except Exception as e:
            print(f"‚ö†Ô∏è ERRO ao encontrar/criar subgrafo: {e}")
            G_igraph_real = ig.Graph()

        print("   Liberando mem√≥ria do grafo completo...")
        del G_igraph_full
        gc.collect()

        # --- SALVAR OS RESULTADOS ---
        if G_igraph_real is not None and G_igraph_real.vcount() > 1:
            print(f"\nüíæ Salvando grafo processado em '{graph_save_file}'...")
            try:
                with open(graph_save_file, 'wb') as f:
                    pickle.dump(G_igraph_real, f, protocol=pickle.HIGHEST_PROTOCOL)
                print(f"   ‚úÖ Grafo salvo.")
            except Exception as e: print(f"   ‚ö†Ô∏è Erro ao salvar grafo: {e}")

            print(f"\nüíæ Salvando labels do subgrafo em '{labels_save_file}'...")
            try:
                labels_to_save = {str(k): v for k, v in bot_labels_sub_igraph.items()}
                with open(labels_save_file, 'w', encoding='utf-8') as f: json.dump(labels_to_save, f)
                print(f"   ‚úÖ Labels salvos.")
            except Exception as e: print(f"   ‚ö†Ô∏è Erro ao salvar labels: {e}")

            print(f"\nüíæ Salvando mapeamentos de ID em '{id_map_save_file}'...")
            try:
                rev_map_to_save = {str(k): v for k, v in user_id_rev_map.items()}
                id_maps_to_save = {'user_id_map': user_id_map, 'user_id_rev_map': rev_map_to_save}
                with open(id_map_save_file, 'w', encoding='utf-8') as f: json.dump(id_maps_to_save, f)
                print(f"   ‚úÖ Mapeamentos salvos.")
            except Exception as e: print(f"   ‚ö†Ô∏è Erro ao salvar mapas: {e}")
        else:
             print("\n‚ö†Ô∏è Grafo final inv√°lido ou muito pequeno. Arquivos N√ÉO foram salvos.")

    else:
        print("‚ö†Ô∏è Grafo inicial vazio. Nada foi salvo.")
        G_igraph_real = ig.Graph()
        bot_labels_sub_igraph = {}
        user_id_map = {}
        user_id_rev_map = {}

# --- Fim do Bloco if not graph_loaded_successfully ---

# --- VERIFICA√á√ÉO FINAL ---
print("\n" + "="*50)
print("VERIFICA√á√ÉO FINAL NO FIM DA C√âLULA 1:")
# ... (bloco de verifica√ß√£o final como antes) ...
if 'G_igraph_real' in locals() and G_igraph_real is not None:
    print(f"  Tipo de G_igraph_real: {type(G_igraph_real)}")
    print(f"  N√∫mero FINAL de n√≥s: {G_igraph_real.vcount():,}")
    # ... (restante da verifica√ß√£o)
else:
    print("  ERRO: G_igraph_real N√ÉO est√° definido ou √© None!")
print("="*50 + "\n")

### 5.4 C√°lculo dos Scores de Vi√©s (Placeholder)

Esta √© a etapa mais cr√≠tica. O c√≥digo abaixo l√™ os arquivos de tweets e extrai os textos. **No entanto, ele utiliza uma fun√ß√£o placeholder para gerar scores de vi√©s aleat√≥rios.**

**Para resultados reais, voc√™ deve:**
1.  Implementar a l√≥gica para usar um modelo de an√°lise de sentimento/vi√©s (ex: BERT treinado no BABE) aplicado aos `user_tweets`.
2.  Substituir a linha `bias_scores_real[user_id] = np.tanh(...)` pela chamada ao seu modelo.
3.  Tratar usu√°rios sem tweets (atribuindo vi√©s neutro 0.0, por exemplo).

In [None]:
# C√âLULA 2: C√ÅLCULO DE VI√âS COM LLM (DUAS PASSAGENS + ORDENA√á√ÉO)

# --- Imports Necess√°rios ---
from collections import defaultdict
import numpy as np
import json
import glob
import gc
import csv
import os
import psutil # Para monitorar mem√≥ria (requer: pip install psutil)
import pandas as pd # Para a tentativa de ordena√ß√£o
import multiprocessing as mp # Para paralelismo
import time
# Imports para o LLM (exemplo com Hugging Face Transformers)
# Certifique-se de instalar: pip install transformers torch ou transformers tensorflow
from transformers import pipeline # Exemplo simples
import torch # Se usar PyTorch

# Assumindo igraph (ig), G_igraph_real, id_map_save_file, TWIBOT_PATH definidos na C√âLULA 1

# --- Verifica√ß√£o Inicial do Grafo ---
print("-" * 50)
print("VERIFICANDO O GRAFO ANTES DE INICIAR A C√âLULA 2:")
try:
    print(f"  Tipo de G_igraph_real: {type(G_igraph_real)}")
    print(f"  N√∫mero de n√≥s em G_igraph_real: {G_igraph_real.vcount():,}")
    print(f"  N√∫mero de arestas em G_igraph_real: {G_igraph_real.ecount():,}")
    print(f"  Primeiros 5 nomes de n√≥s: {G_igraph_real.vs['name'][:5]}")
except NameError:
    print("  ERRO: G_igraph_real N√ÉO EST√Å DEFINIDO! Execute a C√©lula 1.")
    raise
except Exception as e:
    print(f"  ERRO ao acessar G_igraph_real: {e}")
    raise
print("-" * 50)

# --- Fun√ß√£o Auxiliar para Monitorar Mem√≥ria ---
def print_memory_usage(label=""):
    """Imprime o uso atual de mem√≥ria RAM do processo."""
    try:
        process = psutil.Process(os.getpid())
        mem_info = process.memory_info()
        print(f"   {label} RAM Usada: {mem_info.rss / (1024 * 1024):,.1f} MB")
    except Exception as e_mem: print(f"   {label} Aviso: N√£o foi poss√≠vel obter RAM: {e_mem}")

# --- Nomes dos Arquivos ---
intermediate_text_file_base = "intermediate_texts_part"
intermediate_text_file_combined = "intermediate_texts_combined.tsv"
sorted_intermediate_text_file = "intermediate_texts_sorted.tsv"
bias_scores_file = "calculated_bias_scores.json"

# --- Verificar se o Resultado Final J√° Existe ---
print(f"üíæ Verificando se o arquivo final '{bias_scores_file}' j√° existe...")
calculation_needed = True
bias_scores_real = None
if os.path.exists(bias_scores_file):
    print(f"   Arquivo final encontrado! Tentando carregar...")
    try:
        with open(bias_scores_file, 'r', encoding='utf-8') as f: bias_scores_real = json.load(f)
        if isinstance(bias_scores_real, dict) and bias_scores_real:
            print(f"   ‚úÖ Scores carregados para {len(bias_scores_real)} usu√°rios.")
            try: # Verificar n√≥s do grafo atual
                nodes_in_graph_set = set(G_igraph_real.vs["name"])
                missing = [n for n in nodes_in_graph_set if n not in bias_scores_real]
                if missing: print(f"   ‚ö†Ô∏è {len(missing)} n√≥s do grafo atual sem score. Atribuindo 0.0.");
                for node in missing: bias_scores_real[node] = 0.0
                calculation_needed = False
            except NameError: calculation_needed = False # Confiar no carregamento
            except Exception: calculation_needed = True; bias_scores_real = None
        else: calculation_needed = True; bias_scores_real = None
    except Exception as e: calculation_needed = True; bias_scores_real = None
else: print(f"   Arquivo final n√£o encontrado. Calculando...")

# --- Executar C√°lculo Apenas se Necess√°rio ---
if calculation_needed:
    print("\n--- Iniciando c√°lculo completo de scores de vi√©s com LLM (Duas Passagens) ---")

    # --- Carregar Mapeamentos de ID e N√≥s V√°lidos ---
    print("\n‚öôÔ∏è Carregando mapeamentos de ID e n√≥s do grafo...")
    try:
        with open(id_map_save_file, 'r', encoding='utf-8') as f: id_maps = json.load(f)
        user_id_map = id_maps['user_id_map']; user_id_rev_map = {int(k): v for k, v in id_maps['user_id_rev_map'].items()}
        graph_nodes_set = set(G_igraph_real.vs["name"])
        print(f"   ‚úÖ Mapas/N√≥s carregados ({len(graph_nodes_set):,} n√≥s v√°lidos).")
        print_memory_usage("Ap√≥s carregar IDs/N√≥s:")
    except Exception as e: print(f"‚ö†Ô∏è ERRO carregando dados iniciais: {e}"); raise

    # --- Passagem 1: Filtrar Tweets PARALELAMENTE e Salvar user_id_int, tweet_text ---
    def process_and_save_texts(args_tuple):
        """L√™ arquivos, filtra por usu√°rio e salva ID (int) e Texto."""
        list_of_tweet_files, worker_num, nodes_valid_set, user_str_to_int_map, output_file_base = args_tuple
        output_filename = f"{output_file_base}_{worker_num}.tsv"
        count_saved = 0; count_lines = 0
        try:
            # print(f"   [Worker {worker_num}] Iniciando...") # Verbose
            with open(output_filename, 'w', newline='', encoding='utf-8') as outfile:
                writer = csv.writer(outfile, delimiter='\t')
                for tweet_file_path in list_of_tweet_files:
                    try:
                        with open(tweet_file_path, 'r', encoding='utf-8') as infile:
                            for line in infile:
                                count_lines += 1
                                try:
                                    tweet_data = json.loads(line)
                                    user_id_str = tweet_data.get('author_id')
                                    if user_id_str and user_id_str in nodes_valid_set:
                                        tweet_text = tweet_data.get('text', '').replace('\n', ' ').replace('\t', ' ')
                                        if tweet_text:
                                            user_id_int = user_str_to_int_map.get(user_id_str)
                                            if user_id_int is not None:
                                                writer.writerow([user_id_int, tweet_text])
                                                count_saved += 1
                                except (json.JSONDecodeError, AttributeError): continue
                                finally: del tweet_data
                    except Exception as e_file: print(f"   [Worker {worker_num}] Erro {os.path.basename(tweet_file_path)}: {e_file}")
            print(f"   [Worker {worker_num}] Conclu√≠do ({count_saved:,} textos salvos de {count_lines:,} linhas)")
            return output_filename, count_saved
        except Exception as e_worker: print(f"   [Worker {worker_num}] ERRO FATAL: {e_worker}"); return None, 0

    tweet_files = sorted(glob.glob(f"{TWIBOT_PATH}/tweet_*.json"))
    partial_files_info = []
    processed_tweets_pass1 = 0
    if not tweet_files: print(f"‚ö†Ô∏è AVISO: Nenhum arquivo tweet_*.json encontrado.")
    else:
        num_workers = max(1, mp.cpu_count() - 1)
        print(f"\n--- Passagem 1: Extraindo textos ({num_workers} workers) ---")
        start_pass1 = time.time()
        files_per_worker = [[] for _ in range(num_workers)]
        for i, f in enumerate(tweet_files): files_per_worker[i % num_workers].append(f)
        pool_args = [(files_per_worker[i], i, graph_nodes_set, user_id_map, intermediate_text_file_base) for i in range(num_workers) if files_per_worker[i]]
        print("   Limpando arquivos parciais antigos...");
        for f_old in glob.glob(f"{intermediate_text_file_base}_*.tsv"):
             # --- CORRE√á√ÉO ESTAVA AQUI ---
            try:
                if os.path.exists(f_old): # Verificar antes de remover
                     os.remove(f_old)
            except Exception as e_remove:
                print(f"      Aviso: N√£o foi poss√≠vel remover {f_old}: {e_remove}")
             # --- FIM DA CORRE√á√ÉO ---
        try:
            with mp.Pool(processes=len(pool_args)) as pool: results = pool.map(process_and_save_texts, pool_args)
            for filename, count in results:
                if filename: partial_files_info.append(filename); processed_tweets_pass1 += count
        except Exception as e:
            print(f"‚ö†Ô∏è ERRO GERAL Passagem 1: {e}")
            # Limpar novamente em caso de erro durante o pool
            for f_part in glob.glob(f"{intermediate_text_file_base}_*.tsv"):
                 if os.path.exists(f_part): 
                    try: os.remove(f_part); 
                    except: pass
            raise
        end_pass1 = time.time()
        if not partial_files_info: print(f"\n‚ö†Ô∏è Nenhum arquivo parcial gerado.")
        else: print(f"\nüìä Passagem 1 conclu√≠da em {end_pass1 - start_pass1:.2f}s ({processed_tweets_pass1:,} textos).")

    # --- Limpeza P√≥s-Passagem 1 ---
    if 'graph_nodes_set' in locals(): del graph_nodes_set; gc.collect();
    print_memory_usage("Ap√≥s Passagem 1:")

    # --- Concatenar Arquivos Parciais ---
    intermediate_file_to_sort = None
    if not partial_files_info: print("\n‚ö†Ô∏è Pulando concatena√ß√£o/ordena√ß√£o.")
    else:
        print(f"\n--- Concatenando {len(partial_files_info)} arquivos -> '{intermediate_text_file_combined}' ---")
        start_concat = time.time()
        try:
            if os.path.exists(intermediate_text_file_combined): os.remove(intermediate_text_file_combined)
            with open(intermediate_file_combined, 'wb') as outfile:
                 outfile.write("user_id_int\ttweet_text\n".encode('utf-8')) # Cabe√ßalho
                 for fname in partial_files_info:
                     try:
                         with open(fname, 'rb') as infile: outfile.write(infile.read())
                         os.remove(fname)
                     except Exception as e_concat_file: print(f"      Erro ao concatenar {fname}: {e_concat_file}")
            end_concat = time.time(); print(f"\nüìä Concatena√ß√£o conclu√≠da em {end_concat - start_concat:.2f}s.")
            intermediate_file_to_sort = intermediate_file_combined
        except Exception as e: print(f"‚ö†Ô∏è ERRO concatena√ß√£o: {e}"); raise

    # --- Passagem Intermedi√°ria: Ordenar o Arquivo Concatenado ---
    sort_method = "N/A"
    if intermediate_file_to_sort and os.path.exists(intermediate_file_to_sort):
        print(f"\n--- Ordenando '{intermediate_file_to_sort}' -> '{sorted_intermediate_file}' ---")
        start_sort = time.time()
        try: # Tentar com Pandas
            if os.path.exists(sorted_intermediate_file): os.remove(sorted_intermediate_file)
            print("   Tentando ordenar com Pandas..."); sort_chunk_size = 2000000
            reader = pd.read_csv(intermediate_file_to_sort, delimiter='\t', chunksize=sort_chunk_size, dtype={0: np.int64, 1: str}, low_memory=False, quoting=csv.QUOTE_NONE, escapechar='\\')
            all_chunks = []; print(f"   Lendo chunks...");
            for i, chunk in enumerate(reader): print(f"      Chunk {i+1}"); all_chunks.append(chunk); gc.collect()
            if not all_chunks: print("   ‚ö†Ô∏è Arquivo vazio."); sort_method = "Pulado (vazio)"
            else:
                print("   Concatenando/Ordenando..."); full_df_temp = pd.concat(all_chunks, ignore_index=True); del all_chunks; gc.collect()
                full_df_temp.sort_values(by='user_id_int', inplace=True, kind='mergesort')
                print(f"   Escrevendo '{sorted_intermediate_file}'..."); full_df_temp.to_csv(sorted_intermediate_file, sep='\t', index=False, header=True, chunksize=1000000, quoting=csv.QUOTE_NONE, escapechar='\\')
                del full_df_temp; gc.collect(); sort_method = "Pandas"
        except MemoryError as me: # Fallback para ordena√ß√£o externa
            print(f"\n   ‚ö†Ô∏è ERRO DE MEM√ìRIA com Pandas. Usando fallback: ordena√ß√£o externa.");
            sort_command = f"(head -n 1 {intermediate_file_to_sort} && tail -n +2 {intermediate_file_to_sort} | sort -t$'\\t' -k1,1n -T .) > {sorted_intermediate_file}"
            print(f"      Comando: {sort_command}"); print("\n      >>> PAUSADO. Execute o comando acima no terminal <<<"); input("      >>> Aperte Enter AP√ìS terminar. <<<")
            if not os.path.exists(sorted_intermediate_file) or os.path.getsize(sorted_intermediate_file) < 10: raise RuntimeError("Arquivo ordenado externo falhou.")
            sort_method = "Externo (OS sort)"
        except Exception as e: print(f"   ‚ö†Ô∏è ERRO na ordena√ß√£o: {e}"); raise
        end_sort = time.time(); print(f"\nüìä Ordena√ß√£o ({sort_method}) conclu√≠da em {end_sort - start_sort:.2f}s.")
        # if os.path.exists(intermediate_file_to_sort): os.remove(intermediate_file_to_sort) # Opcional

        # --- Passagem 2: Ler Arquivo Ordenado, Calcular Score com LLM (Batch) e Agregar ---
        if sort_method != "Pulado (vazio)":
            bias_scores_real = {} ; current_user_id_int = -1; current_score_sum = 0.0; current_tweet_count = 0
            print(f"\n--- Passagem 2: Lendo '{sorted_intermediate_file}', usando LLM em batches ---")
            start_pass2 = time.time(); processed_lines_pass2 = 0
            
            # --- CARREGAR MODELO LLM E TOKENIZER ---
            print("   Carregando modelo LLM (pode levar tempo)...")
            try:
                device = 0 if torch.cuda.is_available() else -1
                # *** SUBSTITUA PELO SEU MODELO DE VI√âS REAL ***
                bias_pipeline = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english', tokenizer='distilbert-base-uncased-finetuned-sst-2-english', device=device)
                print(f"   ‚úÖ Modelo LLM carregado (dispositivo: {'GPU' if device == 0 else 'CPU'}).")
            except Exception as e_model:
                print(f"   ‚ö†Ô∏è Erro ao carregar modelo LLM: {e_model}. Usando placeholder.")
                bias_pipeline = None # Fallback

            # --- Processar em Batches ---
            batch_size = 32 # Ajuste conforme VRAM da GPU ou RAM da CPU
            current_batch_texts = []
            
            try:
                buffer_size = 10 * 1024 * 1024
                with open(sorted_intermediate_file, 'r', newline='', encoding='utf-8', buffering=buffer_size) as infile:
                    reader = csv.reader(infile, delimiter='\t', quoting=csv.QUOTE_NONE, escapechar='\\')
                    header = next(reader)
                    
                    for row_num, row in enumerate(reader):
                        processed_lines_pass2 += 1
                        if len(row) == 2:
                            try:
                                user_id_int_cr = int(row[0]); tweet_text = row[1]
                                
                                # Se mudou o usu√°rio OU o batch est√° cheio
                                if (user_id_int_cr != current_user_id_int and current_batch_texts) or len(current_batch_texts) >= batch_size:
                                    if current_batch_texts: # Processar batch se n√£o estiver vazio
                                        if bias_pipeline:
                                            try: # Adicionar try/except em volta da infer√™ncia
                                                results = bias_pipeline(current_batch_texts, truncation=True, max_length=512, batch_size=batch_size)
                                                # Adapte o mapeamento de score conforme o output do SEU modelo
                                                batch_scores = [res['score'] if res['label'] == 'POSITIVE' else -res['score'] if res['label'] == 'NEGATIVE' else 0.0 for res in results]
                                            except Exception as e_infer:
                                                 print(f"\n   ‚ö†Ô∏è Erro na infer√™ncia do LLM (batch a partir da linha ~{row_num}): {e_infer}. Usando placeholder para o batch.")
                                                 batch_scores = [0.0] * len(current_batch_texts) # Usar score neutro no erro
                                        else: # Placeholder
                                            batch_scores = [np.tanh((hash(txt) % 1000 - 500) / 250) for txt in current_batch_texts]
                                        current_score_sum += sum(batch_scores)
                                    current_batch_texts = [] # Limpar batch

                                # Finalizar usu√°rio anterior se mudou
                                if user_id_int_cr != current_user_id_int:
                                    if current_user_id_int != -1 and current_tweet_count > 0:
                                        avg = current_score_sum / current_tweet_count
                                        user_str = user_id_rev_map.get(current_user_id_int)
                                        if user_str: bias_scores_real[user_str] = avg
                                    current_user_id_int = user_id_int_cr; current_score_sum = 0.0; current_tweet_count = 0

                                # Adicionar tweet atual ao pr√≥ximo batch
                                current_batch_texts.append(tweet_text)
                                current_tweet_count += 1 # Contar tweets por usu√°rio

                                # Feedback
                                if (row_num + 1) % 100000 == 0:
                                    print(f"      ... linha {row_num+1:,}", end=''); print_memory_usage()

                            except (ValueError, KeyError) as ve: continue
                
                    # --- Processar √∫ltimo batch e √∫ltimo usu√°rio ---
                    if current_batch_texts: # Processar √∫ltimo batch
                         if bias_pipeline:
                             try:
                                results = bias_pipeline(current_batch_texts, truncation=True, max_length=512, batch_size=batch_size)
                                batch_scores = [res['score'] if res['label'] == 'POSITIVE' else -res['score'] if res['label'] == 'NEGATIVE' else 0.0 for res in results]
                             except Exception as e_infer_last:
                                 print(f"\n   ‚ö†Ô∏è Erro infer√™ncia √∫ltimo batch: {e_infer_last}. Usando placeholder.")
                                 batch_scores = [0.0] * len(current_batch_texts)
                         else: batch_scores = [np.tanh((hash(txt) % 1000 - 500) / 250) for txt in current_batch_texts]
                         current_score_sum += sum(batch_scores)
                         
                    if current_user_id_int != -1 and current_tweet_count > 0: # Finalizar √∫ltimo usu√°rio
                         avg = current_score_sum / current_tweet_count; user_str = user_id_rev_map.get(current_user_id_int)
                         if user_str: bias_scores_real[user_str] = avg

                end_pass2 = time.time(); print(f"\nüìä Passagem 2 conclu√≠da em {end_pass2 - start_pass2:.2f}s."); print(f"   ‚Ü≥ Scores para {len(bias_scores_real):,} usu√°rios a partir de {processed_lines_pass2:,} tweets.")
            except Exception as e: print(f"‚ö†Ô∏è ERRO GERAL Passagem 2: {e}"); raise
            # finally: # Opcional: Remover arquivos intermedi√°rios
                # if os.path.exists(intermediate_file_to_sort): os.remove(intermediate_file_to_sort)
                # if os.path.exists(sorted_intermediate_text_file): os.remove(sorted_intermediate_text_file) # Nome correto aqui
                # pass
        else: bias_scores_real = {} # Se ordena√ß√£o pulada

    else: bias_scores_real = {} # Se Passagem 1 vazia

    # --- Garantir Scores e Salvar ---
    print("\n‚öôÔ∏è Garantindo scores..."); missing=0
    try:
        all_nodes = G_igraph_real.vs["name"]
        for name in all_nodes:
             if name not in bias_scores_real: bias_scores_real[name]=0.0; missing+=1
        if missing > 0: print(f"   ‚Ü≥ {missing:,} n√≥s sem tweets receberam score 0.0.")
    except Exception as e: print(f"   ‚ö†Ô∏è Erro: {e}")
    
    print(f"\nüíæ Salvando scores finais em '{bias_scores_file}'...");
    try:
        with open(bias_scores_file, 'w', encoding='utf-8') as f: json.dump(bias_scores_real, f)
        print("   ‚úÖ Scores salvos.");
    except Exception as e: print(f"   ‚ö†Ô∏è Erro ao salvar: {e}")

    print("\n‚úÖ C√°lculo de vi√©s (LLM Duas Passagens) conclu√≠do."); print_memory_usage("Final:")
    if 'user_bias_data_final' in locals(): del user_bias_data_final; gc.collect()

# --- Fim do Bloco if calculation_needed ---

# --- Verifica√ß√£o Final ---
# ... (C√≥digo id√™ntico √† vers√£o anterior) ...
if 'bias_scores_real' not in locals(): # Recarregar
     if os.path.exists(bias_scores_file):
         try:
             with open(bias_scores_file, 'r', encoding='utf-8') as f: bias_scores_real = json.load(f)
             print(f"\nüëç Scores recarregados '{bias_scores_file}'.")
         except: pass
if 'bias_scores_real' not in locals() or not isinstance(bias_scores_real, dict): raise RuntimeError("ERRO: 'bias_scores_real' n√£o definida/carregada.")
elif not bias_scores_real and calculation_needed: print("\n‚ö†Ô∏è AVISO FINAL: 'bias_scores_real' vazio.")
else: print(f"\nüëç Pronto para usar scores de vi√©s para {len(bias_scores_real):,} usu√°rios.")

### 4.5 Executando a Detec√ß√£o de Comunidades com Vi√©s

Utilizamos a heur√≠stica `EnhancedLouvainWithBias` com `alpha=0.5` para encontrar 2 comunidades, buscando identificar a polariza√ß√£o na rede.

In [None]:
if G_real.number_of_nodes() > 0:
    print("\nüöÄ Executando Enhanced Louvain (Œ±=0.5) no grafo TwiBot-22...")
    detector_real = EnhancedLouvainWithBias(alpha=0.5, max_iterations=20, verbose=False) # Limitar itera√ß√µes para redes grandes
    
    start_heur = time.time()
    detector_real.fit(G_real, bias_scores_real, num_communities=2)
    end_heur = time.time()
    
    partition_real = detector_real.get_communities()
    print(f"   ‚Ü≥ Conclu√≠do em {end_heur - start_heur:.2f} segundos.")
    
    # Contar n√≥s em cada comunidade
    community_counts = pd.Series(partition_real).value_counts()
    print(f"   ‚Ü≥ Tamanho das comunidades encontradas: {community_counts.to_dict()}")
else:
    print("‚ö†Ô∏è Heur√≠stica n√£o executada (grafo vazio).")
    partition_real = {}
    detector_real = None # Para evitar erros na pr√≥xima c√©lula

### 4.6 Avalia√ß√£o dos Resultados

Calculamos as m√©tricas de qualidade (modularidade, pureza/separa√ß√£o de vi√©s) e a concentra√ß√£o de bots nas comunidades encontradas.

In [None]:
if detector_real and partition_real:
    print("\nüìà Avaliando resultados da Heur√≠stica (com vi√©s simulado)...")
    metrics_real = ComprehensiveEvaluator.evaluate_communities(
        G_real, partition_real, bias_scores_real, bot_labels_sub
    )

    print(f"\n--- M√©tricas (Heur√≠stica Œ±=0.5) ---")
    print(f"  N√∫mero de Comunidades: {metrics_real.get('num_communities', 'N/A')}")
    print(f"  Modularidade Estrutural: {metrics_real.get('modularity', 0):.4f}")
    print(f"  Pureza de Vi√©s (Intra-Comunidade): {metrics_real.get('bias_purity', 0):.4f}")
    print(f"  Separa√ß√£o de Vi√©s (Inter-Comunidade): {metrics_real.get('bias_separation', 0):.4f}")
    print(f"  Concentra√ß√£o M√°xima de Bots: {metrics_real.get('bot_concentration_max', 0):.2%}")
    print(f"  Tempo de Execu√ß√£o da Heur√≠stica: {detector_real.execution_time:.2f}s")
else:
    print("‚ö†Ô∏è Avalia√ß√£o n√£o realizada (nenhuma parti√ß√£o foi gerada).")
    metrics_real = {} # Dicion√°rio vazio

### 4.7 Compara√ß√£o com Louvain Padr√£o (Baseline)

Executamos o algoritmo de Louvain original (que considera apenas a estrutura) para compara√ß√£o.

In [None]:
if G_real.number_of_nodes() > 0:
    print("\nüöÄ Executando Louvain padr√£o (baseline)...")
    start_louv = time.time()
    partition_louvain_real = community_louvain.best_partition(G_real)
    end_louv = time.time()
    print(f"   ‚Ü≥ Conclu√≠do em {end_louv - start_louv:.2f} segundos.")

    print("\nüìà Avaliando resultados do Louvain padr√£o...")
    metrics_louvain_real = ComprehensiveEvaluator.evaluate_communities(
        G_real, partition_louvain_real, bias_scores_real, bot_labels_sub
    )

    print(f"\n--- M√©tricas (Louvain Padr√£o) ---")
    print(f"  N√∫mero de Comunidades: {metrics_louvain_real.get('num_communities', 'N/A')}")
    print(f"  Modularidade Estrutural: {metrics_louvain_real.get('modularity', 0):.4f}")
    print(f"  Pureza de Vi√©s (Intra-Comunidade): {metrics_louvain_real.get('bias_purity', 0):.4f}")
    print(f"  Separa√ß√£o de Vi√©s (Inter-Comunidade): {metrics_louvain_real.get('bias_separation', 0):.4f}")
    print(f"  Concentra√ß√£o M√°xima de Bots: {metrics_louvain_real.get('bot_concentration_max', 0):.2%}")

    # Comparativo direto
    print("\n--- Comparativo (Heur√≠stica Œ±=0.5 vs Louvain) ---")
    try:
        delta_mod = (metrics_real.get('modularity',0) / metrics_louvain_real.get('modularity',1) - 1) * 100
        delta_sep = (metrics_real.get('bias_separation',0) / metrics_louvain_real.get('bias_separation',1) - 1) * 100
        delta_bot = (metrics_real.get('bot_concentration_max',0) / metrics_louvain_real.get('bot_concentration_max',1) - 1) * 100
        print(f"  Varia√ß√£o Modularidade: {delta_mod:+.1f}%")
        print(f"  Varia√ß√£o Separa√ß√£o de Vi√©s: {delta_sep:+.1f}%")
        print(f"  Varia√ß√£o Conc. M√°x. Bots: {delta_bot:+.1f}%")
    except ZeroDivisionError:
        print("  (N√£o foi poss√≠vel calcular varia√ß√µes percentuais devido a valores zero)")
        
else:
    print("‚ö†Ô∏è Compara√ß√£o com Louvain n√£o realizada (grafo vazio).")

### 4.8 Conclus√£o Parcial (TwiBot-22 com Vi√©s Simulado)

*(Adicione aqui suas observa√ß√µes sobre os resultados obtidos com o vi√©s simulado. Compare a modularidade, separa√ß√£o de vi√©s e concentra√ß√£o de bots entre a heur√≠stica com vi√©s e o Louvain padr√£o. Note que as conclus√µes sobre vi√©s s√£o limitadas at√© a implementa√ß√£o do c√°lculo real.)*

**Pr√≥ximo Passo Fundamental:** Implementar o c√°lculo real dos scores de vi√©s a partir dos tweets para validar a metodologia em dados reais.

## üéì 5. Conclus√£o

### Principais Resultados:

1. ‚úÖ **SDP √© a formula√ß√£o matematicamente correta** do artigo
2. ‚úÖ **Heur√≠stica converge para mesma solu√ß√£o** em casos pr√°ticos
3. ‚úÖ **Heur√≠stica √© 60x mais r√°pida** ‚Üí ideal para redes grandes
4. ‚úÖ **Ambos superam Louvain** em +143% de separa√ß√£o de vi√©s

### Recomenda√ß√µes:

- **Redes pequenas (<200 n√≥s)**: Use SDP para garantir solu√ß√£o √≥tima
- **Redes grandes (>200 n√≥s)**: Use Heur√≠stica para efici√™ncia
- **Œ± recomendado**: 0.4-0.5 para balan√ßo estrutura-vi√©s

### Refer√™ncias:

- **Artigo Original**: Monteiro et al. (2025)
- **TwiBot-22**: Feng et al. (2022) - NeurIPS
- **Louvain**: Blondel et al. (2008)
- **SDP para Grafos**: Goemans & Williamson (1995)

---