# Análise Comparativa de Grafos - Graph6

Este notebook realiza análise completa de grafos em formato Graph6 (.g6), calculando métricas de centralidade e conectividade, além de gerar visualizações e uma análise comparativa em DataFrame.

## Estrutura do Notebook
1. **Imports e configuração**
2. **Funções de carregamento**
3. **Funções de visualização**
4. **Funções de cálculo de métricas**
5. **Função principal de análise**
6. **Execução e resultados**

## 1. Imports e Configuração

Importação das bibliotecas necessárias para análise de grafos, visualização e manipulação de dados.

In [2]:
import os
import networkx as nx
import matplotlib.pyplot as plt
import statistics
import pandas as pd

## 2. Função de Carregamento de Grafos

Função para ler arquivos `.g6` que podem conter um ou mais grafos (um por linha).
Suporta formatos Graph6 e Sparse6.

In [3]:
def load_graphs_from_graph6_file(path):
    """
    Reads a .g6 file that can contain one or more graphs (one per line).
    Returns a list of NetworkX graphs.
    """
    graphs = []
    with open(path, "rb") as f:
        for raw in f:
            line = raw.strip()
            if not line:
                continue
            # Ignores optional ">>graph6<<" header if it appears
            if line == b">>graph6<<":
                continue
            # Graph6 typically starts without ":"; Sparse6 starts with ":".
            if line.startswith(b":"):
                # Sparse6 line
                G = nx.from_sparse6_bytes(line)
            else:
                # Graph6 line
                G = nx.from_graph6_bytes(line)
            graphs.append(G)
    return graphs

## 3. Funções de Visualização

Duas funções para visualizar grafos:
- **visualize_graph**: Desenha o grafo com nós e arestas
- **plot_adjacency_matrix**: Exibe a matriz de adjacência como heatmap

In [4]:
def visualize_graph(G, title="Graph Visualization", file_name="graph.png"):
    """Visualiza o grafo e salva como imagem."""
    plt.figure(figsize=(6, 6))
    pos = nx.spring_layout(G, seed=42)
    
    nx.draw(
        G, pos,
        with_labels=True,
        node_color="lightblue",
        node_size=800,
        font_size=10,
        font_weight="bold",
        edge_color="gray"
    )
    
    plt.title(title, fontsize=14)
    plt.savefig(file_name)   # save instead of show
    plt.close()              # close the figure so nothing opens

In [5]:
def plot_adjacency_matrix(G: nx.Graph, title: str = "Adjacency Matrix", file_name="adjacency_matrix.png"):
    """
    Exibe a matriz de adjacência do grafo como um heatmap.
    - Mostra rótulos de nós quando o grafo é pequeno (<= 20 nós).
    - Salva a imagem em arquivo.
    """
    # Define ordem estável de nós (tenta ordenar caso comparáveis)
    nodes = list(G.nodes())
    try:
        nodes = sorted(nodes)
    except Exception:
        pass

    A = nx.to_numpy_array(G, nodelist=nodes, dtype=float)

    plt.figure(figsize=(6, 6))
    im = plt.imshow(A, cmap="Blues", interpolation="nearest")
    plt.title(title, fontsize=14)
    plt.xlabel("Nodes")
    plt.ylabel("Nodes")

    n = len(nodes)
    if n <= 20:
        plt.xticks(range(n), nodes, rotation=90)
        plt.yticks(range(n), nodes)
    else:
        plt.xticks([])
        plt.yticks([])

    plt.colorbar(im, fraction=0.046, pad=0.04)
    plt.tight_layout()
    plt.savefig(file_name)   # save instead of show
    plt.close()              # close the figure so nothing opens

## 4. Funções de Cálculo de Métricas

### 4.1 Cálculo de Centralidades

Calcula diversas medidas de centralidade e conectividade, retornando estatísticas resumidas (média, mínimo, máximo, desvio padrão) para métricas por-nó.

In [6]:
def calculate_centralities(G: nx.Graph, measures: dict):
    """
    Apply a set of measures (centralities/connectivities) to a graph.
    measures: dict {label: function}
    Returns: dict with summary statistics
    """
    results = {}
    for label, func in measures.items():
        try:
            if label == "Algebraic Connectivity":
                result = func(G, method="lanczos")
            elif label == "Katz Centrality":
                result = func(G, alpha=0.005, beta=1.0, max_iter=2000)
            elif label == "PageRank":
                result = func(G, alpha=0.85)
            else:
                result = func(G)
            print(f"\n>>> {label}:")
            
            # If result is a dict (per-node values)
            if isinstance(result, dict):
                # Calculate summary statistics
                values = list(result.values())
                average_value = sum(values) / len(values)
                min_value = min(values)
                max_value = max(values)
                std_dev = statistics.pstdev(values)
                
                # Store results for DataFrame
                results[label] = {
                    "Average": average_value,
                    "Minimum": min_value,
                    "Maximum": max_value,
                    "Standard Deviation": std_dev
                }
                print(f"Average_{label}: {average_value:.4f}")
                print(f"Minimum_{label}: {min_value:.4f}")
                print(f"Maximum_{label}: {max_value:.4f}")
                print(f"Standard_Deviation_{label}: {std_dev:.4f}")
            else:
                # Single numeric value
                results[label] = {"Value": result}
                print(f"Value: {result}")
                
        except Exception as e:
            print(f"Error computing {label}: {e}")
            results[label] = {"Error": str(e)}
    return results

### 4.2 Avaliação de Conectividade Básica

Calcula métricas básicas de conectividade:
- Número de componentes conectados
- Tamanho do maior componente
- Diâmetro do grafo (do maior componente)

In [8]:
def evaluate_connectivity(G):
    """
    Evaluate basic connectivity measures of a graph.
    
    Returns:
    - dictionary with:
        * number of connected components
        * size of the largest connected component
        * diameter of the graph (largest component)
    """
    # Number of connected components
    num_components = nx.number_connected_components(G)
    
    # Largest connected component
    components = list(nx.connected_components(G))
    largest_component = max(components, key=len)
    largest_size = len(largest_component)
    
    # Subgraph induced by the largest component
    largest_subgraph = G.subgraph(largest_component)
    
    # Diameter of the largest component
    diameter = nx.diameter(largest_subgraph)
    
    result = {
        "Number of connected components": num_components,
        "Size of largest component": largest_size,
        "Graph diameter": diameter
    }
    print("\n>>> Connectivity Measures:")
    for key, value in result.items():
        print(f"{key}: {value}")
    return result

## 5. Função Principal de Análise

A função `main` processa todos os arquivos `.g6` de uma pasta e:
1. Carrega os grafos
2. Gera visualizações (grafo e matriz de adjacência)
3. Calcula centralidades
4. Avalia conectividade
5. **Cria um DataFrame comparativo** com todas as métricas
6. Exibe resumo final

In [9]:
def main(folder: str):
    """Função principal para processar todos os grafos e gerar análise comparativa."""
    # Dictionary to store the loaded graphs
    graphs = {}

    # Dicionários de medidas de centralidade
    dict_centralities = {
        # Standard centrality measures
        "Degree": nx.degree_centrality,
        "Closeness": nx.closeness_centrality,
        "Betweenness": nx.betweenness_centrality,
        "Eigenvector": nx.eigenvector_centrality,
        
        # Additional centrality measures
        "Katz Centrality": nx.katz_centrality,
        "PageRank": nx.pagerank,
        "Harmonic Centrality": nx.harmonic_centrality,
        "Current-flow Betweenness": nx.current_flow_betweenness_centrality
    }
    
    # Dicionários de medidas de conectividade
    dict_connectivity = {
        # Standard connectivity measures
        "Node Connectivity": nx.node_connectivity,
        "Edge Connectivity": nx.edge_connectivity,
        "Algebraic Connectivity": nx.algebraic_connectivity,

        # Additional connectivity measures
        "Average Node Connectivity": nx.average_node_connectivity,
        "Graph Density": nx.density,
        "Average Shortest Path Length": nx.average_shortest_path_length,
        "Global Clustering Coefficient": nx.transitivity,
        "Minimum Node Cut": nx.minimum_node_cut,
        "Minimum Edge Cut": nx.minimum_edge_cut
    }
    
    # DataFrame para armazenar resultados comparativos
    results_df = pd.DataFrame()

    # Iterates through all files in the folder
    for file in sorted(os.listdir(folder)):
        if file.endswith(".g6"):
            file_path = os.path.join(folder, file)
            print(f"\nProcessing file: {file}")
            try:
                graph_list = load_graphs_from_graph6_file(file_path)
                graphs[file] = graph_list
                
                for i, G in enumerate(graph_list):
                    print(f"\n{'='*80}")
                    print(f"GRAPH {i} from {file}")
                    print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
                    print(f"{'='*80}")
                    
                    #! A) Graph visualization
                    visualize_graph(G, title=f"{file} - graph {i}", 
                                  file_name=f"graph_{file}_graph_{i}.png")
                    
                    #! A2) Adjacency matrix
                    plot_adjacency_matrix(G, title=f"Adjacency Matrix - {file} - graph {i}", 
                                        file_name=f"adjacency_{file}_graph_{i}.png")
                    
                    #! B) Centrality calculations
                    results_centralities = calculate_centralities(G, dict_centralities)
                    
                    #! C) Connectivity evaluation
                    dict_evaluate = evaluate_connectivity(G)
                    
                    #! C2) Additional connectivity measures
                    results_connectivity = calculate_centralities(G, dict_connectivity)

                    # Construir linha do DataFrame
                    new_row = {
                        "File": file,
                        "Graph_Index": i,
                        "Num_Nodes": G.number_of_nodes(),
                        "Num_Edges": G.number_of_edges(),
                        **{f"Centrality_{k}_{stat}": v 
                           for k, stats in results_centralities.items() 
                           for stat, v in stats.items()},
                        **{f"Connectivity_{k}_{stat}": v 
                           for k, stats in results_connectivity.items() 
                           for stat, v in stats.items()},
                        **{k: v for k, v in dict_evaluate.items()}
                    }
                    results_df = pd.concat([results_df, pd.DataFrame([new_row])], ignore_index=True)

            except Exception as e:
                print(f"Failed to read {file}: {e}")

            print(f"\n>>> Finished processing {file}.")

    # Display the loaded graphs information
    print(f"\n{'='*80}")
    print("SUMMARY OF ALL GRAPHS")
    print(f"{'='*80}")
    for name, graph_list in graphs.items():
        total_nodes = sum(G.number_of_nodes() for G in graph_list)
        total_edges = sum(G.number_of_edges() for G in graph_list)
        print(f"{name}: {len(graph_list)} graph(s), {total_nodes} nodes, {total_edges} edges")
    
    return results_df

## 6. Execução da Análise

Execute a célula abaixo para processar todos os grafos e gerar o DataFrame comparativo.

In [10]:
# Executar análise
df_results = main(folder=os.path.join("final_work", "data_base"))


Processing file: graph_1098.g6

GRAPH 0 from graph_1098.g6
Nodes: 112, Edges: 560

>>> Degree:
Average_Degree: 0.0901
Minimum_Degree: 0.0901
Maximum_Degree: 0.0901
Standard_Deviation_Degree: 0.0000

>>> Closeness:
Average_Closeness: 0.3964
Minimum_Closeness: 0.3964
Maximum_Closeness: 0.3964
Standard_Deviation_Closeness: 0.0000

>>> Betweenness:
Average_Betweenness: 0.0138
Minimum_Betweenness: 0.0138
Maximum_Betweenness: 0.0138
Standard_Deviation_Betweenness: 0.0000

>>> Eigenvector:
Average_Eigenvector: 0.0945
Minimum_Eigenvector: 0.0945
Maximum_Eigenvector: 0.0945
Standard_Deviation_Eigenvector: 0.0000

>>> Katz Centrality:
Average_Katz Centrality: 0.0945
Minimum_Katz Centrality: 0.0945
Maximum_Katz Centrality: 0.0945
Standard_Deviation_Katz Centrality: 0.0000

>>> PageRank:
Average_PageRank: 0.0089
Minimum_PageRank: 0.0089
Maximum_PageRank: 0.0089
Standard_Deviation_PageRank: 0.0000

>>> Harmonic Centrality:
Average_Harmonic Centrality: 50.2000
Minimum_Harmonic Centrality: 50.2000
M

## 7. Visualização do DataFrame Comparativo

Visualize as primeiras linhas do DataFrame com todas as métricas calculadas.

In [11]:
# Exibir informações do DataFrame
print(f"Total de grafos analisados: {len(df_results)}")
print(f"Total de colunas (métricas): {len(df_results.columns)}")
print("\nPrimeiras linhas do DataFrame:")
df_results.head()

Total de grafos analisados: 4
Total de colunas (métricas): 48

Primeiras linhas do DataFrame:


Unnamed: 0,File,Graph_Index,Num_Nodes,Num_Edges,Centrality_Degree_Average,Centrality_Degree_Minimum,Centrality_Degree_Maximum,Centrality_Degree_Standard Deviation,Centrality_Closeness_Average,Centrality_Closeness_Minimum,...,Connectivity_Algebraic Connectivity_Value,Connectivity_Average Node Connectivity_Value,Connectivity_Graph Density_Value,Connectivity_Average Shortest Path Length_Value,Connectivity_Global Clustering Coefficient_Value,Connectivity_Minimum Node Cut_Value,Connectivity_Minimum Edge Cut_Value,Number of connected components,Size of largest component,Graph diameter
0,graph_1098.g6,0,112,560,0.09009,0.09009,0.09009,0.0,0.396429,0.396429,...,6.0,10.0,0.09009,2.522523,0.0,"{0, 100, 101, 70, 72, 106, 86, 91, 92, 95}","{(5, 111), (55, 111), (25, 111), (63, 111), (4...",1,112,5
1,graph_1210.g6,0,112,168,0.027027,0.027027,0.027027,0.0,0.214701,0.214286,...,0.438447,3.0,0.027027,4.657658,0.0,"{0, 107, 111}","{(104, 111), (3, 111), (101, 111)}",1,112,8
2,graph_1312.g6,0,120,720,0.10084,0.10084,0.10084,0.0,0.355224,0.355224,...,2.291796,12.0,0.10084,2.815126,0.454545,"{0, 3, 6, 7, 8, 107, 12, 110, 108, 114, 118, 93}","{(103, 99), (75, 99), (114, 99), (87, 99), (10...",1,120,5
3,graph_660_petersen_graph.g6,0,10,15,0.333333,0.333333,0.333333,0.0,0.6,0.6,...,2.0,3.0,0.333333,1.666667,0.0,"{0, 8, 7}","{(9, 5), (7, 5), (1, 5)}",1,10,2


## 8. Análise Exploratória das Métricas

### 8.1 Estatísticas Descritivas

In [12]:
# Estatísticas descritivas de métricas numéricas
df_results.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Graph_Index,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Num_Nodes,4.0,88.5,52.46904,10.0,86.5,112.0,114.0,120.0
Num_Edges,4.0,365.75,329.3209,15.0,129.75,364.0,600.0,720.0
Centrality_Degree_Average,4.0,0.1378227,0.1343456,0.02702703,0.07432432,0.09546521,0.1589636,0.3333333
Centrality_Degree_Minimum,4.0,0.1378227,0.1343456,0.02702703,0.07432432,0.09546521,0.1589636,0.3333333
Centrality_Degree_Maximum,4.0,0.1378227,0.1343456,0.02702703,0.07432432,0.09546521,0.1589636,0.3333333
Centrality_Degree_Standard Deviation,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Centrality_Closeness_Average,4.0,0.3915884,0.1592381,0.214701,0.3200932,0.3758262,0.4473214,0.6
Centrality_Closeness_Minimum,4.0,0.3914845,0.159392,0.2142857,0.3199893,0.3758262,0.4473214,0.6
Centrality_Closeness_Maximum,4.0,0.3916922,0.1590844,0.2151163,0.320197,0.3758262,0.4473214,0.6


### 8.2 Comparação de Densidade entre Grafos

In [13]:
# Comparar densidade dos grafos
if 'Connectivity_Graph Density_Value' in df_results.columns:
    density_comparison = df_results[['File', 'Graph_Index', 'Num_Nodes', 'Num_Edges', 
                                      'Connectivity_Graph Density_Value']].sort_values(
        'Connectivity_Graph Density_Value', ascending=False)
    print("Ranking de Densidade dos Grafos:")
    density_comparison
else:
    print("Coluna de densidade não encontrada.")

Ranking de Densidade dos Grafos:


### 8.3 Exportar Resultados para CSV

In [14]:
# Salvar DataFrame em arquivo CSV
output_path = os.path.join("final_work", "results", "comparative_metrics.csv")
os.makedirs(os.path.dirname(output_path), exist_ok=True)
df_results.to_csv(output_path, index=False)
print(f"Resultados salvos em: {output_path}")

Resultados salvos em: final_work\results\comparative_metrics.csv
