#MQE with GraphSAGE for Robust Graph Learning

- **Authors:** Gad Azriel, Lidor Kupershmid  
- **Supervisors:** Prof. Zeev Volkovich, Dr. Renata Avros  
---

## Notebook Overview:
This notebook implements and compares two approaches for robust graph representation learning:
1. **Original MQE** - Matrix-based multi-hop propagation
2. **MQE + GraphSAGE** - GraphSAGE-based propagation with quality estimation
---

---
#### ***Section 1: Setup & Environment***
---

**Install Dependencies**

In [None]:
# Install PyTorch with CUDA support
!pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121

# Install PyTorch Geometric dependencies
!pip install --no-index torch-scatter -f https://pytorch-geometric.com/whl/torch-2.1.2+cu121.html
!pip install --no-index torch-sparse -f https://pytorch-geometric.com/whl/torch-2.1.2+cu121.html
!pip install --no-index torch-cluster -f https://pytorch-geometric.com/whl/torch-2.1.2+cu121.html
!pip install --no-index torch-spline-conv -f https://pytorch-geometric.com/whl/torch-2.1.2+cu121.html
!pip install torch-geometric

# Install scikit-learn
!pip install scikit-learn

Looking in indexes: https://download.pytorch.org/whl/cu121
[31mERROR: Could not find a version that satisfies the requirement torch==2.1.2 (from versions: 2.2.0+cu121, 2.2.1+cu121, 2.2.2+cu121, 2.3.0+cu121, 2.3.1+cu121, 2.4.0+cu121, 2.4.1+cu121, 2.5.0+cu121, 2.5.1+cu121)[0m[31m
[0m[31mERROR: No matching distribution found for torch==2.1.2[0m[31m
[0mLooking in links: https://pytorch-geometric.com/whl/torch-2.1.2+cu121.html
[31mERROR: Could not find a version that satisfies the requirement torch-scatter (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for torch-scatter[0m[31m
[0mLooking in links: https://pytorch-geometric.com/whl/torch-2.1.2+cu121.html
[31mERROR: Could not find a version that satisfies the requirement torch-sparse (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for torch-sparse[0m[31m
[0mLooking in links: https://pytorch-geometric.com/whl/torch-2.1.2+cu121.html
[31mERROR: Could not find a version

##### **Create Project Structure**

In [None]:
# Create directories
!mkdir -p mqe_graphsage/best_params
%cd mqe_graphsage

/content/mqe_graphsage


**Create utils.py** - Helper Functions

In [None]:
%%writefile utils.py
from scipy.sparse import csc_matrix
from sklearn.preprocessing import MinMaxScaler
import torch_geometric
import numpy as np
from scipy import sparse as sp
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import normalize
import torch
import os.path as osp
from torch_geometric.datasets import Planetoid, Amazon, Coauthor, WikiCS
from torch_geometric.utils import remove_self_loops, add_self_loops
import json


# Puropose: Adds noise to your dataset features to test model robustness
# x: Our original features
# alpha: Percentage of samples to add noise to
# noise_type: Either 'uniform' or 'normal' noise distribution
# beta: Intensity of the noise
# Returns: Polluted data, indices of noisy samples, indices of clean samples
def polluted_feat(x, alpha, noise_type, beta):
    rnd_generator = np.random.RandomState(0)
    data_tmp = x.cpu().numpy()
    print('original', data_tmp.sum())
    num_sample, num_feat = data_tmp.shape[0], data_tmp.shape[1]
    num_noisy_samples = int(alpha * num_sample)
    noisy_indices = rnd_generator.choice(num_sample, num_noisy_samples, replace=False)
    clean_indices = np.setdiff1d(np.arange(num_sample), noisy_indices)
    print('noisy_indices', len(noisy_indices))
    print('clean_indices', len(clean_indices))
    if noise_type == 'uniform':
        print('uniform noise !', beta)
        noise = np.random.uniform(0, 1, data_tmp.shape)
    elif noise_type == 'normal':
        print('normal noise !', beta)
        noise = np.random.normal(0, 1, data_tmp.shape)
    data_tmp[noisy_indices] += beta * noise[noisy_indices]
    print('pollted', data_tmp.sum())
    return data_tmp, noisy_indices, clean_indices


# Purpose: Loads previously saved best hyperparameters from experiments
def load_best_params(model, dataset, alpha, noise_type, beta, path='./best_params/'):
    save_file = f'best_results_{model}_{dataset}.json'
    file_path = path + save_file
    try:
        with open(file_path, 'r') as f:
            results = json.load(f)
    except (FileNotFoundError, json.JSONDecodeError):
        print(f"Error: Unable to load results from {file_path}")
        return None
    key_str = f"{alpha}_{beta}_{noise_type}"
    if model in results and key_str in results[model]:
        best_params = results[model][key_str]['best_params']
        return best_params
    else:
        print(f"No best params found for model '{model}', dataset '{dataset}', alpha '{alpha}', noise_type '{noise_type}', beta '{beta}'")
        return None


# Purpose: Loads standard graph datasets (Cora, CiteSeer, etc.)
def load_data(name):
    path = osp.join('./', 'data')
    if name in ['Cora', 'CiteSeer', 'PubMed']:
        dataset = Planetoid(path, name)
    elif name in ['computers', 'photo']:
        dataset = Amazon(path, name)
    elif name in ['cs', 'physics']:
        dataset = Coauthor(path, name)
    elif name in ['wikics']:
        dataset = WikiCS(path)

    data = dataset[0]
    data.edge_index = remove_self_loops(data.edge_index)[0]
    data.edge_index = add_self_loops(data.edge_index)[0]
    data.num_classes = torch.max(data.y).item() + 1
    data.x = Norm(data.x)
    adj = edge_index_to_sparse_mx(data.edge_index, data.num_nodes)
    data.A = adj
    adj = process_adj(adj)
    data.adj = adj
    return data


# Purpose: Converts scipy sparse matrices to PyTorch sparse tensors
def sparse_mx_to_torch_sparse_tensor(sparse_mx):
    sparse_mx = sparse_mx.tocoo().astype(np.float32)
    indices = torch.from_numpy(
        np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64))
    values = torch.from_numpy(sparse_mx.data)
    shape = torch.Size(sparse_mx.shape)
    return torch.sparse.FloatTensor(indices, values, shape)


# Purpose: Sets random seeds for reproducibility
def set_seeds(seed):
    np.random.seed(seed)
    torch_geometric.seed_everything(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False


# Purpose: Normalizes features to range [0,1] or [-1,1]
# Neural networks train better with normalized inputs
def Norm(x, min=0):
    x = x.detach().cpu().numpy()
    if min == 0:
        scaler = MinMaxScaler((0, 1))
    else:
        scaler = MinMaxScaler((-1, 1))
    norm_x = torch.tensor(scaler.fit_transform(x))
    if torch.cuda.is_available():
        norm_x = norm_x.cuda()
    return norm_x


# Purpose: Performs K-hop message passing on the graph
# norm_A: Normalized adjacency matrix
# K: Number of hops/iterations
## This is the core of how GraphSAGE spreads information!
def MessagePro(data, norm_A, K):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    data = data.to(device)
    norm_A = norm_A.to(device)
    X_list = [data]
    for _ in range(K):
        X_list.append(torch.spmm(norm_A, X_list[-1]))

    return X_list


# Purpose: Tests embedding quality using logistic regression
def label_classification(X, Y):
    X = normalize(X, norm='l2')
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.8)

    logreg = LogisticRegression(solver='liblinear')
    c = 2.0 ** np.arange(-10, 10)

    clf = GridSearchCV(estimator=OneVsRestClassifier(logreg),
                       param_grid=dict(estimator__C=c), n_jobs=8, cv=5,
                       verbose=0)
    clf.fit(X_train, y_train)
    y_pred_test = clf.predict(X_test)
    acc_test = accuracy_score(y_test, y_pred_test)

    return acc_test * 100


# Purpose: Converts edge list format to adjacency matrix
def edge_index_to_sparse_mx(edge_index, num_nodes):
    edge_weight = np.array([1] * len(edge_index[0]))
    adj = csc_matrix((edge_weight, (edge_index[0], edge_index[1])),
                     shape=(num_nodes, num_nodes)).tolil()
    return adj


# Purpose: Normalizes the adjacency matrix using symmetric normalization
def normalize_adj(adj):
    # Add self-loops
    adj = adj + sp.eye(adj.shape[0])
    # Compute degree matrix
    rowsum = np.array(adj.sum(1))
    # Compute D^{-1/2}
    d_inv_sqrt = np.power(rowsum, -0.5).flatten()
    d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.
    # Compute D^{-1/2}AD^{-1/2}
    d_mat_inv_sqrt = sp.diags(d_inv_sqrt)
    return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt)


def process_adj(adj):
    adj.setdiag(1)
    adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)
    adj = normalize_adj(adj)
    adj = sparse_mx_to_torch_sparse_tensor(adj)
    return adj

Writing utils.py


**Create model_graphsage.py** - Neural Network Models

In [None]:
%%writefile model_graphsage.py
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import SAGEConv


# Purpose: The GraphSAGE component that aggregates information from neighbors
class GraphSAGEEncoder(nn.Module):
    def __init__(self, input_dim, hidden_dims, num_layers, aggregator='mean'):
        super(GraphSAGEEncoder, self).__init__()
        self.num_layers = num_layers
        self.convs = nn.ModuleList()

        # First layer
        self.convs.append(SAGEConv(input_dim, hidden_dims[0], aggr=aggregator))

        # Hidden layers
        for i in range(1, num_layers):
            self.convs.append(SAGEConv(hidden_dims[i-1], hidden_dims[i], aggr=aggregator))

    def forward(self, x, edge_index):
        """
        Returns embeddings at each layer (multi-hop features)
        """
        embeddings = [x]  # h^(0) = x

        for i, conv in enumerate(self.convs):
            x = conv(x, edge_index)
            x = F.relu(x)
            embeddings.append(x)  # h^(1), h^(2), ..., h^(L)

        return embeddings


class MQEQualityEstimator(nn.Module):
    """
    Estimates mean (μ) and variance (σ²) for feature quality estimation
    """
    def __init__(self, z_dim, feat_dim, hid_dim):
        super(MQEQualityEstimator, self).__init__()
        # Expectation Network (predicts mean μ)
        self.E_MLP = nn.Sequential(
            nn.Linear(z_dim, hid_dim[0]),
            nn.ReLU(),
            nn.Linear(hid_dim[0], feat_dim)
        )
        # Quality Network (predicts variance σ)
        self.Q_MLP = nn.Sequential(
            nn.Linear(z_dim, hid_dim[1]),
            nn.ReLU(),
            nn.Linear(hid_dim[1], 1),
            nn.Softplus()
        )

    def forward(self, Z):
        mean = self.E_MLP(Z)
        sigma = self.Q_MLP(Z)
        return mean, sigma


class MQE_GraphSAGE(nn.Module):
    """
    Combined model: GraphSAGE for propagation + MQE for quality estimation
    """
    def __init__(self, input_dim, z_dim, sage_hidden_dims, mqe_hid_dims, num_layers, aggregator='mean'):
        super(MQE_GraphSAGE, self).__init__()

        self.num_layers = num_layers
        self.z_dim = z_dim

        # GraphSAGE encoder
        self.sage_encoder = GraphSAGEEncoder(input_dim, sage_hidden_dims, num_layers, aggregator)

        # MQE quality estimators (one per layer)
        self.mqe_estimators = nn.ModuleList()
        for i in range(num_layers + 1):  # +1 for the initial features
            if i == 0:
                feat_dim = input_dim
            else:
                feat_dim = sage_hidden_dims[i-1]

            estimator = MQEQualityEstimator(z_dim, feat_dim, mqe_hid_dims)
            self.mqe_estimators.append(estimator)

    def forward(self, x, edge_index, Z):
        """
        x: node features
        edge_index: graph structure
        Z: meta-representation for each node

        Returns: means and sigmas for each layer
        """
        # Get multi-hop embeddings from GraphSAGE
        embeddings = self.sage_encoder(x, edge_index)

        # Estimate quality for each layer
        means = []
        sigmas = []

        for i, embedding in enumerate(embeddings):
            mean, sigma = self.mqe_estimators[i](Z)
            means.append(mean)
            sigmas.append(sigma)

        return embeddings, means, sigmas


# Simple MQE model (original, for comparison)
class MQENet(nn.Module):
    def __init__(self, z_dim, feat_dim, hid_dim):
        super(MQENet, self).__init__()
        # Expectation
        self.E_MLP = nn.Sequential(nn.Linear(z_dim, hid_dim[0]), nn.ReLU(),
                                   nn.Linear(hid_dim[0], feat_dim))
        # Variance (Quality)
        self.Q_MLP = nn.Sequential(nn.Linear(z_dim, hid_dim[1]), nn.ReLU(),
                                   nn.Linear(hid_dim[1], 1), nn.Softplus())

    def forward(self, Z):
        mean = self.E_MLP(Z)
        sigma = self.Q_MLP(Z)
        return mean, sigma

Writing model_graphsage.py


**Create train_graphsage.py** - Training Functions

In [None]:
%%writefile train_graphsage.py
import torch
import torch.optim as optim
import torch.nn.functional as F
from model_graphsage import MQENet
from itertools import chain
from torch.nn import ModuleList


def train_mqe_graphsage(dataset, args):
    """
    Train MQE with GraphSAGE-based propagation
    Two-stage approach:
    1. Train GraphSAGE with supervision
    2. Use trained GraphSAGE embeddings for MQE quality estimation
    """
    device = torch.device(args.device if args.cuda else 'cpu')

    from model_graphsage import GraphSAGEEncoder

    # Stage 1: Train GraphSAGE for node classification
    print("Stage 1: Training GraphSAGE encoder...")
    sage_encoder = GraphSAGEEncoder(
        input_dim=args.input_dim,
        hidden_dims=args.sage_hidden_dims,
        num_layers=args.num_hops,
        aggregator=args.aggregator
    ).to(device)

    # Add a classifier on top
    classifier = torch.nn.Linear(args.sage_hidden_dims[-1], dataset.num_classes).to(device)

    # Create train/val split (same as used for final evaluation)
    from sklearn.model_selection import train_test_split
    indices = torch.arange(dataset.num_nodes)
    train_idx, val_idx = train_test_split(
        indices.numpy(),
        test_size=0.8,
        random_state=42,
        stratify=dataset.y.cpu().numpy()
    )
    train_idx = torch.LongTensor(train_idx).to(device)
    val_idx = torch.LongTensor(val_idx).to(device)

    x = dataset.x.to(device)
    edge_index = dataset.edge_index.to(device)
    y = dataset.y.to(device)

    # Train GraphSAGE
    optimizer_sage = optim.Adam(
        list(sage_encoder.parameters()) + list(classifier.parameters()),
        lr=args.lr
    )

    best_val_acc = 0
    patience = 20
    patience_counter = 0

    for epoch in range(1, min(args.epochs, 100) + 1):  # Max 100 epochs for GraphSAGE
        sage_encoder.train()
        classifier.train()
        optimizer_sage.zero_grad()

        # Forward
        embeddings = sage_encoder(x, edge_index)
        final_emb = embeddings[-1]  # Use last layer
        out = classifier(final_emb)

        # Loss only on training nodes
        loss = F.cross_entropy(out[train_idx], y[train_idx])
        loss.backward()
        optimizer_sage.step()

        # Validation
        if epoch % 10 == 0:
            sage_encoder.eval()
            classifier.eval()
            with torch.no_grad():
                embeddings = sage_encoder(x, edge_index)
                out = classifier(embeddings[-1])
                pred = out.argmax(dim=1)
                val_acc = (pred[val_idx] == y[val_idx]).float().mean().item()

                if val_acc > best_val_acc:
                    best_val_acc = val_acc
                    patience_counter = 0
                else:
                    patience_counter += 1

                if epoch % 20 == 0:
                    print(f'  Epoch {epoch}, Loss: {loss.item():.4f}, Val Acc: {val_acc:.4f}')

        if patience_counter >= patience:
            print(f"  Early stopping at epoch {epoch}")
            break

    print(f"Stage 1 complete. Best val acc: {best_val_acc:.4f}")

    # Stage 2: Get trained embeddings and apply MQE
    print("\nStage 2: Applying MQE quality estimation...")
    sage_encoder.eval()
    with torch.no_grad():
        embeddings = sage_encoder(x, edge_index)

    # Now apply MQE to these trained embeddings
    model_list = []
    for i, emb in enumerate(embeddings):
        feat_dim = emb.shape[1]
        model_temp = MQENet(args.z_dim, feat_dim, args.hid_dim).to(device)
        model_list.append(model_temp)

    # Initialize meta-representation Z
    Z = torch.normal(mean=torch.zeros([dataset.num_nodes, args.z_dim]), std=0.01)
    Z = Z.to(device)
    Z.requires_grad_(True)

    model_list = ModuleList(model_list)
    optimizer = optim.Adam(chain(model_list.parameters(), [Z]), lr=args.lr)

    # Training loop for MQE
    for epoch in range(1, args.epochs + 1):
        loss = 0
        model_list.train()

        for v in range(len(embeddings)):
            x_re, sigma = model_list[v](Z)
            re_loss = (x_re - embeddings[v]) ** 2
            re_loss = re_loss.div(2 * sigma ** 2) + torch.log(sigma)
            re_loss = re_loss.mean(1, keepdim=True)
            loss = loss + re_loss.mean()

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if epoch % 50 == 0 or epoch == 1:
            print(f'  Epoch {epoch}/{args.epochs}, Loss: {loss.item():.4f}')

    return Z.detach().cpu().numpy()


def train_mqe_original(dataset, args):
    """
    Train original MQE (matrix-based propagation) for comparison
    """
    device = torch.device(args.device if args.cuda else 'cpu')

    # Create one MQE model per hop
    model_list = []
    X_list = dataset.X_list

    for v in range(len(X_list)):
        feat_dim = X_list[v].shape[1]
        model_temp = MQENet(args.z_dim, feat_dim, args.hid_dim).to(device)
        model_list.append(model_temp)

    # Initialize meta-representation Z
    Z = torch.normal(mean=torch.zeros([dataset.num_nodes, args.z_dim]), std=0.01)
    Z = Z.to(device)
    Z.requires_grad_(True)

    # Optimizer
    model_list = ModuleList(model_list)
    optimizer = optim.Adam(chain(model_list.parameters(), [Z]), lr=args.lr)

    # Training loop
    for epoch in range(1, args.epochs + 1):
        loss = 0
        model_list.train()

        for v in range(len(X_list)):
            x_re, sigma = model_list[v](Z)
            re_loss = (x_re - X_list[v]) ** 2
            re_loss = re_loss.div(2 * sigma ** 2) + torch.log(sigma)
            re_loss = re_loss.mean(1, keepdim=True)
            loss = loss + re_loss.mean()

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if epoch % 50 == 0 or epoch == 1:
            print(f'Epoch {epoch}/{args.epochs}, Loss: {loss.item():.4f}')

    return Z.detach().cpu().numpy()

Writing train_graphsage.py


**Create run_comparison.py** - Experimental Comparison Script

In [None]:
%%writefile run_comparison.py
import torch
import argparse
from utils import *
from train_graphsage import train_mqe_graphsage, train_mqe_original
from sklearn.neighbors import kneighbors_graph
import warnings
import numpy as np
import time


def parameter_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument('--epochs', type=int, default=1000, help='training epochs')
    parser.add_argument('--lr', type=float, default=0.002, help='learning rate')
    parser.add_argument('--z-dim', type=int, default=512, help='meta-representation dimension')
    parser.add_argument('--hid-dim', type=list, default=[256, 64], help='MQE hidden dims')
    parser.add_argument('--sage-hidden-dims', type=list, default=[128, 128], help='GraphSAGE hidden dims')
    parser.add_argument('--data-name', type=str, default='Cora', help='dataset name')
    parser.add_argument('--num-hops', type=int, default=2, help='number of hops/layers')
    parser.add_argument('--device', type=str, default='cuda')
    parser.add_argument('--num-neighbor', type=int, default=20)
    parser.add_argument('--alpha', type=float, default=0.5, help='noisy node fraction')
    parser.add_argument('--beta', type=float, default=0.5, help='noise level')
    parser.add_argument('--noise-type', type=str, default='normal', choices=['normal', 'uniform', 'Original'])
    parser.add_argument('--aggregator', type=str, default='mean', choices=['mean', 'max', 'lstm'])
    parser.add_argument('--ntrials', type=int, default=3, help='number of trials')
    args = parser.parse_known_args()[0]
    return args


def get_knn_graph(x, num_neighbor, knn_metric='cosine'):
    adj_knn = kneighbors_graph(x, num_neighbor, metric=knn_metric)
    return adj_knn.tolil()


def run_experiment(use_graphsage=True):
    """
    Run experiment with either MQE+GraphSAGE or Original MQE
    """
    warnings.filterwarnings("ignore")
    set_seeds(0)
    args = parameter_parser()

    if torch.cuda.is_available():
        args.cuda = True
    else:
        args.cuda = False

    device = torch.device(args.device if args.cuda else 'cpu')

    # Load dataset
    dataset = load_data(args.data_name)
    dataset.ori_x = dataset.x.cpu().clone()

    # Add noise if requested
    if args.noise_type != 'Original':
        data_tmp, noisy_indices, clean_indices = polluted_feat(
            dataset.x, args.alpha, args.noise_type, args.beta
        )
        data_tmp = torch.from_numpy(data_tmp)
        dataset.x = data_tmp
        args.noise_idx = noisy_indices
        args.clean_idx = clean_indices

    dataset.x = dataset.x.to(device)
    args.input_dim = dataset.num_node_features
    args.num_class = dataset.num_classes

    if use_graphsage:
        print("\n" + "="*60)
        print("Running MQE + GraphSAGE")
        print("="*60)
    else:
        print("\n" + "="*60)
        print("Running Original MQE (Matrix-based)")
        print("="*60)
        # Prepare multi-hop features for original MQE
        feat_ = MessagePro(dataset.x, dataset.adj, args.num_hops)
        feat = torch.stack(feat_).sum(dim=0)
        adj_f = get_knn_graph(feat.cpu(), args.num_neighbor, knn_metric='cosine')
        adj_f = process_adj(adj_f)
        x_f = MessagePro(dataset.x, (dataset.adj + adj_f) / 2, args.num_hops)
        dataset.X_list = x_f

    # Run multiple trials
    accs = []
    times = []

    for trial in range(1, args.ntrials + 1):
        print(f"\n--- Trial {trial}/{args.ntrials} ---")
        set_seeds(trial)

        start_time = time.time()

        if use_graphsage:
            H = train_mqe_graphsage(dataset, args)
        else:
            H = train_mqe_original(dataset, args)

        elapsed_time = time.time() - start_time
        times.append(elapsed_time)

        Y = dataset.y.detach().cpu().numpy()
        acc_test = label_classification(H, Y)

        print(f'Accuracy: {acc_test:.2f}%, Time: {elapsed_time:.2f}s')
        accs.append(acc_test)

    avg_acc = round(np.mean(accs), 2)
    std_acc = round(np.std(accs), 2)
    avg_time = round(np.mean(times), 2)

    return avg_acc, std_acc, avg_time


if __name__ == "__main__":
    # Run both approaches
    print("\n" + "PHASE B: COMPARATIVE EXPERIMENTS ".center(60, "="))

    # Approach 1: Original MQE
    acc_orig, std_orig, time_orig = run_experiment(use_graphsage=False)

    # Approach 2: MQE + GraphSAGE
    acc_sage, std_sage, time_sage = run_experiment(use_graphsage=True)

    # Summary
    print("\n" + "="*60)
    print("FINAL RESULTS SUMMARY")
    print("="*60)
    print(f"\n{'Method':<30} {'Accuracy':<15} {'Time (s)':<10}")
    print("-" * 60)
    print(f"{'Original MQE':<30} {acc_orig}\u00b1{std_orig}%{'':<10} {time_orig}")
    print(f"{'MQE + GraphSAGE':<30} {acc_sage}\u00b1{std_sage}%{'':<10} {time_sage}")
    print("="*60)

Writing run_comparison.py


---
#### ***Section 2: Output***
---

**Run the Comparison Experiment**

In [None]:
!python run_comparison.py


Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!
original 49216.0
noisy_indices 1354
clean_indices 1354
normal noise ! 0.5
pollted 49499.83

Running Original MQE (Matrix-based)

--- Trial 1/3 ---
Epoch 1/1000, Loss: -0.9347
Epoch 50/1000, Loss: -4.8748
Epoch 100/1000, Loss: -5.0431
Epoch 150/1000, Loss: -5.3075
Epoch 200/1000, Loss: -5.5219
Epoch 250/1000, Loss: -5.6539
Epoch 300/