## Implementing C&S and evaluating on AsiaFM Dataset

In this Colab, we will load the AsiaFM dataset, test the performance of a simple MLP, and then refine our prediction results using [Correct&Smooth](https://arxiv.org/abs/2010.13993), a recently propose post-processing algorithm that can improve performance of baseline classifiers using graph structure.

While the paper includes a codebase, this implementation is redone from scratch following the paper. It is centered around building the two main functions, _correct_, and _smooth_, then measuring and visualizing the performance on our dataset.

In [None]:
# Install required packages, following https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.10.0+cu113.html
%matplotlib inline
import torch; torch.manual_seed(42);
import torch.nn.functional as F

from tqdm import tqdm

# Graph-tools for visualization package (GraphViz)
!echo "deb http://downloads.skewed.de/apt bionic main" >> /etc/apt/sources.list
!apt-key adv --keyserver keys.openpgp.org --recv-key 612DEFB798507F25
!apt-get update
!apt-get install python3-graph-tool python3-cairo python3-matplotlib
from graph_tool.all import *
import graph_tool.draw as draw
import matplotlib.cm as cm

Looking in links: https://data.pyg.org/whl/torch-1.10.0+cu113.html
Executing: /tmp/apt-key-gpghome.QiDQejh6v0/gpg.1.sh --keyserver keys.openpgp.org --recv-key 612DEFB798507F25
gpg: key 612DEFB798507F25: "Tiago de Paula Peixoto <tiago@skewed.de>" not changed
gpg: Total number processed: 1
gpg:              unchanged: 1
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
Get:2 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Hit:4 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Ign:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:6 http://archive.ubuntu.com/ubuntu bionic InRelease
Hit:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Get:9 ht



#### Loading datasets and preparing our graph visualizations

We use the builtin LastFMAsia dataset in torch_geometric.datasets (although other datasets should work out of the box with this codebase as well). In a production setting, we would likely be pulling data out of a dataloader in PyTorch, but this data is sufficiently small that representing it as a dense matrix will suffice.

Graph visualization in NetworkX is slow, so I've found the [graph-tool](https://graph-tool.skewed.de/) library and included [graphviz](https://graphviz.org/) bindings to be good for drawing. You will see some helper code in each cell below where we just load the new predictions into our graphviz representation and generate some new viz.

DROP_EDGE_EXPERIMENT will test how C&S compares to other models when the graph is not as dense.

In [None]:
train_split

<torch.utils.data.dataset.Subset at 0x7fa6a9b3d6d0>

In [None]:
"""
Standard utility functionality to load and generate dataset splits
"""
import numpy as np
from torch_geometric.datasets import Twitch, LastFMAsia
from torch_geometric.utils import to_dense_adj

graph = LastFMAsia("./data")  # Twitch("./data", "EN")
X = graph.data.x
edge_index = graph.data.edge_index
y = graph.data.y.squeeze()

NUM_EDGE_KEEP = 0.3

kept_edges = np.random.choice(edge_index.shape[1], int(edge_index.shape[1] * NUM_EDGE_KEEP))
edge_index = edge_index[:, kept_edges]
kept_nodes = list(set(edge_index.flatten().tolist()))

# We use a 0.8 / 0.1 / 0.1 train/val/test split
# For the last section of the blog post, we attempt an experiment with much
# smaller data regime (0.5, 0.25, 0.25)
SPLIT_FRACTIONS = (0.8,  0.1, 0.1)
splits_sizes = (int(SPLIT_FRACTIONS[0] * len(X[kept_nodes])), 
                int(SPLIT_FRACTIONS[1] * len(X[kept_nodes])), 
                len(X[kept_nodes]) - int(SPLIT_FRACTIONS[0] * len(X[kept_nodes]))- int(SPLIT_FRACTIONS[1] * len(X[kept_nodes])))
train_split, val_split, test_split = splits = torch.utils.data.random_split(X[kept_nodes], splits_sizes)
(X_train, y_train), (X_val, y_val), (X_test, y_test) = [(X[split.indices], y[split.indices]) for split in splits] 

num_labels = int(max(y) + 1)
print(f"Dataset: { X_train.shape[0] } training, { X_val.shape[0] } val, { X_test.shape[0] } test samples with { X.shape[1] } dim embeddings")
print(f"{ edge_index.shape[1] } total followerships (edges)")
print(f"{ num_labels } total classes")

# Building graph-tool Graph for visualization
graph_tool_graph = Graph(directed=False)
graph_tool_nodes = []

# We add properties to each graph node that can then be visualized
v_country = graph_tool_graph.new_vertex_property("int")
v_splits = graph_tool_graph.new_vertex_property("int")
for i in range(len(X)):
    v = graph_tool_graph.add_vertex()
    v_country[v] = y[i]
    if i in train_split.indices:
        v_splits[v] = 0
    elif i in val_split.indices:
        v_splits[v] = 1
    elif i in test_split.indices:
        v_splits[v] = 2
    graph_tool_nodes.append(v)
graph_tool_graph.vertex_properties["country"] = v_country
graph_tool_graph.vertex_properties["split"] = v_splits

for e in edge_index.T:
    n1, n2 = [graph_tool_nodes[int(x)] for x in list(e)]
    graph_tool_graph.add_edge(n1, n2)

pos = draw.graphviz_draw(graph_tool_graph, 
                   output="home_country_gt.png", 
                   overlap=False, 
                   size=(30, 30), 
                   vsize=0.3, 
                   vcolor=v_country)

draw.graphviz_draw(graph_tool_graph, 
                   pos=pos,
                   pin=True,
                   output="splits.png", 
                   overlap=False, 
                   size=(30, 30), 
                   vsize=0.3, 
                   vcolor=v_splits)

Dataset: 5179 training, 647 val, 648 test samples with 128 dim embeddings
22244 total followerships (edges)
18 total classes


<VertexPropertyMap object with value type 'vector<double>', for Graph 0x7fa6a340dd50, at 0x7fa6a81dbcd0>

We form the normalized adjancency matrix for future use here.

In [None]:
# # Form normalized adjacency matrix S = D^(-1/2)AD^(-1/2)
print("Form normalized adjacency matrix S...")

# Form the dense graph
A = to_dense_adj(edge_index).squeeze()
D = torch.diag(A.sum(-1))
D_inv_sqrt = D.pow(-0.5)

# Numerical errors from divide by 0, 
# we follow the correction from the paper codebase at:
# https://github.com/CUAI/CorrectAndSmooth/blob/b910314a59270984f5e249462ee3faa815fc9a0c/outcome_correlation.py#L77
D_inv_sqrt[D_inv_sqrt == float('inf')] = 0 # 
S = D_inv_sqrt @ A @ D_inv_sqrt

Form normalized adjacency matrix S...


#### Step 1: Training a base predictor
The first step is to acquire a base classifier model that can output a probability distribution over the classes. We train a shallow MLP in PyTorch:

In [None]:
class BasePredictor(torch.nn.Module):
    """
    A simple MLP class to serve as the base predictor
    """
    def __init__(self, n_hidden_layers=1, in_size=128, hidden_size=64, out_size=1):
        super(BasePredictor, self).__init__()
        if n_hidden_layers == 0:
            self.net = torch.nn.Linear(in_size, out_size)
        else:
            net  = [torch.nn.Linear(in_size, hidden_size), torch.nn.ReLU()]
            net += [torch.nn.Linear(hidden_size, hidden_size), torch.nn.ReLU()] * (n_hidden_layers - 1)
            net += [torch.nn.Linear(hidden_size, out_size)]
            self.net = torch.nn.Sequential(*net)

    def forward(self, X):
        out = self.net(X)
        return out.squeeze()

The MLP achieves ~70% accuracy on the validation and test set. While nodes deep in each class cluster are consistent, the model makes errors on users with friends in different countries (cluster borders). You can see this in `home_country_mlp_pred.png`

In [None]:
"""
Step 1: Training the base predictor using the per-node embeddings
"""
net = BasePredictor(in_size=X.shape[1], n_hidden_layers=1, out_size=num_labels)
if num_labels > 1:
    loss = torch.nn.CrossEntropyLoss()
else:
    loss = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(net.parameters())

def train(X, y):
    optimizer.zero_grad()
    yhat = net(X)
    l = loss(yhat, y)
    l.backward()
    optimizer.step()
    return l

NUM_EPOCHS = 500

pbar = tqdm(range(NUM_EPOCHS))
for ep in pbar:
    l = train(X_train, y_train)
    pred = torch.argmax(net(X_val), -1)
    pbar.set_postfix({'loss': float(l), "val_acc": float(torch.sum(pred == y_val) / len(pred))})

# Visualize the MLP predictions on validation and test sets
pred_labels = torch.argmax(net(X), -1)
v_country_mlp_pred = graph_tool_graph.new_vertex_property("int")
for i, v in enumerate(graph_tool_nodes):
    v_country_mlp_pred[v] = pred_labels[i]

# graph_tool_graph.vertex_properties["mlp_pred"] = v_country_mlp_pred

# draw.graphviz_draw(graph_tool_graph, 
#                    output="home_country_mlp_pred.png", 
#                    pos=pos,
#                    pin=True,
#                    overlap=False, 
#                    size=(30, 30), 
#                    vsize=0.3, 
#                    vcolor=v_country_mlp_pred)

100%|██████████| 500/500 [00:06<00:00, 76.99it/s, loss=0.411, val_acc=0.7]


#### Step 2: Correct

We first form a matrix with each row equal to the residual error between the (one-hot encoded) labels `Y` and the predicted class distributions `Z` for the training nodes only (by setting the error at val and test indices to 0).

In [None]:
"""
Step 2 (2.2): Correcting for error in base predictions with residual propagation
"""
Z = torch.softmax(net(X), -1)
Y = F.one_hot(y, num_labels)

def residual_error(Z):
    """
    Form E, residual error matrix Z - L for training data"
    """
    E = Z - Y
    E[val_split.indices + test_split.indices] = 0
    return E

E = residual_error(Z)

Next we "smooth" the error across the graph in `correct(...)`. Due to homophily, we expect errors to be positively correlated for neighboring nodes, so for validation/test nodes, errors on neighboring training nodes can be predictive of the real error.

After smoothing the error, C&S scales the size of the new errors to be in the same scale as the original training errors in `autoscale(..)`. Adding the residuals back to the original predictions give us a new prediction vector Zr.

In [None]:
def correct(E, alpha1 = 0.8, eps = 1e-5, verbose=True, viz=False):
    """

    E^(t+1) = (1-alpha1)E + alpha * S @ E^(t) -> Ehat
    """
    if verbose:
        pbar = tqdm(total=float('inf'))

    Ehat = E
    diff = eps
    itr = 0
    while diff >= eps:
        # This is the iterative update step
        Et = (1 - alpha1) * E + alpha1 * (S @ Ehat)
        diff = float(torch.norm(Ehat - Et))
        Ehat = Et

        if verbose:
            pbar.update(1)
            pbar.set_postfix({ 'diff': diff })

        if viz and itr % 10 == 0:
            v_Zr = graph_tool_graph.new_vertex_property("float")
            max_err = torch.max(Ehat, -1).values
            for i, v in enumerate(graph_tool_nodes):
                v_Zr[v] = max_err[i]

            graph_tool_graph.vertex_properties["Zr"] = v_Zr

            draw.graphviz_draw(graph_tool_graph, 
                                output=f"zr_e_{itr}.png", 
                                pos=pos,
                                pin=True,
                                overlap=False, 
                                size=(30, 30), 
                                vsize=0.3, 
                                vcolor=v_Zr,
                                vcmap=cm.get_cmap("inferno")
                               )
        itr += 1
    return Ehat

Ehat = correct(E, viz=False)


def autoscale(E, Ehat, Z):
    """
    sigma = sum of absolute value of E for each training sample / num training samples
    """
    sigma = float(sum(torch.norm(E[train_split.indices], p=1, dim=-1))) / len(train_split)
    Zr = Z + sigma * Ehat / sum(abs(Ehat))
    Zr[train_split.indices] = Z[train_split.indices]
    return Zr

Zr = autoscale(E, Ehat, Z)


0it [00:00, ?it/s][A
1it [00:00,  8.24it/s][A
1it [00:00,  8.24it/s, diff=30.6][A
2it [00:00,  7.31it/s, diff=30.6][A
2it [00:00,  7.31it/s, diff=11.7][A
3it [00:00,  6.62it/s, diff=11.7][A
3it [00:00,  6.62it/s, diff=6.38][A
4it [00:00,  7.08it/s, diff=6.38][A
4it [00:00,  7.08it/s, diff=4.03][A
5it [00:00,  6.20it/s, diff=4.03][A
5it [00:00,  6.20it/s, diff=2.84][A
6it [00:00,  5.63it/s, diff=2.84][A
6it [00:00,  5.63it/s, diff=2.13][A
7it [00:01,  4.32it/s, diff=2.13][A
7it [00:01,  4.32it/s, diff=1.64][A
8it [00:01,  4.49it/s, diff=1.64][A
8it [00:01,  4.49it/s, diff=1.28][A
9it [00:01,  5.26it/s, diff=1.28][A
9it [00:01,  5.26it/s, diff=1.01][A
10it [00:01,  5.26it/s, diff=0.796][A
11it [00:01,  7.22it/s, diff=0.796][A
11it [00:01,  7.22it/s, diff=0.632][A
12it [00:01,  7.22it/s, diff=0.502][A
13it [00:01,  8.57it/s, diff=0.502][A
13it [00:01,  8.57it/s, diff=0.4]  [A
14it [00:02,  8.57it/s, diff=0.319][A
15it [00:02,  9.73it/s, diff=0.319][A
15it [00:0

#### Step 3: Smooth
In the Correct step, we smoothed errors over adjacent nodes. In the Smooth step, we will also smooth the predictions across adjacent nodes following the same intuition. The smoothing operation is identical to the error correction, this time iterating over our best guess matrix G, initialized to our scaled prediction vector.

In [None]:
"""
Step 3: Smoothing final predictions with prediction correlation
"""
# Best guesses G:
# validation and test it is Zr
G = Zr
G[train_split.indices] = Y[train_split.indices].type(torch.float32)

def smooth(G, alpha2=0.8, eps=1e-5, verbose=True, viz=False):
    # G^(t+1) = (1 - alpha)G + alpha2 SG^(t) -> Yhat
    if verbose:
        pbar = tqdm(total=float('inf'))

    yhat = G
    diff = eps
    itr = 0
    while diff >= eps:
        Gt = (1 - alpha2) * G + alpha2 * (S @ yhat)
        diff = float(torch.norm(yhat - Gt))
        yhat = Gt
        if verbose:
            pbar.update(1)
            pbar.set_postfix({ 'diff': diff })
        if viz and itr % 10 == 0:
            preds = torch.argmax(yhat, -1)
            v_yhat = graph_tool_graph.new_vertex_property("int")
            for i, v in enumerate(graph_tool_nodes):
                v_yhat[v] = preds[i]

            graph_tool_graph.vertex_properties["smooth"] = v_yhat

            draw.graphviz_draw(graph_tool_graph, 
                                output=f"yhat_smooth_{itr}.png", 
                                pos=pos,
                                pin=True,
                                overlap=False, 
                                size=(30, 30), 
                                vsize=0.3, 
                                vcolor=v_yhat)
        itr += 1
    return yhat

yhat = smooth(G, alpha2=0.7666, viz=False)


0it [00:00, ?it/s][A
1it [00:00, 15.66it/s, diff=50.6][A
2it [00:00, 15.07it/s, diff=50.6][A
2it [00:00, 15.07it/s, diff=17.4][A
3it [00:00, 15.07it/s, diff=8.55][A
4it [00:00, 13.10it/s, diff=8.55][A
4it [00:00, 13.10it/s, diff=5.07][A
5it [00:00, 13.10it/s, diff=3.31][A
6it [00:00, 13.02it/s, diff=3.31][A
6it [00:00, 13.02it/s, diff=2.36][A
7it [00:00, 13.02it/s, diff=1.72][A
8it [00:00, 12.99it/s, diff=1.72][A
8it [00:00, 12.99it/s, diff=1.29][A
9it [00:00, 12.99it/s, diff=0.965][A
10it [00:00, 12.82it/s, diff=0.965][A
10it [00:00, 12.82it/s, diff=0.731][A
11it [00:00, 12.82it/s, diff=0.554][A
12it [00:00, 12.41it/s, diff=0.554][A
12it [00:00, 12.41it/s, diff=0.422][A
13it [00:01, 12.41it/s, diff=0.321][A
14it [00:01, 12.12it/s, diff=0.321][A
14it [00:01, 12.12it/s, diff=0.245][A
15it [00:01, 12.12it/s, diff=0.187][A
16it [00:01, 12.30it/s, diff=0.187][A
16it [00:01, 12.30it/s, diff=0.143][A
17it [00:01, 12.30it/s, diff=0.109][A
18it [00:01, 12.63it/s, dif

The paper makes it clear the hyperparameter tuning is pretty vital to this method. We implement a simple sweep here that just does a grid search in sequence. Serious implementations should look at parallelizing this search over a wider space.

In [None]:
from numpy import linspace
net.train(False)

def correct_and_smooth(E, Z, y, alpha1=0.4, alpha2=0.4):
    """
    Full pipeline for C&S
    """
    Ehat = correct(E, alpha1=alpha1, verbose=False, eps=1e-4)
    G = autoscale(E, Ehat, Z)
    G[train_split.indices] = Y[train_split.indices].type(torch.float32)
    yhat = smooth(G, alpha2=alpha2, verbose=False, eps=1e-4)
    return yhat

def hyperparameter_sweep(model, X, y, alpha1s, alpha2s):
    """
    We test val accuracy over a grid search of alpha1 and alpha2 and return
    the results as a list of (val_acc, (alpha1, alpha2)) for each run.
    """
    results = []
    Z = torch.sigmoid(model(X))
    E = residual_error(Z)
    with tqdm(total=len(alpha1s) * len(alpha2s)) as pbar:
        for alpha1 in alpha1s:
            for alpha2 in alpha2s:
                yhat = correct_and_smooth(E, Z, y, alpha1, alpha2)
                pred = torch.argmax(yhat, -1)
                val_acc = torch.mean((pred[val_split.indices] == y[val_split.indices]).type(torch.float32))
                results.append([float(val_acc), (alpha1, alpha2), yhat])
                pbar.update(1)
    return results


alpha1s, alpha2s = linspace(0.1, 0.9, 5), linspace(0.1, 0.9, 5)
sweep = sorted(hyperparameter_sweep(net, X, y, alpha1s, alpha2s))
display(f"Max val acc: { sweep[-1][0] } with hparams: { sweep[-1][1] }")


  0%|          | 0/25 [00:00<?, ?it/s][A
403it [00:43, 14.66it/s, diff=1.35e-6]
  8%|▊         | 2/25 [00:01<00:21,  1.06it/s][A
 12%|█▏        | 3/25 [00:03<00:26,  1.22s/it][A
 16%|█▌        | 4/25 [00:05<00:35,  1.70s/it][A
 20%|██        | 5/25 [00:12<01:13,  3.66s/it][A
 24%|██▍       | 6/25 [00:14<00:52,  2.78s/it][A
 28%|██▊       | 7/25 [00:15<00:41,  2.32s/it][A
 32%|███▏      | 8/25 [00:17<00:36,  2.15s/it][A
 36%|███▌      | 9/25 [00:19<00:37,  2.35s/it][A
 40%|████      | 10/25 [00:27<00:59,  3.94s/it][A
 44%|████▍     | 11/25 [00:29<00:44,  3.21s/it][A
 48%|████▊     | 12/25 [00:30<00:36,  2.78s/it][A
 52%|█████▏    | 13/25 [00:33<00:31,  2.61s/it][A
 56%|█████▌    | 14/25 [00:36<00:30,  2.77s/it][A
 60%|██████    | 15/25 [00:43<00:42,  4.29s/it][A
 64%|██████▍   | 16/25 [00:46<00:33,  3.75s/it][A
 68%|██████▊   | 17/25 [00:49<00:27,  3.43s/it][A
 72%|███████▏  | 18/25 [00:52<00:23,  3.34s/it][A
 76%|███████▌  | 19/25 [00:56<00:21,  3.56s/it][A
 80%|███

'Max val acc: 0.7743431329727173 with hparams: (0.9, 0.9)'

Here below we summarize our model performance results. With each step, we see nearly 10% jump in accuracy! Clearly, there are huge gains to be had in including the graph structure in this particular predictive task. We also see the importance of the two alpha variables to the performance of the smoothing steps - run a hyperparameter sweep if you choose to implement this method!

In [None]:
yhat_mlp = torch.argmax(net(X), -1)
print(f"Val  accuracy MLP: { torch.mean((yhat_mlp[val_split.indices] == y[val_split.indices]).type(torch.float32)) }")
print(f"Test accuracy MLP: { torch.mean((yhat_mlp[test_split.indices] == y[test_split.indices]).type(torch.float32)) }\n")

yhat_correct = torch.argmax(G, -1)
print(f"Val  accuracy Correct: { torch.mean((yhat_correct[val_split.indices] == y[val_split.indices]).type(torch.float32)) }")
print(f"Test accuracy Correct: { torch.mean((yhat_correct[test_split.indices] == y[test_split.indices]).type(torch.float32)) }\n")

yhat_cs = torch.argmax(correct_and_smooth(E, Z, y), -1)
print(f"Val  accuracy Correct&Smooth: { torch.mean((yhat_cs[val_split.indices] == y[val_split.indices]).type(torch.float32)) }")
print(f"Test accuracy Correct&Smooth: { torch.mean((yhat_cs[test_split.indices] == y[test_split.indices]).type(torch.float32)) }\n")

yhat_cs_sweep = torch.argmax(sorted(sweep)[-1][2], -1)
print(f"Val  accuracy Correct&Smooth Sweep: { torch.mean((yhat_cs_sweep[val_split.indices] == y[val_split.indices]).type(torch.float32)) }")
print(f"Test accuracy Correct&Smooth Sweep: { torch.mean((yhat_cs_sweep[test_split.indices] == y[test_split.indices]).type(torch.float32)) }\n")

Val  accuracy MLP: 0.7001545429229736
Test accuracy MLP: 0.6975308656692505

Val  accuracy Correct: 0.7001545429229736
Test accuracy Correct: 0.6975308656692505

Val  accuracy Correct&Smooth: 0.7465224266052246
Test accuracy Correct&Smooth: 0.7577160596847534

Val  accuracy Correct&Smooth Sweep: 0.7743431329727173
Test accuracy Correct&Smooth Sweep: 0.8040123581886292



#### Explaining C&S
A 20% increase in accuracy deserves some scrutiny - why does C&S perform so well here? Dense graphs lend themselves really well to classical smoothing approaches.

In [None]:
Y_simple = Y.clone().type(torch.float)
Y_simple[val_split.indices + test_split.indices] = 0

def simple_smooth(Y_simple, alpha_simple = 0.8, eps = 1e-5, verbose=True):
    if verbose:
        pbar = tqdm(total=float('inf'))

    Yhat_simple = Y_simple
    diff = eps
    itr = 0
    while diff >= eps:
        # This is the iterative update step
        Yhat_t = (1 - alpha_simple) * Y_simple + alpha_simple * (S @ Yhat_simple)
        diff = float(torch.norm(Yhat_simple - Yhat_t))
        Yhat_simple = Yhat_t

        if verbose:
            pbar.update(1)
            pbar.set_postfix({ 'diff': diff })

    return Yhat_simple

Yhat_simple = simple_smooth(Y_simple, verbose=False)
yhat_simple_correct = torch.argmax(Yhat_simple, -1)
print(f"Val  accuracy Correct: { torch.mean((yhat_simple_correct[val_split.indices] == y[val_split.indices]).type(torch.float32)) }")
print(f"Test accuracy Correct: { torch.mean((yhat_simple_correct[test_split.indices] == y[test_split.indices]).type(torch.float32)) }\n")

Val  accuracy Correct: 0.5950540900230408
Test accuracy Correct: 0.6280864477157593

