# Final Project: Activation Functions in Graph Neural Networks

This project is a continuation of my Senior Capstone Project: "Exploring the Role of Activation Functions in Deep Learning". While that research project only examined activation functions in the context of image classification, this notebook attempts to apply several different activation functions to Graph Neural Networks (GNNs). Specifically, we try to classify molecules (represented as graphs) by their scents/odors. Thus, this notebook represents a pivot from image classification to molecular classification. 

We use the Deep Graph Library (DGL) for our Graph Convolutional Neural Networks (GCNNs).

In [1]:
# pip install dgl

The Open Graph Benchmark (OGB) package contains helpful data loaders for pre-processing and splitting graph data.

In [2]:
# pip install ogb

RDKit is a cheminformatics library that will allow us to convert the input data (SMILES) into RDKit objects which can then be used as DGL objects.

In [3]:
# pip install rdkit-pypi

In [4]:
# pip install torchvision

In [5]:
# Imports
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# import pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import SGD,Adam,lr_scheduler
from torch.utils.data import random_split
import torchvision
from torchvision import transforms, datasets
from torch.utils.data import DataLoader
from torch.nn.parameter import Parameter

# Import graph library
import dgl
from dgl.data import DGLDataset
# import torch
# import torch as th
import os
from ogb.utils.features import (allowable_features, atom_to_feature_vector,
 bond_to_feature_vector, atom_feature_vector_to_dict, bond_feature_vector_to_dict) 
from rdkit import Chem

Using backend: pytorch


## Dataset

Next, we upload our training data. The dataset for this experiment was obtained from the AIcrowd Learning to Smell Challenge, which had essentially the same task of identifying the odors of chemical molecules. Of course, this experiment is slightly different because different activation functions will be tested. The training data is formatted as a CSV file with the following two columns:    


1.   SMILES (Simplified Molecular-Input Line-Entry System): A line notation for describing the chemical structure of a molecule with short ASCII strings
2.   Sentence: A string of odors separated by commas 

The Sentence data are the target values. There are 4,316 molecules and over 100 different scents. 

In [6]:
# Read in the training data and view the first 5 rows
df = pd.read_csv('./train.csv')
df.head()

Unnamed: 0,SMILES,SENTENCE
0,C/C=C/C(=O)C1CCC(C=C1C)(C)C,"fruity,rose"
1,COC(=O)OC,"fresh,ethereal,fruity"
2,Cc1cc2c([nH]1)cccc2,"resinous,animalic"
3,C1CCCCCCCC(=O)CCCCCCC1,"powdery,musk,animalic"
4,CC(CC(=O)OC1CC2C(C1(C)CC2)(C)C)C,"coniferous,camphor,fruity"


## Activation Functions

Here is the code for the Swish and Mish activation functions (Mish is a modified verison of Swish).

In [7]:
# implement swish activation function
def f_swish(input):
    '''
    Applies the swish function element-wise:
    swish(x) = x * sigmoid(x)
    '''
    return input * torch.sigmoid(input)

# implement class wrapper for swish activation function
class swish(nn.Module):
    '''
    Applies the swish function element-wise:
    swish(x) = x * sigmoid(x)

    Shape:
        - Input: (N, *) where * means, any number of additional
          dimensions
        - Output: (N, *), same shape as the input

    Examples:
        >>> m = swish()
        >>> input = torch.randn(2)
        >>> output = m(input)

    '''
    def __init__(self):
        '''
        Init method.
        '''
        super().__init__()

    def forward(self, input):
        '''
        Forward pass of the function.
        '''
        return f_swish(input)

In [8]:
# implement mish activation function
def f_mish(input):
    '''
    Applies the mish function element-wise:
    mish(x) = x * tanh(softplus(x)) = x * tanh(ln(1 + exp(x)))
    '''
    return input * torch.tanh(F.softplus(input))

# implement class wrapper for mish activation function
class mish(nn.Module):
    '''
    Applies the mish function element-wise:
    mish(x) = x * tanh(softplus(x)) = x * tanh(ln(1 + exp(x)))

    Shape:
        - Input: (N, *) where * means, any number of additional
          dimensions
        - Output: (N, *), same shape as the input

    Examples:
        >>> m = mish()
        >>> input = torch.randn(2)
        >>> output = m(input)

    '''
    def __init__(self):
        '''
        Init method.
        '''
        super().__init__()

    def forward(self, input):
        '''
        Forward pass of the function.
        '''
        return f_mish(input)

Here is the code for the custom activation functions, TAct and mTAct. These are both designed to interpolate between ReLU, Swish, and Tanh. 

In [9]:
def f_mtact(input, alpha, beta, inplace = False):
    '''
    Applies the mtact function element-wise:
    mtact(x) = ----
    '''
    A = 0.5*(alpha**2)
    B = 0.5 - A
    #B=(1-alpha**2)/2
    #C = (1+beta**2)/2
    C = 0.5*(1+beta**2)

    return (A*input + B)*(torch.tanh(C*input)+1)

In [10]:
def f_tact(input, alpha, beta, inplace = False):
    '''
    Applies the tact function element-wise:
    tact(x) = ----
    '''
    A = 0.5*alpha
    B = 0.5 - A
    #B=(1-alpha)/2
    C = 0.5*(1+beta)

    return (A*input + B)*(torch.tanh(C*input)+1)

In [11]:
# implement class wrapper for mtact activation function
class mTACT(nn.Module):
    '''
    Applies the mTACT function element-wise:
    mtact(x) = ----

    Shape:
        - Input: (N, *) where * means, any number of additional
          dimensions
        - Output: (N, *), same shape as the input

    Examples:
        >>> m = mtact()
        >>> input = torch.randn(2)
        >>> output = m(input)

    '''
    def __init__(self, alpha = np.random.uniform(0,0.5), beta = np.random.uniform(0,0.5),  inplace = False):
        """
        An implementation of our M Tanh Activation Function,
        mTACT.
        :param alpha a tuneable parameter
        :param beta a tuneable parameter
        """
        super().__init__()
        self.inplace = inplace

        self.alpha = alpha
        self.alpha = Parameter(torch.tensor(self.alpha,requires_grad=True))

        self.beta = beta
        self.beta = Parameter(torch.tensor(self.beta,requires_grad=True))

    def forward(self, input):
        '''
        Forward pass of the function.
        '''
        return f_mtact(input, alpha = self.alpha, beta = self.beta, inplace = self.inplace)

In [12]:
# implement class wrapper for tact activation function
class TACT(nn.Module):
    '''
    Applies the TACT function element-wise:
    tact(x) = ----

    Shape:
        - Input: (N, *) where * means, any number of additional
          dimensions
        - Output: (N, *), same shape as the input

    Examples:
        >>> t = tact()
        >>> input = torch.randn(2)
        >>> output = t(input)

    '''
    def __init__(self, alpha = np.random.uniform(0,0.5), beta = np.random.uniform(0,0.5),  inplace = False):
        """
        An implementation of our M Tanh Activation Function,
        mTACT.
        :param alpha a tuneable parameter
        :param beta a tuneable parameter
        """
        super().__init__()
        self.inplace = inplace

        self.alpha = alpha
        self.alpha = Parameter(torch.tensor(self.alpha,requires_grad=True))

        self.beta = beta
        self.beta = Parameter(torch.tensor(self.beta,requires_grad=True))

    def forward(self, input):
        '''
        Forward pass of the function.
        '''
        return f_tact(input, alpha = self.alpha, beta = self.beta, inplace = self.inplace)

## Data Pre-processing

The `smiles_to_graph` function takes in a SMILES string as input, converts it into an RDKit molecule object, and then returns the input as a DGL graph object. 



In [13]:
def smiles_to_graph(smiles_string):
    """
    Converts SMILES string to graph Data object
    INPUT: SMILES string (str)
    OUTPUT: graph object
    """

    # Convert input into an RDKit molecule object
    mol = Chem.MolFromSmiles(smiles_string)

    
    # Create an adjacency matrix
    adjacency_matrix = np.asmatrix(Chem.GetAdjacencyMatrix(mol))
    # num_nodes = len(adjacency_matrix)
    # Keep only connected nodes 
    nz_adj_matrix = np.nonzero(adjacency_matrix)
    # edge_list = [ ]
    src = []
    dst = []

    for i in range(nz_adj_matrix[0].shape[0]):
      src.append(nz_adj_matrix[0][i])
      dst.append(nz_adj_matrix[1][i])

    graph = dgl.graph((src, dst))
    bidirected_graph = dgl.to_bidirected(graph)

    return bidirected_graph

Here are two examples demonstrating the DGL graph representation of a given SMILES string.

In [14]:
print(smiles_to_graph("COC(=O)OC"))

Graph(num_nodes=6, num_edges=10,
      ndata_schemes={}
      edata_schemes={})


In [15]:
print(smiles_to_graph("CC(CC(=O)OC1CC2C(C1(C)CC2)(C)C)C"))

Graph(num_nodes=17, num_edges=36,
      ndata_schemes={}
      edata_schemes={})


The `smiles_to_feat_vec` function takes in a SMILES string as input and then outputs an array of feature vectors. Each atom has a feature vector generated by the `atom_to_feature_vector` method from the OGB library. Each component of the feature vector represents some physical or chemical property of the atom, such as the atomic number. 

In [16]:
def smiles_to_feat_vec(smiles_string):
    """
    Returns atom features for a molecule given a SMILES string
    INPUT: SMILES string (str)
    OUTPUT: graph object
    """
    # Convert input into an RDKit molecule object
    molecule = Chem.MolFromSmiles(smiles_string)
    # Collect each atom's feature vector in a list
    atom_features_list = []
    for atom in molecule.GetAtoms():
        atom_features_list.append(atom_to_feature_vector(atom))
    # Convert the list into a numpy array
    atoms_feature_vectors = np.array(atom_features_list, dtype = np.int64)
    return atoms_feature_vectors

Here are two examples demonstrating the feature vectorizations for a given SMILES string.

In [17]:
print(smiles_to_feat_vec("COC(=O)OC"))

[[5 0 4 5 3 0 2 0 0]
 [7 0 2 5 0 0 1 0 0]
 [5 0 3 5 0 0 1 0 0]
 [7 0 1 5 0 0 1 0 0]
 [7 0 2 5 0 0 1 0 0]
 [5 0 4 5 3 0 2 0 0]]


In [18]:
print(smiles_to_feat_vec("CC(CC(=O)OC1CC2C(C1(C)CC2)(C)C)C"))

[[5 0 4 5 3 0 2 0 0]
 [5 0 4 5 1 0 2 0 0]
 [5 0 4 5 2 0 2 0 0]
 [5 0 3 5 0 0 1 0 0]
 [7 0 1 5 0 0 1 0 0]
 [7 0 2 5 0 0 1 0 0]
 [5 0 4 5 1 0 2 0 1]
 [5 0 4 5 2 0 2 0 1]
 [5 0 4 5 1 0 2 0 1]
 [5 0 4 5 0 0 2 0 1]
 [5 0 4 5 0 0 2 0 1]
 [5 0 4 5 3 0 2 0 0]
 [5 0 4 5 2 0 2 0 1]
 [5 0 4 5 2 0 2 0 1]
 [5 0 4 5 3 0 2 0 0]
 [5 0 4 5 3 0 2 0 0]
 [5 0 4 5 3 0 2 0 0]]


For this experiment, we can first simplify our task by changing it from multiple classification to binary classification. Instead of attempting to identify all scents for a given molecule, we can simply try to determine whether a given molecule has a fruity scent or not. Here, we replace the Sentences with binary digits: "1" indicates the presence of a fruity scent, while "0" indicates the absence.

In [19]:
# Convert Sentences to a list of strings
sentences_list = df['SENTENCE'].to_list()
labels = []

# Iterate through list of sentences
for sentence in sentences_list:
  # Split each sentence into a list
  sentence = sentence.split(",")
  # Check for whether "fruity" exists in a sentence and add appropriate label
  if 'fruity' in sentence:
    labels.append(1)
  else:
    labels.append(0)

Next, we create a list of SMILES objects from our dataframe. We iterate through the list and convert each SMILES object to a graph object. If the graph object can then be properly converted to an OGB feature vector, then it remains in our dataset. Otherwise, it is stored as an exception that is later removed.

In [20]:
# Make a list of graphs
molecules_list = df['SMILES'].to_list()

# Keep count of the exceptions
exception_count = 0

# Make listx of graph objects and exception counts
graphs = []
exceptions = []

# Iterate through SMILES objects and try to save valid objects
for mol in molecules_list:
  g_mol = smiles_to_graph(mol)

  try:
    g_mol.ndata['feat'] = torch.tensor(smiles_to_feat_vec(mol)) 
  except:
    exceptions.append(exception_count)
   
  graphs.append(g_mol)
  exception_count += 1


Any molecules that somehow could not be represented as DGL graph objects are dropped from this experiment.

In [21]:
index = 0
for exception_count in exceptions:
  graphs.pop(exception_count - index)
  labels.pop(exception_count - index)
  index += 1

## Prepare Train and Test Datasets Randomly

In this step, we convert our dataset into a DGL dataset by adding labels to each graph from the labels list created earlier. 

In [22]:
class SyntheticDataset(DGLDataset):
    def __init__(self):
        super().__init__(name='synthetic')

    def process(self):
        #edges = pd.read_csv('./graph_edges.csv')
        #properties = pd.read_csv('./graph_properties.csv')
        self.graphs = graphs
        self.labels = torch.LongTensor(labels)

    def __getitem__(self, i):
        return self.graphs[i], self.labels[i]

    def __len__(self):
        return len(self.graphs)

dataset = SyntheticDataset()
# An example of a data point in the pre-processed dataset
graph, label = dataset[0]
print(graph, label)

Graph(num_nodes=14, num_edges=28,
      ndata_schemes={'feat': Scheme(shape=(9,), dtype=torch.int64)}
      edata_schemes={}) tensor(1)


In [49]:
# Another example of a data point in the pre-processed dataset
graph, label = dataset[1]
print(graph, label)

Graph(num_nodes=6, num_edges=10,
      ndata_schemes={'feat': Scheme(shape=(9,), dtype=torch.int64)}
      edata_schemes={}) tensor(1)


For this experiment, we use an 80/20 split for our training and testing data.

In [23]:
from dgl.dataloading import GraphDataLoader
from torch.utils.data.sampler import SubsetRandomSampler

num_examples = len(dataset)
num_train = int(num_examples * 0.8)

train_sampler = SubsetRandomSampler(torch.arange(num_train))
test_sampler = SubsetRandomSampler(torch.arange(num_train, num_examples))

train_dataloader = GraphDataLoader(
    dataset, sampler=train_sampler, batch_size=5, drop_last=False)
test_dataloader = GraphDataLoader(
    dataset, sampler=test_sampler, batch_size=5, drop_last=False)

## Define the GCNN then Train & Evaluate Accuracy

### ReLU

In [24]:
from dgl.nn import GraphConv

class GCNN_RELU(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCNN_RELU, self).__init__()
        # ACTIVATION FUNCTIONS HERE IN GRAPHCONV (NONE BY DEFAULT)
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        g.ndata['h'] = h
        return dgl.mean_nodes(g, 'h')

In [25]:
# Create the model with given dimensions
model = GCNN_RELU(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for ReLU (Trial 1):', num_correct / num_tests)



Test accuracy for ReLU (Trial 1): 0.7825581395348837


In [26]:
# Create the model with given dimensions
model = GCNN_RELU(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for ReLU (Trial 2):', num_correct / num_tests)

Test accuracy for ReLU (Trial 2): 0.7802325581395348


In [27]:
# Create the model with given dimensions
model = GCNN_RELU(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for ReLU (Trial 3):', num_correct / num_tests)

Test accuracy for ReLU (Trial 3): 0.7802325581395348


In [50]:
print("Average test accuracy for ReLU:", (0.7825581395348837 + 0.7802325581395348 + 0.7802325581395348) / 3)

Average test accuracy for ReLU: 0.7810077519379846


### Swish

In [29]:
from dgl.nn import GraphConv

class GCNN_SWISH(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCNN_SWISH, self).__init__()
        # ACTIVATION FUNCTIONS HERE IN GRAPHCONV (NONE BY DEFAULT)
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        SWISH = swish()
        h = self.conv1(g, in_feat)
        h = SWISH.forward(h)
        h = self.conv2(g, h)
        g.ndata['h'] = h
        return dgl.mean_nodes(g, 'h')

In [30]:
# Create the model with given dimensions
model = GCNN_SWISH(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for Swish (Trial 1):', num_correct / num_tests)

Test accuracy for Swish (Trial 1): 0.7790697674418605


In [31]:
# Create the model with given dimensions
model = GCNN_SWISH(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for Swish (Trial 2):', num_correct / num_tests)

Test accuracy for Swish (Trial 2): 0.7825581395348837


In [32]:
# Create the model with given dimensions
model = GCNN_SWISH(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for Swish (Trial 3):', num_correct / num_tests)

Test accuracy for Swish (Trial 3): 0.786046511627907


In [51]:
print("Average test accuracy for Swish:", (0.7790697674418605 + 0.7825581395348837 + 0.786046511627907) / 3)

Average test accuracy for Swish: 0.7825581395348836


### Mish

In [34]:
from dgl.nn import GraphConv

class GCNN_MISH(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCNN_MISH, self).__init__()
        # ACTIVATION FUNCTIONS HERE IN GRAPHCONV (NONE BY DEFAULT)
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.mish(h)
        h = self.conv2(g, h)
        g.ndata['h'] = h
        return dgl.mean_nodes(g, 'h')

In [35]:
# Create the model with given dimensions
model = GCNN_MISH(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for Mish (Trial 1):', num_correct / num_tests)

Test accuracy for Mish (Trial 1): 0.7790697674418605


In [36]:
# Create the model with given dimensions
model = GCNN_MISH(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for Mish (Trial 2):', num_correct / num_tests)

Test accuracy for Mish (Trial 2): 0.7779069767441861


In [37]:
# Create the model with given dimensions
model = GCNN_MISH(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for Mish (Trial 3):', num_correct / num_tests)

Test accuracy for Mish (Trial 3): 0.7825581395348837


In [52]:
print("Average test accuracy for Mish:", (0.7790697674418605 + 0.7779069767441861 + 0.7825581395348837) / 3)

Average test accuracy for Mish: 0.7798449612403102


### TAct

In [39]:
from dgl.nn import GraphConv

class GCNN_TACT(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCNN_TACT, self).__init__()
        # ACTIVATION FUNCTIONS HERE IN GRAPHCONV (NONE BY DEFAULT)
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        tact = TACT()
        h = self.conv1(g, in_feat)
        h = tact.forward(h)
        h = self.conv2(g, h)
        g.ndata['h'] = h
        return dgl.mean_nodes(g, 'h')

In [40]:
# Create the model with given dimensions
model = GCNN_TACT(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for TAct (Trial 1):', num_correct / num_tests)

Test accuracy for TAct (Trial 1): 0.7802325581395348


In [41]:
# Create the model with given dimensions
model = GCNN_TACT(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for TAct (Trial 2):', num_correct / num_tests)

Test accuracy for TAct (Trial 2): 0.7802325581395348


In [42]:
# Create the model with given dimensions
model = GCNN_TACT(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for TAct (Trial 3):', num_correct / num_tests)

Test accuracy for TAct (Trial 3): 0.7825581395348837


In [53]:
print("Average test accuracy for TAct:", (0.7802325581395348 + 0.7802325581395348 + 0.7825581395348837) / 3)

Average test accuracy for TAct: 0.7810077519379846


### mTAct

In [44]:
from dgl.nn import GraphConv

class GCNN_MTACT(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCNN_MTACT, self).__init__()
        # ACTIVATION FUNCTIONS HERE IN GRAPHCONV (NONE BY DEFAULT)
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        mtact = mTACT()
        h = mtact(h)
        h = self.conv2(g, h)
        g.ndata['h'] = h
        return dgl.mean_nodes(g, 'h')

In [45]:
# Create the model with given dimensions
model = GCNN_MTACT(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for mTAct (Trial 1):', num_correct / num_tests)

Test accuracy for mTAct (Trial 1): 0.7825581395348837


In [46]:
# Create the model with given dimensions
model = GCNN_MTACT(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for mTAct (Trial 2):', num_correct / num_tests)

Test accuracy for mTAct (Trial 2): 0.7790697674418605


In [47]:
# Create the model with given dimensions
model = GCNN_MTACT(9, 8, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['feat'].float())
        #print(pred,labels)
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['feat'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy for mTAct (Trial 3):', num_correct / num_tests)

Test accuracy for mTAct (Trial 3): 0.7825581395348837


In [54]:
print("Average test accuracy for mTAct:", (0.7825581395348837 + 0.7790697674418605 + 0.7825581395348837) / 3)

Average test accuracy for mTAct: 0.7813953488372093


## Conclusion

None of the activation functions significantly outperformed the others in accuracy on this particular task. However, there are many other avenues to explore in graph neural network tasks. 

Credit: Basis of experiment and code provided by
 https://towardsdatascience.com/learn-to-smell-molecules-with-graph-convolutional-neural-networks-62fa5a826af5