# Node classification with Cora Dataset

In this notebook, you will need to deal with the node classification problem on Cora Dataset, which is similar to the Karate Club problem.

## Load Dataset

The Cora dataset consists of Machine Learning papers. These papers are classified into one of the following seven classes:
- Case_Based
- Genetic_Algorithms
- Neural_Networks
- Probabilistic_Methods
- Reinforcement_Learning
- Rule_Learning
- Theory

Each node represents a paper, and the link between them represents the citation relationship. It is splitted to train, validation and test set. This is also a common baseline for most graph neural network papers.

In [None]:
from dgl.data import CoraDataset
import torch as th
dataset = CoraDataset()
g = dataset[0]
print(g)

## Extract masks and print dataset statistics

In [None]:
num_labels = g.ndata['label'].max().item()+1 # label index started from 0
feature_dim = g.ndata['feat'].shape[1]
train_mask = g.ndata['train_mask'].to(th.bool)
val_mask = g.ndata['val_mask'].to(th.bool)
test_mask = g.ndata['test_mask'].to(th.bool)
print("Node feature dimension: {}".format(feature_dim))
print("Number of labels: {}".format(num_labels))
print("Number of nodes for training: {}".format(train_mask.long().sum()))
print("Number of nodes for validataion: {}".format(val_mask.long().sum()))
print("Number of nodes for testing: {}".format(test_mask.long().sum()))

## Setup Model and Train

In this challenge, you will need to write the below part to achieve better performance on the test set. You can change the model structure, use other dgl [nn modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv), tuning hyperparameters in optimizers, add early stopping and so on. You need to finish the model with the training loop. Use trained model for further evaluation on test set. 

**However, please remember only using training data in the training loop below**. (such as `g.ndata['label'][train_mask]`)

In [None]:
class Model(nn.Module):
    def __init__(self, ...):
        pass
        
    def forward(self, g, inputs):
        pass
    

model = Model(...)

## Evaluate result on the test set

In [None]:
model.eval()
logits = model(g, g.ndata['feat'])
test_acc = (logits.argmax(1)==g.ndata['label'])[test_mask].float().mean()
print('Test accuracy: {:.4f}'.format(test_acc))