<a href="https://colab.research.google.com/github/alessiodevoto/NeuralNetworks_project/blob/main/data1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Libraries and framework

In [None]:
!python -c "import torch; print(torch.__version__)"
!pip install torch-scatter -f https://data.pyg.org/whl/torch-1.9.0+cu102.html
!pip install torch-sparse -f https://data.pyg.org/whl/torch-1.9.0+cu102.html
!pip install torch-geometric
!pip install wget
!pip install pickle5

In [2]:
cd /content/drive/MyDrive/gcn

/content/drive/MyDrive/gcn


# Dataset
The PylonsDataset class builds up a dataset of graph, one for each photo in data. Download options are: 
- complete_data : downloads json files (raw) and pyG dataset (processed)
- processed_data : only downloads processed data in PyG format
- raw_data: only downloads json files, and processes them to create a PyG dataset

In [4]:
from PylonsDataset import PylonsDataset
mydata = PylonsDataset(root='data', password='matching', download_option='raw_data')

Downloading dataset to data ...
Downloading...


Processing...


Retrieving information about relations between assets...
One-hot encoding assets...
Processing dictionaries...
Processing datasets in json format...
Parsing dataset file: data/raw/datasets/D110-36742.json
Parsing dataset file: data/raw/datasets/D550-19031.json
Parsing dataset file: data/raw/datasets/D340-33954.json
Parsing dataset file: data/raw/datasets/D260-26837.json
Parsing dataset file: data/raw/datasets/D550-47654.json
Parsing dataset file: data/raw/datasets/D110-11881.json
Parsing dataset file: data/raw/datasets/D340-49418.json
Parsing dataset file: data/raw/datasets/D260-49027.json
Number of elements not included (unclassifiable photos): 789
Number of elements in dataset: 9444


Done!


Dataset ready
Find raw data in data/raw and processed data in data/processed


 In order to get a single graph, we use __get()__, whereas to get a pair of graphs randomly generated on the fly, we use ____getitem__()__.
 A single graph, stored as an object of the class Data , has the following features:
 - x : feature matrix NxF, N num nodes, F num features
 - edge_index: graph connectivity in PyG format
 - edge_attr: edge features  
 - y: target value, i.e. id of pylons captured in this graph
 - photo_id: id of photo this graph represents

In [5]:
g0 = mydata.get(0)
print('Graph element at index 0:')
print(g0)
print('\n\nFeatures matrix of graph element at index 0:')
print(g0.x)

Graph element at index 0:
Data(x=[3, 52], edge_index=[2, 12], edge_attr=[12, 1], y='504631_4353907_57', photo_id='F_2020_06_23@15.58.44(612)_Converted_CROP_1_73.jpg')


Features matrix of graph element at index 0:
tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.6752, 0.9000, 6.7219],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,

We can also get a pair of graph, together with a label (1:similar, -1: not similar). 
Pairs of graph are saved in a PairData structure, derived from data, optimized for batching. 



In [6]:
pair = mydata[10] # equivalent to mydata.__geitem__(0)
print(pair)

PairData(x1=[9, 52], edge_index1=[2, 90], edge_attr1=[90, 1], y1='504708_4353863_31', x2=[10, 52], edge_index2=[2, 110], edge_attr2=[110, 1], y2='504708_4353863_31', target=[1], num_nodes=19)


We can explore some properies:

In [7]:
print(f'Number of node features: {mydata.num_node_features}')
print(f'Number of edge features: {mydata.num_edge_features}')
print(f'Average number of nodes per graph: {mydata.avg_nodes_per_graph}')
print(f'Average number of edges per graph: {mydata.avg_edges_per_graph}')
print(f'Dataset is undirected: {mydata.is_undirected}')
print(f'Number of classes (i.e. of captured pylons): {mydata.num_classes}')

Number of node features: 52
Number of edge features: 1
Average number of nodes per graph: 6.0630029648454045
Average number of edges per graph: 49.32570944515036
Dataset is undirected: True
Number of classes (i.e. of captured pylons): 1965


Whether the pair of graph are similar or not and which specific pair to extract is decided randomly on the fly when we invoke __getitem__. In order to achieve a deterministic behavior, we can create a dataset with the option `deterministic = True` or just set the property to `True`. This way, the similarity is based on the index (even->similar, odd->not similar).

In [8]:
print('Randomly extracted pairs:')
print(mydata[10])
print(mydata[10])
print(mydata[10])
print('Deterministically extracted pairs:')
mydata.deterministic = True
print(mydata[10])
print(mydata[10])
print(mydata[10])
mydata.deterministic = False

Randomly extracted pairs:
PairData(x1=[9, 52], edge_index1=[2, 90], edge_attr1=[90, 1], y1='504708_4353863_31', x2=[8, 52], edge_index2=[2, 72], edge_attr2=[72, 1], y2='504708_4353863_31', target=[1], num_nodes=17)
PairData(x1=[9, 52], edge_index1=[2, 90], edge_attr1=[90, 1], y1='504708_4353863_31', x2=[8, 52], edge_index2=[2, 72], edge_attr2=[72, 1], y2='504708_4353863_31', target=[1], num_nodes=17)
PairData(x1=[9, 52], edge_index1=[2, 90], edge_attr1=[90, 1], y1='504708_4353863_31', x2=[9, 52], edge_index2=[2, 90], edge_attr2=[90, 1], y2='504708_4353863_31', target=[1], num_nodes=18)
Deterministically extracted pairs:
PairData(x1=[9, 52], edge_index1=[2, 90], edge_attr1=[90, 1], y1='504708_4353863_31', x2=[10, 52], edge_index2=[2, 110], edge_attr2=[110, 1], y2='504708_4353863_31', target=[1], num_nodes=19)
PairData(x1=[9, 52], edge_index1=[2, 90], edge_attr1=[90, 1], y1='504708_4353863_31', x2=[10, 52], edge_index2=[2, 110], edge_attr2=[110, 1], y2='504708_4353863_31', target=[1], nu

# Model

In [10]:
from model import GraphEmbeddingNet
from loss import PairwiseLoss
from PylonsDataset import PylonsDataset
from torch_geometric.loader import DataLoader
import torch

dev = "cuda:0" if torch.cuda.is_available() else "cpu"
device = torch.device(dev)  

model = GraphEmbeddingNet(
    conv_hidden_channels=[64, 64, 64], 
    graph_aggr_dim=32,
    node_feature_dim=52,
    edge_feature_dim=1,
    node_hidden_sizes=[52, 52],
    edge_hidden_sizes=None
)
print(model)


optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
criterion = PairwiseLoss()
EPOCHS = 60

mydata.deterministc=False
train_loader = DataLoader(mydata, batch_size=64, shuffle=True, follow_batch=['x1', 'x2'], drop_last=True)




GraphEmbeddingNet(
  (graph_encoder): GraphEncoder(
    (MLP1): Sequential(
      (0): Linear(in_features=52, out_features=52, bias=True)
      (1): ReLU()
      (2): Linear(in_features=52, out_features=52, bias=True)
    )
  )
  (message_net): MessageNet(
    (conv1): GCNConv(52, 64)
    (conv2): GCNConv(64, 64)
    (conv3): GCNConv(64, 64)
  )
  (aggregator): GraphAggregator(
    (aggregator): Linear(in_features=64, out_features=32, bias=True)
  )
)


In [11]:
print('Starting training')
for epoch in range(1, EPOCHS):
    model.train()
    loss = None
    losses = []
    for data in train_loader:  # Iterate in batches over the training dataset.
        # print('[FORWARD] processing values:')
        # print(data.x1)
        # print(data.x2)
        # print(data.x1_batch)
        # print(data.x2_batch)
        # print(data.num_graphs)
        emb1, emb2 = model(data)
        #emb1 = linear_layer(data.x1)
        #emb1 = conv_layer(emb1, data.edge_index1)
        #print('[FORWARD] processed values:')
        #print(emb1)
        #print(emb2)
        loss = criterion(emb1, emb2, data.target)  # Compute the loss.
        #print('[FORWARD] loss:')
        #print(loss.size())
        losses.append(loss)
        loss.backward(torch.ones_like(loss))  # Derive gradients.
        optimizer.step()  # Update parameters based on gradients.
        optimizer.zero_grad()  # Clear gradients.
        #print(f'[FORWARD] List of losses in this batch: size:{len(losses)} list: {losses}')
        #print(f'[FORWARD] Avg loss in this batch: {losses[-1].mean}')
    print(f'Epoch: {epoch:03d}, Loss: {torch.cat(losses, 1).mean()}')

Starting training
Epoch: 001, Loss: 1.190205454826355
Epoch: 002, Loss: 0.8206015825271606
Epoch: 003, Loss: 0.5824412703514099
Epoch: 004, Loss: 0.55259770154953
Epoch: 005, Loss: 0.5111098289489746
Epoch: 006, Loss: 0.48512953519821167
Epoch: 007, Loss: 0.4666995108127594
Epoch: 008, Loss: 0.47039949893951416
Epoch: 009, Loss: 0.45882120728492737
Epoch: 010, Loss: 0.45039913058280945
Epoch: 011, Loss: 0.4259071946144104
Epoch: 012, Loss: 0.4484967589378357
Epoch: 013, Loss: 0.43738240003585815
Epoch: 014, Loss: 0.4348319470882416
Epoch: 015, Loss: 0.4271049499511719
Epoch: 016, Loss: 0.44710078835487366
Epoch: 017, Loss: 0.4256753921508789
Epoch: 018, Loss: 0.4200536012649536
Epoch: 019, Loss: 0.42354831099510193
Epoch: 020, Loss: 0.4245895445346832
Epoch: 021, Loss: 0.4246613383293152
Epoch: 022, Loss: 0.43687304854393005
Epoch: 023, Loss: 0.4240849018096924
Epoch: 024, Loss: 0.42480629682540894
Epoch: 025, Loss: 0.42533212900161743
Epoch: 026, Loss: 0.42426371574401855
Epoch: 027, 