# Stage 1 - Code Study Allennlp + DataReader, SparseAdjacencyField (See DEV_STAGES.md for update)
- 8/?-8/?
- code study allennlp (across stages)
    - study tools and resources
        - allennlp guide
        - allennlp doc
        - allennlp github (source code)
        - google
    - data
        - fields
        - instances
        - batch
        - dataset
        - vocabulary
    - data operator
        - reader(to_instance)
        - vocab.from_instances
        - token_indexer(index_with => token_id)
        - dataloader(call batch_tensors in fields)
        - embedder(token_id2vec)
        - encoder(model part)
    - trainer
        - trainer
        - tensorboard_writer
        - config file composition and jsonnet
    - general work flow conclusion
        - rawdata =reader=> instance (with fields)
        - instance =Vocab=> vocab (with namespaces)
        - instance =IndexWith=> indexed_instances (TensorDict)
        - instances =dataloader(batch_tensors function)=> batch_tensor (TensorDict)
        - batch_tensor =model=> logits
    - allennlp conclusion
        - OOP + dependency injection
        - a good coding style
        - several robust off-the-shelf models
- implementation of datareader
    - use utils.doc2graph
    - graph2instance
- implementation of SparseAdjacencyField
    - sparse version of origin AdjacencyField
    - modify code (almost all) of allennlp AdjacencyField
    - implementation of PytorchGeoData Batching

# Stage 2 - Train a naive model (BagofWordPooling) with allennlp train
- 8/?-8/?
- (start this note when 2->3)
- mismatched BERT (use defualt mean)
    - use PretrainedTransformerMismatchedIndexer + PretrainedTransformerMismatchedEmbedder
    - note that here use BERT without special token
    - also "[ROOT]" in dependency graph is not special token to BERT is a potential issue
- sparse2dense, dense2sparse in tensorop.py
    - naive implementation works well without tensor
    - fix gradient issue
        - learn about leaf node in computatino graph
        - inplace operation
        - tensor properties
        - torch.sparse.Tensor.to_dense() as tf.scatter_nd
    - 2020/8/21, can actually use pytorch_scatter, pytorch_sparse...
- allennlp train can work with my modules
    
    
# State 3 - Train A HGNN model (het graph embedding w/o interaction)
- due 8/22
- add Graph2VecEncoder Registrable
- implement HGEN

# (Now) Stage 4 - Train A HGMN model (het graph matching network (may be final))
- due 8/31
- add GraphPair2VecEncoder Registrable
- implement HGMN

# Stage 5 - Validation on ANLI/Q-Test/HAN, Experiments
- due 9/15
- parse ANLI/HAN
- Q-Test generator(This may be required earlier)

# Stage 6 - Paper Fixing (due 9/19, EACL due 9/20)

# Now Work, Todo
- reader to add bidirectional relation
    - add add_edge for simplicity
- GraphEMbeddingNet(GraphPair2VecEncoder)
    - todo
- Config is modified
    - remove or transformer embedder
    - can train on token embedding first (quicker and see effect)
    - also a must do exp
- add raw_text_datareader
- tensor_op
    - move sparse cross attention to tensor_op

In [1]:
%load_ext autoreload
%autoreload 2
import os, sys

In [2]:
%pwd

'/work/2020-IIS-NLU-internship/MNLI/tests'

In [3]:
sys.path.append(os.path.abspath(".."))

# External Dependencies

In [4]:
## util
import os
import logging
from argparse import ArgumentParser
from tqdm import tqdm_notebook as tqdmnb
from tqdm import tqdm as tqdm
import pickle
import json 
import jsonlines as jsonl
from collections import defaultdict
from typing import Iterable, List, Dict, Tuple, Union
from pathlib import Path
## graph
import networkx as nx
import matplotlib.pyplot as plt
# geometric
import torch_geometric
## nn
import numpy as np
import torch
from torch_geometric.utils.convert import to_networkx
from torch_geometric.data.data import Data
## Stanza
import stanza
from stanza.models.common.doc import Document
from stanza.pipeline.core import Pipeline
## allennlp model
from allennlp_models.structured_prediction.predictors.srl import SemanticRoleLabelerPredictor
from allennlp_models.structured_prediction.predictors.biaffine_dependency_parser import BiaffineDependencyParserPredictor
from allennlp.predictors.predictor import Predictor #
## allennlp
from allennlp.data import Token, Vocabulary, Instance
from allennlp.data.fields import ListField, TextField, Field
from allennlp.data.token_indexers import (
    SingleIdTokenIndexer,
    TokenCharactersIndexer,
    ELMoTokenCharactersIndexer,
    PretrainedTransformerIndexer,
    PretrainedTransformerMismatchedIndexer,
)
from allennlp.data import DatasetReader, DataLoader, Instance, Vocabulary, PyTorchDataLoader
from allennlp.data.tokenizers import (
    CharacterTokenizer,
    PretrainedTransformerTokenizer,
    SpacyTokenizer,
    WhitespaceTokenizer,
)
from allennlp.modules.seq2vec_encoders import CnnEncoder
from allennlp.modules.text_field_embedders import BasicTextFieldEmbedder
from allennlp.modules.token_embedders import (
    Embedding,
    TokenCharactersEncoder,
    ElmoTokenEmbedder,
    PretrainedTransformerEmbedder,
    PretrainedTransformerMismatchedEmbedder,
)
from allennlp.nn import util as nn_util

# Internal Dependencies

In [5]:
import src.config as config

from src.data_git import utils as utils
from src.data_git import reader as reader

from src.models import SynNLIModel

In [6]:
# use relative by concatting pwd
# or the cahce file name will be ..SLASH........
bert_model = "bert-base-uncased"
train_data_path = "/work/2020-IIS-NLU-internship/MNLI/data/anli_v1.0/R1/train.jsonl"
validation_data_path = "/work/2020-IIS-NLU-internship/MNLI/data/anli_v1.0/R1/dev.jsonl"
test_data_path = "/work/2020-IIS-NLU-internship/MNLI/data/anli_v1.0/R1/test.jsonl"
cache_data_dir = "/work/2020-IIS-NLU-internship/MNLI/data/ANLI_instance_cache/R1"

# Read from ANLI preprocessed

In [7]:
rdr2 = reader.NLIGraphReader(input_fields=reader.config.default_fields, lazy=False, max_instances=100)

In [8]:
dev2 = rdr2.read(file_path="../data/anli_v1.0_preprocessed/R2/dev.jsonl")

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='reading instances', max=1.0, style=Prog…




# Batch Testing Model

In [9]:
vocab = Vocabulary.from_instances(dev2, min_count={"edge_labels":500}, max_vocab_size={"edge_labels":20}, non_padded_namespaces= ["*tags", "labels"]) # need to use @@unlown@@ for edge labels
# min_count={"edge_labels":150} => 58
# min_count={"edge_labels":500} => 46
# min_count={"edge_labels":1000} => 42
# 1200 => 36

HBox(children=(FloatProgress(value=0.0, description='building vocab', style=ProgressStyle(description_width='i…




In [10]:
dev2.index_with(vocab)

In [11]:
loader2 = PyTorchDataLoader(dev2, batch_size=2)

In [12]:
batch = next(iter(loader2))

In [13]:
vocab

Vocabulary with namespaces:  edge_labels, Size: 10 || labels, Size: 3 || tags, Size: 30522 || Non Padded Namespaces: {'*tags', 'labels'}

# TODO now, GraphMatchingNet

In [30]:
dim_encoder = 300
dim_embedder = 768
dim_matching = 44

In [37]:
from src.modules import *

In [38]:
Graph2GraphEncoder.list_available()

['gat', 'rgcn']

In [39]:
GraphPair2GraphPairEncoder.list_available()

['bimpm']

In [40]:
Graph2VecEncoder.list_available()

['global_attention']

In [41]:
GraphPair2VecEncoder.list_available()

['graph_embedding_net', 'graph_matching_net']

In [42]:
batch["g_p"].keys()

dict_keys(['edge_index', 'edge_attr', 'batch_id'])

In [43]:
transformer_embedder = PretrainedTransformerMismatchedEmbedder(model_name=config.TRANSFORMER_NAME)

In [44]:
gate_nn = torch.nn.Linear(300, 1)
node_nn = torch.nn.Linear(300, 300)

In [45]:
pooler = Graph2VecEncoder.by_name("global_attention")(gate_nn=gate_nn, nn=node_nn)
pooler

GlobalAttention(gate_nn=Linear(in_features=300, out_features=1, bias=True), nn=Linear(in_features=300, out_features=300, bias=True))

In [46]:
rgcn = Graph2GraphEncoder.by_name("rgcn")(in_channels=300, out_channels=300, aggr="add", num_relations=20)
rgcn

RGCNConv(300, 300, num_relations=20)

In [47]:
#gen = GraphPair2VecEncoder.by_name("graph_matching_net")(convs=rgcn, num_layers=3, pooler=pooler) # this is a constructor
#gen

In [48]:
from allennlp.modules.bimpm_matching  import BiMpmMatching#from allennlp.common  import Params
from allennlp.common import Params

match = BiMpmMatching.from_params(
    params = Params({
        "hidden_dim" : 300,
        "num_perspectives" : 10,
        "share_weights_between_directions" : False,
        "with_full_match" : False,
        "with_maxpool_match" :  True,
        "with_attentive_match" : True,
        "with_max_attentive_match" : True,
    })
)



In [49]:
from src.modules.graph_pair2graph_pair_encoders.graph_pair_mpm import GraphPairMPM
graph_bimpm = GraphPairMPM(bimpm=match)
graph_bimpm._dim_match

44

In [50]:
upd = NodeUpdater.by_name("gru")(input_size=dim_encoder+graph_bimpm._dim_match, hidden_size=dim_encoder)

In [51]:
from allennlp.modules import FeedForward
from allennlp.nn import Activation
projector = FeedForward(768, 1, 300, Activation.by_name("linear")(), 0.0)
classifier = FeedForward(300*4, 2, [300, 3], Activation.by_name("relu")(), 0.0)

In [52]:
gmn = GraphPair2VecEncoder.by_name("graph_matching_net")(
    num_layers = 3,
    convs = rgcn, 
    atts = graph_bimpm,
    updater = upd,  
    pooler =  pooler,
)

In [53]:
#print(model._modules) # no encoder of GraphMatchingNet
print(isinstance(gmn, torch.nn.Module))

True


In [54]:
model = SynNLIModel(
    vocab=vocab,
    embedder=transformer_embedder,
    projector=projector,
    encoder=gmn,
    classifier=classifier,
)
model

SynNLIModel(
  (embedder): PretrainedTransformerMismatchedEmbedder(
    (_matched_embedder): PretrainedTransformerEmbedder(
      (transformer_model): BertModel(
        (embeddings): BertEmbeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (token_type_embeddings): Embedding(2, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): BertEncoder(
          (layer): ModuleList(
            (0): BertLayer(
              (attention): BertAttention(
                (self): BertSelfAttention(
                  (query): Linear(in_features=768, out_features=768, bias=True)
                  (key): Linear(in_features=768, out_features=768, bias=True)
                  (value): Linear(in_features=768, out_features=768, bias=True)
                  (dropout): Dropout(p=0.1, inplace=False)
                )

In [55]:
model(**batch)

{'probs': tensor([[0.5000, 0.5078, 0.4982],
         [0.5000, 0.4922, 0.5018]], grad_fn=<SoftmaxBackward>),
 'loss': tensor(1.0704, grad_fn=<NllLossBackward>)}

In [56]:
loader2.batch_size

2

In [57]:
def recursive_to_device(data, device):
    for k in data.keys():
        if isinstance(data[k], dict):
            recursive_to_device(data[k], device)
        else:
            data[k] = data[k].to(device)

In [44]:
recursive_to_device(batch, "cuda")

In [None]:
model.to("cuda")

In [None]:
model(**batch)

In [None]:
for n in model.to("cpu").named_parameters():
    if n[:7] == "encoder":
        print(n)