# Model Combo Use

To use this notebook you need first to train a NER classification model (go to our repo `Lab.HuggingFace-NER-Research` and follow the instructions) then you need a trained generation model (follow the `conditional_generationipynb` notebook).

In [1]:
import os
import json
from transformers import (
    BertForTokenClassification,
    BertTokenizer,
    AutoTokenizer,
    T5ForConditionalGeneration
)

from torch import cuda
import sys
sys.path.append('..')
from src.utils.utils import *

device = 'cuda' if cuda.is_available() else 'cpu'

ner_model_dir=os.path.join('../model/Bert')
t5_model_dir=os.path.join('../model/t5_20/t5-large')

with open(ner_model_dir+'/config.json', 'r', encoding='utf-8') as f:
    datastore = json.load(f)

label_list = dict((int(k), v) for k,v in datastore['id2label'].items())
special_tokens = dict((v, f'<|{v}|>') for _,v in datastore['id2label'].items())

t5_tokenizer = AutoTokenizer.from_pretrained('t5-base')
t5_model= T5ForConditionalGeneration.from_pretrained(t5_model_dir)
t5_model.to(device)

ner_model = BertForTokenClassification.from_pretrained(ner_model_dir)
ner_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

  from .autonotebook import tqdm as notebook_tqdm
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [2]:
def linearize_entities(entities):
    string=''
    for e in entities:
        label=special_tokens[e['label']]
        text=e['text']
        string=string + f'{label} {text} {label} '

    return string

In [5]:
text='Rabiot, Juventus, renewal the contract, Torino'

entities=get_entities(ner_model, ner_tokenizer, text, label_list)
input_text=linearize_entities(entities)
input_text

'<|PERSON|> rabiot <|PERSON|> <|CLUB|> juventus <|CLUB|> <|TRANSFER_MARKET|> renewal the contract <|TRANSFER_MARKET|> <|CLUB|> torino <|CLUB|> '

In [6]:
input_ids = t5_tokenizer(input_text, return_tensors="pt")
input_ids = input_ids.to(device)

generated_ids = t5_model.generate(input_ids = input_ids['input_ids'], attention_mask = input_ids['attention_mask'], max_length=256)
preds = [t5_tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]

print(preds)

['are set to sign Juventus contract, here we go The agreement has been reached and signed, contracts now signed between clubs. Torino will receive a percentage on future sale. Juve are now waiting for the final green light from the cl']
