### TF HUB

In questo notebook andremo ad esplorare un altro repository molto interessante per creare le nostre pipeline di NLP: Tensorflow HUB.
Useremo sempre Q&A come esempio principale, ma ci accorgeremo che non sarà immediato come huggingface.

In [1]:
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import tensorflow_text

questions = ["What is your age?"]
responses = ["I am 20 years old.", "good morning"]
response_contexts = ["I will be 21 next year.", "great day."]

module = hub.load('https://tfhub.dev/google/universal-sentence-encoder-multilingual-qa/3')

question_embeddings = module.signatures['question_encoder'](
            tf.constant(questions))

response_embeddings = module.signatures['response_encoder'](
        input=tf.constant(responses),
        context=tf.constant(response_contexts))

np.inner(question_embeddings['outputs'], response_embeddings['outputs'])

array([[0.40883988, 0.08877401]], dtype=float32)

In [6]:
import json

In [7]:
dev_set = json.load(open("./SQUAD/dev-v2.0.json"))

In [10]:
dev_collection = []
for passage in dev_set["data"]:
    for paragraph in passage["paragraphs"]:
        text = paragraph["context"]
        for qa in paragraph["qas"]:
            question = qa["question"]
            answers_list = []
            for answer in qa["answers"]:
                ans = answer["text"]
                answers_list.append(ans)
            dev_collection.append({
                "text": text,
                "question": question,
                "answers": answers_list
            })

In [11]:
dev_collection[0]

{'text': 'The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse ("Norman" comes from "Norseman") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.',
 'question': 'In what country is Normandy located?',
 'answers': ['France', 'France', 'France', 'France']}

### Spacy

Spacy è una libreria estremamente utile, production ready, che raccoglie moltissime delle operazioni di NLP e le mette a disposizione dell'utente con un'interfaccia facile ed inuitiva, aiutandoci non solo durante la fase di protyping ma anche durante la messa in produzione dei nostri modelli.

In [17]:
from spacy.lang.en import English

nlp = English()
nlp.add_pipe("sentencizer")

<spacy.pipeline.sentencizer.Sentencizer at 0x7f13873bb9c0>

In [40]:
doc = nlp(dev_collection[0]["text"])
candidate_answers = []
for sentence in doc.sents:
    for sentence_piece in sentence.text.split(","):
        candidate_answers.append(sentence_piece)

In [41]:
questions = [dev_collection[0]["question"]]
responses = candidate_answers[:]
response_contexts = [dev_collection[0]["text"] for _ in responses]

question_embeddings = module.signatures['question_encoder'](
            tf.constant(questions))
response_embeddings = module.signatures['response_encoder'](
        input=tf.constant(responses),
        context=tf.constant(response_contexts))

response = np.inner(question_embeddings['outputs'], response_embeddings['outputs'])
print(response.shape)
response

(1, 10)


array([[0.5253961 , 0.4790218 , 0.38074923, 0.43492612, 0.2225343 ,
        0.19773774, 0.2120413 , 0.20404881, 0.27391946, 0.16583326]],
      dtype=float32)

In [42]:
# Guardiamo alle top 3 responses
for arg_idx in np.argsort(-response[0])[:3]:
    print(arg_idx, responses[arg_idx])

0 The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy
1  a region in France.
3  Iceland and Norway who


In [38]:
from tqdm import tqdm

exact_match = 0
soft_match = 0
for dev_qa in tqdm(dev_collection):
    candidate_answers = []
    doc = nlp(dev_qa["text"])
    
    for sentence in doc.sents:
        for sentence_piece in sentence.text.split(","):
            candidate_answers.append(sentence_piece)
            
    questions = [dev_qa["question"]]

    response_contexts = [dev_qa["text"] for _ in candidate_answers]
    

    question_embeddings = module.signatures['question_encoder'](
                tf.constant(questions))
    
    response_embeddings = module.signatures['response_encoder'](
            input=tf.constant(candidate_answers),
            context=tf.constant(response_contexts))
    
    response = np.inner(question_embeddings['outputs'], response_embeddings['outputs'])
    
    # Guardiamo la top answer
    top_answer = candidate_answers[ np.argsort(response[0])[-1] ]
    
    if top_answer in dev_qa["answers"]:
        exact_match += 1
    for ans in dev_qa["answers"]:
        if ans in top_answer or top_answer in ans:
            soft_match += 1

print(f"Exact Match score: {exact_match / len(dev_collection)}")  
print(f"Soft Match score: {soft_match / len(dev_collection)}") 

100%|██████████| 11873/11873 [12:41<00:00, 15.59it/s]

Exact Match score: 0.0015160448075465342
Soft Match score: 0.8888233807799208



