##**Entrenamiento de un transformer para Q&A**
Equipo 5:
*   Héctor Manuel Cárdenas Yáñez
*   Alejandro Pizarro Chávez
*   Diego Rosas
*   Fausto Alejandro Palma Cervantes
*   Alan Ricardo Vilchis Arceo

##Corpus: El principito
Novela de Antoine de Saint-Exupéry


##Instalar bibliotecas necesarias

In [1]:
!pip install transformers



In [2]:
import requests

# Download the corpus
url = "https://www.gutenberg.org/cache/epub/5200/pg5200.txt"
response = requests.get(url)
corpus_text = response.text

# Let's check the beginning of the corpus to ensure it's loaded correctly
print(corpus_text[:500])  # Printing the first 500 characters of the corpus

﻿The Project Gutenberg eBook of Metamorphosis
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook


In [None]:
from transformers import BertForQuestionAnswering, BertTokenizer
import torch
import time

# Load pre-trained model and tokenizer
model_name = 'bert-large-uncased-whole-word-masking-finetuned-squad'
model = BertForQuestionAnswering.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)

# Tokenize the corpus in chunks
chunk_size = 2000
chunks = [corpus_text[i:i + chunk_size] for i in range(0, len(corpus_text), chunk_size)]

# Ask 10 questions regarding the corpus
questions = [
    "What is the title of the book?",
    "Who is the main character in 'The Metamorphosis'?",
    "What does Gregor turn into at the beginning of the story?",
    "How do Gregor's family members react to his transformation?",
    "What job does Gregor have?",
    "What does Gregor enjoy doing in his free time before his transformation?",
    "Who is Grete in relation to Gregor?",
    "Where does most of the story take place?",
    "How does Gregor communicate after his transformation?",
    "What happens to Gregor at the end?"
]


for question in questions:
    answers = []
    for chunk in chunks:
        inputs = tokenizer(question, chunk, return_tensors='pt', max_length=512, truncation=True)
        with torch.no_grad():
            outputs = model(**inputs)
        start_scores = outputs.start_logits
        end_scores = outputs.end_logits

        # Find the tokens with the highest start and end scores
        answer_start = torch.argmax(start_scores)
        answer_end = torch.argmax(end_scores)

        # Decode the tokens and remove extra spaces and special tokens
        answer = tokenizer.decode(inputs['input_ids'][0][answer_start:answer_end+1])
        answer = answer.replace('[CLS]', '').replace('[SEP]', '').strip()
        answers.append(answer)

    # Select the unique answers or use a different logic to choose the final answer
    unique_answers = list(set(answers))
    final_answer = " ".join(unique_answers[:5])  # Select top 5 unique answers
    print(f"Question: {question}")
    print(f"Answer: {final_answer}\n")

print("Seconds running model =", time.time())

config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Question: What is the title of the book?
Answer:  . project gutenberg trademark. if you do not charge anything for copies of this ebook, complying with the trademark license is very easy. you may use this ebook for nearly any purpose such as creation of derivative works, reports, performances and research. project gutenberg ebooks may be modified and printed and given away — you may do practically anything in the united states with ebooks not protected by u. s. copyright law. redistribution is subject to the trademark license, especially commercial redistribution. start : full license the full project gutenberg license please read this before you distribute or use this work to protect the project gutenberg™ mission of promoting the free distribution of electronic works, by using or distributing this work ( or any other work associated in any way with the phrase “ project gutenberg ” ), you agree to comply with all the terms of the full project gutenberg™ license available with this fil

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: Who is the main character in 'The Metamorphosis'?
Answer:  who is the main character in'the metamorphosis '? who is the main character in'the metamorphosis '?  you had to concede that it was possible. but as if in gruff reply to this question, the chief clerk ’ s firm footsteps in his highly polished boots could now be heard in the adjoining room. from the room on his right, gregor ’ s sister whispered to him to let him know : “ gregor, the chief clerk is here. ” “ yes, i know ”, said gregor to himself ; but without daring to raise his voice loud enough for his sister to hear him. “ gregor ”, said his father now from the room to his left, “ the chief clerk has come round and wants to know why you didn ’ t leave on the early train. we don ’ t know what to say to him. and anyway, he wants to speak to you personally. so please open up this door. i ’ m sure he ’ ll be good enough to forgive the untidiness of your room. ” then the chief clerk called “ good morning, mr. samsa ”. “ 

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: What does Gregor turn into at the beginning of the story?
Answer:  . what does gregor turn into at the beginning of the story? a horrible vermin junior salesman to a travelling representative



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: How do Gregor's family members react to his transformation?
Answer:  how do gregor's family members react to his transformation? unhappy silence gregor ’ s mother and sister would now leave their work where it was and sit close together, cheek to cheek . mr. samsa merely opened his eyes wide and briefly nodded to him several times. at that, and without delay, the man actually did take long strides into the front hallway ; his two friends had stopped rubbing their hands some time before and had been listening to what was being said. now they jumped off after their friend as if taken with a sudden fear that mr. samsa might go into the hallway in front of them and break the connection with their leader. once there, all three took their hats from the stand, took their sticks from the holder, bowed without a word and left the premises. mr. samsa and the two women followed them out onto the landing ; but they had had no reason to mistrust the men ’ s intentions and as they leaned o

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: What does Gregor enjoy doing in his free time before his transformation?
Answer:  . crawling about what does gregor enjoy doing in his free time before his transformation? working with his fretsaw



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: Who is Grete in relation to Gregor?
Answer:  his sister . his sister had certainly left it there for him because of that, but he turned, almost against his own will, away from the dish and crawled back into the centre of the room. through the crack in the door, gregor could see that the gas had been lit in the living room. his father at this time would normally be sat with his evening paper, reading it out in a loud voice to gregor ’ s mother, and sometimes to his sister, but there was now not a sound to be heard. gregor ’ s sister her mother



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
