<a href="https://colab.research.google.com/github/sergiomora03/AdvancedTopicsAnalytics/blob/main/exercises/E7-QuestionAnswer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Question & Answer

Creating a Question-Answer Transformer model or QA Transformer can be beneficial for several reasons, particularly in the field of Natural Language Processing (NLP). Here are some compelling reasons why you might want to develop a QA Transformer:

1. **Question-Answering Systems:** QA Transformers are designed to provide accurate and contextually relevant answers to questions posed in natural language. These systems have a wide range of practical applications, including chatbots, virtual assistants, customer support, and information retrieval.

2. **Information Retrieval:** QA Transformers can be used to search through large corpora of text and extract precise answers to user queries. This can improve the efficiency and effectiveness of information retrieval systems.

3. **Document Summarization:** QA Transformers can be used to summarize long documents by answering questions about the document's content. This makes it easier for users to quickly understand the key points and relevant information in a text.

4. **Education and E-Learning:** QA Transformers can be integrated into educational platforms to provide instant answers and explanations to students' questions. They can also help with the automatic generation of quiz questions and answers.

5. **Content Generation:** QA Transformers can assist in content generation by automatically answering questions based on available knowledge. This can be useful for generating FAQs, product descriptions, and informative articles.

6. **Customer Support:** Many companies use QA systems to automate responses to frequently asked questions, freeing up human agents to handle more complex queries and providing customers with quick solutions.

7. **Medical Diagnosis:** QA Transformers can assist medical professionals by answering questions related to patient records, medical literature, and diagnostic information, potentially leading to faster and more accurate diagnoses.

8. **Legal and Compliance:** In the legal field, QA Transformers can be used to search and extract information from legal documents, assisting lawyers in their research and case preparation.

9. **Language Translation:** QA Transformers can be used to answer questions about language translation, helping users understand the meaning of words, phrases, or sentences in different languages.

10. **Scientific Research:** QA Transformers can support researchers by answering questions related to scientific literature, allowing them to quickly access relevant information for their studies.

11. **Decision Support:** QA Transformers can aid in decision-making processes by providing answers to questions related to data analysis, market research, and business intelligence.

12. **Accessibility:** QA Transformers can improve accessibility for individuals with disabilities by providing spoken or written answers to their questions, helping them access information more easily.

Overall, QA Transformers have the potential to enhance information retrieval, automation, and user interaction in various domains, making them a valuable tool in the development of intelligent systems and applications. The ability to provide accurate and context-aware answers to questions in natural language is a key advantage of these models.


---

Exercise:

Now, as a data scientist expert in NLP, you are asked to create a model to be able to answer question in Spanish. Your stakeholders will pass you an article and one question and your model should answer it.


In [1]:
!pip install requests beautifulsoup4



In [1]:
import requests
from bs4 import BeautifulSoup

# URL del artículo
url = "https://time.com/collection/time100-ai/6309026/geoffrey-hinton/"

# Realizar una solicitud HTTP para obtener el contenido de la página
response = requests.get(url)

# Verificar si la solicitud fue exitosa
if response.status_code == 200:
    # Analizar el contenido HTML de la página con BeautifulSoup
    soup = BeautifulSoup(response.text, "html.parser")

    # Encontrar el contenido del artículo (puedes inspeccionar el HTML de la página para encontrar la estructura adecuada)
    article_content = soup.find("div", {"class": "article-content"})

    # Extraer el texto del artículo
    article_text = ""
    for paragraph in article_content.find_all("p"):
        article_text += paragraph.get_text() + "\n"

    # Imprimir el texto del artículo
    print(article_text)
else:
    print("Error al obtener la página:", response.status_code)

Over the course of February, Geoffrey Hinton, one of the most influential AI researchers of the past 50 years, had a “slow eureka moment.”
Hinton, 76, has spent his career trying to build AI systems that model the human brain, mostly in academia before joining Google in 2013. He had always believed that the brain was better than the machines that he and others were building, and that by making them more like the brain, they would improve. But in February, he realized “the digital intelligence we’ve got now may be better than the brain already. It’s just not scaled up quite as big.” 
Developers around the world are currently racing to build the biggest AI systems that they can. Given the current rate at which AI companies are increasing the size of models, it could be less than five years until AI systems have 100 trillion connections—roughly as many as there are between neurons in the human brain.
Alarmed, Hinton left his post as VP and engineering fellow in May and gave a flurry of in

# **Traducción de Texto**


In [None]:
# pip install transformers
# pip install sentencepiece
# pip install sacremoses

In [3]:
from transformers import pipeline
from nltk.tokenize import sent_tokenize

In [4]:
article = sent_tokenize(
    article_text
)  # Se sacan oraciones para que traduzca todo el articulo ya que es muy larga la cantidad de tokens entregada y toca truncar la data
print(len(article))
article[:5]

40


['Over the course of February, Geoffrey Hinton, one of the most influential AI researchers of the past 50 years, had a “slow eureka moment.”\nHinton, 76, has spent his career trying to build AI systems that model the human brain, mostly in academia before joining Google in 2013.',
 'He had always believed that the brain was better than the machines that he and others were building, and that by making them more like the brain, they would improve.',
 'But in February, he realized “the digital intelligence we’ve got now may be better than the brain already.',
 'It’s just not scaled up quite as big.” \nDevelopers around the world are currently racing to build the biggest AI systems that they can.',
 'Given the current rate at which AI companies are increasing the size of models, it could be less than five years until AI systems have 100 trillion connections—roughly as many as there are between neurons in the human brain.']

In [5]:
# Se entrena modelo con traductor inglés - español
translator = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")
# Se itera sobre todas las oraciones para traducir todo el artículo y no tener que truncarlo o truncarlo lo menor posible.
articulo_espanol = []
for oracion in article:
    articulo_es = translator(
        oracion, clean_up_tokenization_spaces=True, truncation=True
    )
    print(articulo_es[0]["translation_text"])
    articulo_espanol.append(articulo_es[0]["translation_text"])

All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-en-es.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.


A lo largo de febrero, Geoffrey Hinton, uno de los investigadores de IA más influyentes de los últimos 50 años, tuvo un “momento eureka lento.” Hinton, de 76 años, ha pasado su carrera tratando de construir sistemas de IA que modelen el cerebro humano, sobre todo en la academia antes de unirse a Google en 2013.
Siempre había creído que el cerebro era mejor que las máquinas que él y otros estaban construyendo, y que al hacerlos más parecidos al cerebro, mejorarían.
Pero en febrero, se dio cuenta de que “la inteligencia digital que tenemos ahora puede ser mejor que el cerebro ya.
Los desarrolladores de todo el mundo están compitiendo actualmente para construir los sistemas de IA más grandes que puedan.
Dado el ritmo actual al que las compañías de IA están aumentando el tamaño de los modelos, podría pasar menos de cinco años hasta que los sistemas de IA tengan 100 billones de conexiones, aproximadamente tantas como haya entre las neuronas en el cerebro humano.
Alarmado, Hinton dejó su pue

In [6]:
articulo_espanol_texto_plano = " ".join(articulo_espanol)
articulo_espanol_texto_plano

'A lo largo de febrero, Geoffrey Hinton, uno de los investigadores de IA más influyentes de los últimos 50 años, tuvo un “momento eureka lento.” Hinton, de 76 años, ha pasado su carrera tratando de construir sistemas de IA que modelen el cerebro humano, sobre todo en la academia antes de unirse a Google en 2013. Siempre había creído que el cerebro era mejor que las máquinas que él y otros estaban construyendo, y que al hacerlos más parecidos al cerebro, mejorarían. Pero en febrero, se dio cuenta de que “la inteligencia digital que tenemos ahora puede ser mejor que el cerebro ya. Los desarrolladores de todo el mundo están compitiendo actualmente para construir los sistemas de IA más grandes que puedan. Dado el ritmo actual al que las compañías de IA están aumentando el tamaño de los modelos, podría pasar menos de cinco años hasta que los sistemas de IA tengan 100 billones de conexiones, aproximadamente tantas como haya entre las neuronas en el cerebro humano. Alarmado, Hinton dejó su pu

Se traduce la pregunta solicitada para poder ingresarla al modelo


In [7]:
question = "How is Geoffrey Hinton?"
pregunta_es = translator(question, clean_up_tokenization_spaces=True, truncation=True)
print(pregunta_es[0]["translation_text"])
pregunta_espanol = pregunta_es[0]["translation_text"]

¿Cómo está Geoffrey Hinton?


Debido a que la traducción no está englobando el 100% de lo que quiere decir la palabra en inglés se empleará traducción por humano en este caso para poder dar respuesta


In [10]:
pregunta_espanol_humano = "¿Cómo es Geoffrey Hinton?"

# **Question and Answering**


Debido a que tenemos dos preguntas (traducida por humano y por máquina), ambas serán incluidas en la prgeunta para conocer qué respuesta nos genera el modelo


In [8]:
import pandas as pd

Respuesta a pregunta por traducción por computador


In [9]:
reader = pipeline("question-answering")
respuesta = reader(question=pregunta_espanol, context=articulo_espanol_texto_plano)
pd.DataFrame([respuesta])

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 473/473 [00:00<?, ?B/s] 
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading model.safetensors: 100%|██████████| 261M/261M [00:43<00:00, 5.98MB/s] 
All PyTorch model weights were used when initializing TFDistilBertForQuestionAnswering.

All the weights of TFDistilBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForQuestionAnswering for

Unnamed: 0,score,start,end,answer
0,0.423527,0,21,A lo largo de febrero


In [11]:
respuesta_h = reader(
    question=pregunta_espanol_humano, context=articulo_espanol_texto_plano
)
pd.DataFrame([respuesta_h])

Unnamed: 0,score,start,end,answer
0,0.761458,0,21,A lo largo de febrero


Se muestra que a pesar de ser distintas las preguntas, el modelo genera la misma respuesta en ambas ocasiones.


# Conclusiones

El modelo pudo realizar la traducción tanto de la pregunta como del texto y finalmente respondió las preguntas realizadas aunque no haya sido un 100% acertada.