<a href="https://colab.research.google.com/github/alimoorreza/CS167-sp25-notes/blob/main/Day27c_LLM_question_answering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS167: Day27
## Deep Learning and Large Language Models (LLMs)


#### CS167: Machine Learning, Spring 2025


📜 [Syllabus](https://analytics.drake.edu/~reza/teaching/cs167_sp25/cs167_syllabus_sp25.pdf)

In [None]:
!pip install transformers



#Question Answering Model using Transformer-based large language model (LLM)
>**many-to-one** mapping machine learning task:
<div>
<img src="https://analytics.drake.edu/~reza/teaching/cs167_sp25/notes/images/many_to_one_ML_task.png?raw=1" width=200/>
</div>


In [None]:
from transformers import pipeline
qa_nlp_model = pipeline("question-answering") # Uses DistilBERT model by default (on 05/06/24 in Reza's machine) distilbert/distilbert-base-cased-distilled-squad


No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [None]:
context = r"""The Taj Mahal is an iconic ivory-white marble mausoleum located in Agra, India. ...
           It was commissioned in 1631 by the Mughal emperor Shah Jahan to serve as the tomb for his beloved wife Mumtaz Mahal. ...
           The mausoleum is the centerpiece of a large complex that includes a mosque, guest house, and formal gardens surrounded by walls. ...
           The Taj Mahal is considered a masterpiece of Mughal architecture and is one of the most famous buildings in the world. ..."""

question_sentences = ["What is Taj Mahal?", \
                      "Who commissioned Taj Mahal?", \
                      "What Taj Mahal is famous for?"]

for ii in range(len(question_sentences)):
  result = qa_nlp_model(question=question_sentences[ii], context=context)

  verbose = 1
  if verbose:
    print(f"Answer: '{result['answer']}")
  else:
    print(f"Answer: '{result['answer']}, score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: 'an iconic ivory-white marble mausoleum
Answer: 'Mughal emperor Shah Jahan
Answer: 'Mughal architecture


Let's pick a particular pre-trained model. To ensure we can easily load different models using the same PyTorch commands, we can use this [AutoModelForQuestionAnswering](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForQuestionAnswering) module.Let's try first __BERT (Bidirectional Encoder Representations from Transformers)__

In [None]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

#model_name            = "bert-large-uncased-whole-word-masking-finetuned-squad"  # BERT
model_name           = "mrm8488/longformer-base-4096-finetuned-squadv2"
#model_name           = "mrm8488/longformer-base-4096-finetuned-squadv2"

tokenizer             = AutoTokenizer.from_pretrained(model_name)
model                 = AutoModelForQuestionAnswering.from_pretrained(model_name)

context = r"""The Taj Mahal is an iconic ivory-white marble mausoleum located in Agra, India. ...
           It was commissioned in 1631 by the Mughal emperor Shah Jahan to serve as the tomb for his beloved wife Mumtaz Mahal. ...
           The mausoleum is the centerpiece of a large complex that includes a mosque, guest house, and formal gardens surrounded by walls. ...
           The Taj Mahal is considered a masterpiece of Mughal architecture and is one of the most famous buildings in the world. ..."""

question_sentences = ["What is Taj Mahal?", \
                      "Who commissioned Taj Mahal?", \
                      "What Taj Mahal is famous for?"]



for question in question_sentences:
  inputs              = tokenizer(question, context, add_special_tokens=True, return_tensors="pt")
  input_ids           = inputs["input_ids"].tolist()[0]
  #text_tokens        = tokenizer.convert_ids_to_tokens(input_ids)

  output              = model(**inputs)
  answer_start_scores = output['start_logits'].detach()
  answer_end_scores   = output['end_logits'].detach()

  answer_start        = torch.argmax(answer_start_scores)    # get the most likely beginning of answer with the argmax of the score
  answer_end          = torch.argmax(answer_end_scores) + 1  # get the most likely end of answer with the argmax of the score
  answer              = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
  print(f"Question: {question}")
  print(f"Answer: {answer}")


Some weights of the model checkpoint at mrm8488/longformer-base-4096-finetuned-squadv2 were not used when initializing LongformerForQuestionAnswering: ['longformer.pooler.dense.bias', 'longformer.pooler.dense.weight']
- This IS expected if you are initializing LongformerForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LongformerForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Question: What is Taj Mahal?
Answer:  an iconic ivory-white marble mausoleum
Question: Who commissioned Taj Mahal?
Answer:  Mughal emperor Shah Jahan
Question: What Taj Mahal is famous for?
Answer:  masterpiece of Mughal architecture


BERT is just one of the many LLM pretrained models for quetion answering. [Huggingface hosts a repository of these pretrained models](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForQuestionAnswering). There several others:


*   [LongFormer](https://huggingface.co/mrm8488/longformer-base-4096-finetuned-squadv2)
*   [DistilBERT](https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad)
*   [RoBERTa](https://huggingface.co/deepset/roberta-base-squad2)
*   [ALBERT](https://huggingface.co/twmkn9/albert-base-v2-squad2)
*   [XLM RoBERTa](https://huggingface.co/TunahanGokcimen/Question-Answering-xlm-roberta-base)
*   [MobileBERT](https://huggingface.co/csarron/mobilebert-uncased-squad-v2)                    
*   [XLNet](https://huggingface.co/xlnet/xlnet-base-cased)
*   [T5](https://huggingface.co/sjrhuschlee/flan-t5-base-squad2)
*   [GPT2](https://huggingface.co/openai-community/gpt2)


There are a few more recent extra-large models (more than 8GB-10GB) such as BLOOM and Llama. You could try those later if you want.
*   [BLOOM](https://huggingface.co/bigscience/bloom)
*   [Llama2](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat)

# __Group Activty#3__
> 1. Try another pretrained model __RoBERTa, LongFormer, ALBERT, XLM RoBERTa, MobileBERT, T5, GPT2__ and redo the Question-Answering task.
> 2. Try out your pre-trained model by asking it some additional questions based on the provided information.
> 3. Now, expand the context by adding a few more sentences and check if the model can still answer your questions accurately.
> 4. Find the size (in megabytes)of each pretrained model to gain insight into its scale and magnitude.
> 5. Examine and document the variations in responses generated by distinct pre-trained models.


In [None]:
# your code here
# ...





Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Question: What is Taj Mahal?
Answer: an iconic ivory - white marble mausoleum
Question: Who commissioned Taj Mahal?
Answer: shah jahan
Question: What Taj Mahal is famous for?
Answer: mughal architecture and is one of the most famous buildings in the world


So far, we have only worked with a pre-trained model, which worked just fine. If you want to create your own pretrained model using your own Question Answering dataset, you can fine-tune a transformer model.

For fine-tuning a Question Answering Model, curious students can explore this Notebook from Huggingface further.
> [Question Answering model fine-tuning](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)