In this tutorial, we'll use a pretrained transformer model to perform question answering (QA). The model will be given a context (a passage of text) and a question, and it will try to find the most relevant answer within the context.

# Import the necessary libraries:

In [7]:
from transformers import pipeline


# Create a question-answering pipeline:
We'll initialize a pipeline for question-answering using a pre-trained model.

In [2]:
question_answerer = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


# Provide context and ask a question:
You’ll need a context paragraph (the passage in which the model should search for answers) and a specific question.

In [3]:
context = """
Transformers are a type of machine learning model introduced in 2017. They use self-attention mechanisms to
process input data. Since their introduction, they have achieved state-of-the-art performance in various natural
language processing tasks like machine translation, text summarization, and question answering. The transformer
architecture led to the creation of models like BERT, GPT, and others.
"""

question = "What tasks are transformers used for?"


# Get the answer:
Use the question_answerer() function to get the answer based on the context and question.

**Parameters:**

*   context: The passage where the model will search for the answer.
*   question: The question the model will try to answer based on the context.

In [4]:
answer = question_answerer(question=question, context=context)
print("Answer:", answer['answer'])

Answer: machine translation, text summarization, and question answering



# Fine-Tuning a Pretrained Model for Question Answering

In this section, we'll demonstrate how to fine-tune a pretrained model like `distilbert-base-cased-distilled-squad` for question-answering tasks on a custom dataset.

Fine-tuning the model on your own data helps it adapt to domain-specific language and questions.


In [2]:
! pip install datasets # if necessary

Collecting datasets
  Downloading datasets-3.0.0-py3-none-any.whl.metadata (19 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.0.0-py3-none-any.whl (474 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.3/474.3 kB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (39.9 MB)
[2K

In [3]:
# Step 1: Load a custom question-answering dataset
# We'll use the SQuAD dataset here as an example. Replace this with your own dataset if needed.
from datasets import load_dataset

dataset = load_dataset('squad', split='train[:1%]')

# Step 2: Load the pretrained model and tokenizer
from transformers import DistilBertTokenizer, DistilBertForQuestionAnswering

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased')
model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-cased')

# Step 3: Preprocess the dataset for fine-tuning
def preprocess_data(examples):
    inputs = tokenizer(
        examples['context'],
        examples['question'],
        truncation=True,
        padding='max_length',
        max_length=384
    )

    # Tokenize the answer separately
    answers = examples['answers']
    start_positions = []
    end_positions = []

    for i, answer in enumerate(answers):
        answer_text = answer['text'][0]
        start_char = answer['answer_start'][0]

        # Tokenize the context
        context = examples['context'][i]
        tokenized_context = tokenizer(context, truncation=True, padding='max_length', max_length=384)

        # Tokenize the answer
        tokenized_answer = tokenizer(answer_text, truncation=True, padding='max_length', max_length=384)

        # Find the token indices corresponding to the start and end of the answer
        start_pos = None
        end_pos = None

        # Loop through the tokenized context and look for the answer tokens
        for idx in range(len(tokenized_context['input_ids']) - len(tokenized_answer['input_ids']) + 1):
            if tokenized_context['input_ids'][idx:idx + len(tokenized_answer['input_ids'])] == tokenized_answer['input_ids']:
                start_pos = idx
                end_pos = idx + len(tokenized_answer['input_ids']) - 1
                break

        if start_pos is None or end_pos is None:
            start_pos = 0
            end_pos = 0

        start_positions.append(start_pos)
        end_positions.append(end_pos)

    inputs.update({
        'start_positions': start_positions,
        'end_positions': end_positions
    })

    return inputs



train_data = dataset.map(preprocess_data, batched=True, remove_columns=['id', 'title', 'context', 'question', 'answers'])

# Step 4: Define training arguments and initialize Trainer
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',
evaluation_strategy="no",  # Disable evaluation
    learning_rate=3e-5,
    per_device_train_batch_size=8,
    num_train_epochs=1
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data
)

# Step 5: Fine-tune the model
trainer.train()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/7.62k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/87599 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10570 [00:00<?, ? examples/s]

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/263M [00:00<?, ?B/s]

Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/876 [00:00<?, ? examples/s]

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pai

Step,Training Loss


TrainOutput(global_step=110, training_loss=0.590543642911044, metrics={'train_runtime': 35.3129, 'train_samples_per_second': 24.807, 'train_steps_per_second': 3.115, 'total_flos': 85839086721024.0, 'train_loss': 0.590543642911044, 'epoch': 1.0})

# Now check model's performance after fine-tuning

In [8]:
model_path = '/content/results/checkpoint-110'
model_fine_tuned = DistilBertForQuestionAnswering.from_pretrained(model_path)
tokenizer_fine_tuned = DistilBertTokenizer.from_pretrained('distilbert-base-cased')

question_answerer_fine_tuned = pipeline("question-answering", model=model_fine_tuned, tokenizer=tokenizer_fine_tuned)


new_context = "This is a new context for testing the fine-tuned model."
new_question = "What is the main topic of this context?"
answer_fine_tuned = question_answerer_fine_tuned(question=new_question, context=new_context)
print("Fine-tuned model answer:", answer_fine_tuned['answer'])


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
  self.pid = os.fork()


Fine-tuned model answer: model.


# Exercise Questions:

*   Change the context to a different passage (such as one from a news article or a technical document). Does the model still provide accurate answers?
*   Modify the script to allow the user to ask multiple questions about the same context without restarting the program. What changes did you make to achieve this?
*   Experiment with different pretrained QA models from Hugging Face (e.g., bert-large-uncased-whole-word-masking-finetuned-squad). How does the performance change? Which model gives the best results in your experiments?





In [10]:
context = """
In recent years, artificial intelligence (AI) has made significant advancements in various fields. Machine learning, a subset of AI, has been widely used in image recognition, natural language processing, and recommendation systems. Deep learning, a more advanced form of machine learning, has achieved remarkable results in tasks such as image classification, speech recognition, and natural language understanding.
"""

question = "What are some applications of deep learning in AI?"

answer = question_answerer_fine_tuned(question=question, context=context)
print("Answer:", answer['answer'])

Answer: understanding.


In [14]:
context = input("Please enter the context: ")

while True:
    question = input("Please enter your question (or type 'exit' to stop): ")
    if question.lower() == 'exit':
        break
    answer = question_answerer(context, question)
    print(f"\nAnswer: {answer}")

Please enter the context: In recent years, artificial intelligence (AI) has made significant advancements in various fields. Machine learning, a subset of AI, has been widely used in image recognition, natural language processing, and recommendation systems. Deep learning, a more advanced form of machine learning, has achieved remarkable results in tasks such as image classification, speech recognition, and natural language understanding.
Please enter your question (or type 'exit' to stop): exit
