# Part 1 - Question Answering

For the first part, use the Hugging Face question-answering pipeline and feed it with the five 300-word long sections from the book of your choice that you analyzed in Project 1.

These sections should be selected so they are: introducing the protagonist(s), the antagonist, the crime and crime scene, any significant evidence, and the resolution of the crime/a narrative that presents the case against the perpetrator.

For a prompt, Implement a simple prompt interface that takes in your question, runs it against the model, and returns the answer. You don't need to do anything special about this, just a simple console I/O interface without any complicated error handling. It is up to you how you want to upload the context to the model (pre-loaded into your program, on-demand, etc.).

The questions you should ask are about the identity and characteristics of the protagonist, antagonist/perpetrator, the nature and the setting of the crime or crime scene, the evidence, and the case against the perpetrator.

Document the questions, ask the questions, and document the specificity and accuracy of the results.

Part 1.2 - use two different HF QA models: use the default question-answering pipeline, then use other models of choice and discuss the differences in the result.

https://huggingface.co/docs/transformers/main_classes/pipelines

https://huggingface.co/docs/transformers/v4.35.0/en/main_classes/pipelines#transformers.QuestionAnsweringPipeline


In [23]:
from typing import Optional
from transformers import pipeline
import torch
from pprint import pprint


## Prompt Interface

For a prompt, Implement a simple prompt interface that takes in your question, runs it against the model, and returns the answer.

You don't need to do anything special about this, just a simple console I/O interface without any complicated error handling.

It is up to you how you want to upload the context to the model (pre-loaded into your program, on-demand, etc.).


In [24]:
def run(
    question: str,
    context: str,
    model: Optional[str] = None,
    **kwargs,
):
    print("=" * 100)
    print(f"model: {model}")
    for k, v in kwargs.items():
        print(f"{k}: {v}")
    print("~" * 80)

    # Construct Pipeline

    device = "cuda" if torch.cuda.is_available() else "cpu"
    pipe = pipeline(
        "question-answering",
        model=model,
        # model="deepset/roberta-base-squad2",
        device=device,
    )

    # Run Pipeline

    question = question.strip()
    context = context.strip()

    print(f"C: {context}")
    print(f"Q: {question}")
    # display(Markdown(f"**Q:** {question}"))
    # display(Markdown(f"**C:** {context}"))

    res = pipe(
        question=question,
        context=context,
        **kwargs,
    )
    # pprint(res)

    answer, score = "idk", 1.0

    # Get the result
    if res and isinstance(res, dict):
        answer = res.get("answer", "idk")
        score = res.get("score", 1.0)

    answer = answer.strip()
    score = round(score, 3)

    print(f"A: {answer} (score: {round(score, 3)})")
    # display(Markdown(f"**A:** {answer} (score: {score})"))

    return res


def run_models(question: str, context: str, models: list[str], **kwargs):
    for model in models:
        run(question, context, model=model, **kwargs)


### Testing Prompt Interface

Models: https://huggingface.co/models?pipeline_tag=question-answering&sort=trending


In [25]:
test_ctx = """
Sherlock Holmes took his bottle from the corner of the mantel-piece and
his hypodermic syringe from its neat morocco case. With his long,
white, nervous fingers he adjusted the delicate needle, and rolled back
his left shirt-cuff. For some little time his eyes rested thoughtfully
upon the sinewy forearm and wrist all dotted and scarred with
innumerable puncture-marks. Finally he thrust the sharp point home,
pressed down the tiny piston, and sank back into the velvet-lined
arm-chair with a long sigh of satisfaction.
"""

test_q = """
What's my name?
"""


In [26]:
# Default - "distilbert-base-uncased-distilled-squad"
_ = run(test_q, test_ctx)


No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


model: None
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: Sherlock Holmes took his bottle from the corner of the mantel-piece and
his hypodermic syringe from its neat morocco case. With his long,
white, nervous fingers he adjusted the delicate needle, and rolled back
his left shirt-cuff. For some little time his eyes rested thoughtfully
upon the sinewy forearm and wrist all dotted and scarred with
innumerable puncture-marks. Finally he thrust the sharp point home,
pressed down the tiny piston, and sank back into the velvet-lined
arm-chair with a long sigh of satisfaction.
Q: What's my name?
A: Sherlock Holmes (score: 0.761)


In [27]:
_ = run(test_q, test_ctx, model="deepset/roberta-base-squad2")


model: deepset/roberta-base-squad2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: Sherlock Holmes took his bottle from the corner of the mantel-piece and
his hypodermic syringe from its neat morocco case. With his long,
white, nervous fingers he adjusted the delicate needle, and rolled back
his left shirt-cuff. For some little time his eyes rested thoughtfully
upon the sinewy forearm and wrist all dotted and scarred with
innumerable puncture-marks. Finally he thrust the sharp point home,
pressed down the tiny piston, and sank back into the velvet-lined
arm-chair with a long sigh of satisfaction.
Q: What's my name?
A: Sherlock Holmes (score: 0.114)


In [28]:
run_models(
    test_q,
    test_ctx,
    models=[
        "distilbert-base-uncased-distilled-squad",
        "deepset/roberta-base-squad2",
    ],
)


model: distilbert-base-uncased-distilled-squad
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: Sherlock Holmes took his bottle from the corner of the mantel-piece and
his hypodermic syringe from its neat morocco case. With his long,
white, nervous fingers he adjusted the delicate needle, and rolled back
his left shirt-cuff. For some little time his eyes rested thoughtfully
upon the sinewy forearm and wrist all dotted and scarred with
innumerable puncture-marks. Finally he thrust the sharp point home,
pressed down the tiny piston, and sank back into the velvet-lined
arm-chair with a long sigh of satisfaction.
Q: What's my name?
A: Sherlock Holmes (score: 0.99)
model: deepset/roberta-base-squad2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: Sherlock Holmes took his bottle from the corner of the mantel-piece and
his hypodermic syringe from its neat morocco case. With his long,
white, nervous fingers he adjusted the 

---
---

## Experiments & Results

For the first part, use the Hugging Face question-answering pipeline and feed it with the five 300-word long sections from the book of your choice that you analyzed in Project 1.

These sections should be selected so they are: **introducing the protagonist(s), the antagonist, the crime and crime scene, any significant evidence, and the resolution of the crime/a narrative that presents the case against the perpetrator.**

The questions you should ask are about the identity and characteristics of the protagonist, antagonist/perpetrator, the nature and the setting of the crime or crime scene, the evidence, and the case against the perpetrator.

Document the questions, ask the questions, and document the specificity and accuracy of the results.


In [29]:
# TODO: Try out a good selection of models and keep some interesting ones
models = [
    "distilbert-base-uncased-distilled-squad",
    "deepset/roberta-base-squad2",
]


---

### Section 1


In [30]:
s1 = """
My name Jeff.
"""


In [31]:
s1q1 = """
What's my name?
"""

run_models(s1q1, s1, models=models)


model: distilbert-base-uncased-distilled-squad
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: My name Jeff.
Q: What's my name?
A: Jeff (score: 0.987)
model: deepset/roberta-base-squad2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: My name Jeff.
Q: What's my name?
A: Jeff (score: 0.641)


In [32]:
# TODO: Add more cells and ask more questions on Section 1.


#### TODO: document the specificity and accuracy of the results


---

### Section 2


In [33]:
s2 = """
My name Jeff.
"""


In [34]:
s2q1 = """
What's my name?
"""

run_models(s2q1, s2, models=models)


model: distilbert-base-uncased-distilled-squad
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: My name Jeff.
Q: What's my name?
A: Jeff (score: 0.987)
model: deepset/roberta-base-squad2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: My name Jeff.
Q: What's my name?
A: Jeff (score: 0.641)


In [35]:
# TODO: Add more cells and ask more questions on Section 2.


#### TODO: document the specificity and accuracy of the results


---

### Section 3


In [36]:
s3 = """
My name Jeff.
"""


In [37]:
s3q1 = """
What's my name?
"""


run_models(s3q1, s3, models=models)


model: distilbert-base-uncased-distilled-squad
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: My name Jeff.
Q: What's my name?
A: Jeff (score: 0.987)
model: deepset/roberta-base-squad2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: My name Jeff.
Q: What's my name?
A: Jeff (score: 0.641)


In [38]:
# TODO: Add more cells and ask more questions on Section 3.


#### TODO: document the specificity and accuracy of the results


---

### Section 4


In [39]:
s4 = """
My name Jeff.
"""


In [40]:
s4q1 = """
What's my name?
"""


run_models(s4q1, s4, models=models)


model: distilbert-base-uncased-distilled-squad
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: My name Jeff.
Q: What's my name?
A: Jeff (score: 0.987)
model: deepset/roberta-base-squad2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: My name Jeff.
Q: What's my name?
A: Jeff (score: 0.641)


In [41]:
# TODO: Add more cells and ask more questions on Section 4.


#### TODO: document the specificity and accuracy of the results


---

### Section 5


In [42]:
s5 = """
My name Jeff.
"""


In [43]:
s5q1 = """
What's my name?
"""


run_models(s5q1, s5, models=models)


model: distilbert-base-uncased-distilled-squad
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: My name Jeff.
Q: What's my name?
A: Jeff (score: 0.987)
model: deepset/roberta-base-squad2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C: My name Jeff.
Q: What's my name?
A: Jeff (score: 0.641)


In [44]:
# TODO: Add more cells and ask more questions on Section 5.


#### TODO: document the specificity and accuracy of the results
