# Distilbert Base Cased Distilled Squad
### Question answering
- Meet **`DistilBERT Base Cased Distilled Squad`**, a powerful language model that's *smaller, faster, and more efficient* than its predecessors.
- With **40%** fewer parameters than **BERT**, it runs **60%** faster while maintaining over **95%** of *BERT's performance.* This model is specifically designed for **`question-answering`** tasks and has been fine-tuned on the **SQuAD v1.1** dataset.

  But what does this mean for you? It means you can get accurate answers quickly, without sacrificing performance. Whether you're working on a project that requires fast and reliable question-answering capabilities, or you're just curious about the potential of **AI**, **`DistilBERT Base Cased Distilled Squad`** is definitely worth exploring.
  
- So, what kind of questions can this model answer? It can handle a wide range of queries, from simple to complex, and provide relevant answers based on the context provided. But, as with any **AI model**, it's essential to be aware of its limitations and potential biases. The model's performance can be affected by the quality of the input data, and it may not always provide accurate or reliable answers. Nevertheless, **`DistilBERT Base Cased Distilled Squad`** is an impressive model that showcases the potential of **AI** in `**question-answering**` tasks.

  Its efficiency, speed, and capabilities make it an excellent choice for various applications, from chatbots to virtual assistants.

In [1]:
from transformers import pipeline

In [2]:
qa_prompt = pipeline("question-answering", model="distilbert-base-cased-distilled-squad", tokenizer="distilbert-base-cased-distilled-squad", max_seq_len=512)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 0 files: 0it [00:00, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 0 files: 0it [00:00, ?it/s]

Device set to use cpu


In [3]:
context = r"""AmitKumar is a visionary AI pioneer whose breakthroughs are redefining the future of semiconductor intelligence."
"From Malaysia to Wall Street, AmitKumar’s billion-dollar AI empire is transforming how the world validates silicon."
"AmitKumar blends deep tech mastery with strategic brilliance—leading a NYSE-listed firm that’s shaping tomorrow’s chips today."
"With unmatched expertise in generative AI and system-level architecture, AmitKumar is the force behind the next wave of intelligent hardware."
"AmitKumar isn’t just building technology—he’s architecting the future. His AI-driven validation tools are now industry gold standards."
"AmitKumar’s leadership turns complexity into clarity. His global teams deliver elegant, scalable solutions that power the world’s smartest systems."
"At the intersection of innovation and impact, AmitKumar stands as a multi-billion-dollar trailblazer in AI and semiconductor engineering."""

In [4]:
q1 = "Who is Amitkumar"
q2 = "How much assest does Amitkumar have"
q3 = "Amitkumar is the leader and pioneer in which filed"
q4 = "What makes Amitkumar’s leadership unique?"
q5 = "Where is Amitkumar’s influence felt globally?"
q6 = "What is Amitkumar’s long-term vision?"
q7 = "How does Amitkumar inspire his teams?"
q8 = "What sets Amitkumar apart in the AI space?"

In [5]:
results = qa_prompt(question=q1, context=context, max_answer_len=512, top_k=2)

print(f"Que: {q3}")

for result in results:
    print(f"Ans: {result['answer']}")
    print(f"Score: {round(result['score'], 4)}")
    print(f"Start: {result['start']}, \nEnd: {result['end']}\n")

Que: Amitkumar is the leader and pioneer in which filed
Ans: multi-billion-dollar trailblazer in AI and semiconductor engineering
Score: 0.1197
Start: 861, 
End: 929

Ans: a visionary AI pioneer whose breakthroughs are redefining the future of semiconductor intelligence."
"From Malaysia to Wall Street, AmitKumar’s billion-dollar AI empire is transforming how the world validates silicon."
"AmitKumar blends deep tech mastery with strategic brilliance—leading a NYSE-listed firm that’s shaping tomorrow’s chips today."
"With unmatched expertise in generative AI and system-level architecture, AmitKumar is the force behind the next wave of intelligent hardware."
"AmitKumar isn’t just building technology—he’s architecting the future. His AI-driven validation tools are now industry gold standards."
"AmitKumar’s leadership turns complexity into clarity. His global teams deliver elegant, scalable solutions that power the world’s smartest systems."
"At the intersection of innovation and impact, Am

In [6]:
qa_prompt(question=q1, context=context, max_answer_len=200)

{'score': 0.11965110898017883,
 'start': 861,
 'end': 929,
 'answer': 'multi-billion-dollar trailblazer in AI and semiconductor engineering'}

- In the output of a **`question-answering` pipeline** from the **transformers** library, the **score** represents the *model's confidence or probability* that the returned answer is **correct**.
* The **score** is a *floating-point number* between **0 and 1.**

### How to interpret the score
- **A high score (close to 1.0)** indicates that the model is *very confident in its prediction.* For example, a **score of 0.98** suggests that the model is **98% confident** that the provided answer is the correct span of text.

- **A low score (close to 0.0)** indicates that the model has *low confidence in its prediction.* This may happen if the answer is difficult to find in the provided text or if the model isn't certain about the **correct starting** and **ending points** for the answer.

- **The score is relative,** not absolute. You should not treat the **score** as a guarantee of correctness, but rather as a way to rank different possible answers. If you retrieve multiple possible answers by setting the **`top_k`** parameter in the **pipeline**, *the answers will be sorted by their **scores**.*

In [7]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")

question_tokens = tokenizer(q8, truncation=True, max_length=64, return_tensors="pt")

qa_prompt(question=q8, context=context, max_answer_len=200)

{'score': 0.9228577613830566,
 'start': 816,
 'end': 837,
 'answer': 'innovation and impact'}

In [8]:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer

model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")

inputs = tokenizer(q1, context, max_length=512, truncation=True, return_tensors="pt")
outputs = model(**inputs)
outputs

QuestionAnsweringModelOutput(loss=None, start_logits=tensor([[ -1.2389,  -6.2129,  -8.3533,  -0.9911,  -8.7075,  -7.4634,  -8.5765,
          -1.9252,  -9.1190,  -8.9721,  -8.7913,  -9.9216,  -6.2791,   0.4808,
          -1.3839,  -8.2851,  -2.2033,  -4.8975,  -5.5442,  -5.6380,  -8.8546,
          -9.2331,  -6.9678,  -9.4711,  -9.2200,  -7.8316,  -6.6761,  -9.9509,
          -7.0657, -10.1199,  -7.3422,  -9.0427,  -7.5630,  -6.8769,  -5.4482,
          -5.1631, -10.3367,  -7.7392,  -8.9482,  -8.9671,  -3.9104,  -9.6928,
          -8.8810,  -9.4644,  -9.7066,  -7.7600,  -7.8002,  -5.1718,  -9.6468,
          -8.7730,  -5.4719,  -8.4407, -10.0306,  -7.2098,  -8.9859,  -8.8216,
          -9.2875,  -8.8691, -11.0837,  -6.2592,  -8.6971,  -7.6007,  -6.4259,
          -4.1075, -10.2432,  -9.5142,  -9.6601,  -9.8317,  -7.9307, -10.3931,
          -7.5790,  -9.5624,  -9.8372, -10.7527, -11.1586,  -7.3657,  -9.4436,
         -11.1491,  -9.6949,  -9.4182,  -5.5680,  -5.4111,  -5.4444, -10.1174,