<a href="https://colab.research.google.com/github/Ahmdmnz/DeepLearning/blob/main/TTask.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. Encoder Processing
### - How the model encodes the question:
The input question is: **"What are the symptoms of diabetes?"**
1. **Tokenization**: The question is first broken into tokens (words/subwords).
2. **Embedding**: Each token is converted into a dense vector using learned embeddings.
3. **Positional Encoding**: Since Transformers have no recurrence, positional encodings are added to capture word order.
4. **Self-Attention Layers**: The embeddings are passed through self-attention and feedforward layers to capture contextual relationships.

### - Result of Attention Scores in encoder stage:
Attention scores determine how much each word attends to every other word. For example, "symptoms" may attend more to "diabetes" than to "what".

### - How Self-Attention Captures Word Relationships:
Self-attention calculates interactions between all words, allowing the model to understand dependencies regardless of distance. For instance, it learns that "symptoms" relates to "diabetes" even though they are not adjacent.

### - Significance:
Self-attention enables the model to:
- Understand the context of each word
- Disambiguate meanings based on surrounding words
- Improve representation for downstream tasks like question answering

---

## 2. Context Processing
### - Passage from Medical Paper:
> *"Diabetes is a chronic condition characterized by high blood sugar levels. Common symptoms include increased thirst, frequent urination, extreme fatigue, and blurred vision."*

### - Encoder-Decoder Attention:
In this stage:
1. The encoder has already processed the input question.
2. The decoder attends to encoder outputs while also accessing the passage.
3. Encoder-decoder attention lets the decoder focus on parts of the input relevant to generating the answer.
4. In this case, attention would be highest around phrases like *"increased thirst"*, *"frequent urination"*, etc., since they match the concept of "symptoms".

---

## 3. Decoder Prediction
The decoder generates the answer step-by-step using softmax probabilities.

### Softmax Probabilities Table:
| Step | Candidate Tokens               | Probabilities            |
|------|--------------------------------|--------------------------|
| 1    | "Diabetes" (0.05), "Increased" (0.75), "Common" (0.2) |
| 2    | "hunger" (0.1), "thirst" (0.8), "sugar" (0.1)          |
| 3    | "frequent" (0.3), "urination" (0.6), "pain" (0.1)      |
| 4    | "extreme" (0.2), "fatigue" (0.7), "mild" (0.1)         |

### - Final Prediction:
**Step-by-step decoding:**
1. Highest prob → **"Increased"**
2. Highest prob → **"thirst"** → "Increased thirst"
3. Highest prob → **"urination"** → likely continuation from "frequent"
   - But here, "frequent" had 0.3 and "urination" had 0.6. Since "frequent urination" is a common phrase, likely the model learned to implicitly include "frequent" → **"urination"**
4. Highest prob → **"fatigue"**

###  **Final Answer:**
**"Increased thirst, urination, fatigue"**



# this code answers the question

In [9]:
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

model_name = "t5-base"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

question = "What are the symptoms of diabetes?"
context = "Diabetes is a chronic condition characterized by high blood sugar levels. Common symptoms include increased thirst, frequent urination, extreme fatigue, and blurred vision."

input_text = f"question: {question} context: {context}"
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=512)

with torch.no_grad():
    outputs = model.generate(inputs['input_ids'], max_length=50, num_beams=4, early_stopping=True)

generated_answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated Answer: ", generated_answer)

Generated Answer:  increased thirst, frequent urination, extreme fatigue, and blurred vision


# this code compute self-attention scores in a Transformer encoder for this sentence.

In [10]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import BertTokenizer, BertModel

class EnhancedSelfAttention(nn.Module):
    def __init__(self, d_model: int = 768, num_heads: int = 8):
        super().__init__()
        self.d_model = d_model
        self.num_heads = num_heads
        self.d_k = d_model // num_heads

        self.w_q = nn.Linear(d_model, d_model)
        self.w_k = nn.Linear(d_model, d_model)
        self.w_v = nn.Linear(d_model, d_model)
        self.w_o = nn.Linear(d_model, d_model)

    def forward(self, x: torch.Tensor, mask: torch.Tensor = None):
        batch_size, seq_len, _ = x.size()

        Q = self.w_q(x).view(batch_size, seq_len, self.num_heads, self.d_k).transpose(1, 2)
        K = self.w_k(x).view(batch_size, seq_len, self.num_heads, self.d_k).transpose(1, 2)
        V = self.w_v(x).view(batch_size, seq_len, self.num_heads, self.d_k).transpose(1, 2)

        scores = torch.matmul(Q, K.transpose(-2, -1)) / np.sqrt(self.d_k)
        if mask is not None:
            scores = scores.masked_fill(mask == 0, -1e9)
        attn_weights = F.softmax(scores, dim=-1)
        output = torch.matmul(attn_weights, V)

        output = output.transpose(1, 2).contiguous().view(batch_size, seq_len, self.d_model)
        return self.w_o(output), attn_weights

def compute_self_attention(sentence: str):
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    model = BertModel.from_pretrained('bert-base-uncased')

    inputs = tokenizer(sentence, return_tensors='pt', padding=True, truncation=True)
    tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
    with torch.no_grad():
        embeddings = model(**inputs).last_hidden_state

    attention_module = EnhancedSelfAttention(d_model=768, num_heads=8)
    output, attn_weights = attention_module(embeddings, mask=inputs['attention_mask'])

    head0_attn = attn_weights[0, 0].detach().numpy()
    print(f"\nAttention Scores (Head 1):")
    header = "\t" + "\t".join([f"{token[:6]}" for token in tokens])
    print(header)
    for i, token in enumerate(tokens):
        row = "\t".join([f"{score:.2f}" for score in head0_attn[i]])
        print(f"{token[:6]}\t{row}")

    return attn_weights

if __name__ == "__main__":
    sentence = "What are the symptoms of diabetes?"
    compute_self_attention(sentence)


Attention Scores (Head 1):
	[CLS]	what	are	the	sympto	of	diabet	?	[SEP]
[CLS]	0.12	0.10	0.11	0.13	0.11	0.11	0.10	0.10	0.11
what	0.11	0.10	0.12	0.13	0.11	0.12	0.10	0.11	0.10
are	0.13	0.13	0.11	0.12	0.10	0.11	0.10	0.11	0.10
the	0.11	0.11	0.11	0.11	0.11	0.12	0.10	0.11	0.11
sympto	0.10	0.10	0.11	0.11	0.11	0.12	0.12	0.11	0.11
of	0.11	0.12	0.11	0.11	0.11	0.11	0.12	0.11	0.11
diabet	0.11	0.12	0.11	0.12	0.10	0.12	0.10	0.11	0.11
?	0.13	0.11	0.11	0.11	0.10	0.11	0.10	0.11	0.12
[SEP]	0.11	0.11	0.11	0.11	0.11	0.12	0.11	0.12	0.10
