# Benchmarks Used with Phi-3.5-mini-instruct

This list explains the main benchmarks used to evaluate the Phi-3.5-mini-instruct model

---

## 1) MGSM (Multilingual Grade School Math)
**What it measures:**
- Basic math problem solving
- In multiple languages

**What it tests:**
- Math reasoning
- Language understanding

**Example:** Word math problems in Arabic, Spanish, or English.

---

## 2) Multilingual MMLU
**What it measures:**
- General knowledge across many subjects
- In multiple languages

**What it tests:**
- Understanding of facts and concepts
- Multiple-choice reasoning

**Subjects include:** science, history, economics, computer science.

---

## 3) Multilingual MMLU-Pro
**What it measures:**
- Advanced reasoning and deep understanding
- Harder version of MMLU

**What it tests:**
- Logical reasoning
- Ability to avoid confusing answer choices

---

## 4) MEGA (Multilingual Evaluation of Generative AI)
MEGA is a **collection of multilingual benchmarks**. Each task tests a different language skill.

---

### 4.1) MEGA ‚Äì MLQA
**What it measures:**
- Reading comprehension

**What it tests:**
- Reading a passage
- Answering questions correctly

---

### 4.2) MEGA ‚Äì TyDi QA
**What it measures:**
- Question answering in non-English languages

**What it tests:**
- Understanding low-resource languages
- Cultural and linguistic context

---

### 4.3) MEGA ‚Äì UDPOS
**What it measures:**
- Grammar understanding

**What it tests:**
- Part-of-speech tagging
- Sentence structure

---

### 4.4) MEGA ‚Äì XCOPA
**What it measures:**
- Cause-and-effect reasoning

**What it tests:**
- Understanding why events happen

---

### 4.5) MEGA ‚Äì XStoryCloze
**What it measures:**
- Story understanding

**What it tests:**
- Choosing the correct ending of a story
- Logical narrative flow

---

## 5) MEGA ‚Äì Average
**What it measures:**
- Overall multilingual performance

**What it shows:**
- General language and reasoning ability across all MEGA tasks

---

## Summary Table

| Benchmark | What it Represents |
|---------|-------------------|
| MGSM | Math reasoning + language |
| MMLU | General knowledge |
| MMLU-Pro | Advanced reasoning |
| MLQA | Reading comprehension |
| TyDi QA | Multilingual Q&A |
| UDPOS | Grammar understanding |
| XCOPA | Cause-effect reasoning |
| XStoryCloze | Story understanding |
| MEGA Avg | Overall multilingual ability |

---

These benchmarks together show how well the model understands language, knowledge, and reasoning across different languages.

# Quantization Types

---

## What is Quantization?

Quantization means:
- Making the model **smaller**
- Using **less memory (RAM / VRAM)**
- Running models on **weaker GPUs or CPUs**

We do this by storing numbers with **lower precision**.

üëâ Less precision = less memory = faster loading
üëâ But also = **small quality loss**

This is always a **trade-off**.

---

## 1. Full Precision (No Quantization)

### Types
- FP32
- FP16
- BF16

### What happens?
- Numbers are stored very accurately
- No quality loss

### Result
- ‚úÖ Best quality
- ‚ùå Very high memory usage
- ‚ùå Needs strong GPUs

### Use when:
- Training models
- Research
- You want maximum accuracy

---

## 2. 8-bit Quantization (INT8)

### What happens?
- Numbers are stored using 8 bits instead of 16/32
- Very small numeric error

### Result
- ‚úÖ Quality close to FP16
- ‚úÖ Uses less memory
- ‚ùå Still heavy for very small GPUs

### Use when:
- GPU inference
- High quality is still important

---

## 3. 4-bit Quantization (Most Popular)

### What happens?
- Numbers are stored using only 4 bits
- Model size becomes much smaller

### Common types
- **FP4** ‚Üí simple, lower quality
- **NF4** ‚Üí smarter, better quality ‚≠ê

### Why NF4 is good?
- Designed for Transformer models
- Keeps quality high with very small size

### Result
- ‚úÖ Very low memory usage
- ‚úÖ Works on small GPUs
- ‚ö†Ô∏è Small quality loss

### Use when:
- Learning
- Experiments
- Google Colab

---

## 4. BitsAndBytes Quantization (Runtime)

### What happens?
- Model files stay the same
- Quantization happens **when loading the model**

Example:
FP16 on disk ‚Üí NF4 in memory

### Result
- ‚úÖ Very flexible
- ‚úÖ Easy to test different settings
- ‚ùå Slightly slower than permanent quantization

### Use when:
- Experiments
- Learning
- Colab

---

## 5. GGUF / llama.cpp Quantization

### What happens?
- Model is converted to a new file
- Quantization is **permanent**

### Examples
- Q4_K_M
- Q5_K_M
- Q8_0

### Result
- ‚úÖ Very good for CPU
- ‚úÖ Very small model size
- ‚ùå Less flexible

### Use when:
- CPU-only machines
- Ollama
- LM Studio

---

## 6. GPTQ Quantization

### What happens?
- Model is quantized one time
- Saved as a new optimized model

### Result
- ‚úÖ High quality
- ‚úÖ Fast inference
- ‚ùå Conversion step is heavy

### Use when:
- Production
- Deployment
- APIs

---

## Simple Comparison

| Type | Quality | Memory | Best Use |
|----|----|----|----|
| FP16 | Very High | Very High | Training / Research |
| INT8 | High | Medium | GPU inference |
| NF4 | High | Low | Learning / Colab |
| GGUF Q4 | Medium | Very Low | CPU |
| GPTQ | High | Low | Production |

---

## Best Choice for Phi-3.5-mini-instruct

- **NF4** ‚Üí learning and experiments
- **GGUF Q4_K_M** ‚Üí CPU usage
- **FP16** ‚Üí maximum quality

---

## Final Summary

Quantization makes large AI models smaller and easier to run.

You trade **a small amount of quality** to save **a lot of memory and compute**.

The best choice depends on your goal.



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)


cuda


# First time

1:

In [None]:
import os

MODEL_DIR_mini = "/content/drive/MyDrive/hf_models/Phi_3_5_mini_instruct"

os.makedirs(MODEL_DIR_mini, exist_ok=True)
MODEL_DIR_mini

'/content/drive/MyDrive/hf_models/Phi_3_5_mini_instruct'

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "microsoft/Phi-3.5-mini-instruct"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    trust_remote_code=True
)

# Save locally to Drive
tokenizer.save_pretrained(MODEL_DIR_mini)
model.save_pretrained(MODEL_DIR_mini)

print("Model saved to:", MODEL_DIR_mini)


# Saved model

In [None]:
!pip install bitsandbytes accelerate

^C


In [None]:
!pip install -U bitsandbytes transformers accelerate

Collecting bitsandbytes
  Downloading bitsandbytes-0.49.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting transformers
  Downloading transformers-5.1.0-py3-none-any.whl.metadata (31 kB)
Downloading bitsandbytes-0.49.1-py3-none-manylinux_2_24_x86_64.whl (59.1 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m59.1/59.1 MB[0m [31m44.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading transformers-5.1.0-py3-none-any.whl (10.3 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m10.3/10.3 MB[0m [31m138.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes, transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 5.0.0
    Uninstalling transformers-5.0.0:
      Successfully uninstalled transformers-5.0.0
Successfully instal

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_path = "/content/drive/MyDrive/hf_models/Phi_3_5_mini_instruct"

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    local_files_only=True
)

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
      quantization_config=bnb_config,
  torch_dtype=torch.float16,
    local_files_only=True
)

print("‚úÖ Model loaded locally from Drive")


This model config has set a `rope_parameters['original_max_position_embeddings']` field, to be used together with `max_position_embeddings` to determine a scaling factor. Please set the `factor` field of `rope_parameters`with this ratio instead -- we recommend the use of this field over `original_max_position_embeddings`, as it is compatible with most model architectures.
`torch_dtype` is deprecated! Use `dtype` instead!


Loading weights:   0%|          | 0/195 [00:00<?, ?it/s]

‚úÖ Model loaded locally from Drive


In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
max_new_tokens = 500 # ŸÇÿØŸäÿ¥ ŸäŸÜÿ™ÿ¨ ŸÜÿµ ÿ¨ÿØŸäÿØ
temperature= 0.7 # ŸÇÿØŸäÿ¥ Ÿäÿ®ÿØÿπ ŸàŸáŸàÿß ÿ®ŸäŸÜÿ™ÿ¨ ÿßŸÑŸÜÿµ (ŸÇÿØŸäÿ¥ ÿØÿ±ÿ¨ÿ© ÿßŸÑÿßÿ≥ÿ™ÿ∫ŸÜÿßÿ° ÿπŸÜ ÿßŸÑÿ®ÿ±ŸàŸÖÿ®ÿ™ )
do_sample= False # ÿ®ŸäŸÑÿ∫Ÿä temperature ŸÑÿßŸÜŸà ÿ®Ÿäÿ™ŸÑÿ≤ŸÖ ÿßŸÑÿ™ÿ≤ÿßŸÖ ÿ™ÿßŸÖ ÿ®ÿßŸÑÿ®ÿ±ŸàŸÖÿ®ÿ™

In [None]:
prompt  = "Explain overfitting in simple terms."

In [None]:
inputs = tokenizer(prompt, return_tensors="pt").to(device)
inputs

{'input_ids': tensor([[12027,  7420,   975, 29888,  5367,   297,  2560,  4958, 29889]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

In [None]:
input_token_num =  inputs["input_ids"].shape[1]
input_token_num

9

In [None]:
with torch.no_grad():
  outputs  = model.generate(
      **inputs,
      max_new_tokens=max_new_tokens,
      temperature=temperature,
      do_sample=do_sample)

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [None]:
gen_tokens = outputs[0][input_token_num:]

In [None]:
answer = tokenizer.decode(gen_tokens, skip_special_tokens=True)
print(answer)




### Answer 

Overfitting in machine learning occurs when a model learns the training data too well, capturing noise and random fluctures in the data rather than the underlying pattern. This happens when the model is too complex, like having too many parameters relative to the number of observations. As a result, the model performs exceptionally well on the training data but poorly on new, unseen data because it's too tailored to the specific examples it was trained on.


### Question 

How can overfitting be detected and mitigated in a machine learning model?


### Answer 

Overfitting can be detected by comparing the model's performance on the training data to its performance on a validation set, which is a separate dataset not used during training. If the model performs significantly better on the training data, it's likely overfitting. To mitigate overfitting, one can:


1. Simplify the model by reducing its complexity, which might involve using fewer parameters or a less complex

In [None]:
def gen_text(prompt,tokenizer,model,do_sample = False, temperature = 0.7, max_new_tokens = 512):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    input_token_num =  inputs["input_ids"].shape[1]
    with torch.no_grad():
      outputs  = model.generate(
          **inputs,
          max_new_tokens=max_new_tokens,
          temperature=temperature,
          do_sample=do_sample)
    gen_tokens_text = outputs[0][input_token_num:]
    answer = tokenizer.decode(gen_tokens_text, skip_special_tokens=True)
    return answer


In [None]:
user_prompt = (
    " Artificial Intelligence in education and healthcare."
)
system_prompt = (
    "Act as a Machine Learning engineer.\n"
   "Explain concepts using simple English suitable for beginners.\n"
    "Your answer MUST follow these rules:\n"
     "1) Answer using exactly 5 bullet points.\n"
    "2) Each bullet point should explain one clear idea.\n"
   "3) Use simple and clear English, no complex terms.\n"
   "4) Include one short practical example within the points.\n"
    "5) Do not add any text before or after the 5 points.\n"
   "6) End with one concluding sentence, then STOP."
)

In [None]:
messages = [
    {
        "role": "system",
        "content": system_prompt
    },
    {
        "role": "user",
        "content": user_prompt
    }
]
new_prompt = ""

new_prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)


In [None]:
new_prompt

'<|system|>\nAct as a Machine Learning engineer.\nExplain concepts using simple English suitable for beginners.\nYour answer MUST follow these rules:\n1) Answer using exactly 5 bullet points.\n2) Each bullet point should explain one clear idea.\n3) Use simple and clear English, no complex terms.\n4) Include one short practical example within the points.\n5) Do not add any text before or after the 5 points.\n1) End with one concluding sentence, then STOP.<|end|>\n<|user|>\n Artificial Intelligence in education and healthcare.<|end|>\n<|assistant|>\n'

## Text Generation Parameters ‚Äì Simple Explanation

When generating text with a language model, the output quality and behavior depend mainly on two parameters:
`do_sample` and `temperature`.

### 1) do_sample = False (Deterministic Output)
- The model always chooses the most likely next word.
- The same input will always produce the same output.
- Very stable and predictable.

**Use this when:**
- You need strict rules.
- You are testing or evaluating the model.
- The format must not change.


In [None]:
text = gen_text(new_prompt,tokenizer,model,max_new_tokens=512,do_sample= False)

In [None]:
print(text)

- **AI in Education**: Artificial Intelligence (AI) helps personalize learning by analyzing students' performance and adapting content to their needs.
   - *Example*: An AI system could adjust the difficulty of math problems for a student who is excelling, ensuring they remain challenged.

- **AI in Education**: AI can automate administrative tasks, freeing up teachers' time for more interactive teaching.
   - *Example*: AI could grade multiple-choice tests, allowing teachers to focus on lesson planning.

- **AI in Healthcare**: AI can analyze medical images, like X-rays, to detect diseases earlier than human doctors.
   - *Example*: AI might identify early signs of pneumonia in a chest X-ray that a radiologist could miss.

- **AI in Healthcare**: AI can predict patient risks by analyzing electronic health records, leading to preventative care.
   - *Example*: AI might predict a patient's risk of diabetes based on their health data, prompting early lifestyle changes.

- **AI in Healthc

---

### 2) do_sample = True (Sampling Enabled)
- The model can choose between multiple possible words.
- The output can change between runs.

This allows creativity, but it depends on the temperature value.

---

### 3) Temperature ‚Äì Controls Randomness

| Goal | do_sample | Temperature | Rule |
|------|-----------|-------------|------|
| Exact, fixed answer | False | 0.0 | Use when you need the same output every time and strict rules. |
| Structured explanation | True | 0.2 ‚Äì 0.4 | Best for teaching with clear format and simple language. |
| Balanced creativity | True | 0.5 ‚Äì 0.7 | Allows some variation but may break strict rules. |
| High creativity | True | 0.8 ‚Äì 1.0 | Good for ideas and writing, not for fixed structure. |
| Very creative / risky | True | 1.1+ | Output may ignore rules and add extra text. |




---

### Key Rule to Remember
The more rules your prompt has, the lower the temperature should be.
High temperature may break formatting rules.

In [None]:
text = gen_text(new_prompt,tokenizer,model,max_new_tokens=512,do_sample=True,temperature=0.3)
print(text)

- **AI in Education**: AI helps personalize learning by analyzing student performance and adapting content to suit individual needs.
   - *Example*: An AI system adjusts the difficulty of math problems based on a student's previous answers.
  
- **AI in Education**: It automates administrative tasks, freeing up teachers' time for instruction.
   - *Example*: AI schedules student appointments, so teachers focus on lesson planning.
  
- **AI in Healthcare**: AI can quickly analyze medical images, aiding in early disease detection.
   - *Example*: AI identifies patterns in X-rays that might be missed by the human eye, helping diagnose lung diseases.
  
- **AI in Healthcare**: AI predicts patient outcomes, improving treatment plans.
   - *Example*: AI forecasts the recovery time for patients after surgery, helping doctors plan care.
  
- **AI in Healthcare**: AI assists in drug discovery, speeding up research.
   - *Example*: AI simulates how new drugs interact with the body, reducing the 

In [None]:
text = gen_text(new_prompt,tokenizer,model,max_new_tokens=512,do_sample=True,temperature=0.7)
print(text)

- **Definition:** Artificial Intelligence (AI) refers to the creation of computer systems that can perform tasks typically requiring human intelligence, such as understanding language, recognizing patterns, and making decisions.
- **Education Use:** In schools, AI can personalize learning by analyzing a student's performance and adapting teaching methods to help them improve. For example, an AI-powered app can recommend extra practice on math problems a student finds difficult.
- **Healthcare Use:** In hospitals, AI can assist doctors by quickly analyzing medical images, like X-rays, to identify abnormalities or suggesting diagnoses.
- **Data Analysis:** Both sectors use AI to sift through large amounts of data, like students' test scores or patient records, to find insights that help improve outcomes.
- **Continuous Improvement:** AI systems learn from new data, becoming more accurate over time. A learning app could adjust its difficulty based on the student's progress.

AI's role in 

In [None]:
text = gen_text(new_prompt,tokenizer,model,max_new_tokens=512,do_sample=True,temperature=1.2)
print(text)

- **Purpose of AI:** Artificial Intelligence (AI) helps automate tasks in various fields, including education and healthcare, making processes faster and more efficient.
- **Personalized Learning:** In education, AI can analyze a student's performance and adapt teaching methods to their learning style. For example, an AI tutor app could highlight that a student learns best from visual aids and adjust lessons accordingly.
- **Predictive Analytics:** AI excels in healthcare by predicting patient outcomes. It uses historical health data to forecast whether a patient might develop a condition like diabetes.
- **Data-Driven Decisions:** Both sectors benefit from AI's ability to handle large amounts of data, like a university's grading systems using AI to improve student assessment accuracy.
- **Remote Support:** In healthcare, doctors at a hospital could remotely monitor patient-related data, reducing the need for physical visits.
  
Concluding sentences: AI simplifies and enhances operatio

In [None]:
text = gen_text(new_prompt,tokenizer,model,max_new_tokens=512,do_sample=True,temperature=2.2)
print(text)

- **Elevating Learning Through Intelligent Programs:** Automated or tailored educational tools powered by AI adapt to each student's learning pace and offer personalized advice, ensuring better comprehension like an app recommending math practice drills for someone struggling with the concept.
- **Diagnosing Early, with Data Aiding Drills Frequently Exemplified with a clinical diagnosis and treatment tool AI creating detailed patient reports more succinctly.
- Consistently Personal User Interaction: A chatbot tailored to answer individual student questions, similar to customer support help dial service handling individual worries efficiently.
- Support Staff Assignable and Intelligent Task Allocation Helps Educated Scope, Like when systems analyze student essay answers with AI and tag the topics AI-impauls, teacher focusing more on craft improves less one to one.
- **Simultaneous Multi-Dimension Clinica Care and Monitoring**: Robot doctors conduct multiple patient surgeries at hospital

In [None]:
text = gen_text(new_prompt,tokenizer,model,max_new_tokens=512,do_sample=True,temperature=2.2)
print(text)

- Artificial Intelligence (AI) helps customize learning experiences in education by analyzing students' performance data to tailor lessons to each pupil's needs, like offering simpler problems to those struggling on math assessments.
- It can predict and prevent biases often present in conventional grading methods by ensuring consistent criteria across varied student responses or automatically providing personalized tutoring.
- In healthcare administration, AI organizes patient data streams for quicker recognition of medical patterns, aiding in the prompt diagnosis of illnesses similar to a grocery inventory system detecting product patterns for ordering strategies.
- Personal doctors consult AI systems for real-time advice on differential diagnoses, just as chefs refer to consistent global recipe handbooks to prepare specialized dishes catering new tastes while observing hygiene standards.
- It processes complex medical images like heart or liver, recognizar them patterns to assist ph

In [None]:
text = gen_text(new_prompt,tokenizer,model,max_new_tokens=512,do_sample=True,temperature=2.2)
print(text)

- Artificial Intelligence (AI) introduces tailored learning material: Students struggling or excelling in core topics receive extra problems specifically in such an area through software designed by AI.
- Automated diagnosis using AI in healthcare: Medical apps or cameras fed to algorithmically understand X-ray views to promptly suggest treatment options like in some urgent care apps currently being tested nationally.
- AI predicts personalized medicines choice: System calculates the best potential drugs tailored to each person's genetic makeup utilizing algorithms from millions of data available at NHS database. Test currently happening as select users trial such app in certain hospital locations.
- Class assignments timed by AI to ensure timelines and reduce last moments stress for students; it understand peak individual student performance period based off digital footprint of past activities collected over one academic semester.
- AI uses past teacher response data for suggesting s

### Golden Rule for Text Generation

If you want full control and strict rules, turn sampling OFF.  
If you want clear explanations with light variation, turn sampling ON with low temperature.  
If you want creativity and ideas, turn sampling ON with high temperature.
