### quantization
- convertion from higher memory format to a lower memory format

- consider al llm model with 70 billion parameters.each may be of type FP32(Full precision)-floating point | 32 bits 
- converting it to int 8 will help low spec pc to operate it.

## Quantization vs Inference in Large Language Models (LLMs)

### **1. Quantization**
Quantization is a technique used to reduce the precision of the numerical representation of a model's parameters. Its goal is to decrease computational requirements, memory usage, and power consumption while maintaining acceptable accuracy.

- **Key Points**:
  - Converts floating-point (e.g., `float32`) to lower precision (e.g., `int8`).
  - Reduces memory footprint and speeds up computation.
  - Types:
    - **Post-training quantization**: Applied after training.
    - **Quantization-aware training**: Incorporated during training to minimize accuracy loss.

---

### **2. Inference**
Inference is the process of using a trained model to generate predictions or outputs based on input data.

- **Key Points**:
  - Involves deploying the model for real-world use cases like text generation or classification.
  - Optimized for low-latency and efficient performance.
  - Techniques include batching, pruning, and hardware-specific optimizations.

---

### **Comparison Table**

| **Aspect**           | **Quantization**                      | **Inference**                       |
|-----------------------|---------------------------------------|--------------------------------------|
| **Focus**            | Model optimization                   | Generating predictions              |
| **Stage**            | Pre-inference optimization            | Deployment and execution phase      |
| **Goal**             | Improve efficiency                   | Deliver accurate predictions        |
| **Challenges**       | Accuracy trade-offs                  | Real-time response, scaling         |

### **Relationship**
Quantization is often used as an optimization technique to improve the efficiency of inference, especially for deploying LLMs in resource-constrained environments or scenarios requiring low latency.


## Calibration in Machine Learning

### **What is Calibration?**
Calibration refers to the process of adjusting a system to ensure its outputs (predictions or measurements) accurately reflect the intended values. In machine learning, it ensures that the predicted probabilities correspond to the true likelihood of an event.

---

### **Why is Calibration Needed?**
- Many machine learning models output probabilities that may not correspond to real-world outcomes.
- Example:
  - A model predicts a 70% chance of an event but is only correct 50% of the time.
  - This discrepancy can mislead decision-making processes.
- Calibration aligns predicted probabilities with actual outcomes.

---

### **Techniques for Calibration**
1. **Platt Scaling**:
   - Fits a logistic regression model to map predicted scores to probabilities.
   - Suitable for large datasets.

2. **Isotonic Regression**:
   - Fits a piecewise constant, non-decreasing function to map scores to probabilities.
   - Effective for small datasets but prone to overfitting with sparse data.

3. **Temperature Scaling**:
   - Adjusts logits by multiplying them with a scaling factor (temperature) to improve softmax probabilities.
   - Commonly used with deep learning models.

4. **Histogram Binning**:
   - Groups predictions into bins and adjusts probabilities based on observed frequencies in each bin.
   - Simple and interpretable.

---

### **Calibration Metrics**
- **Expected Calibration Error (ECE)**:
  Measures the difference between predicted probabilities and observed frequencies across bins.

- **Brier Score**:
  Combines calibration and sharpness to assess the accuracy of probabilistic predictions.

---

### **Calibration in Quantization**
When quantizing models, calibration ensures that reduced precision (e.g., `int8`) maintains prediction quality:
- **Static Quantization**: Calibrates scales and offsets using a representative dataset.
- **Dynamic Quantization**: Adjusts scales at runtime based on input.

---

### **Importance of Calibration**
Calibration is critical in applications where confidence in predictions is essential, such as:
- Medical diagnostics
- Risk assessment
- Fraud detection

By aligning predicted probabilities with real-world outcomes, calibration improves trust and reliability in model predictions.


___
- symmetric quantization
   - batch normalization
- asymmetric quantization
