# **Lab Report: Quantization and pruning**

### **Student information**
- Student name: Bart Peelman
- Student code: CODE

### **Assignment description**
The assignment was about optimizing a TensorFlow model for deployment. We trained a baseline CNN, then applied post-training quantization, quantization aware training, and weight pruning. The goal was to see how these techniques affect model size and accuracy, and to be able to convert the models to TensorFlow Lite for efficient use on mobile or embedded devices.

### **Proof of work done**
#### Lab 3: Quantization and pruning
##### 3.1 Open the notebook
- I have opened, observed and ran the notebook in Google Colab.
- Then I saved and uploaded the notebook (with output visible) to my Github reopistory.

#### 3.2 Follow the instructions in the notebook

#### ❓Questions

- What is the role of model_builder(): how does it differ from building a model manually?
  - model_builder() is a helper function that returns a Keras model with a predefined architecture.
  - Purpose: makes the code reusable and ensures consistent architecture across multiple experiments (baseline, pruning, quantization).
  - Difference: Instead of manually redefining the model each time, model_builder() centralizes model creation in one place.

- What is the purpose of the TensorFlow Lite format? How does it differ from the TensorFlow format?
  - Purpose: TFLite is designed for efficient deployment on mobile, embedded, and IoT devices.
  - Differences:
    - Smaller file size
    - Optimized for low-latency inference
    - Supports quantization for reduced precision
    - Cannot be trained further (inference-only)


- What changes in the model's layers after making it quantization aware?
  - Quantization aware training wraps layers with fake quantization nodes.
  - These nodes simulate lower-precision arithmetic (e.g., int8) during training.
  - Visible changes in the summary: Layers are wrapped with QuantizeWrapper, increasing the total parameter count slightly.

- What is quantization and pruning?
  - Quantization: Reduces the precision of model weights and activations (e.g., from float32 → int8) to decrease size and improve inference speed.
  - Pruning: Removes unnecessary weights (sets them to zero) to create sparsity in the network, which can reduce computation and model size.

- Why should you use quantization aware training instead of simply quantizing a model after training?
  - Post-training quantization can cause loss of accuracy, especially for small or sensitive models.
  - Quantization aware training allows the model to adapt to precision loss during training, resulting in higher accuracy after quantization.

- When do you see a difference in the model's size when using quantization: after conversion to TFLite of after model compression using gzip? Why is that?
  - After TFLite conversion: Significant size reduction occurs because TFLite supports lower-precision weights.
  - Gzip compression: Can reduce file size for all formats, but TFLite already optimizes for size, so gzip provides smaller additional gains.
  - Reason: Quantization reduces raw data size, whereas gzip only compresses the file representation.

- And when in the case of pruning: after conversion or after compression? Why is that?
  - After compression (gzip) you see the size reduction for pruned models.
  - Reason: Pruning adds sparsity (zeros in weights), which TFLite alone does not always compress efficiently; gzip takes advantage of repeated zeros.

- What is the role of the sparsity and step parameters in the PolynomialDecay function?
  - sparsity: defines the fraction of weights to prune.
    - initial_sparsity: starting fraction (e.g., 0.0 or 50%)
    - final_sparsity: target fraction after pruning

  - step: determines how pruning progresses over training steps.
    - begin_step: step to start pruning
    - end_step: step to stop pruning

  - PolynomialDecay gradually increases sparsity from initial → final over the steps.

- Why do we need to remove the pruning layer before saving the model?
  - Pruning layers (PruneLowMagnitude) are training-time wrappers.
  - If saved with wrappers:
    - Model cannot be deployed efficiently
    - File size may increase
  - strip_pruning() removes wrappers, leaving a standard Keras model with pruned weights ready for deployment.



### **Evaluation criteria**
- ✅ Show that you've executed the notebook and pushed it to the repository  
- ✅ Show that you can convert a TensorFlow model to a TensorFlow Lite model  
- ✅ Show that you can execute post-training quantization on a model  
- ✅ Show that you can train a quantization aware model  
- ✅ Show that you can perform weight pruning on a model  
- ✅ Show that you wrote an elaborate lab report in Markdown and pushed it to the repository  
- ✅ Provide an answer to all questions marked with ❓, use code to support your answers where applicable  
  - ✅ Discuss the answers during the demo session  



### **Issues**
- **TensorFlow version:** The notebook needed TF 2.14 to work with `tensorflow_model_optimization`. Colab didn’t have it, so I ran it locally with TF 2.14.1 and TFMOT 0.8.0. Fixed pruning and quantization errors.  
- **Quantization aware training:** `quantize_model()` didn’t work in Colab due to version issues. Worked fine after using TF 2.14.1 locally.  
- **VS Code virtual environment:** My venv wasn’t detected automatically, so I manually selected it as the kernel.
- **Pruning callbacks:** Needed `UpdatePruningStep()` callback in `model.fit()` to make pruning actually work.

### **Reflection**
The hardest part was fixing the version issues for pruning and quantization. Setting up the venv and kernel in VS Code was a bit tricky too. The easier part was actually training and converting the models once everything was set up. I learned a lot about quantization, pruning, and TFLite conversion. Next time, I’d make sure the environment is fully compatible before starting the notebook to save time.

### **Resources**
List all sources of useful information that you encountered while completing this assignment: books, manuals, HOWTO's, blog posts, etc. Note: AI is not considered a valid literary source. Do not cite, for example, https://chatgpt.com. If you use AI, let it guide you to real, reliable sources instead