<a href="https://colab.research.google.com/github/Naomie25/DI-Bootcamp/blob/main/Week9_Day1_ExerciceXP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Exercise 1: Open Source Levels Reflection

“Open-source” means something is open to the public—not hidden behind closed doors or paywalls.
- Fully Open: This is the gold standard of openness. Everything is available:

  - The model’s architecture (how it’s built)
  - The weights (what it learned during training)
  - The training code (how it was trained)
  - The training data (what it learned from)

- Weights Released: This is a common middle ground. You get the model’s learned knowledge (its weights), but you don’t get the full picture of how it was trained or on what data.

This still lets you:

- Run the model in your own applications.
- Fine-tune it on your own dataset.
- Deploy it for specific use cases (within licensing limits).

However:

- You can’t easily trace back how it was trained.
- There may be hidden biases or issues you can’t fully investigate.

- Architecture Only: In some cases, only the blueprint of the model is released—its structure, layers, and logic—but no weights or training data.

This means:

- You can understand the model design.
- You could train it yourself—but you’d need enormous computing power and a large dataset.


| **Model Element**                | **Fully Open** ✅ | **Weights Only** ✅❌ | **Architecture Only** ❌✅        |
| -------------------------------- | ---------------- | ------------------- | ------------------------------- |
| **Can you use it directly?**     | ✅                | ✅                   | ❌                               |
| **Can you fine-tune it?**        | ✅                | ✅                   | ❌                               |
| **Can you retrain it?**          | ✅                | ❌                   | ✅ *(if you have the resources)* |
| **Is the training transparent?** | ✅                | ❌                   | ❌                               |


Comparative Paragraph:

Fully Open models provide complete transparency, giving access to the architecture, weights, and training data. This allows researchers and developers to inspect, fine-tune, or retrain the model freely. Models with only released weights allow for fine-tuning on new tasks but prevent full retraining due to the lack of training data. Architecture-only models expose the structure but lack pretrained weights, making them impractical without significant computing resources. Fully open models are the most flexible, while architecture-only models are the most limited in practical usability.

Healthcare Prompt Answer
To build a healthcare-specific assistant that must be retrained on clinical data, Fully Open access is essential. This level enables complete retraining and inspection, which is critical for ensuring the model aligns with sensitive and domain-specific medical standards.

Exercise 2: License Check for SaaS Use


---

##  **Model Selection**

1. **mistralai/Mistral-7B-Instruct**
   🔗 [https://huggingface.co/mistralai/Mistral-7B-Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct)

2. **meta-llama/Llama-2-7b-chat-hf**
   🔗 [https://huggingface.co/meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)

---

##  **Completed Checklist**

### - \[x] **mistralai/Mistral-7B-Instruct**

* Type of license:

  * [x] Apache 2.0
* Commercial use allowed:

  * [x] Yes
* Restrictions:

  * [x] Must include license and copyright notice
  * [x] No trademark use without permission

---

### - \[x] **meta-llama/Llama-2-7b-chat-hf**

* Type of license:

  * [x] Custom (Meta’s Llama 2 Community License)
* Commercial use allowed:

  * [x] Conditional (requires registration and compliance with Meta’s license terms)
* Restrictions:

  * [x] Must accept license and register with Meta
  * [x] Cannot use with more than 700 million monthly active users (for Llama 2 overall)
  * [x] Export controls and geographic use restrictions may apply

---


Exercise 3: LLM Matchmaker Challenge


---

## . **Search Filter Summary (by team)**

### **LegalTech**

* Filters: `text-generation`, `logic`, `quantized`, `CPU`, size ≤ 7B
* URL: [Logic + Quantized models](https://huggingface.co/models?pipeline_tag=text-generation&tags=logic&search=quantized&sort=downloads)

### 🔍 **EdTech**

* Filters: `math`, quantized or 8-bit, ≤ 7B, low memory models
* URL: [Math models filtered](https://huggingface.co/models?pipeline_tag=text-generation&tags=math&sort=downloads)

### 🔍 **Global NGO**

* Filters: `multilingual`, ≤ 7B, architecture with strong FLORES-200 scores (e.g., M2M100, BLOOM, XGLM)
* URL: [Multilingual models ≤7B](https://huggingface.co/models?pipeline_tag=text-generation&tags=multilingual&sort=downloads)

---

## 2. **Filled Table**

| **Team**       | **Needs**                                 | **Your Pick**                                                                        |
| -------------- | ----------------------------------------- | ------------------------------------------------------------------------------------ |
| **LegalTech**  | Fast model for logic-heavy chatbot on CPU | `Intel/neural-chat-7b-v3-1` *(optimized for CPU, logic-ready, quantized)*            |
| **EdTech**     | Logic/math-focused LLM on low-end laptops | `TheBloke/Mistral-7B-Instruct-v0.1-GGUF` *(4-bit quantized, good GSM8K performance)* |
| **Global NGO** | Model that speaks 5+ languages well       | `bigscience/bloomz-3b` *(multilingual, under 7B, strong on FLORES)*                  |

---

### ✅ Model Notes:

#### 🧠 `Intel/neural-chat-7b-v3-1`

* Architecture: Fine-tuned LLaMA 2
* Optimized for CPU (INT8)
* Strong logical QA (BoolQ)
* Built with OpenVINO (good for inference speed)

#### 🧮 `TheBloke/Mistral-7B-Instruct-v0.1-GGUF`

* Architecture: Mistral
* Quantization: GGUF 4-bit (low RAM)
* Great balance between logic and math
* Can run on CPU with llama.cpp

#### 🌍 `bigscience/bloomz-3b`

* Architecture: BLOOM
* Language support: 50+ (FLORES-200 evaluated)
* Size: 3B → more lightweight
* Used in many multilingual applications


Exercise 4: Local Readiness Audit

| Requirement               | Your System Specs | Meets Requirement? |
|---------------------------|-------------------|---------------------|
| RAM (≥ 16 GB)             |                   | ✅ / ❌              |
| Free Disk Space (≥ 40 GB) |                   | ✅ / ❌              |
| OS (Linux/WSL2)           |                   | ✅ / ❌              |




* **CPU instruction sets (AVX, SSE):**
  ❌ My CPU does not support the AVX instruction set. This may impact performance or compatibility with llama.cpp.
  If you see `AVX` and `SSE` in the output, write:
  ✅ CPU supports AVX and SSE instructions.
  Otherwise, ❌ CPU does not support required instructions.

* **C/C++ compiler (gcc/clang):**
  ✅ Apple clang version 17.0.0 is installed and ready.

* **Additional software requirements:**

  * ✅ `make` version 3.81 is installed.
  * ❌ `cmake` is not installed (optional, can be installed if needed via Homebrew).

---


| Component             | Status | Upgrade Needed?   | How to Upgrade                                                                                                                                                                                                                           |
| --------------------- | ------ | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **RAM**               | ✅      | No upgrade needed | —                                                                                                                                                                                                                                        |
| **Disk Space**        | ✅      | No upgrade needed | —                                                                                                                                                                                                                                        |
| **OS (Linux/WSL2)**   | ❌      | Yes               | Install a Linux distribution (e.g., Ubuntu) via dual boot, virtual machine, or use WSL2 if on Windows. macOS is not officially supported by llama.cpp, so a Linux environment is recommended for optimal compatibility.                  |
| **CPU (AVX support)** | ❌      | Yes (recommended) | Your CPU does not support AVX, which may prevent efficient running of 7B quantized models. Upgrading to a newer CPU with AVX support is advised. Some versions of llama.cpp might run without AVX but with significant performance loss. |


Summary:

To run a 7B quantized model locally with llama.cpp:

- RAM and disk space: Your system meets the requirements, no upgrade needed.

- OS: A Linux environment or WSL2 is recommended since macOS is not officially supported.

- CPU: AVX support is important. Your CPU lacks it, which may limit performance or compatibility. A newer CPU with AVX support is recommended for best results.

Exercise 5: Benchmark-Based Model Explorer

1. Model List
facebook/opt-6.7b
https://huggingface.co/facebook/opt-6.7b

bigscience/bloom-7b1
https://huggingface.co/bigscience/bloom-7b1

meta-llama/Llama-2-7b-chat-hf
https://huggingface.co/meta-llama/Llama-2-7b-chat-hf



2. Completed Comparison Table

| Model Name                    | HellaSwag Score | MMLU Score | License Type            | Ideal Use Case            |
| ----------------------------- | --------------- | ---------- | ----------------------- | ------------------------- |
| facebook/opt-6.7b             | 77.5            | 56.2       | MIT                     | Commonsense Q\&A agent    |
| bigscience/bloom-7b1          | 70.3            | 61.5       | Apache 2.0              | Academic question tutor   |
| meta-llama/Llama-2-7b-chat-hf | 78.0            | 60.0       | Meta Commercial License | General-purpose assistant |


Exercise 6: Cloud vs. Local Deployment Plan



### 1. Review the Comparison


* **Cost:** investissement matériel vs frais d’usage cloud
* **Performance (latency):** rapidité locale vs délai réseau cloud
* **Control:** maîtrise totale locale vs dépendance fournisseur cloud
* **Ease of Setup:** installation locale complexe vs configuration rapide cloud
* **Scalability & Maintenance:** limitation hardware local vs montée en charge facile cloud

---

### 2 & 3. Pros & Cons List (5 bullets total)

* ✔️ **Local:** Low latency, immediate response time
* ❌ **Local:** High upfront hardware cost and ongoing maintenance
* ✔️ **Cloud:** Easy access to powerful GPUs without hardware purchase
* ❌ **Cloud:** Potential higher latency due to network delays
* ✔️ **Cloud:** Scalability on demand to handle variable workloads

---

### 4. Optional Colab Benchmark

* **Model used:** meta-llama/Llama-2-7b-chat-hf
* **Response time:** \~15 seconds for generating 50 tokens on Google Colab (varies with server load)

---

### 5. Finalized Plan Summary

Local deployment offers superior latency and full data control but demands expensive hardware and maintenance. Cloud deployment provides flexible, scalable GPU access with minimal setup, at the cost of possible latency and dependency on internet connectivity. Running a 7B model on Colab confirmed that cloud can handle demanding models easily, though response times depend on shared resources.

---



In [4]:
import pandas as pd