<a href="https://colab.research.google.com/github/CrisMcode111/DI_Bootcamp/blob/main/w8_d3_XP_Open_Source_LLMs_Student.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercises XP: Open-Source LLM Strategy (Student)
Use this guided notebook and fill each TODO. Run it in Colab if you prefer; GPU not required unless you try optional model runs.

## What you'll learn
- Identify degrees of openness in LLMs and what each enables.
- Understand and compare open-source licenses.
- Assess strengths/trade-offs of open LLMs for specific constraints (CPU, math, multilingual).
- Evaluate which models fit given hardware and licensing needs.
- Use open tools/leaderboards to guide model selection.
- Plan deployment trade-offs (local vs. cloud).

## What you will create
- Comparative LLM openness analysis (table + paragraph + healthcare prompt answer).
- Licensing compatibility checklist for SaaS products.
- Quiz-style reflection on LLM selection.
- Local deployment readiness checklist + upgrade notes.
- Benchmark-based model match guide.
- Hardware upgrade plan / cost-benefit comparison (local vs. cloud).

## Exercise 1: Open Source Levels Reflection
**Use:** Fully Open, Weights Released, Architecture Only; model components; fine-tuning/domain adaptation; healthcare compliance.

**Step-by-step**
1) Gather definitions for the three openness levels.  
2) Identify key characteristics (what is open; what you can/cannot do).  
3) Compare side by side in a table (what's open vs. impact).  
4) Write a 3?5 sentence comparative paragraph.  
5) Answer the healthcare prompt (1?2 sentences) about retraining on clinical data.

**Deliverables:** paragraph + healthcare answer.

In [1]:
open_source_levels = {
    "Fully Open": {
        "definition": "A fully open-source model where code, architecture, weights, and training data (partially or fully) are released under a permissive license that allows use, modification, and redistribution.",
        "what_is_open": [
            "Model weights",
            "Model architecture",
            "Source code",
            "Training data (partial or full)"
        ],
        "what_you_can_do": [
            "Run the model locally without restrictions",
            "Retrain or fine-tune on domain-specific data",
            "Modify, redistribute, or commercialize the model"
        ],
        "what_you_cannot_do": [
            "Few limitations beyond respecting the license",
            "No major commercial restrictions in most cases"
        ],
    },

    "Weights Released": {
        "definition": "The model’s pre-trained weights are publicly released, usually along with the architecture, but training data and full source code may remain closed.",
        "what_is_open": [
            "Pre-trained model weights",
            "Model architecture"
        ],
        "what_you_can_do": [
            "Use the model directly",
            "Perform fine-tuning on custom data"
        ],
        "what_you_cannot_do": [
            "Access the original training data",
            "Freely redistribute or commercially exploit the model if the license restricts it"
        ],
    },

    "Architecture Only": {
        "definition": "Only the high-level model architecture is released; no weights or usable training artifacts are provided. Useful for research but not for immediate deployment.",
        "what_is_open": [
            "Theoretical model design",
            "Architecture description"
        ],
        "what_you_can_do": [
            "Rebuild the model from scratch",
            "Train it entirely on your own data (high cost)"
        ],
        "what_you_cannot_do": [
            "Use the model as-is (no weights available)",
            "Replicate original performance without extensive training resources"
        ],
    },
}

open_source_levels


{'Fully Open': {'definition': 'A fully open-source model where code, architecture, weights, and training data (partially or fully) are released under a permissive license that allows use, modification, and redistribution.',
  'what_is_open': ['Model weights',
   'Model architecture',
   'Source code',
   'Training data (partial or full)'],
  'what_you_can_do': ['Run the model locally without restrictions',
   'Retrain or fine-tune on domain-specific data',
   'Modify, redistribute, or commercialize the model'],
  'what_you_cannot_do': ['Few limitations beyond respecting the license',
   'No major commercial restrictions in most cases']},
 'Weights Released': {'definition': 'The model’s pre-trained weights are publicly released, usually along with the architecture, but training data and full source code may remain closed.',
  'what_is_open': ['Pre-trained model weights', 'Model architecture'],
  'what_you_can_do': ['Use the model directly',
   'Perform fine-tuning on custom data'],
  'w

In [2]:
comparison_table = """| Openness level | What's open? | Impact on retraining/modifying |
| --- | --- | --- |
| Fully Open | Weights, architecture, code, (sometimes) training data | Easiest: full flexibility for fine-tuning, domain adaptation, and safe on-prem deployment |
| Weights Released | Weights + architecture | Moderate: fine-tuning possible, but limited visibility into training data; some commercial restrictions may apply |
| Architecture Only | Architecture description only | Hardest: you must train from scratch; extremely costly and slow for domain adaptation |
"""
print(comparison_table)


| Openness level | What's open? | Impact on retraining/modifying |
| --- | --- | --- |
| Fully Open | Weights, architecture, code, (sometimes) training data | Easiest: full flexibility for fine-tuning, domain adaptation, and safe on-prem deployment |
| Weights Released | Weights + architecture | Moderate: fine-tuning possible, but limited visibility into training data; some commercial restrictions may apply |
| Architecture Only | Architecture description only | Hardest: you must train from scratch; extremely costly and slow for domain adaptation |



In [3]:
comparative_paragraph = """Fully open-source models provide the highest level of transparency and control, giving users access to code, architecture, and weights, which makes fine-tuning and domain adaptation straightforward. Models with released weights offer a middle ground: users can run and adapt the model, but they lack insight into the original training data and may face licensing constraints. Architecture-only releases offer conceptual value but little practical utility, since the absence of weights requires costly training from scratch. As openness decreases, the effort, resources, and complexity required to adapt a model increase significantly. Therefore, the chosen openness level directly influences feasibility, cost, and compliance in specialized domains."""
print(comparative_paragraph)


Fully open-source models provide the highest level of transparency and control, giving users access to code, architecture, and weights, which makes fine-tuning and domain adaptation straightforward. Models with released weights offer a middle ground: users can run and adapt the model, but they lack insight into the original training data and may face licensing constraints. Architecture-only releases offer conceptual value but little practical utility, since the absence of weights requires costly training from scratch. As openness decreases, the effort, resources, and complexity required to adapt a model increase significantly. Therefore, the chosen openness level directly influences feasibility, cost, and compliance in specialized domains.


In [4]:
healthcare_prompt_answer = """In healthcare, only fully open or weights-released models are suitable for retraining on clinical data because they can be fine-tuned locally while maintaining strict privacy and compliance requirements. Architecture-only models are impractical since they require prohibitively expensive training from scratch."""
print(healthcare_prompt_answer)


In healthcare, only fully open or weights-released models are suitable for retraining on clinical data because they can be fine-tuned locally while maintaining strict privacy and compliance requirements. Architecture-only models are impractical since they require prohibitively expensive training from scratch.


## Exercise 2: License Check for SaaS Use
**Use:** HF model pages; permissive vs. copyleft; commercial clauses; restrictions (MAU caps, attribution, export controls).

**Step-by-step**
1) Select two HF models (e.g., Mistral-7B-Instruct, Llama-2-7b-chat-hf) and note URLs.  
2) Locate the License field and copy the name.  
3) Determine if commercial use is allowed/restricted/prohibited; note conditions.  
4) Identify extra restrictions (MAU caps, attribution, export/geography).  
5) Build the markdown checklist with your findings.

**Deliverables:** model names + URLs + completed checklist.

In [6]:
license_checklist = [
    {
        "model": "mistralai/Mistral-7B-Instruct-v0.3",
        "url": "https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3",
        "license": "Apache 2.0",
        "commercial_use": "Yes",
        "restrictions": [
            "Requires preservation of Apache 2.0 NOTICE file",
            "No major commercial restrictions (permissive license)"
        ],
    },
    {
        "model": "meta-llama/Llama-2-7b-chat-hf",
        "url": "https://huggingface.co/meta-llama/Llama-2-7b-chat-hf",
        "license": "Llama 2 Community License",
        "commercial_use": "Conditional",
        "restrictions": [
            "Cannot use the model to compete with Meta products",
            "Must comply with Responsible Use Guidelines",
            "Attribution required",
            "Potential export/geography restrictions"
        ],
    },
]

license_checklist


[{'model': 'mistralai/Mistral-7B-Instruct-v0.3',
  'url': 'https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3',
  'license': 'Apache 2.0',
  'commercial_use': 'Yes',
  'restrictions': ['Requires preservation of Apache 2.0 NOTICE file',
   'No major commercial restrictions (permissive license)']},
 {'model': 'meta-llama/Llama-2-7b-chat-hf',
  'url': 'https://huggingface.co/meta-llama/Llama-2-7b-chat-hf',
  'license': 'Llama 2 Community License',
  'commercial_use': 'Conditional',
  'restrictions': ['Cannot use the model to compete with Meta products',
   'Must comply with Responsible Use Guidelines',
   'Attribution required',
   'Potential export/geography restrictions']}]

Model 1 – Mistral-7B-Instruct-v0.3

Type of license:
Apache 2.0 (permissive)

Commercial use allowed:
Yes

Restrictions:

* Must preserve the Apache 2.0 NOTICE file

* No major commercial restrictions (standard permissive license)

Model 2 – Llama-2-7b-chat-hf

Type of license:
Llama 2 Community License

Commercial use allowed:
Conditional

Restrictions:

* Cannot use the model to compete with Meta products

* Must follow the Responsible Use Guidelines

* Attribution required

* Possible export/geographic limitations

## Exercise 3: LLM Matchmaker Challenge
**Use:** CPU inference, low-end laptops, multilingual needs; HF filters; size <=7B; benchmarks (BoolQ, GSM8K, FLORES-200).

**Step-by-step**
1) Analyze team needs (LegalTech CPU logic; EdTech math/logic low-end; Global NGO multilingual >=5 langs).  
2) Apply HF filters (cpu/quantized, logic/math/multilingual tags, size <=7B).  
3) List 3?5 candidates per team with params/arch/quantization/benchmarks.  
4) Compare top 2 per team and pick the best model with justification.  
5) Fill the table with your picks.

**Deliverables:** filter summary + filled picks table.

In [7]:
filters_by_team = {
    "LegalTech": [
        "text-generation",
        "cpu-inference",
        "GGUF or quantized",
        "logic/BoolQ benchmark",
        "<=7B parameters"
    ],
    "EdTech": [
        "math/GSM8K tag",
        "cpu or quantized",
        "<=7B (prefer <=4B)",
        "instruction-tuned"
    ],
    "Global NGO": [
        "multilingual/FLORES-200 tag",
        "cpu or GGUF quantized",
        "<=7B parameters",
        "supports ≥5 languages"
    ],
}

filters_by_team


{'LegalTech': ['text-generation',
  'cpu-inference',
  'GGUF or quantized',
  'logic/BoolQ benchmark',
  '<=7B parameters'],
 'EdTech': ['math/GSM8K tag',
  'cpu or quantized',
  '<=7B (prefer <=4B)',
  'instruction-tuned'],
 'Global NGO': ['multilingual/FLORES-200 tag',
  'cpu or GGUF quantized',
  '<=7B parameters',
  'supports ≥5 languages']}

In [8]:
candidates_by_team = {
    "LegalTech": [
        {
            "model": "mistralai/Mistral-7B-Instruct-v0.3",
            "params_b": "7B",
            "arch": "Transformer (Mistral architecture)",
            "optimization": "Available in GGUF for CPU; quantized Q4_K_M/Q5_K_M",
            "benchmarks": "Strong on BoolQ, general reasoning"
        },
        {
            "model": "google/gemma-2b-it",
            "params_b": "2B",
            "arch": "Gemma Transformer",
            "optimization": "Very CPU-friendly; GGUF/Q4 available",
            "benchmarks": "Good logical reasoning for size"
        },
        {
            "model": "TheBloke/Llama-3-8B-Instruct-GGUF (8B but quantized fits CPU)",
            "params_b": "8B (effective <7B due to quantization constraints)",
            "arch": "Llama 3",
            "optimization": "GGUF, Q4_K_M",
            "benchmarks": "High accuracy on BoolQ and legal-style QA"
        },
    ],

    "EdTech": [
        {
            "model": "Qwen/Qwen2-1.5B-Instruct",
            "params_b": "1.5B",
            "arch": "Qwen2 Transformer",
            "optimization": "Quantized GGUF available",
            "benchmarks": "Strong GSM8K for small size"
        },
        {
            "model": "google/gemma-2b-it",
            "params_b": "2B",
            "arch": "Gemma Transformer",
            "optimization": "Lightweight, CPU-ready",
            "benchmarks": "Good GSM8K math reasoning"
        },
        {
            "model": "deepseek-ai/deepseek-math-7b",
            "params_b": "7B",
            "arch": "DeepSeek Math",
            "optimization": "GGUF quantized versions exist",
            "benchmarks": "High GSM8K accuracy for math tutoring"
        },
        {
            "model": "Mistral-7B-Instruct-v0.3",
            "params_b": "7B",
            "arch": "Mistral",
            "optimization": "GGUF optimized",
            "benchmarks": "Decent GSM8K for general reasoning"
        },
    ],

    "Global NGO": [
        {
            "model": "Qwen/Qwen2-7B-Instruct",
            "params_b": "7B",
            "arch": "Qwen2",
            "optimization": "GGUF, CPU-ready",
            "benchmarks": "Top-tier multilingual (FLORES-200)"
        },
        {
            "model": "google/mt5-small",
            "params_b": "300M",
            "arch": "mT5",
            "optimization": "CPU friendly; not quantized",
            "benchmarks": "Strong multilingual baseline"
        },
        {
            "model": "TheBloke/XLM-RoBERTa-Base-GGUF",
            "params_b": "270M",
            "arch": "XLM-R",
            "optimization": "GGUF optimized for CPU",
            "benchmarks": "Excellent cross-lingual understanding"
        },
        {
            "model": "facebook/mbart-large-50",
            "params_b": "610M",
            "arch": "mBART",
            "optimization": "Runs on CPU; not GGUF but efficient",
            "benchmarks": "Very strong translation across 50 languages"
        },
    ],
}

candidates_by_team


{'LegalTech': [{'model': 'mistralai/Mistral-7B-Instruct-v0.3',
   'params_b': '7B',
   'arch': 'Transformer (Mistral architecture)',
   'optimization': 'Available in GGUF for CPU; quantized Q4_K_M/Q5_K_M',
   'benchmarks': 'Strong on BoolQ, general reasoning'},
  {'model': 'google/gemma-2b-it',
   'params_b': '2B',
   'arch': 'Gemma Transformer',
   'optimization': 'Very CPU-friendly; GGUF/Q4 available',
   'benchmarks': 'Good logical reasoning for size'},
  {'model': 'TheBloke/Llama-3-8B-Instruct-GGUF (8B but quantized fits CPU)',
   'params_b': '8B (effective <7B due to quantization constraints)',
   'arch': 'Llama 3',
   'optimization': 'GGUF, Q4_K_M',
   'benchmarks': 'High accuracy on BoolQ and legal-style QA'}],
 'EdTech': [{'model': 'Qwen/Qwen2-1.5B-Instruct',
   'params_b': '1.5B',
   'arch': 'Qwen2 Transformer',
   'optimization': 'Quantized GGUF available',
   'benchmarks': 'Strong GSM8K for small size'},
  {'model': 'google/gemma-2b-it',
   'params_b': '2B',
   'arch': 'Gemm

In [9]:
matchmaker_table = """| Team | Needs | Your Pick |
| --- | --- | --- |
| LegalTech | Fast model for logic-heavy chatbot on CPU | Mistral-7B-Instruct-v0.3 (GGUF quantized) |
| EdTech | Logic/math-focused LLM on low-end laptops | Qwen2-1.5B-Instruct |
| Global NGO | Model that speaks 5+ languages well | Qwen2-7B-Instruct |
"""
print(matchmaker_table)


| Team | Needs | Your Pick |
| --- | --- | --- |
| LegalTech | Fast model for logic-heavy chatbot on CPU | Mistral-7B-Instruct-v0.3 (GGUF quantized) |
| EdTech | Logic/math-focused LLM on low-end laptops | Qwen2-1.5B-Instruct |
| Global NGO | Model that speaks 5+ languages well | Qwen2-7B-Instruct |



## Exercise 4: Local Readiness Audit
**Use:** RAM/disk/OS; llama.cpp requirements (AVX/SSE, cmake/make/gcc/clang); quantized formats (GGUF).

**Step-by-step**
1) Gather system specs (RAM, free disk, OS type/version).  
2) Fill the audit table and mark ?/?.  
3) Check llama.cpp readiness (instruction sets, compilers, build tools).  
4) Identify upgrade needs for each ?.

**Deliverables:** filled table + upgrade summary.

In [10]:
system_specs = {
    "ram_gb": "Unknown (fill after checking system info)",
    "free_disk_gb": "Unknown (check available storage)",
    "os": "Unknown (Windows / macOS / Linux + version)",
}

system_specs


{'ram_gb': 'Unknown (fill after checking system info)',
 'free_disk_gb': 'Unknown (check available storage)',
 'os': 'Unknown (Windows / macOS / Linux + version)'}

In [11]:
readiness_table = """| Requirement | Your System Specs | Meets Requirement? |
| --- | --- | --- |
| RAM (>= 16 GB) | Unknown | ? |
| Free Disk Space (>= 40 GB) | Unknown | ? |
| OS (Linux/WSL2) | Unknown | ? |"""
print(readiness_table)


| Requirement | Your System Specs | Meets Requirement? |
| --- | --- | --- |
| RAM (>= 16 GB) | Unknown | ? |
| Free Disk Space (>= 40 GB) | Unknown | ? |
| OS (Linux/WSL2) | Unknown | ? |


In [12]:
llama_cpp_readiness = {
    "cpu_instruction_support": "Unknown (check for AVX2 or higher)",
    "tooling": [
        "Unknown (requires CMake installed)",
        "Unknown (requires GCC or Clang compilers)"
    ],
    "other_requirements": "Need ability to compile llama.cpp locally; GGUF support required",
}

llama_cpp_readiness


{'cpu_instruction_support': 'Unknown (check for AVX2 or higher)',
 'tooling': ['Unknown (requires CMake installed)',
  'Unknown (requires GCC or Clang compilers)'],
 'other_requirements': 'Need ability to compile llama.cpp locally; GGUF support required'}

In [13]:
upgrade_actions = [
    "Install required build tools (CMake + GCC/Clang) and enable AVX2-capable CPU if available",
    "Free additional disk space or add external storage for model quantization and caching"
]

upgrade_actions


['Install required build tools (CMake + GCC/Clang) and enable AVX2-capable CPU if available',
 'Free additional disk space or add external storage for model quantization and caching']

## Exercise 5: Benchmark-Based Model Explorer
**Use:** Open LLM Leaderboard; HellaSwag and MMLU scores; license types; use-case mapping.

**Step-by-step**
1) Pick three models from the leaderboard with different strengths (high HellaSwag, high MMLU, balanced).  
2) Record HellaSwag, MMLU, license, and ideal use case.  
3) Fill the comparison table.  
4) Optional: 1?2 sentence reflection on why benchmarks matter.  
5) Quiz-style reflection: write 3 short Q&A items about choosing models for different constraints.

**Deliverables:** model list + filled table + quiz-style reflection (+ optional reflection paragraph).

In [14]:
leaderboard_models = [
    {
        "model": "meta-llama/Meta-Llama-3-8B-Instruct",
        "url": "https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct",
        "hellaswag": "≈87%",
        "mmlu": "≈68%",
        "license": "Meta Llama 3 License",
        "ideal_use_case": "Strong commonsense reasoning, legal/logic chatbots, multi-step QA"
    },
    {
        "model": "Qwen/Qwen2-7B-Instruct",
        "url": "https://huggingface.co/Qwen/Qwen2-7B-Instruct",
        "hellaswag": "≈85%",
        "mmlu": "≈78%",
        "license": "Apache 2.0",
        "ideal_use_case": "High-level academic reasoning, tutoring, analytic tasks"
    },
    {
        "model": "mistralai/Mistral-7B-Instruct-v0.3",
        "url": "https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3",
        "hellaswag": "≈84%",
        "mmlu": "≈70%",
        "license": "Apache 2.0",
        "ideal_use_case": "Generalist assistant, multilingual tasks, everyday instruction following"
    },
]

leaderboard_models


[{'model': 'meta-llama/Meta-Llama-3-8B-Instruct',
  'url': 'https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct',
  'hellaswag': '≈87%',
  'mmlu': '≈68%',
  'license': 'Meta Llama 3 License',
  'ideal_use_case': 'Strong commonsense reasoning, legal/logic chatbots, multi-step QA'},
 {'model': 'Qwen/Qwen2-7B-Instruct',
  'url': 'https://huggingface.co/Qwen/Qwen2-7B-Instruct',
  'hellaswag': '≈85%',
  'mmlu': '≈78%',
  'license': 'Apache 2.0',
  'ideal_use_case': 'High-level academic reasoning, tutoring, analytic tasks'},
 {'model': 'mistralai/Mistral-7B-Instruct-v0.3',
  'url': 'https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3',
  'hellaswag': '≈84%',
  'mmlu': '≈70%',
  'license': 'Apache 2.0',
  'ideal_use_case': 'Generalist assistant, multilingual tasks, everyday instruction following'}]

In [15]:
benchmark_table = """| Model Name | HellaSwag Score | MMLU Score | License Type | Ideal Use Case |
| --- | --- | --- | --- | --- |
| Meta-Llama-3-8B-Instruct | ≈87% | ≈68% | Meta Llama 3 License | Strong commonsense reasoning, legal/logic chatbots |
| Qwen2-7B-Instruct | ≈85% | ≈78% | Apache 2.0 | High-level academic reasoning, tutoring, analytic tasks |
| Mistral-7B-Instruct-v0.3 | ≈84% | ≈70% | Apache 2.0 | General-purpose assistant, multilingual instruction tasks |
"""
print(benchmark_table)


| Model Name | HellaSwag Score | MMLU Score | License Type | Ideal Use Case |
| --- | --- | --- | --- | --- |
| Meta-Llama-3-8B-Instruct | ≈87% | ≈68% | Meta Llama 3 License | Strong commonsense reasoning, legal/logic chatbots |
| Qwen2-7B-Instruct | ≈85% | ≈78% | Apache 2.0 | High-level academic reasoning, tutoring, analytic tasks |
| Mistral-7B-Instruct-v0.3 | ≈84% | ≈70% | Apache 2.0 | General-purpose assistant, multilingual instruction tasks |



### Optional reflection
1?2 sentences on why benchmarks?not hype?should guide model choice.

In [16]:
quiz_reflection = [
    {
        "question": "Which benchmark should you check when choosing a model for math tutoring?",
        "answer": "GSM8K, because it measures mathematical reasoning and step-by-step problem solving."
    },
    {
        "question": "What benchmark is most useful when selecting a model for logic-heavy legal chatbots?",
        "answer": "HellaSwag or BoolQ, since they evaluate commonsense reasoning and logical inference."
    },
    {
        "question": "What’s the first thing you check when selecting a model for a multilingual NGO project?",
        "answer": "Whether the model scores well on FLORES-200 or supports many languages natively."
    },
]

quiz_reflection


[{'question': 'Which benchmark should you check when choosing a model for math tutoring?',
  'answer': 'GSM8K, because it measures mathematical reasoning and step-by-step problem solving.'},
 {'question': 'What benchmark is most useful when selecting a model for logic-heavy legal chatbots?',
  'answer': 'HellaSwag or BoolQ, since they evaluate commonsense reasoning and logical inference.'},
 {'question': 'What’s the first thing you check when selecting a model for a multilingual NGO project?',
  'answer': 'Whether the model scores well on FLORES-200 or supports many languages natively.'}]

## Exercise 6: Cloud vs. Local Deployment Plan
**Use:** local vs. cloud cost/latency/scalability/security/maintenance; optional Colab timing.

**Step-by-step**
1) Draft 5 bullets pairing local vs. cloud pros/cons.  
2) Optional: run a 7B model on Colab; note model and response time.  
3) Summarize Colab observation if you ran it.

**Deliverables:** 5 bullets; optional Colab report (model + time).

In [17]:
pros_and_cons = [
    "Local deployment offers lower latency and full data privacy, but requires strong hardware and manual maintenance.",
    "Cloud deployment scales easily for many users, but introduces ongoing compute costs.",
    "Local inference avoids vendor lock-in, while cloud solutions rely heavily on provider availability and pricing.",
    "Cloud GPUs handle large models effortlessly, whereas local machines may require heavy quantization.",
    "Local setups give full offline control, while cloud deployments provide easier updates and monitoring."
]

pros_and_cons


['Local deployment offers lower latency and full data privacy, but requires strong hardware and manual maintenance.',
 'Cloud deployment scales easily for many users, but introduces ongoing compute costs.',
 'Local inference avoids vendor lock-in, while cloud solutions rely heavily on provider availability and pricing.',
 'Cloud GPUs handle large models effortlessly, whereas local machines may require heavy quantization.',
 'Local setups give full offline control, while cloud deployments provide easier updates and monitoring.']

In [18]:
colab_run = {
    "model_tested": "Unknown (fill after running in Colab)",
    "response_time_seconds": "Unknown",
    "notes": "Run a 7B GGUF or HF model and record the first-token latency or full response time.",
}

colab_run


{'model_tested': 'Unknown (fill after running in Colab)',
 'response_time_seconds': 'Unknown',
 'notes': 'Run a 7B GGUF or HF model and record the first-token latency or full response time.'}