# Open-Source vs Closed-Source LLMs



**Learning goals**

    Understand open-source vs closed-source LLMs

    Load and run:

    Open-source models via Hugging Face

    Open-source models via Ollama (OpenAI-compatible API)

    Closed-source models via OpenRouter

    Inspect model properties

    Send simple prompts and compare outputs

In [None]:
!pip install -q transformers accelerate torch openai

## Step 1: Open Source + HuggingFace


**Two ways:**

    pipelines: https://huggingface.co/docs/transformers/en/main_classes/pipelines

    Auto Classes: https://huggingface.co/docs/transformers/model_doc/auto

### pipeline

In [None]:
# pipeline
#             Great way to use models for inference
#             Tasks: Audio      (classification, recognition),
#                    Vision     (classification, obj detection, segmentation, depth)
#                    NLP        (classification, Q&A, Summerization, Translation, Lang modeling(word prediciton))
#                    MultiModal (Doc Q&A)



In [None]:
from transformers import pipeline

# Audio classification
pipe   = pipeline("audio-classification")
result = pipe("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")

In [None]:
result

[{'score': 0.4868629574775696, 'label': 'up'},
 {'score': 0.19627460837364197, 'label': '_unknown_'},
 {'score': 0.19116656482219696, 'label': 'left'},
 {'score': 0.04208880290389061, 'label': '_silence_'},
 {'score': 0.0303605105727911, 'label': 'on'},
 {'score': 0.01467297226190567, 'label': 'go'},
 {'score': 0.013018293306231499, 'label': 'right'},
 {'score': 0.011818612925708294, 'label': 'off'},
 {'score': 0.005862788762897253, 'label': 'stop'},
 {'score': 0.005178585182875395, 'label': 'no'},
 {'score': 0.0020859132055193186, 'label': 'down'},
 {'score': 0.0006093769334256649, 'label': 'yes'}]

In [None]:
# vision
pipe   = pipeline("image-classification")
result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")
result

preprocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
Device set to use cpu


[{'label': 'lynx, catamount', 'score': 0.43349990248680115},
 {'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
  'score': 0.03479622304439545},
 {'label': 'snow leopard, ounce, Panthera uncia',
  'score': 0.032401926815509796},
 {'label': 'Egyptian cat', 'score': 0.023944783955812454},
 {'label': 'tiger cat', 'score': 0.02288925088942051}]

In [None]:
# NLP and Q&A
pipe = pipeline("question-answering")
result = pipe(question="Where do I work?", context="My name is Sylvain and I work at Hugging Face in Brooklyn")
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'score': 0.6949766278266907, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}


### Auto Classes


In [None]:
# Auto Classes

#     pass name or path >> load model and its components
#     AutoConfig, AutoTokenizer, AutoModel ...
#     AutoModel is Generic class
#     for NLP:    AutoModelForCausalLM, AutoModelForMaskLM, AutoModelForSequenceClassification, ...
#     for Vision: AutoModelForImageClassification, AutoModelForDepthEstimation
#     available for Audio, MutliModal, Timeseries and etc

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "google/gemma-2b"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

print(f"model name: {model_name}")
print(f"num. parameters: {model.num_parameters()}")
print("Max context length:", model.config.max_position_embeddings)

# run a prompt
prompt = "Hi there, where is the capital of France?"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.1,
        do_sample=True
    )

decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)


## Step 2: OpenAI

In [None]:
from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key= api_key,
)

# First API call with reasoning
response = client.chat.completions.create(
  model="openai/gpt-oss-20b:free",
  messages=[
          {
            "role": "user",
            "content": "How many r's are in the word 'strawberry'?"
          }
        ]
)

# Extract the assistant message with reasoning_details
result = response.choices[0].message.content
print(result)