In [None]:
!jupyter nbextension disable --py widgetsnbextension

Disabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


**This notebook demonstrates how to load instruction-tuned models (like `Phi-3-mini`, `Falcon-7B-Instruct`) using the Hugging Face `transformers` library, create a text-generation pipeline, and experiment with prompts to generate human-like responses.**


In [None]:
"""Install required libraries
We install Hugging Face's `transformers` (for pretrained NLP models)
and `accelerate` (for efficient model loading on different hardware like CPU/GPU)"""
!pip install transformers>=4.40.1 accelerate>=0.27.2

In [None]:
# Import the required classes from transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

"""Load the model and tokenizer
    - AutoModelForCausalLM: loads a pre-trained model for Causal Language Modeling (text generation).
    - AutoTokenizer: loads the matching tokenizer (converts text to tokens and back).
    - Here we are using Microsoft's "microsoft/Phi-3-mini-4k-instruct", a lightweight instruction-tuned model.
    - device_map="cuda": ensures the model runs on GPU if available.
    - torch_dtype="auto": lets PyTorch automatically select the best precision (e.g., float16/32).
    - trust_remote_code=False: avoids running custom model code for security reasons."""

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

In [None]:
# Import Hugging Face's high-level pipeline
from transformers import pipeline

"""Create a text generation pipeline
  Why use pipeline?
    - It abstracts away low-level details (tokenization, model forwarding, decoding).
    - Easy to use: you just provide the model, tokenizer, and parameters.
  Parameters explained:
    - task="text-generation": we want the model to generate text.
    - return_full_text=False: ensures we only get the generated continuation, not the original prompt.
    - max_new_tokens=500: limits how long the generated output can be.
    - do_sample=False: disables randomness (deterministic output), good for reproducibility."""

generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)

In [None]:
# Define the prompt (user query)
messages = [
    {"role": "user", "content": "Summarize the importance of renewable energy in 3 bullet points."}
]



In [None]:
"""Generate output from the model
  - We pass the messages to the pipeline.
  - The pipeline internally handles tokenization, model inference, and decoding.
  - Response is returned as a list of dictionaries, where "generated_text" contains the response"""
response = generator(messages)
print(response[0]["generated_text"])