In [13]:
!jupyter nbextension disable --py widgetsnbextension

Disabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


**This notebook demonstrates how to load instruction-tuned models (like `Phi-3-mini`, `Falcon-7B-Instruct`) using the Hugging Face `transformers` library, create a text-generation pipeline, and experiment with prompts to generate human-like responses.**


In [8]:
"""Install required libraries
We install Hugging Face's `transformers` (for pretrained NLP models)
and `accelerate` (for efficient model loading on different hardware like CPU/GPU)"""
!pip install transformers>=4.40.1 accelerate>=0.27.2

In [9]:
# Import the required classes from transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

"""Load the model and tokenizer
    - AutoModelForCausalLM: loads a pre-trained model for Causal Language Modeling (text generation).
    - AutoTokenizer: loads the matching tokenizer (converts text to tokens and back).
    - Here we are using Microsoft's "microsoft/Phi-3-mini-4k-instruct", a lightweight instruction-tuned model.
    - device_map="cuda": ensures the model runs on GPU if available.
    - torch_dtype="auto": lets PyTorch automatically select the best precision (e.g., float16/32).
    - trust_remote_code=False: avoids running custom model code for security reasons."""

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [10]:
# Import Hugging Face's high-level pipeline
from transformers import pipeline

"""Create a text generation pipeline
  Why use pipeline?
    - It abstracts away low-level details (tokenization, model forwarding, decoding).
    - Easy to use: you just provide the model, tokenizer, and parameters.
  Parameters explained:
    - task="text-generation": we want the model to generate text.
    - return_full_text=False: ensures we only get the generated continuation, not the original prompt.
    - max_new_tokens=500: limits how long the generated output can be.
    - do_sample=False: disables randomness (deterministic output), good for reproducibility."""

generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)

Device set to use cuda


In [11]:
# Define the prompt (user query)
messages = [
    {"role": "user", "content": "Summarize the importance of renewable energy in 3 bullet points."}
]



In [12]:
"""Generate output from the model
  - We pass the messages to the pipeline.
  - The pipeline internally handles tokenization, model inference, and decoding.
  - Response is returned as a list of dictionaries, where "generated_text" contains the response"""
response = generator(messages)
print(response[0]["generated_text"])

 - Renewable energy sources, such as solar, wind, and hydro, are essential for reducing greenhouse gas emissions and combating climate change.
- They provide a sustainable and inexhaustible supply of energy, reducing dependence on finite fossil fuels and enhancing energy security.
- Investing in renewable energy technologies creates jobs, stimulates economic growth, and promotes innovation in the energy sector.
