In [1]:
pip install torch transformers peft accelerate bitsandbytes huggingface_hub


Collecting bitsandbytes
  Downloading bitsandbytes-0.49.2-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Downloading bitsandbytes-0.49.2-py3-none-manylinux_2_24_x86_64.whl (60.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.7/60.7 MB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.49.2


In [2]:
pip install unsloth

Collecting unsloth
  Downloading unsloth-2026.2.1-py3-none-any.whl.metadata (69 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/69.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.7/69.7 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unsloth_zoo>=2026.2.1 (from unsloth)
  Downloading unsloth_zoo-2026.2.1-py3-none-any.whl.metadata (32 kB)
Collecting tyro (from unsloth)
  Downloading tyro-1.0.6-py3-none-any.whl.metadata (12 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.34-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.2 kB)
Collecting datasets!=4.0.*,!=4.1.0,<4.4.0,>=3.4.1 (from unsloth)
  Downloading datasets-4.3.0-py3-none-any.whl.metadata (18 kB)
Collecting hf_transfer (from unsloth)
  Downloading hf_transfer-0.1.9-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.7 kB)
Collecting transformers!=4.52.0,!=4.52.1,!=4.52.2,!=4.52.3,!=4.

In [4]:
import sys
import logging
from typing import Optional

import torch
from unsloth import FastLanguageModel
from transformers import TextStreamer


# ==============================
# CONFIGURATION
# ==============================

BASE_MODEL_ID = "unsloth/Llama-3.2-3B-Instruct"
ADAPTER_MODEL_ID = "Krishm/industry5-llama3-qlora"

MAX_NEW_TOKENS = 256
TEMPERATURE = 0.7
TOP_P = 0.9


# ==============================
# LOGGING
# ==============================

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)


def check_cuda():
    if not torch.cuda.is_available():
        logger.warning("CUDA not detected. Unsloth works best with GPU.")
    return torch.cuda.is_available()


def load_model():
    try:
        logger.info("Loading base model using Unsloth (4-bit optimized)...")

        model, tokenizer = FastLanguageModel.from_pretrained(
            model_name=BASE_MODEL_ID,
            max_seq_length=2048,
            dtype=None,
            load_in_4bit=True,
        )

        logger.info("Loading LoRA adapter...")
        model = FastLanguageModel.get_peft_model(
            model,
            r=16,
            target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"],
            lora_alpha=32,
            lora_dropout=0.05,
            bias="none",
            use_gradient_checkpointing=False,
        )

        model.load_adapter(ADAPTER_MODEL_ID,adapter_name="adapter_industry5-llama3-qlora")

        model.eval()

        return model, tokenizer

    except Exception as e:
        logger.critical(f"Model loading failed: {e}")
        sys.exit(1)


def generate_response(model, tokenizer, prompt: str):
    try:
        inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

        streamer = TextStreamer(tokenizer)

        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=MAX_NEW_TOKENS,
                temperature=TEMPERATURE,
                top_p=TOP_P,
                do_sample=True,
                streamer=streamer,
            )

        return tokenizer.decode(outputs[0], skip_special_tokens=True)

    except Exception as e:
        logger.error(f"Generation failed: {e}")
        raise


def main(question: Optional[str] = None):
    check_cuda()

    if not question:
        question = "What are the key principles of Industry 5.0?"

    model, tokenizer = load_model()

    logger.info("Generating response...")
    response = generate_response(model, tokenizer, question)

    print("\n===== QUESTION =====")
    print(question)
    print("\n===== RESPONSE =====")
    print(response)


if __name__ == "__main__":
    main()


==((====))==  Unsloth 2026.2.1: Fast Llama patching. Transformers: 4.57.6.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2026.2.1 patched 28 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


<|begin_of_text|>What are the key principles of Industry 5.0? Industry 5.0 is a concept that represents the next phase of industrial development, built on the principles of Industry 4.0, with a focus on sustainability, social responsibility, and human-centered design. The key principles of Industry 5.0 are:
1. Sustainability: Industry 5.0 prioritizes sustainability and environmental responsibility. It aims to reduce waste, minimize carbon footprint, and promote eco-friendly practices throughout the entire supply chain.
2. Social responsibility: Industry 5.0 emphasizes social responsibility and the well-being of employees, customers, and the broader community. It fosters a culture of inclusivity, diversity, and respect for human rights.
3. Human-centered design: Industry 5.0 is designed with human needs and well-being at its core. It focuses on creating products and services that are intuitive, user-friendly, and tailored to individual needs.
4. Circular economy: Industry 5.0 promotes t

In [5]:
question = "What is Industry 5.0 and what are its main goals?"
main(question)

==((====))==  Unsloth 2026.2.1: Fast Llama patching. Transformers: 4.57.6.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
<|begin_of_text|>What is Industry 5.0 and what are its main goals? Industry 5.0 is a concept that represents the next phase of industrial development, which is characterized by the integration of technology, artificial intelligence, and the Internet of Things (IoT). The main goals of Industry 5.0 are to increase efficiency, productivity, and sustainability, while also promoting innovation and entrepreneurship.
Some of the key features of Industry 5.0 include:
1. **Digitalization**: The use of digital technologies, such as artificial intelligen