## Install Dependencies

In [None]:
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" -q
!pip install --no-deps "trl<0.9.0" peft accelerate bitsandbytes xformers datasets -q

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## LLM Inference

In [None]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "wolf010/4TH_fine_tuned_Llama-3.2-3B-Instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit
)

FastLanguageModel.for_inference(model)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.10.7: Fast Llama patching. Transformers = 4.46.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu124. CUDA = 7.5. CUDA Toolkit = 12.4.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-

In [None]:
base_prompt = """
You are operating a virtual coffee kiosk that receives speech-to-text (STT) inputs from customers placing coffee orders. Your role is to understand and process these inputs, respond naturally in Korean, and generate a structured JSON file with the correct details for backend processing.

**Key Requirements**:
- **Menu Items**: The kiosk offers the following drinks:
- Hot Drinks: 허브티 (always served hot)
- Iced Only Drinks: 토마토주스, 키위주스, 망고스무디, 딸기스무디, 레몬에이드, 복숭아아이스티 (always served iced)
- Hot and Iced Coffee: 아메리카노, 라떼, 카푸치노, 카페모카, 바닐라라떼, 에스프레소, 카라멜마끼아또
- Specialty Drinks: 초콜릿라떼 (available in both hot and iced versions)
- **Default Values**:
    - Use default size "미디움" and temperature "핫" only if the customer does not specify these details.
- **Do Not Make Assumptions**:
    - If the customer specifies temperature or size, do not override it with defaults. For instance, if they say "아이스 라떼 두잔 주세요", the output should indicate "아이스" without changing it to "핫".
- **Current Conversation History** is a single-line cumulative log of all customer requests so far in this session. starting from 1
**Customer Input and Expected Output Format**:
- Each response should have:
  1. **Natural Language Confirmation**: Respond in Korean, starting with an action confirmation such as "[Drink] [quantity] 주문되었습니다." and follow with a full summary of all items ordered so far in the current conversation history up to the last entry, beginning with "지금까지 주문하신 내용은 다음과 같습니다:".
  2. **Structured JSON Output**: Each JSON output should only contain the items directly requested in the latest input, not a full history.

  - **JSON Output** should include only the latest customer input items (from the most recent entry in **Current Conversation History**), not the entire conversation history.
- In your natural language response:
  - Confirm the items in the latest order entry, followed by a summary of all items ordered so far.


**JSON Output Format**:
- The JSON should be structured as follows:
  ```json
  {{
      "action": "[action_type]",
      "order_items": [
          {{
              "drink": "[Drink Name]",
              "size": "[Size]",
              "temperature": "[Temperature]",
              "quantity": [Quantity],
              "add_ons": [List of add-ons if any],
              "extra_shots": [Number of extra shots if any]
          }}
      ]
  }}
  ```
  - **Example JSON Output**:
    ```json
    {{
        "action": "create_order",
        "order_items": [
            {{
                "drink": "아메리카노",
                "size": "미디움",
                "temperature": "핫",
                "quantity": 1,
                "add_ons": [],
                "extra_shots": 0
            }}
        ]
    }}
    ```

**Available Actions for JSON Output**:
- **create_order**: For new drink orders.
- **add_item**: For adding a new item to the current order.
- **modify_order**: For changing an existing item (e.g., modifying size or temperature).
- **cancel_order**: To remove an order item or reset the order.
- **recommend_closest_item**: If a requested item is unavailable, recommend the closest item.
- **show_order_summary**: Display a summary of all items ordered so far.
- **complete_order**: Finalize the order after confirmation.

**Specific Scenarios and Expected Outputs**:
- **Creating a New Order**:
- **Current Conversation History**:
"Customer's 1 Input:아메리카노 4잔 주세요."
**Response**:
- **Natural Language Response**: "아메리카노 4잔 주문되었습니다. 지금까지 주문하신 내용은 다음과 같습니다:
-핫 아메리카노 미디옴 4잔"
- **JSON Output**:
  ```json
  {{
    "action": "create_order",
    "order_items": [
      {{
        "drink": "아메리카노",
        "size": "미디움",
        "temperature": "핫",
        "quantity": 4,
        "add_ons": [],
        "extra_shots": 0
      }}
    ]
  }}
  ```
**Example**:
- **Current Conversation History**:
"Customer's 1 Input: 아메리카노 4잔 주세요. Customer's 2 Input: 카페라떼 라지로 2잔 주세요"
  **Response**:
  - **Natural Language Response**: "카페라떼 라지로 2잔 주문되었습니다. 지금까지 주문하신 내용은 다음과 같습니다:
  - 핫 아메리카노 미디움 4잔,
  - 핫 카페라떼 라지 2잔."
  - **JSON Output**:
    ```json
    {{
      "action": "create_order",
      "order_items": [
        {{
          "drink": "카페라떼",
          "size": "라지",
          "temperature": "핫",
          "quantity": 2,
          "add_ons": [],
          "extra_shots": 0
        }}
      ]
    }}
    ```
- **Requesting Order Summary**:
  - **Current Conversation History**:
  "Customer's 1 Input: 내가 지금까지 뭘 주문했지?"
  **Response**:
  - **Natural Language Response**: "지금까지 주문하신 내용은 다음과 같습니다:
  -핫 아메리카노 미디움 4잔 1샷 추가
  -아이스 카페라떼 라지 2잔 휘핑크림 추가"
  - **JSON Output**: None (as it is just a summary request without any new action).

- **Modifying an Existing Order**:
  - **Current Conversation History**:
  "Customer's 1 Input: "주문한거 아이스 라떼로 바꿔줘."
  **Response**:
  - **Natural Language Response**: "주문이 아메리카노에서 아이스 라떼로 변경되었습니다. 지금까지 주문하신 내용은 다음과 같습니다:
  -아이스 라떼 미디옴 1잔"
  - **JSON Output**:
    ```json
    {{
      "action": "modify_order",
      "old_drink": "아메리카노",
      "new_drink": "라떼",
      "size": "미디움",
      "temperature": "아이스",
      "quantity": 1,
      "add_ons": [],
      "extra_shots": 0
    }}
    ```

- **Short Names or Misspellings**:
  - Recognize common shorthand or misspellings. For example:
    - "아아" should be interpreted as "아이스 아메리카노".
    - "뜨아" should be interpreted as "핫 아메리카노".

- **Unavailable Items**:
  - If the customer requests an item not on the menu, respond politely and recommend a similar item if available.
  - **Example**:
  - **Current Conversation History**:
  "Customer's 1 Input: "초코라떼 주세요."
  **Response**:
    - **Natural Language Response**: "죄송합니다, 초코라떼는 메뉴에 없습니다. 대신 초콜릿라떼를 추천드립니다."
    - **JSON Output**:
      ```json
      {{
        "action": "recommend_closest_item",
        "requested_item": "초코라떼",
        "recommended_item": "초콜릿라떼"
      }}
      ```

- **Order Confirmation**:
  - **Customer Input**: "주문 완료할게요."
  - **Natural Language Response**: "주문이 완료되었습니다. 결제는 카드리더기를 사용해주세요. 감사합니다."
  - **JSON Output**: should include summary of items so far

**Response Rules**:
- Treat each new input as part of the same order until "주문 완료할게" is received, which finalizes the order.
- Always confirm the latest action first in the natural language response, followed by a full order summary.
- Ensure each JSON output reflects only the customer's latest input, not the entire conversation history.
**Current Conversation History**:
{}

**Response**:
"""




In [None]:
# Example Customer Input
customer_input = "Customer's 1 Input: 아이스 카페라떼 라지 한잔으로, 아이스 아메리카노 3잔을 엑스라지 사이즈로 주세요. Customer's 2 Input: 아이스 카페라떼 라지 2잔으로 바꿔주세요"

# Process the input without instructions
inputs = tokenizer([base_prompt.format(customer_input, "")], return_tensors='pt').to("cuda")

# Generate output from the model
outputs = model.generate(**inputs, max_new_tokens=200, use_cache=True)

# Decode and print the output
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)


You are operating a virtual coffee kiosk that receives speech-to-text (STT) inputs from customers placing coffee orders. Your role is to understand and process these inputs, respond naturally in Korean, and generate a structured JSON file with the correct details for backend processing.

**Key Requirements**:
- **Menu Items**: The kiosk offers the following drinks:
- Hot Drinks: 허브티 (always served hot)
- Iced Only Drinks: 토마토주스, 키위주스, 망고스무디, 딸기스무디, 레몬에이드, 복숭아아이스티 (always served iced)
- Hot and Iced Coffee: 아메리카노, 라떼, 카푸치노, 카페모카, 바닐라라떼, 에스프레소, 카라멜마끼아또
- Specialty Drinks: 초콜릿라떼 (available in both hot and iced versions)
- **Default Values**:
    - Use default size "미디움" and temperature "핫" only if the customer does not specify these details.
- **Do Not Make Assumptions**:
    - If the customer specifies temperature or size, do not override it with defaults. For instance, if they say "아이스 라떼 두잔 주세요", the output should indicate "아이스" without changing it to "핫".
- **Current Conversation His

In [None]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

In [None]:
instruction = "You are a helpful assistant who can answer questions"
input = "Who developed GPT models"

# process the input
inputs = tokenizer([base_prompt.format(instruction, input, "")], return_tensors='pt').to('cuda')
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.batch_decode(outputs)[0]
print(response)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are a helpful assistant who can answer questions

### Input:
Who developed GPT models

### Response:
OpenAI developed GPT models.<|end_of_text|>


In [None]:
instruction = "You are a helpful assistant who can answer questions"
input = "Explain about Transformers in AI?"

# process the input
inputs = tokenizer([alpaca_prompt.format(instruction, input, "")], return_tensors='pt').to('cuda')
outputs = model.generate(**inputs, max_new_tokens=100, temperature = 0.1)
response = tokenizer.batch_decode(outputs)[0]
print(response)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are a helpful assistant who can answer questions

### Input:
Explain about Transformers in AI?

### Response:
Transformers are a type of artificial intelligence (AI) that uses a neural network to learn patterns in data. They are used in a variety of applications, including natural language processing, computer vision, and speech recognition. Transformers are able to learn complex patterns in data by using a neural network to process the data in a way that is similar to how the human brain processes information. This allows them to learn patterns in data that are too complex for traditional machine learning algorithms to handle.<|end_of_text|>


## Fine Tuning

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    lora_alpha = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = True,
    random_state = 42,
    max_seq_length = max_seq_length
)

Unsloth 2024.7 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

In [None]:
def format_input_prompt(examples):
    # get the list with keys
    instructions = examples['instruction']
    inputs = examples['input']
    outputs = examples['output']

    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # format the input prompt
        text = alpaca_prompt.format(instruction, input, output)
        texts.append(text)

    return {"text": texts}

In [None]:
# import the dataset
from datasets import load_dataset

dataset = load_dataset("yahma/alpaca-cleaned", split='train')

dataset = dataset.map(format_input_prompt, batched=True)

Downloading readme:   0%|          | 0.00/11.6k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/44.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51760 [00:00<?, ? examples/s]

Map:   0%|          | 0/51760 [00:00<?, ? examples/s]

In [None]:
dataset

Dataset({
    features: ['output', 'input', 'instruction', 'text'],
    num_rows: 51760
})

In [None]:
dataset[0]

{'output': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.',
 'input': '',
 'instruction': 'Give three tips for staying healthy.',
 'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes th

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model, # peft model
    train_dataset = dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    args = TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=10,
        max_steps=30,
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=1234,
        output_dir="outputs"
    )
)

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/464 [00:00<?, ?B/s]

Map:   0%|          | 0/51760 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 51,760 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 30
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.424
2,2.6548
3,2.6507
4,2.4485
5,2.7541
6,2.9558
7,2.6116
8,2.5738
9,2.7065
10,2.7154


In [None]:
trainer_stats

TrainOutput(global_step=30, training_loss=2.679690368970235, metrics={'train_runtime': 226.5245, 'train_samples_per_second': 1.059, 'train_steps_per_second': 0.132, 'total_flos': 2750561593786368.0, 'train_loss': 2.679690368970235, 'epoch': 0.00463678516228748})

In [None]:
## save the model
model.save_pretrained("./best_model")
tokenizer.save_pretrained('./best_model')

('./best_model/tokenizer_config.json',
 './best_model/special_tokens_map.json',
 './best_model/tokenizer.json')

In [None]:
## unsloth save model
from unsloth import unsloth_save_model

unsloth_save_model(model, tokenizer, "unsloth_model", )

Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... Done.


('unsloth_model', None)

In [None]:
FastLanguageModel.for_inference(model)

instruction = "You are a helpful assistant who can answer questions"
input = "Who developed GPT models"

# process the input
inputs = tokenizer([alpaca_prompt.format(instruction, input, "")], return_tensors='pt').to('cuda')
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.batch_decode(outputs)[0]
print(response)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are a helpful assistant who can answer questions

### Input:
Who developed GPT models

### Response:
OpenAI
<|end_of_text|>


In [None]:


# Set model for inference
FastLanguageModel.for_inference(model)

# Initialize conversation history, summary history, and order confirmation flag
conversation_history = []  # Stores each customer input as a string entry in the format: "Customer's X Input: [input]"
summary_history = []       # Stores each cumulative summary of orders
order_confirmed = False    # Tracks if the order is finalized

# Function to generate prompt based on conversation history and new input
def generate_prompt(conversation_history, user_input):
    # Format the conversation history and cumulative summary as a single string
    if conversation_history:
        formatted_history = " ".join(conversation_history)
    else:
        formatted_history = "none"
    base_prompt = f"""
    You are operating a virtual coffee kiosk that receives STT (speech-to-text) inputs from customers placing coffee orders. Your task is to process these inputs, respond in Korean, and generate a JSON output for backend processing.

    **Menu Items**:
    - Hot Drinks: 허브티 (always hot)
    - Iced Only Drinks: 토마토주스, 키위주스, 망고스무디, 딸기스무디, 레몬에이드, 복숭아아이스티 (always iced)
    - Hot/Iced Coffee: 아메리카노, 라떼, 카푸치노, 카페모카, 바닐라라떼, 에스프레소, 카라멜마끼아또
    - Specialty: 초콜릿라떼 (hot or iced)

    **Default Values**:
    - Use default size "미디움" and temperature "핫" if unspecified.
    - Do not override explicitly given size or temperature.

    **Response Requirements**:
    1. **Natural Language Response**: Confirm each item in Korean, e.g., "[Drink] [quantity] 주문되었습니다.", followed by a full summary of all ordered items, starting with "지금까지 주문하신 내용은 다음과 같습니다:".
    2. **JSON Output**: Only include items from the latest input in the structured JSON format below:
      ```json
      {{
          "action": "[action_type]",
          "order_items": [
              {{
                  "drink": "[Drink Name]",
                  "size": "[Size]",
                  "temperature": "[Temperature]",
                  "quantity": [Quantity],
                  "add_ons": [List of add-ons],
                  "extra_shots": [Number of extra shots]
              }}
          ]
      }}
    **Available Actions for JSON Output**:
    - **create_order**: For new drink orders.
    - **add_item**: For adding a new item to the current order.
    - **modify_order**: For changing an existing item (e.g., modifying size or temperature).
    - **cancel_order**: To remove an order item or reset the order.
    - **recommend_closest_item**: If a requested item is unavailable, recommend the closest item.
    - **show_order_summary**: Display a summary of all items ordered so far.
    - **complete_order**: Finalize the order after confirmation.

    **Key Scenarios**:

    - New Order: Confirm with a natural response and JSON output for each new drink.
    - Modification: Confirm changes and modify JSON.
    - Summary Request: Provide a summary without a JSON output.
    - Unavailable Items: Recommend a similar item.
    - Order Completion: Confirm completion and provide a summary.

    Current Conversation History: {formatted_history}

    Response: """
    return base_prompt.strip()

# Main interaction loop
print("Welcome to the virtual coffee kiosk! What would you like to order?")
input_counter = 1

while not order_confirmed:
    # Take user input
    user_input = input("Customer: ")

    # Check if customer confirms the order
    if "주문 완료할게" in user_input:
        order_confirmed = True
        print("Kiosk: 주문이 완료되었습니다. 결제는 카드리더기를 사용해주세요. 감사합니다.")
        conversation_history.clear()
        summary_history.clear()
        continue

    # Append the new input to conversation history with labeled format
    conversation_history.append(f"Customer's {input_counter} Input: {user_input}")

    # Generate the prompt based on the conversation history and new user input
    prompt = generate_prompt(conversation_history, user_input)

    # Process the input with the tokenizer and model
    inputs = tokenizer([prompt], return_tensors='pt').to("cuda")

    # Generate output from the model
    outputs = model.generate(**inputs, max_new_tokens=500, use_cache=True)

    # Decode and print only the final response from the model
    response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
    print("Kiosk:", response.strip())

    input_counter += 1

    # Handle order cancellation requests
    if "주문 취소" in user_input:
        # Clear conversation history and summary history for a reset
        print("Kiosk: 주문이 취소되었습니다. 새 주문을 시작해 주세요.")
        conversation_history.clear()
        summary_history.clear()
        input_counter = 1

# End of the session
print("Thank you for using the coffee kiosk!")


Welcome to the virtual coffee kiosk! What would you like to order?
Customer: 아메리카노 한잔 줘
Kiosk: You are operating a virtual coffee kiosk that receives STT (speech-to-text) inputs from customers placing coffee orders. Your task is to process these inputs, respond in Korean, and generate a JSON output for backend processing.

    **Menu Items**:
    - Hot Drinks: 허브티 (always hot)
    - Iced Only Drinks: 토마토주스, 키위주스, 망고스무디, 딸기스무디, 레몬에이드, 복숭아아이스티 (always iced)
    - Hot/Iced Coffee: 아메리카노, 라떼, 카푸치노, 카페모카, 바닐라라떼, 에스프레소, 카라멜마끼아또
    - Specialty: 초콜릿라떼 (hot or iced)

    **Default Values**:
    - Use default size "미디움" and temperature "핫" if unspecified.
    - Do not override explicitly given size or temperature.

    **Response Requirements**:
    1. **Natural Language Response**: Confirm each item in Korean, e.g., "[Drink] [quantity] 주문되었습니다.", followed by a full summary of all ordered items, starting with "지금까지 주문하신 내용은 다음과 같습니다:".
    2. **JSON Output**: Only include items from the latest 

KeyboardInterrupt: Interrupted by user

In [None]:
# Install dependencies
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" -q
!pip install --no-deps "trl<0.9.0" peft accelerate bitsandbytes xformers datasets -q

# Import necessary libraries
from unsloth import FastLanguageModel
import torch
import json

# Set parameters for model loading
max_seq_length = 2048
dtype = None
load_in_4bit = True

# Load the model using Unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="wolf010/4TH_fine_tuned_Llama-3.2-3B-Instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit
)

# Set model for inference
FastLanguageModel.for_inference(model)