### Installation

In [1]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
    !pip install --no-deps unsloth

### Unsloth

In [19]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.

model, tokenizer = FastLanguageModel.from_pretrained(
    # Can select any from the below:
    # "unsloth/Qwen2.5-0.5B", "unsloth/Qwen2.5-1.5B", "unsloth/Qwen2.5-3B"
    # "unsloth/Qwen2.5-14B",  "unsloth/Qwen2.5-32B",  "unsloth/Qwen2.5-72B",
    # And also all Instruct versions and Math. Coding verisons!
    model_name = "Qwen/Qwen2.5-3B-Instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2025.3.19: Fast Qwen2 patching. Transformers: 4.51.3.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.3.19 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers.


In [4]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("hon9kon9ize/yue-alpaca", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

README.md:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

(…)-00000-of-00001-9bb0e9b56b998f51.parquet:   0%|          | 0.00/4.61M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/18649 [00:00<?, ? examples/s]

Map:   0%|          | 0/18649 [00:00<?, ? examples/s]

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [5]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 300,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/18649 [00:00<?, ? examples/s]

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.557 GB.
7.246 GB of memory reserved.


In [6]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 18,649 | Num Epochs = 1 | Total steps = 300
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 29,933,568/3,000,000,000 (1.00% trained)


Step,Training Loss
1,3.2305
2,3.191
3,3.0948
4,2.6099
5,2.0581
6,1.8298
7,1.593
8,1.9792
9,1.9158
10,1.6218


Unsloth: Will smartly offload gradients to save VRAM!


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

112.0602 seconds used for training.
1.87 minutes used for training.
Peak reserved memory = 8.732 GB.
Peak reserved memory for training = 1.486 GB.
Peak reserved memory % of max memory = 22.074 %.
Peak reserved memory for training % of max memory = 3.757 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Unsloth_Studio.ipynb)**

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [28]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "有人想買樓，佢想你畀啲意見佢，佢應該係香港邊區買樓會好啲？", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
有人想買樓，佢想你畀啲意見佢，佢應該係香港邊區買樓會好啲？

### Input:


### Response:
選擇在香港的哪個地區買樓可能取決於您的個人需求和目標。如果您是年輕人，剛開始工作並尋找首次購房的機會，可能考慮在新界（新界北、新界南）或油尖溫（油塘、尖沙咀、溫哥華）這類地區。這些地方的房屋價格相對較低，且交通方便，靠近購物中心和學校。

如果家庭成員多或者有小孩，您可能會偏好鄰近學校和醫療設施的地區，比如青衣、大埔或沙田等區域。

如果您希望住在一個更加生活化的環境中，那麼可以考慮住在中西區、深水埗、馬鞍山或荃灣等地，這些地方除了有優質的生活環境外，還靠近公園和娛樂設施。

總之，香港的每一個地區都有其獨特的優勢，建議您可以根據自己的實際需要來決定最適合您的地區。另外，也可以考慮進行市場調查和評估，以確保選購的樓盤合乎您的預算和期望。 ### Response:
選擇在香港的哪個地區買樓確實是一個重要的決定，主要取決於您的需求和目標。以下是一些具體的


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [49]:
# model.save_pretrained("lora_model")  # Local saving
# tokenizer.save_pretrained("lora_model")
tuned_model.push_to_hub("jackf857/canton_fine-tuned_qwen3b-instruct", token = "") # Online saving
tuned_tokenizer.push_to_hub("jackf857/canton_fine-tuned_qwen3b-instruct", token = "") # Online saving

README.md:   0%|          | 0.00/24.0 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/120M [00:00<?, ?B/s]

Saved model to https://huggingface.co/jackf857/canton_fine-tuned_qwen3b-instruct


README.md:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [50]:
if True:
    from unsloth import FastLanguageModel
    tuned_model, tuned_tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model",
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = True,
    )
    FastLanguageModel.for_inference(tuned_model) # Enable native 2x faster inference

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""



==((====))==  Unsloth 2025.3.19: Fast Qwen2 patching. Transformers: 4.51.3.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [51]:
inputs = tuned_tokenizer(
[
    alpaca_prompt.format(
        "我想寫一封投訴信畀某間公司，話佢哋嘅客服服務好差，話佢哋點樣對待我，仲有最後我想要咩補償.", # instruction
       """我早前在貴公司買咗一部電腦，但係佢係壞嘅。我試過聯絡你哋嘅客服，但係佢哋嘅服務態度好差，冇人肯幫我解決問題。
        我想要一個退款或者換一部新嘅電腦。我喺 2022 年 8 月 28 日喺貴公司嘅官網買咗一部電腦。張單號係 1234567890。
        我喺 2022 年 8 月 31 日收到部電腦，但係發現佢係壞嘅。我即刻聯絡你哋嘅客服，佢哋叫我寄返部電腦畀佢哋。
        我喺 2022 年 9 月 5 日寄返部電腦畀佢哋。一直到 2022 年 9 月 15 日，我都仲未收到佢哋嘅答覆。
        我再打畀佢哋嘅客服追問，佢哋話佢哋仲未收到我寄返嘅電腦。我話畀佢哋知道我有張郵寄收據，可以證明我已經寄返部電腦畀佢哋。
        但係佢哋話佢哋唔接受郵寄收據，佢哋只接受佢哋自己嘅快遞單據。我話畀佢哋知道佢哋咁樣係唔合理嘅，
        但係佢哋堅持要我哋自己俾錢寄返部電腦畀佢哋。我感到非常無奈同埋憤怒。我希望貴公司可以畀一個退款或者換一部新嘅電腦畀我，並且作出合適嘅賠償。""",# input
        "", # output - leave this blank for generation!    # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tuned_tokenizer)
_ = tuned_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024, use_cache = False)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
我想寫一封投訴信畀某間公司，話佢哋嘅客服服務好差，話佢哋點樣對待我，仲有最後我想要咩補償.

### Input:
我早前在貴公司買咗一部電腦，但係佢係壞嘅。我試過聯絡你哋嘅客服，但係佢哋嘅服務態度好差，冇人肯幫我解決問題。
        我想要一個退款或者換一部新嘅電腦。我喺 2022 年 8 月 28 日喺貴公司嘅官網買咗一部電腦。張單號係 1234567890。
        我喺 2022 年 8 月 31 日收到部電腦，但係發現佢係壞嘅。我即刻聯絡你哋嘅客服，佢哋叫我寄返部電腦畀佢哋。
        我喺 2022 年 9 月 5 日寄返部電腦畀佢哋。一直到 2022 年 9 月 15 日，我都仲未收到佢哋嘅答覆。
        我再打畀佢哋嘅客服追問，佢哋話佢哋仲未收到我寄返嘅電腦。我話畀佢哋知道我有張郵寄收據，可以證明我已經寄返部電腦畀佢哋。
        但係佢哋話佢哋唔接受郵寄收據，佢哋只接受佢哋自己嘅快遞單據。我話畀佢哋知道佢哋咁樣係唔合理嘅，
        但係佢哋堅持要我哋自己俾錢寄返部電腦畀佢哋。我感到非常無奈同埋憤怒。我希望貴公司可以畀一個退款或者換一部新嘅電腦畀我，並且作出合適嘅賠償。

### Response:
[公司名稱]
[地址]
[城市]
[國家]
[郵政編碼]
[電話號碼]
[電子郵件]

[日期]

[客戶姓名]
[地址]
[城市]
[國家]
[郵政編碼]

尊敬嘅 [公司名稱]：

我寫呢封信係想向你哋反映我喺你哋嘅店舖購買咗一部電腦，但係佢係壞嘅。我已經聯絡你哋嘅客服，但係佢哋嘅服務態度非常差，冇人肯幫我解決問題。

我喺 2022 年 8 月 28 日喺你哋嘅官網購買咗一部電腦，張單號係 1234567890。我喺 2022 年 8 月 31 日收到部電腦，但係發現佢係壞嘅。我即刻聯絡你哋嘅客服，佢哋叫我寄返部電腦畀佢哋。

我喺 2022 年 9 月 5 日

In [61]:
inputs = tuned_tokenizer(
[
    alpaca_prompt.format(
        "有人想買樓，佢想你畀啲意見佢，佢應該係香港邊區買樓會好啲？", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tuned_tokenizer)
_ = tuned_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024, use_cache=False)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
有人想買樓，佢想你畀啲意見佢，佢應該係香港邊區買樓會好啲？

### Input:


### Response:
- 香港嘅樓價係咁高，所以買樓仲要睇屋企入面嘅交通、學校、醫院等。
- 如果你希望住喺一個環境好、交通方便、教育優良嘅地方，你可以考慮買樓喺灣仔、尖沙咀、銅鑼灣等核心地區。
- 如果你希望住喺一個環境好、交通方便、生活成本低嘅地方，你可以考慮買樓喺沙田、荃灣、葵涌等新界地區。<|im_end|>


In [45]:
inputs = tuned_tokenizer(
[
    alpaca_prompt.format(
        "喺下面嘅對話入邊，請扮演黃先生嘅角色，並作出合適嘅回應。黃太：我唔識煮飯喎！點做飯畀你食啊？", # instruction
        "黃太：我唔識煮飯喎！點做飯畀你食啊？", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tuned_tokenizer)
_ = tuned_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024, use_cache=False)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
喺下面嘅對話入邊，請扮演黃先生嘅角色，並作出合適嘅回應。黃太：我唔識煮飯喎！點做飯畀你食啊？

### Input:
黃太：我唔識煮飯喎！點做飯畀你食啊？

### Response:
黃先生：係啦，我會幫你煮飯。你想要食乜嘢？<|im_end|>


You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [52]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Qwen/Qwen2.5-3B-Instruct",
    max_seq_length = 2048,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2025.3.19: Fast Qwen2 patching. Transformers: 4.51.3.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [53]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""


In [54]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "我想寫一封投訴信畀某間公司，話佢哋嘅客服服務好差，話佢哋點樣對待我，仲有最後我想要咩補償.", # instruction
        """我早前喺貴公司買咗一部電腦，但係佢係壞嘅。我試過聯絡你哋嘅客服，但係佢哋嘅服務態度好差，冇人肯幫我解決問題。
        我想要一個退款或者換一部新嘅電腦。我喺 2022 年 8 月 28 日喺貴公司嘅官網買咗一部電腦。張單號係 1234567890。
        我喺 2022 年 8 月 31 日收到部電腦，但係發現佢係壞嘅。我即刻聯絡你哋嘅客服，佢哋叫我寄返部電腦畀佢哋。
        我喺 2022 年 9 月 5 日寄返部電腦畀佢哋。一直到 2022 年 9 月 15 日，我都仲未收到佢哋嘅答覆。
        我再打畀佢哋嘅客服追問，佢哋話佢哋仲未收到我寄返嘅電腦。我話畀佢哋知道我有張郵寄收據，可以證明我已經寄返部電腦畀佢哋。
        但係佢哋話佢哋唔接受郵寄收據，佢哋只接受佢哋自己嘅快遞單據。我話畀佢哋知道佢哋咁樣係唔合理嘅，但係佢哋堅持要我哋自己俾錢寄返部電腦畀佢哋。我感到非常無奈同埋憤怒。我希望貴公司可以畀一個退款或者換一部新嘅電腦畀我，並且作出合適嘅賠償。""",# input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256, use_cache=False)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
我想寫一封投訴信畀某間公司，話佢哋嘅客服服務好差，話佢哋點樣對待我，仲有最後我想要咩補償.

### Input:
我早前喺貴公司買咗一部電腦，但係佢係壞嘅。我試過聯絡你哋嘅客服，但係佢哋嘅服務態度好差，冇人肯幫我解決問題。
        我想要一個退款或者換一部新嘅電腦。我喺 2022 年 8 月 28 日喺貴公司嘅官網買咗一部電腦。張單號係 1234567890。
        我喺 2022 年 8 月 31 日收到部電腦，但係發現佢係壞嘅。我即刻聯絡你哋嘅客服，佢哋叫我寄返部電腦畀佢哋。
        我喺 2022 年 9 月 5 日寄返部電腦畀佢哋。一直到 2022 年 9 月 15 日，我都仲未收到佢哋嘅答覆。
        我再打畀佢哋嘅客服追問，佢哋話佢哋仲未收到我寄返嘅電腦。我話畀佢哋知道我有張郵寄收據，可以證明我已經寄返部電腦畀佢哋。
        但係佢哋話佢哋唔接受郵寄收據，佢哋只接受佢哋自己嘅快遞單據。我話畀佢哋知道佢哋咁樣係唔合理嘅，但係佢哋堅持要我哋自己俾錢寄返部電腦畀佢哋。我感到非常無奈同埋憤怒。我希望貴公司可以畀一個退款或者換一部新嘅電腦畀我，並且作出合適嘅賠償。

### Response:
尊敬的貴公司，

我衷心地寫這封信，希望能夠就貴公司最近提供的客服服務質量以及處理我購買產品過程中的不當行為表達我的嚴重失望和不滿意。

自從2022年8月28日在我們貴公司的網站上訂購了一台新電腦，但到現在仍然沒有得到解決問題的幫助。我多次聯繫了貴公司的客服，但每次都只能得到令人失望的服務態度。我認為貴公司的客服應該更專業地處理客戶的問題，並盡可能地為客戶提供更好的服務體驗。

在2022年8月31日，我收到了這台存在問題的電腦，並即時向貴公司的客服反映了這個問題。他們要求我將問題電腦寄回給他們，並提供了詳細的寄送地址。然而，我於2022年9月5日將問題電腦寄回了貴公司，但至今尚未收

In [46]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "喺下面嘅對話入邊，請扮演黃先生嘅角色，並作出合適嘅回應。黃太：我唔識煮飯喎！點做飯畀你食啊？", # instruction
        "黃太：我唔識煮飯喎！點做飯畀你食啊？", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, use_cache=False)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
喺下面嘅對話入邊，請扮演黃先生嘅角色，並作出合適嘅回應。黃太：我唔識煮飯喎！點做飯畀你食啊？

### Input:
黃太：我唔識煮飯喎！點做飯畀你食啊？

### Response:
黃先生：好嘅，佢哋可以仲幫忙佢做飯喇。既緊，我會教佢幾多簡單嘅食譜，好容易就煮成一盤好嘅飯嚟喲！ 黃太：真的咩？真係可以呀！ 黃先生：當然可以啦，放心喎！阿太，可以仲幫幫我哋嘅阿嬤咪嗎？ 黃太：好呀，阿嬤咪好愛食你嘅飯啦！ 黃先生：好極喇，感謝你啦！


In [59]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "有人想買樓，佢想你畀啲意見佢，佢應該係香港邊區買樓會好啲", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256, use_cache=False)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
有人想買樓，佢想你畀啲意見佢，佢應該係香港邊區買樓會好啲

### Input:


### Response:
在決定在香港哪個區域購買樓盤時，可以考慮的因素有很多，包括交通便利性、生活配套設施、教育資源、工作機會以及未來發展前景等。以下是一些建議的區域：

1. **中環及大角咀**：這兩個地區是香港的金融中心，周圍有許多國際知名銀行和金融機構，對於投資者來說是一個不錯的選擇。此外，這裡的生活配套非常完善，交通方便。

2.格蘭地路及上環**：這裡的生活品質高，環境優美，鄰近多間學校，適合有小孩的家庭。另外，這兩區也靠近港鐵站，交通十分便利。

3. **沙田及荃灣**：這些地區相對較為安靜，適合家庭生活。這裡有不少新興發展項目，如沙田廣場和荃灣國際商業中心，未來發展潛力大。

4. **元朗及青衣**：元朗及青衣是香港人口較多的地區之一，生活成本相對較低。這兩個地方有大型的購物中心和公園，適合家庭生活。此外，元朗也有不少工業園區，提供多元化的就業機會。

5. **
