In [1]:
from unsloth import FastLanguageModel
import torch
import os

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [2]:
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth

In [3]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-v0.3", # "unsloth/mistral-7b" for 16bit loading
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2025.3.6: Fast Mistral patching. Transformers: 4.48.3.
   \\   /|    NVIDIA GeForce RTX 4070. Num GPUs = 1. Max memory: 11.719 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 128, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",

                      "embed_tokens", "lm_head",], # Add for continual pretraining
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth: Offloading input_embeddings to disk to save VRAM
Unsloth: Offloading output_embeddings to disk to save VRAM


Unsloth 2025.3.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Unsloth: Training embed_tokens in mixed precision to save VRAM
Unsloth: Training lm_head in mixed precision to save VRAM


In [9]:
def read_markdown_files(directory):
    markdown_contents = []

    for root, _, files in os.walk(directory):
        for file in files:
            if file.endswith(".md"):
                file_path = os.path.join(root, file)
                try:
                    with open(file_path, "r", encoding="utf-8", errors='replace') as f:
                        content = f.read()
                        markdown_contents.append(content)
                except Exception as e:
                    print(f"Error reading file {file_path}: {e}")

    return markdown_contents

In [10]:
# with open("../documentation/basics/GeneralTaskKnowledge/Cutting/Cutting_Locations.md") as f:
#     data1 = f.read()

docs_content = read_markdown_files(os.path.join(os.curdir, "../documentation"))

In [16]:
# from datasets import load_dataset
# dataset = load_dataset("roneneldan/TinyStories", split = "train[:100]")

print(docs_content[2])

Can usability models help robots learn and improve interaction quality through task data and user feedback?Yes, usability models can indeed support learning in robots to enhance usability over time. In fact, considering usability during the design and development of robotic systems is of paramount importance.

## Use of Usability Models in Robots

Usability models can serve as a guide for how robots should interact with users, helping them to achieve their goals efficiently, effectively, and with a high degree of satisfaction. Over time, as the robot interacts more with its users, it can use these models to learn and adapt itself to better meet the user's needs.

## Task Data & User Feedback

### Task Data

Task data refers to the specific tasks that the robot is programmed to perform. This data can include information on the success or failure of tasks, the time taken to complete tasks, and other metrics that are relevant to the task's performance.

Robots can use this data to improve

In [17]:
from datasets import Dataset, load_dataset
data = {"text": docs_content}
datasets = Dataset.from_dict(data)

In [18]:
EOS_TOKEN = tokenizer.eos_token

In [19]:
def formatting_prompts_func(examples):
    return { "text" : [example + EOS_TOKEN for example in examples["text"]] }
datasets = datasets.map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/551 [00:00<?, ? examples/s]

In [20]:
for row in datasets[:2]["text"]:
    print("=========================")
    print(row)

**How do usability models like Nielsen's Heuristics support heuristic evaluation, particularly through principles such as visibility of system status and error prevention?**Usability models provide a solid foundation for conducting heuristic evaluations in user-interface design. These models are instrumental in identifying possible usability problems in the design that might lead to user confusion or errors.

One widely-accepted usability model is Nielsen’s Usability Heuristics. This model consists of 10 simple, yet powerful rules of thumb, or "heuristics," for interactive design.

## Nielsen’s Usability Heuristics

Here are those principles with particular emphasis on "visibility of system status" and "error prevention":

1. **Visibility of system status**: This heuristic principle states that the design should always keep the users informed about what's going on, through appropriate feedback within reasonable time. By providing real-time status updates, users can understand the conte

In [22]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from unsloth import UnslothTrainer, UnslothTrainingArguments

trainer = UnslothTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = datasets,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 8,

    args = UnslothTrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 8,

        warmup_ratio = 0.1,
        num_train_epochs = 5,

        learning_rate = 5e-5,
        embedding_learning_rate = 5e-6,
        max_steps=100,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.00,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Tokenizing to ["text"] (num_proc=8):   0%|          | 0/551 [00:00<?, ? examples/s]

In [23]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 551 | Num Epochs = 3 | Total steps = 100
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 8 x 1) = 16
 "-____-"     Trainable parameters = 603,979,776/4,362,342,400 (13.85% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,1.325
2,1.3735
3,1.2958
4,1.2262
5,1.2966
6,1.2613
7,1.252
8,1.249
9,1.2449
10,1.1438


## Inference

In [29]:
from transformers import TextIteratorStreamer
from threading import Thread
text_streamer = TextIteratorStreamer(tokenizer)
import textwrap
max_print_width = 100

inputs = tokenizer(
[
    "what are the flanagan motion phases involved in the task- cut the apple, give it in json format, no further explanation is needed"
]*1, return_tensors = "pt").to("cuda")

generation_kwargs = dict(
    inputs,
    streamer = text_streamer,
    max_new_tokens = 1024,
    use_cache = True,
)
thread = Thread(target = model.generate, kwargs = generation_kwargs)
thread.start()

length = 0
for j, new_text in enumerate(text_streamer):
    if j == 0:
        wrapped_text = textwrap.wrap(new_text, width = max_print_width)
        length = len(wrapped_text[-1])
        wrapped_text = "\n".join(wrapped_text)
        print(wrapped_text, end = "")
    else:
        length += len(new_text)
        if length >= max_print_width:
            length = 0
            print()
        print(new_text, end = "")
    pass
pass

<s> what are the flanagan motion phases involved in the task- cut the apple, give it in json format,
no further explanation isneeded.The Flanagan Action Model is a conceptual framework used to 
describe and analyze the components of everyday manipulation actions. It breaks down actions into several 
phases, each with distinct characteristics and goals. When applied to the task "cut the apple," the 
Flanagan Action Model identifies the following phases:

1. **Preparation Phase**:
  - **Goal**: Prepare the 
tools and environment for the action.
  - **Components**:
    - Identifying the apple and ensuring it is 
within reach.
    - Retrieving the cutting tool (knife).
    - Ensuring a stable and safe cutting surface 
(e.g., cutting board).

2. **Approach Phase**:
  - **Goal**: Position oneself and the tools to initiate 
the cutting action.
  - **Components**:
    - Moving towards the apple and cutting board.
    - 
Aligning the knife with the intended cutting line on the apple.
    - Adju