<a href="https://colab.research.google.com/github/akhilvjose/AI-SE-Planning/blob/main/Llama_3_2__Unsloth_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install

In [2]:
%%capture
# Normally using pip install unsloth is enough

# Temporarily as of Jan 31st 2025, Colab has some issues with Pytorch
# Using pip install unsloth will take 3 minutes, whilst the below takes <1 minute:
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29 peft trl triton
!pip install --no-deps cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
!pip install --no-deps unsloth

# Unsloth

In [3]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [4]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-Instruct", # or choose "unsloth/Llama-3.2-1B-Instruct"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2025.1.8: Fast Llama patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.35G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.7k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

In [5]:
FastLanguageModel.for_inference(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 3072, padding_idx=128004)
    (layers): ModuleList(
      (0): LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=3072, out_features=3072, bias=False)
          (k_proj): Linear4bit(in_features=3072, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=3072, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=3072, out_features=3072, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=3072, out_features=8192, bias=False)
          (up_proj): Linear4bit(in_features=3072, out_features=8192, bias=False)
          (down_proj): Linear4bit(in_features=8192, out_features=3072, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((3072,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((3072,)

Define Promot and set up to de inference

In [14]:
alpaca_prompt = "### Instruction: {instruction}\n### Inout: {input}\n### Response:" # Changed {} to {input}
instruction = "you are a system engineer who knows how to write requirements"
user_input = "write a requirement for ABS"  # Changed variable name to avoid conflict

#process input

formatted_prompt = alpaca_prompt.format(instruction=instruction, input=user_input) # Changed variable name and Now the format method has both keyword arguments

# Tokenize the formatted prompt and move to the same device as the model
input_ids = tokenizer(formatted_prompt, return_tensors="pt").input_ids.to(model.device)

# Generate output
output = model.generate(input_ids, max_new_tokens=200)


print(tokenizer.decode(output[0], skip_special_tokens=True))  # Decode the output and print

### Instruction: you are a system engineer who knows how to write requirements
### Inout: write a requirement for ABS
### Response: Here is a requirement for ABS

**Functional Requirement: Automatic Braking System (ABS)**

The Automatic Braking System (ABS) shall be designed to prevent wheel lockup and maintain traction during hard braking maneuvers, thereby ensuring a safer and more controlled stopping distance.

**Functional Requirements:**

1. **Wheel Speed Monitoring**: The ABS system shall continuously monitor the speed of each wheel and detect any wheel lockup or loss of traction.
2. **Anti-locking Algorithm**: The ABS system shall employ an anti-locking algorithm to rapidly apply and release the brakes to each wheel to maintain traction and prevent wheel lockup.
3. **Braking Control**: The ABS system shall control the braking process to ensure that each wheel is slowed down gradually and evenly, without causing excessive wear on the brakes or wheels.
4. **Driver Feedback**: The 