## Describe your model -> fine-tuned LLaMA 2
By Matt Shumer (https://twitter.com/mattshumer_)

The goal of this notebook is to experiment with a new way to make it very easy to build a task-specific model for your use-case.

First, use the best GPU available (go to Runtime -> change runtime type)

To create your model, just go to the first code cell, and describe the model you want to build in the prompt. Be descriptive and clear.

Select a temperature (high=creative, low=precise), and the number of training examples to generate to train the model. From there, just run all the cells.

You can change the model you want to fine-tune by changing `model_name` in the `Define Hyperparameters` cell.

#Data generation step

Write your prompt here. Make it as descriptive as possible!

Then, choose the temperature (between 0 and 1) to use when generating data. Lower values are great for precise tasks, like writing code, whereas larger values are better for creative tasks, like writing stories.

Finally, choose how many examples you want to generate. The more you generate, a) the longer it takes and b) the more expensive data generation will be. But generally, more examples will lead to a higher-quality model. 100 is usually the minimum to start.

In [1]:
from huggingface_hub import login
# Your Hugging Face token
HF_TOKEN = "########################"

# Authenticate with Hugging Face
login(token=HF_TOKEN)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [2]:
prompt = "You are an expert software developer in Rust. You always want to improve your code to have higher quality. You have to generate an output that follows good coding standards and makes a working program."

Run this to generate the dataset.

In [3]:
!pip install openai



In [4]:
import os
import random
from openai import AzureOpenAI

# Azure OpenAI configuration
api_key = '#################'
model = 'gpt-3.5-turbo'
api_version = '#######'
azure_endpoint = '###########################'  

client = AzureOpenAI(
    api_key=api_key,
    api_version=api_version,
    azure_endpoint=azure_endpoint
)

def generate_example(prompt, prev_examples, temperature=0.5):
    messages = [
        {
            "role": "system",
            "content": f"You are generating data which will be used to train a machine learning model.\n\nYou will be given a high-level description of the model we want to train, and from that, you will generate data samples, each with a prompt/response pair.\n\nYou will do so in this format:\n```\nprompt\n-----------\n$prompt_goes_here\n-----------\n\nresponse\n-----------\n$response_goes_here\n-----------\n```\n\nOnly one prompt/response pair should be generated per turn.\n\nFor each turn, make the example slightly more complex than the last, while ensuring diversity.\n\nMake sure your samples are unique and diverse, yet high-quality and complex enough to train a well-performing model.\n\nHere is the type of model we want to train:\n`{prompt}`"
        }
    ]

    if len(prev_examples) > 0:
        if len(prev_examples) > 10:
            prev_examples = random.sample(prev_examples, 10)
        for example in prev_examples:
            messages.append({
                "role": "assistant",
                "content": example
            })

    try:
        response = client.chat.completions.create(
            model="#####################", 
            messages=messages,
            temperature=temperature,
            max_tokens=1354
        )
        return response.choices[0].message.content
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

number_of_examples = 5  # or whatever number you want
temperature = 0.7  
prompt = "You are an expert software developer in Rust. You always want to improve your code to have higher quality. You have to generate an output that follows good coding standards and makes a working program."  # replace with your actual prompt

# Generate examples
prev_examples = []
for i in range(number_of_examples):
    print(f'Generating example {i}')
    example = generate_example(prompt, prev_examples, temperature)
    if example:
        prev_examples.append(example)
    else:
        print(f"Failed to generate example {i}")

print(prev_examples)

Generating example 0
Generating example 1
Generating example 2
Generating example 3
Generating example 4
['prompt\n-----------\nHow can I implement a basic HTTP server in Rust using the Hyper crate?\n-----------\n\nresponse\n-----------\nHere\'s a simple example of an HTTP server implemented in Rust using the Hyper crate:\n\n```rust\nuse hyper::service::{make_service_fn, service_fn};\nuse hyper::{Body, Request, Response, Server};\nuse std::net::SocketAddr;\n\nasync fn handle_request(_req: Request<Body>) -> Result<Response<Body>, hyper::Error> {\n    Ok(Response::new(Body::from("Hello, World!")))\n}\n\n#[tokio::main]\nasync fn main() {\n    let addr = SocketAddr::from(([127, 0, 0, 1], 3000));\n\n    let make_svc = make_service_fn(|_conn| {\n        async { Ok::<_, hyper::Error>(service_fn(handle_request)) }\n    });\n\n    let server = Server::bind(&addr).serve(make_svc);\n    \n    println!("Listening on http://{}", addr);\n\n    if let Err(e) = server.await {\n        eprintln!("serve

We also need to generate a system message.

In [5]:
def generate_system_message(prompt):
    try:
        response = client.chat.completions.create(
            model="aoi-pickme-4o-driverReg-dev-001",
            messages=[
                {
                    "role": "system",
                    "content": "You will be given a high-level description of the model we are training, and from that, you will generate a simple system prompt for that model to use. Remember, you are not generating the system message for data generation -- you are generating the system message to use for inference. A good format to follow is `Given $INPUT_DATA, you will $WHAT_THE_MODEL_SHOULD_DO`.\n\nMake it as concise as possible. Include nothing but the system prompt in your response.\n\nFor example, never write: `\"$SYSTEM_PROMPT_HERE\"`.\n\nIt should be like: `$SYSTEM_PROMPT_HERE`."
                },
                {
                    "role": "user",
                    "content": prompt.strip(),
                }
            ],
            temperature=temperature,
            max_tokens=500,
        )
        return response.choices[0].message.content
    except Exception as e:
        print(f"An error occurred: {e}")
        return None


temperature = 0.7  

system_message = generate_system_message(prompt)

if system_message:
    print(f'The system message is: `{system_message}`. Feel free to re-run this cell if you want a better result.')
else:
    print("Failed to generate system message.")

The system message is: `Given the provided code, you will improve it to follow good coding standards and ensure it is a working Rust program.`. Feel free to re-run this cell if you want a better result.


Now let's put our examples into a dataframe and turn them into a final pair of datasets.

In [6]:
import pandas as pd


prompts = []
responses = []

# Parse out prompts and responses from examples
for example in prev_examples:
  try:
    split_example = example.split('-----------')
    prompts.append(split_example[1].strip())
    responses.append(split_example[3].strip())
  except:
    pass

# Create a DataFrame
df = pd.DataFrame({
    'prompt': prompts,
    'response': responses
})

# Remove duplicates
df = df.drop_duplicates()

print('There are ' + str(len(df)) + ' successfully-generated examples. Here are the first few:')

df.head()

There are 5 successfully-generated examples. Here are the first few:


Unnamed: 0,prompt,response
0,How can I implement a basic HTTP server in Rus...,Here's a simple example of an HTTP server impl...
1,Can you show me how to create and use custom d...,Certainly! Here's an example of creating a cus...
2,How can I use the `tokio` crate to perform asy...,Certainly! Below is an example of using the `t...
3,Can you provide an example of how to use the S...,Certainly! Here's an example demonstrating how...
4,Can you show me how to use Rust's `std::thread...,Certainly! Here's an example demonstrating how...


Split into train and test sets.

In [7]:
# Split the data into train and test sets, with 90% in the train set
train_df = df.sample(frac=0.9, random_state=42)
test_df = df.drop(train_df.index)

# Save the dataframes to .jsonl files
train_df.to_json('train.jsonl', orient='records', lines=True)
test_df.to_json('test.jsonl', orient='records', lines=True)

# Install necessary libraries

In [8]:
!pip install pyarrow==16.1.0
!pip install --upgrade bigframes ibis-framework sqlglot
!pip install cudf-cu12=24.8.3
!pip install -U datasets

[31mERROR: Invalid requirement: 'pyarrow=16.1.0': Expected end or semicolon (after name and no valid version specifier)
    pyarrow=16.1.0
           ^
Hint: = is not a valid operator. Did you mean == ?[0m[31m
Collecting ibis-framework
  Using cached ibis_framework-9.5.0-py3-none-any.whl.metadata (17 kB)
Collecting sqlglot
  Using cached sqlglot-25.24.0-py3-none-any.whl.metadata (19 kB)
[31mERROR: Invalid requirement: 'cudf-cu12=24.8.3': Expected end or semicolon (after name and no valid version specifier)
    cudf-cu12=24.8.3
             ^
Hint: = is not a valid operator. Did you mean == ?[0m[31m


In [11]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

# Define Hyperparameters

In [13]:
model_name = "meta-llama/Llama-2-7b-chat-hf" # use this if you have access to the official LLaMA 2 model "meta-llama/Llama-2-7b-chat-hf", though keep in mind you'll need to pass a Hugging Face key argument
dataset_name = "/content/train.jsonl"
new_model = "llama-2-7b-custom"
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
output_dir = "./results"
num_train_epochs = 1
fp16 = False
bf16 = False
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 1
gradient_checkpointing = True
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
save_steps = 25
logging_steps = 5
max_seq_length = None
packing = False
device_map = {"": 0}

#Load Datasets and Train

In [14]:
# Load datasets
train_dataset = load_dataset('json', data_files='/content/train.jsonl', split="train", cache_dir=None)

valid_dataset = load_dataset('json', data_files='/content/test.jsonl', split="train", cache_dir='/tmp/dataset_cache')

# Preprocess datasets
train_dataset_mapped = train_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)
valid_dataset_mapped = valid_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)
model.config.use_cache = False
model.config.pretraining_tp = 1
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)
# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="all",
    evaluation_strategy="steps",
    eval_steps=5  # Evaluate every 20 steps
)
# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset_mapped,
    eval_dataset=valid_dataset_mapped,  # Pass validation dataset here
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)
trainer.train()
trainer.model.save_pretrained(new_model)

# Cell 4: Test the model
logging.set_verbosity(logging.CRITICAL)
prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\nWrite a function that reverses a string. [/INST]" # replace the command here with something relevant to your task
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(prompt)
print(result[0]['generated_text'])

Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/4 [00:00<?, ? examples/s]

Map:   0%|          | 0/1 [00:00<?, ? examples/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]



Map:   0%|          | 0/4 [00:00<?, ? examples/s]

Map:   0%|          | 0/1 [00:00<?, ? examples/s]

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss




[INST] <<SYS>>
Given the provided code, you will improve it to follow good coding standards and ensure it is a working Rust program.
<</SYS>>

Write a function that reverses a string. [/INST]  Sure! Here's an improved version of the code that follows good coding standards and includes a function to reverse a string:
```
use std::fmt::Display;

fn reverse_string(s: &str) -> String {
    // Implement the reverse function here
    let mut rev = String::new();
    for c in s.char_at_a_time() {
        rev.push(s.char_at_a_time().rev().next().unwrap());
    }
    rev
}

fn main() {
    let s = "hello";
    println!("The reversed string is: {}", reverse_string(&


#Run Inference

In [15]:
from transformers import pipeline

prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\nWrite a function that reverses a string. [/INST]" # replace the command here with something relevant to your task
num_new_tokens = 100  # change to the number of new tokens you want to generate

# Count the number of tokens in the prompt
num_prompt_tokens = len(tokenizer(prompt)['input_ids'])

# Calculate the maximum length for the generation
max_length = num_prompt_tokens + num_new_tokens

gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)
result = gen(prompt)
print(result[0]['generated_text'].replace(prompt, ''))

  Sure! Here's an improved version of the code that follows good coding standards and includes a function to reverse a string:
```
use std::fmt::Display;

fn reverse_string(s: &str) -> String {
    // Implement the reverse function here
    let mut rev = String::new();
    for c in s.char_at_a_time() {
        rev.push(s.char_at_a_time


In [16]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"


In [17]:
import torch
torch.cuda.empty_cache()


#Merge the model and store it

In [18]:
# Merge and save the fine-tuned model

model_path = "/content/llama-2-7b-custom"  

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Save the merged model
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  adapters_weights = torch.load(


('/content/llama-2-7b-custom/tokenizer_config.json',
 '/content/llama-2-7b-custom/special_tokens_map.json',
 '/content/llama-2-7b-custom/tokenizer.model',
 '/content/llama-2-7b-custom/added_tokens.json',
 '/content/llama-2-7b-custom/tokenizer.json')

# Load a fine-tuned model from Location and run inference

In [19]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "/content/llama-2-7b-custom"  # change to the path where your model is saved

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,  # Use half-precision for memory efficiency
    device_map="auto"  # Automatically distribute model across available GPUs
)
# model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  return torch.load(checkpoint_file, map_location="cpu")


In [24]:
from transformers import pipeline, set_seed

# Set a seed for reproducibility
set_seed(42)

# Initialize the generator
gen = pipeline('text-generation', model=model, tokenizer=tokenizer)

prompt = "Create a calculator App from Rust that i can run on the CLI"
params = {
    "max_new_tokens": 10000,  # Adjust this number as needed
    "num_return_sequences": 1,
    "no_repeat_ngram_size": 2,
    "do_sample": True,
    "temperature": 0.7
}

# Generate text
result = gen(prompt, **params)

# Print the generated text
print(result[0]['generated_text'])

Create a calculator App from Rust that i can run on the CLI.

I have created a simple calculators that can perform basic mathematical operations such as addition,subtraction, multiplication and division. 
My question is how can i create a CLI application that will run the calculater and allow the user to input numbers and perform calculations. I am very new to rust and i am not sure how to go about this. any guidance will be greatly appreciated. Below is my code for the simple calculator
```
fn main() {
    println!("Welcome to the Simple Calculator!");
    
   let mut num1 = String::new();
  letmut num2 =String:: new();  // create two variables for user input
let mut result =  String ::new (); //create a variable for result
     
 // prompt user for input 1 and 2
print!("Enter the first number: ");
std::io::stdin().read_line(&mutnum1).unwrap(); // read user's input into num  variable
println!(" Enter the second number : ");  std:: io:: stdin().readlined(& mutnum2).unck();//read user's