To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

This notebook uses the `Llama-3` format for conversation style finetunes. We use [Open Assistant conversations](https://huggingface.co/datasets/philschmid/guanaco-sharegpt-style) in ShareGPT style.

In [1]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.
* [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)

In [2]:
import pandas as pd

In [3]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-Instruct-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Llama patching release 2024.6
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/131 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.1k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/459 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the `Llama-3` format for conversation style finetunes. We use [Open Assistant conversations](https://huggingface.co/datasets/philschmid/guanaco-sharegpt-style) in ShareGPT style. Llama-3 renders multi turn conversations like below:

```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hey there! How are you?<|eot_id|><|start_header_id|>user<|end_header_id|>

I'm great thanks!<|eot_id|>
```

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old` and our own optimized `unsloth` template.

Note ShareGPT uses `{"from": "human", "value" : "Hi"}` and not `{"role": "user", "content" : "Hi"}`, so we use `mapping` to map it.

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [5]:
train_data = pd.read_csv("train_data_finetuning.csv", index_col=False).drop(columns=['Unnamed: 0'])

In [6]:
train_data.head()

Unnamed: 0,User Story,instruction,paraphrased version,prompt,response
0,"As an economist, I want to use hierarchical cl...",increase number of total characters,"As a data-driven economist, I desire to employ...",Based on the following instruction: increase ...,"As a data-driven economist, I desire to employ..."
1,"As an economist, I want to use hierarchical cl...",decrease number of total characters,"""As an economist, I'd like to apply hierarchic...",Based on the following instruction: decrease ...,"""As an economist, I'd like to apply hierarchic..."
2,"As an economist, I want to use hierarchical cl...",increase number of uppercase characters,"**AS AN ECONOMIST**, **I WISH TO EMPLOY HIERAR...",Based on the following instruction: increase ...,"**AS AN ECONOMIST**, **I WISH TO EMPLOY HIERAR..."
3,"As an economist, I want to use hierarchical cl...",decrease number of uppercase characters,"As an economist, i want to organize similar in...",Based on the following instruction: decrease ...,"As an economist, i want to organize similar in..."
4,"As an economist, I want to use hierarchical cl...",don't change number of uppercase characters,"As an expert in economic analysis, I wish to e...",Based on the following instruction: don't chan...,"As an expert in economic analysis, I wish to e..."


In [7]:
def convert_to_conversations(row):
    return [{'from': "human", 'value': row['prompt']}, {"from": "gpt", "value": row["response"]}]

# Apply the function to each row in the DataFrame and create a new column 'Info'
train_data['conversations'] = train_data.apply(convert_to_conversations, axis=1)

In [8]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

from datasets import load_dataset, Dataset
dataset = Dataset.from_pandas(train_data)
dataset = dataset.map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/270 [00:00<?, ? examples/s]

Let's see how the `Llama-3` format works by printing the 5th element

In [9]:
dataset[5]["conversations"]

[{'from': 'human',
  'value': 'Based on the following instruction: decrease  number of lowercase characters. Paraphrase the following user story and output only paraphrased version: As an economist, I want to use hierarchical clustering to group similar economic sectors and industries based on the financial and economic indicators of business data to improve the accuracy and efficiency of economic analysis and prediction.'},
 {'from': 'gpt',
  'value': 'As econ expert, I aim to classify econ sectors using cluster algo based on biz stats to enhance analytic acuracy & speed.'}]

In [10]:
print(dataset[5]["text"])

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Based on the following instruction: decrease  number of lowercase characters. Paraphrase the following user story and output only paraphrased version: As an economist, I want to use hierarchical clustering to group similar economic sectors and industries based on the financial and economic indicators of business data to improve the accuracy and efficiency of economic analysis and prediction.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

As econ expert, I aim to classify econ sectors using cluster algo based on biz stats to enhance analytic acuracy & speed.<|eot_id|>


If you're looking to make your own chat template, that also is possible! You must use the Jinja templating regime. We provide our own stripped down version of the `Unsloth template` which we find to be more efficient, and leverages ChatML, Zephyr and Alpaca styles.

More info on chat templates on [our wiki page!](https://github.com/unslothai/unsloth/wiki#chat-templates)

In [11]:
unsloth_template = \
    "{{ bos_token }}"\
    "{{ 'You are a helpful assistant to the user\n' }}"\
    "{% for message in messages %}"\
        "{% if message['role'] == 'user' %}"\
            "{{ '>>> User: ' + message['content'] + '\n' }}"\
        "{% elif message['role'] == 'assistant' %}"\
            "{{ '>>> Assistant: ' + message['content'] + eos_token + '\n' }}"\
        "{% endif %}"\
    "{% endfor %}"\
    "{% if add_generation_prompt %}"\
        "{{ '>>> Assistant: ' }}"\
    "{% endif %}"
unsloth_eos_token = "eos_token"

if False:
    tokenizer = get_chat_template(
        tokenizer,
        chat_template = (unsloth_template, unsloth_eos_token,), # You must provide a template and EOS token
        mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
        map_eos_token = True, # Maps <|im_end|> to </s> instead
    )

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [12]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

  self.pid = os.fork()


Map (num_proc=2):   0%|          | 0/270 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [13]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
5.594 GB of memory reserved.


In [14]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 270 | Num Epochs = 2
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,3.083
2,3.2355
3,3.1228
4,2.957
5,2.5875
6,2.2684
7,1.8863
8,1.5425
9,1.4836
10,1.3645


In [15]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

291.7352 seconds used for training.
4.86 minutes used for training.
Peak reserved memory = 6.252 GB.
Peak reserved memory for training = 0.658 GB.
Peak reserved memory % of max memory = 42.392 %.
Peak reserved memory for training % of max memory = 4.462 %.


<a name="Inference"></a>
### Inference
Let's run the model! Since we're using `Llama-3`, use `apply_chat_template` with `add_generation_prompt` set to `True` for inference.

In [16]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
)

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"from": "human", "value": "Based on the following instruction: increase  number of total characters.  Paraphrase the following user story and output only paraphrased version: As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  number of total characters.  Paraphrase the following user story and output only paraphrased version: As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs a leading expert in economic analysis, I aim to leverage advanced machine learning techniques, specifically reinforcement learning, to create highly precise and reliable models that effectively capture the intricate patterns and trends within market systems, ultimately enhancing our understanding of economic behavior and decision-making processes.<|eot_id|>']

In [17]:
test_data = pd.read_csv("test_data_finetuning.csv", index_col=False).drop(columns=['Unnamed: 0'])[:25]

In [None]:
def inference(x):
  messages = [
    {"from": "human", "value": x},
  ]
  inputs = tokenizer.apply_chat_template(
      messages,
      tokenize = True,
      add_generation_prompt = True, # Must add for generation
      return_tensors = "pt",
  ).to("cuda")
  outputs = model.generate(input_ids = inputs, max_new_tokens = 150, use_cache = True)
  res = tokenizer.batch_decode(outputs)
  print(x, res)
  return res

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
par_columns = [col for col in test_data.columns if col.startswith('par')]
for index, row in test_data.iterrows():
  print(index)
  for col in par_columns:
    test_data.at[index, col.replace('par', 'llm')] = inference(test_data.loc[index, col])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


0


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  number of total characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  number of total characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs a financial analyst, I aim to leverage machine learning techniques to craft more precise simulations of economic systems and consumer patterns.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  number of total characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  number of total characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"Economist seeks to improve market forecasting using machine learning techniques."<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: don't change  number of total characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: don't change  number of total characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to create more precise simulations of market trends and consumer actions.<|eot_id|>"]


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  number of uppercase characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  number of uppercase characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n**AS AN ECONOMIST**, I **WANT TO UTILIZE REINFORCEMENT LEARNING** TO **DEVELOP MORE ACCURATE MODELS** OF **MARKET DYNAMICS AND BEHAVIOR**.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  number of uppercase characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  number of uppercase characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, i want to apply reinforcement learning techniques to improve our understanding of market trends and consumer behavior.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: don't change  number of uppercase characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: don't change  number of uppercase characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to craft more precise simulations of market trends and consumer actions.<|eot_id|>"]


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  number of lowercase characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  number of lowercase characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nas an economist, i want to leverage reinforcement learning techniques to create more precise models of market trends and consumer behavior.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  number of lowercase characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  number of lowercase characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs econ expert, I need to apply rl tech to create better market models.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: don't change  number of lowercase characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: don't change  number of lowercase characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to create more precise simulations of market trends and consumer actions.<|eot_id|>"]


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  number of special characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  number of special characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs a data-driven analyst, I aim to leverage reinforcement learning techniques to craft more precise simulations of market trends and consumer habits, thereby enhancing predictive accuracy and informing strategic decision-making.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  number of special characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  number of special characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I want to improve market forecasting using machine learning techniques.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: don't change  number of special characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: don't change  number of special characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to craft more precise simulations of market trends and consumer actions.<|eot_id|>"]


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  number of numbers.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  number of numbers.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to enhance the precision of my market models, thereby better understanding and predicting market trends and behaviors.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  number of numbers.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  number of numbers.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage machine learning techniques to improve predictive models of market trends and consumer actions.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: don't change  number of numbers.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: don't change  number of numbers.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to create more precise simulations of market trends and consumer actions.<|eot_id|>"]


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  number of blanks.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  number of blanks.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n**As a _______________ (profession), I want to leverage _______________ (machine learning technique) to create more precise simulations of _______________ (economic phenomenon) and _______________ (human behavior), so that I can better understand and predict _______________ (market trends).**<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  number of blanks.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  number of blanks.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I want to apply reinforcement learning techniques to improve market modeling accuracy.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: don't change  number of blanks.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: don't change  number of blanks.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I want to leverage reinforcement learning techniques to improve the accuracy of my market models and better understand the underlying dynamics of economic systems.<|eot_id|>"]


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  number of words.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  number of words.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs a professional economist, I aim to leverage the capabilities of reinforcement learning to create more precise and reliable models that effectively capture the intricacies of market trends and consumer behavior.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  number of words.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  number of words.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I want to apply reinforcement learning to improve market modeling accuracy.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: don't change  number of words.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: don't change  number of words.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to create more precise simulations of market trends and consumer actions.<|eot_id|>"]


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  average length of words.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  average length of words.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs a seasoned economist, I aim to leverage the capabilities of reinforcement learning to craft more precise and insightful models of market fluctuations and consumer behavior patterns.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  average length of words.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  average length of words.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I need better market models.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: don't change  average length of words.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: don't change  average length of words.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to create more precise simulations of market trends and consumer actions.<|eot_id|>"]


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  number of propositions.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  number of propositions.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to craft more precise simulations of market trends and consumer actions, ultimately enhancing our understanding of economic systems and informing better decision-making.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  number of propositions.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  number of propositions.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I want to improve market forecasting using machine learning techniques.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: don't change  number of propositions.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: don't change  number of propositions.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to create more precise simulations of market trends and consumer actions.<|eot_id|>"]


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  average length of propositions.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  average length of propositions.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist seeking to enhance predictive capabilities, I aim to leverage reinforcement learning techniques to craft more precise simulations of market trends and participant actions.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  average length of propositions.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  average length of propositions.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"Economist seeks to improve market forecasting using machine learning techniques."<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: don't change  average length of propositions.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ["<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: don't change  average length of propositions.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I aim to leverage reinforcement learning techniques to create more precise simulations of market trends and consumer actions.<|eot_id|>"]


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: increase  number of punctuation characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: increase  number of punctuation characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist, I desire to leverage reinforcement learning techniques to craft more precise models of market fluctuations and consumer behavior, thereby enhancing our understanding of economic systems and informing more effective policy decisions.<|eot_id|>']


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Based on the following instruction: decrease  number of punctuation characters.  Paraphrase the following user story and output only paraphrased version: 
As an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior. ['<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nBased on the following instruction: decrease  number of punctuation characters.  Paraphrase the following user story and output only paraphrased version: \nAs an economist, I want to use reinforcement learning to develop more accurate models of market dynamics and behavior.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAs an economist I want to use reinforcement learning to improve market models accuracy<|eot_id|>']


In [None]:
test_data.to_csv('test_data_output_llama3.csv')

In [None]:
from google.colab import files
files.download('test_data_output_llama3.csv')