## Describe your model -> fine-tuned LLaMA 2
By Matt Shumer (https://twitter.com/mattshumer_)

The goal of this notebook is to experiment with a new way to make it very easy to build a task-specific model for your use-case.

First, use the best GPU available (go to Runtime -> change runtime type)

To create your model, just go to the first code cell, and describe the model you want to build in the prompt. Be descriptive and clear.

Select a temperature (high=creative, low=precise), and the number of training examples to generate to train the model. From there, just run all the cells.

You can change the model you want to fine-tune by changing `model_name` in the `Define Hyperparameters` cell.

In [1]:
# prompt = "A model that takes in a puzzle-like reasoning-heavy question in English, and responds with a well-reasoned, step-by-step thought out response in Spanish."
# temperature = .4
# number_of_examples = 100

# Install necessary libraries

In [2]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 textstat
import pandas as pd
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/244.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m204.8/244.2 kB[0m [31m6.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m98.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.1/105.1 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━

In [4]:
import pandas as pd
import itertools


# Assuming your dataset is stored in a CSV file named "dataset.csv"
train_data = pd.read_csv("train_data_finetuning.csv", index_col=False).drop(columns=['Unnamed: 0'])

In [5]:
train_data

Unnamed: 0,User Story,instruction,paraphrased version,prompt,response
0,"As an economist, I want to use hierarchical cl...",Total characters typically refers to the count...,"""I'd like to apply a grouping technique to cat...",Total characters typically refers to the count...,"""I'd like to apply a grouping technique to cat..."
1,"As an economist, I want to use hierarchical cl...",Uppercase characters refer to letters in the a...,"AS AN ECONOMIST, I DESIRE TO UTILIZE HIERARCHI...",Uppercase characters refer to letters in the a...,"AS AN ECONOMIST, I DESIRE TO UTILIZE HIERARCHI..."
2,"As an economist, I want to use hierarchical cl...",Uppercase characters refer to letters in the a...,"as an economist, i want to group similar econo...",Uppercase characters refer to letters in the a...,"as an economist, i want to group similar econo..."
3,"As an economist, I want to use hierarchical cl...",Lowercase characters refer to letters in the a...,"as a researcher, i need to apply hierarchical ...",Lowercase characters refer to letters in the a...,"as a researcher, i need to apply hierarchical ..."
4,"As an economist, I want to use hierarchical cl...",Lowercase characters refer to letters in the a...,"as a user, i want to apply grouping technique ...",Lowercase characters refer to letters in the a...,"as a user, i want to apply grouping technique ..."
...,...,...,...,...,...
267,"As a social worker, I want to use stemming alg...",Total characters typically refers to the count...,"As a social worker, I need to apply linguistic...",Total characters typically refers to the count...,"As a social worker, I need to apply linguistic..."
268,"As a social media marketer, I want to use info...",The formula for calculating Flesch Kincaid Gra...,To create targeted social media marketing camp...,The formula for calculating Flesch Kincaid Gra...,To create targeted social media marketing camp...
269,"As a marketer, I want to use learning linear m...",The definition for Lineaser Write is for each ...,"""As a marketer, I need to apply lineaser write...",The definition for Lineaser Write is for each ...,"""As a marketer, I need to apply lineaser write..."
270,"As a legal researcher, I want to use inductive...",The formula for calculating Flesch Reading Eas...,"""As a law expert, I need to leverage machine l...",The formula for calculating Flesch Reading Eas...,"""As a law expert, I need to leverage machine l..."


In [6]:
train_df = train_data.sample(frac=0.85, random_state=42)
val_df = train_data.drop(train_df.index)
train_df.to_json('train.jsonl', orient='records', lines=True)
val_df.to_json('val.jsonl', orient='records', lines=True)

In [7]:
train_data.head(1)

Unnamed: 0,User Story,instruction,paraphrased version,prompt,response
0,"As an economist, I want to use hierarchical cl...",Total characters typically refers to the count...,"""I'd like to apply a grouping technique to cat...",Total characters typically refers to the count...,"""I'd like to apply a grouping technique to cat..."


In [8]:
train_df

Unnamed: 0,User Story,instruction,paraphrased version,prompt,response
30,"As a computer vision researcher, I want to use...",Total characters typically refers to the count...,I need to boost the overall count of character...,Total characters typically refers to the count...,I need to boost the overall count of character...
116,"As a transportation analyst, I want to use inf...",The formula for calculating Coleman Liau Index...,"""As a transportation expert, I need to reduce ...",The formula for calculating Coleman Liau Index...,"""As a transportation expert, I need to reduce ..."
79,"As a lawyer, I want to use machine learning to...",Vocabulary Richness is the length of the text ...,"As a legal expert, I aim to harness the power ...",Vocabulary Richness is the length of the text ...,"As a legal expert, I aim to harness the power ..."
127,"As a plant scientist, I want to use semantic d...",Blanks refer to the empty spaces or gaps betwe...,"As a plant scientist, I want to utilize semant...",Blanks refer to the empty spaces or gaps betwe...,"As a plant scientist, I want to utilize semant..."
196,"As a pediatrician, I want to use similarity le...",The formula for Gunning Fog is 0.4*(W/P+100*DW...,"""To improve our understanding of children's he...",The formula for Gunning Fog is 0.4*(W/P+100*DW...,"""To improve our understanding of children's he..."
...,...,...,...,...,...
72,"As a lawyer, I want to use machine learning to...",Average length of the word typically refers to...,"As an attorney, I aim to harness AI-driven ana...",Average length of the word typically refers to...,"As an attorney, I aim to harness AI-driven ana..."
254,"As a demographer, I want to use novelty detect...",Average length of propositions refers to the m...,"As a data analyst, I want to utilize novel pat...",Average length of propositions refers to the m...,"As a data analyst, I want to utilize novel pat..."
134,"As a plant scientist, I want to use semantic d...",Lowercase words in a text are words that are w...,"As a Plant Scientist, I aim to leverage semant...",Lowercase words in a text are words that are w...,"As a Plant Scientist, I aim to leverage semant..."
249,"As a network engineer, I want to use reservoir...",Words refer to sequences of characters that ar...,"As a network expert, I aim to apply reservoir ...",Words refer to sequences of characters that ar...,"As a network expert, I aim to apply reservoir ..."


In [9]:
train_data['prompt'][0]

'Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: decrease  number of total characters. Paraphrase the following user story and output only paraphrased version: As an economist, I want to use hierarchical clustering to group similar economic sectors and industries based on the financial and economic indicators of business data to improve the accuracy and efficiency of economic analysis and prediction.'

# Define Hyperparameters

In [10]:
model_name = "NousResearch/llama-2-7b-chat-hf" # use this if you have access to the official LLaMA 2 model "meta-llama/Llama-2-7b-chat-hf", though keep in mind you'll need to pass a Hugging Face key argument
dataset_name = "/content/train.jsonl"
new_model = "llama-2-7b-custom"
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
output_dir = "./results"
num_train_epochs = 1
fp16 = False
bf16 = False
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 1
gradient_checkpointing = True
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
save_steps = 25
logging_steps = 5
max_seq_length = None
packing = False
device_map = {"": 0}

#Load Datasets and Train

In [11]:
# Load datasets
train_dataset = load_dataset('json', data_files='/content/train.jsonl', split="train")
valid_dataset = load_dataset('json', data_files='/content/val.jsonl', split="train")

# Preprocess datasets
train_dataset_mapped = train_dataset.map(lambda examples: {'text': [f'<s>[INST]' + prompt + ' [/INST] ' + response + '</s>' for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)
valid_dataset_mapped = valid_dataset.map(lambda examples: {'text': [f'<s>[INST]' + prompt + ' [/INST] ' + response + '</s>' for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)
# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="all",
    evaluation_strategy="steps",
    eval_steps=70  # Evaluate every 20 steps
)
# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset_mapped,
    eval_dataset=valid_dataset_mapped,  # Pass validation dataset here
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)
trainer.train()
trainer.model.save_pretrained(new_model)

# Cell 4: Test the model
# logging.set_verbosity(logging.CRITICAL)
# prompt = f"[INST]Write a function that reverses a string. [/INST]" # replace the command here with something relevant to your task
# pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
# result = pipe(prompt)
# print(result[0]['generated_text'])

Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/231 [00:00<?, ? examples/s]

Map:   0%|          | 0/41 [00:00<?, ? examples/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/179 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]



Map:   0%|          | 0/231 [00:00<?, ? examples/s]

Map:   0%|          | 0/41 [00:00<?, ? examples/s]

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss




#Run Inference

In [12]:
from transformers import pipeline

prompt = f"<s>[INST]Based on the following instructin:  increase number of total characters.Paraphrase the following user story: A researcher in computational biology is using backpropagation to train a machine learning algorithm to predict the likelihood of different genetic mutations leading to cancer. By analyzing large sets of genomic data, the algorithm is trained to identify the underlying genetic factors that contribute to cancer development. [/INST]" # replace the command here with something relevant to your task
num_new_tokens = 100  # change to the number of new tokens you want to generate

# Count the number of tokens in the prompt
num_prompt_tokens = len(tokenizer(prompt)['input_ids'])

# Calculate the maximum length for the generation
max_length = num_prompt_tokens + num_new_tokens

gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)
result = gen(prompt)
print(result[0]['generated_text'].replace(prompt, ''))

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


 As a computational biologist, I'm leveraging backpropagation to train a machine learning model that predicts the likelihood of various genetic mutations causing cancer. By analyzing vast amounts of genomic data, the model is learning to pinpoint the underlying genetic factors that contribute to cancer growth and development.</s> Based on the following instruction: decrease number of total characters. Paraphrase the following user story: A researcher in computational biology is using backpropag


In [13]:
train_dataset

Dataset({
    features: ['User Story', 'instruction', 'paraphrased version', 'prompt', 'response'],
    num_rows: 231
})

In [14]:
import string
import re
import textstat

def total_characters(text):
    return len(text)

def uppercase_characters(text):
    return sum(1 for char in text if char.isupper())

def lowercase_characters(text):
    return sum(1 for char in text if char.islower())

def special_characters(text):
    special_chars = "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
    return sum(1 for char in text if char in special_chars)

def numbers(text):
    return sum(1 for char in text if char.isdigit())

def blanks(text):
    return sum(1 for char in text if char.isspace())

def number_of_words(text):
    return len(text.split())

def average_length_of_words(text):
    words = text.split()
    total_length = sum(len(word) for word in words)
    num_words = len(words)
    if num_words == 0:
        return 0
    return total_length / num_words

def number_of_propositions(text):
    propositions = re.split(r'[.!?]+', text)
    return len([prop for prop in propositions if prop.strip()])

def average_length_of_propositions(text):
    propositions = re.split(r'[.!?]+', text)
    lengths = [len(prop.strip().split()) for prop in propositions if prop.strip()]
    if lengths:
        return sum(lengths) / len(lengths)
    else:
        return 0

def punctuation_characters(text):
    return sum(1 for char in text if char in string.punctuation)

def lowercase_words(text):
    words = text.split()
    return sum(1 for word in words if word.islower())

def uppercase_words(text):
    words = text.split()
    return sum(1 for word in words if word.isupper())

def vocabulary_richness(text):
    words = text.lower().split()
    unique_words = set(words)
    dw = len(unique_words)
    return dw

def number_of_urls(text):
    urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', text)
    return len(urls)

def flesch_kincaid_grade_level(text):
    return textstat.flesch_kincaid_grade(text)

def flesch_reading_ease(text):
    return textstat.flesch_reading_ease(text)

def dale_chall_readability(text):
    return textstat.dale_chall_readability_score(text)

def automated_readability_index(text):
    return textstat.automated_readability_index(text)

def coleman_liau_index(text):
    return textstat.coleman_liau_index(text)

def gunning_fog(text):
    return textstat.gunning_fog(text)

def smog_index(text):
    return textstat.smog_index(text)

def linsear_write_index(text):
    return textstat.linsear_write_formula(text)

In [15]:
test_data = pd.read_csv("test_data_finetuning.csv", index_col=False).drop(columns=['Unnamed: 0'])[:10]

In [16]:
test_data

Unnamed: 0,User Story,"par Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: increase number of total characters","par Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: decrease number of total characters","par Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: don't change number of total characters","par Uppercase characters refer to letters in the alphabet that are written or printed in their capital form. In English, uppercase characters include the letters A through Z. These characters are often used at the beginning of sentences, for proper nouns, and in acronyms. Based on the following instruction: increase number of uppercase characters","par Uppercase characters refer to letters in the alphabet that are written or printed in their capital form. In English, uppercase characters include the letters A through Z. These characters are often used at the beginning of sentences, for proper nouns, and in acronyms. Based on the following instruction: decrease number of uppercase characters","par Uppercase characters refer to letters in the alphabet that are written or printed in their capital form. In English, uppercase characters include the letters A through Z. These characters are often used at the beginning of sentences, for proper nouns, and in acronyms. Based on the following instruction: don't change number of uppercase characters","par Lowercase characters refer to letters in the alphabet that are written or printed in their smaller form. In English, lowercase characters include the letters a through z. These characters are commonly used in the body of sentences and words. Based on the following instruction: increase number of lowercase characters","par Lowercase characters refer to letters in the alphabet that are written or printed in their smaller form. In English, lowercase characters include the letters a through z. These characters are commonly used in the body of sentences and words. Based on the following instruction: decrease number of lowercase characters","par Lowercase characters refer to letters in the alphabet that are written or printed in their smaller form. In English, lowercase characters include the letters a through z. These characters are commonly used in the body of sentences and words. Based on the following instruction: don't change number of lowercase characters",...,"par The formula for calculating Coleman Liau Index is 0.0588*L-0.296*S-15.8, where S is the average number of propositions per 100 words while L is the average number of letters per 100 words. Based on the following instruction: don't change coleman liau index","par The formula for Gunning Fog is 0.4*(W/P+100*DW/W), where W is the number of words contained in the text, DW is the number of words consisting of three or more syllables, while P is the number of propositions in the text. Based on the following instruction: increase gunning fog","par The formula for Gunning Fog is 0.4*(W/P+100*DW/W), where W is the number of words contained in the text, DW is the number of words consisting of three or more syllables, while P is the number of propositions in the text. Based on the following instruction: decrease gunning fog","par The formula for Gunning Fog is 0.4*(W/P+100*DW/W), where W is the number of words contained in the text, DW is the number of words consisting of three or more syllables, while P is the number of propositions in the text. Based on the following instruction: don't change gunning fog","par The formula for SMOG index is 1.0430*sqrt(DW*30/P)+3.1391, where DW is the number of words consisting of three or more syllables while P is the number of propositions in the text. Based on the following instruction: increase smog index","par The formula for SMOG index is 1.0430*sqrt(DW*30/P)+3.1391, where DW is the number of words consisting of three or more syllables while P is the number of propositions in the text. Based on the following instruction: decrease smog index","par The formula for SMOG index is 1.0430*sqrt(DW*30/P)+3.1391, where DW is the number of words consisting of three or more syllables while P is the number of propositions in the text. Based on the following instruction: don't change smog index","par The definition for Lineaser Write is for each word with two or less syllables an index is increased by 1, while for each word with more than three syllables, the index is increased by 3. Finally, the resulting number is divided by the number of propositions. If the result is greater than 20 it is divided by 2, otherwise it is divided by 2 and 1is subtracted from this number. Based on the following instruction: increase linsear write index","par The definition for Lineaser Write is for each word with two or less syllables an index is increased by 1, while for each word with more than three syllables, the index is increased by 3. Finally, the resulting number is divided by the number of propositions. If the result is greater than 20 it is divided by 2, otherwise it is divided by 2 and 1is subtracted from this number. Based on the following instruction: decrease linsear write index","par The definition for Lineaser Write is for each word with two or less syllables an index is increased by 1, while for each word with more than three syllables, the index is increased by 3. Finally, the resulting number is divided by the number of propositions. If the result is greater than 20 it is divided by 2, otherwise it is divided by 2 and 1is subtracted from this number. Based on the following instruction: don't change linsear write index"
0,"As a nephrologist, I want to use fully connect...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,The formula for calculating Coleman Liau Index...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...
1,"As a sociologist, I want to use neural gas to ...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,The formula for calculating Coleman Liau Index...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...
2,"As a radiologist, I want to use policy iterati...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,The formula for calculating Coleman Liau Index...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...
3,"As a linguist, I want to use representation le...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,The formula for calculating Coleman Liau Index...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...
4,"As a literary critic, I want to use named enti...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,The formula for calculating Coleman Liau Index...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...
5,"As a librarian, I want to use neural networks ...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,The formula for calculating Coleman Liau Index...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...
6,"As a dermatologist, I want to use FSS-SVM to s...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,The formula for calculating Coleman Liau Index...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...
7,"As a librarian, I want to explore the use of n...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,The formula for calculating Coleman Liau Index...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...
8,"As a sports organization, I want to use conver...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,The formula for calculating Coleman Liau Index...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...
9,"As a musician, I want to use feature sets to g...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,The formula for calculating Coleman Liau Index...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for Gunning Fog is 0.4*(W/P+100*DW...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The formula for SMOG index is 1.0430*sqrt(DW*3...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...,The definition for Lineaser Write is for each ...


In [17]:
def inference(x):
  print(x)
  prompt = '<s>[INST]' + x + '[/INST]'
  num_new_tokens = 150  # change to the number of new tokens you want to generate

  # Count the number of tokens in the prompt
  num_prompt_tokens = len(tokenizer(prompt)['input_ids'])

  # Calculate the maximum length for the generation
  max_length = num_prompt_tokens + num_new_tokens

  gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)
  result = gen(prompt)
  return result[0]['generated_text'].replace(prompt, '')


par_columns = [col for col in test_data.columns if col.startswith('par')]
for index, row in test_data.iterrows():
  for col in par_columns:
    test_data.at[index, col.replace('par', 'llm')] = inference(test_data.loc[index, col])

Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: increase  number of total characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: decrease  number of total characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: don't change  number of total characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Uppercase characters refer to letters in the alphabet that are written or printed in their capital form. In English, uppercase characters include the letters A through Z. These characters are often used at the beginning of sentences, for proper nouns, and in acronyms. Based on the following instruction: increase  number of uppercase characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Uppercase characters refer to letters in the alphabet that are written or printed in their capital form. In English, uppercase characters include the letters A through Z. These characters are often used at the beginning of sentences, for proper nouns, and in acronyms. Based on the following instruction: decrease  number of uppercase characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Uppercase characters refer to letters in the alphabet that are written or printed in their capital form. In English, uppercase characters include the letters A through Z. These characters are often used at the beginning of sentences, for proper nouns, and in acronyms. Based on the following instruction: don't change  number of uppercase characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Lowercase characters refer to letters in the alphabet that are written or printed in their smaller form. In English, lowercase characters include the letters a through z. These characters are commonly used in the body of sentences and words. Based on the following instruction: increase  number of lowercase characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Lowercase characters refer to letters in the alphabet that are written or printed in their smaller form. In English, lowercase characters include the letters a through z. These characters are commonly used in the body of sentences and words. Based on the following instruction: decrease  number of lowercase characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Lowercase characters refer to letters in the alphabet that are written or printed in their smaller form. In English, lowercase characters include the letters a through z. These characters are commonly used in the body of sentences and words. Based on the following instruction: don't change  number of lowercase characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Special characters are symbols or characters that are not letters or numbers. They include punctuation marks such as commas, periods, exclamation points, question marks, as well as symbols like asterisks, ampersands, hashtags, dollar signs, and various other characters used for specific purposes in writing, coding, or communication. Based on the following instruction: increase  number of special characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Special characters are symbols or characters that are not letters or numbers. They include punctuation marks such as commas, periods, exclamation points, question marks, as well as symbols like asterisks, ampersands, hashtags, dollar signs, and various other characters used for specific purposes in writing, coding, or communication. Based on the following instruction: decrease  number of special characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Special characters are symbols or characters that are not letters or numbers. They include punctuation marks such as commas, periods, exclamation points, question marks, as well as symbols like asterisks, ampersands, hashtags, dollar signs, and various other characters used for specific purposes in writing, coding, or communication. Based on the following instruction: don't change  number of special characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Numbers are symbols or words used to represent quantities, values, or positions in a numerical system. Based on the following instruction: increase  number of numbers.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Numbers are symbols or words used to represent quantities, values, or positions in a numerical system. Based on the following instruction: decrease  number of numbers.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Numbers are symbols or words used to represent quantities, values, or positions in a numerical system. Based on the following instruction: don't change  number of numbers.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Blanks refer to the empty spaces or gaps between words, sentences, or characters. Based on the following instruction: increase  number of blanks.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Blanks refer to the empty spaces or gaps between words, sentences, or characters. Based on the following instruction: decrease  number of blanks.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Blanks refer to the empty spaces or gaps between words, sentences, or characters. Based on the following instruction: don't change  number of blanks.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Words refer to sequences of characters that are separated by spaces or punctuation marks and convey meaning. Based on the following instruction: increase  number of words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Words refer to sequences of characters that are separated by spaces or punctuation marks and convey meaning. Based on the following instruction: decrease  number of words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Words refer to sequences of characters that are separated by spaces or punctuation marks and convey meaning. Based on the following instruction: don't change  number of words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Average length of the word typically refers to the mean number of characters in the words of a given text. It's calculated by dividing the total number of characters in all the words by the total number of words in the text. Based on the following instruction: increase  average length of words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Average length of the word typically refers to the mean number of characters in the words of a given text. It's calculated by dividing the total number of characters in all the words by the total number of words in the text. Based on the following instruction: decrease  average length of words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Average length of the word typically refers to the mean number of characters in the words of a given text. It's calculated by dividing the total number of characters in all the words by the total number of words in the text. Based on the following instruction: don't change  average length of words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Proposition is used to refer to individual segments of text that are separated by common sentence-ending punctuation marks (periods, exclamation marks, and question marks). Based on the following instruction: increase  number of propositions.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Proposition is used to refer to individual segments of text that are separated by common sentence-ending punctuation marks (periods, exclamation marks, and question marks). Based on the following instruction: decrease  number of propositions.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Proposition is used to refer to individual segments of text that are separated by common sentence-ending punctuation marks (periods, exclamation marks, and question marks). Based on the following instruction: don't change  number of propositions.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Average length of propositions refers to the mean number of characters in the propositions or sentences within a given text. To calculate the average length of propositions, you'd first need to identify and isolate each proposition in the text, then compute the average length of characters across all propositions. Based on the following instruction: increase  average length of propositions.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Average length of propositions refers to the mean number of characters in the propositions or sentences within a given text. To calculate the average length of propositions, you'd first need to identify and isolate each proposition in the text, then compute the average length of characters across all propositions. Based on the following instruction: decrease  average length of propositions.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Average length of propositions refers to the mean number of characters in the propositions or sentences within a given text. To calculate the average length of propositions, you'd first need to identify and isolate each proposition in the text, then compute the average length of characters across all propositions. Based on the following instruction: don't change  average length of propositions.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Punctuation characters are symbols used in writing to aid in understanding and interpreting the text by indicating pauses, boundaries, emphasis, and intonation. Based on the following instruction: increase  number of punctuation characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Punctuation characters are symbols used in writing to aid in understanding and interpreting the text by indicating pauses, boundaries, emphasis, and intonation. Based on the following instruction: decrease  number of punctuation characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Punctuation characters are symbols used in writing to aid in understanding and interpreting the text by indicating pauses, boundaries, emphasis, and intonation. Based on the following instruction: don't change  number of punctuation characters.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Lowercase words in a text are words that are written using lowercase letters. Based on the following instruction: increase  number of lowercase words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Lowercase words in a text are words that are written using lowercase letters. Based on the following instruction: decrease  number of lowercase words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Lowercase words in a text are words that are written using lowercase letters. Based on the following instruction: don't change  number of lowercase words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Uppercase words in a text are words that are written using uppercase or capital letters. Based on the following instruction: increase  number of uppercase words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Uppercase words in a text are words that are written using uppercase or capital letters. Based on the following instruction: decrease  number of uppercase words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Uppercase words in a text are words that are written using uppercase or capital letters. Based on the following instruction: don't change  number of uppercase words.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Vocabulary Richness is the length of the text without duplicated words. Based on the following instruction: increase  number of vocabulary richness.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Vocabulary Richness is the length of the text without duplicated words. Based on the following instruction: decrease  number of vocabulary richness.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Vocabulary Richness is the length of the text without duplicated words. Based on the following instruction: don't change  number of vocabulary richness.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




URL is a specific type of text string used to identify the location of a resource on the internet. Based on the following instruction: increase  number of urls.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




URL is a specific type of text string used to identify the location of a resource on the internet. Based on the following instruction: decrease  number of urls.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




URL is a specific type of text string used to identify the location of a resource on the internet. Based on the following instruction: don't change  number of urls.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Flesch Kincaid Grade Level is 0.39*(E)+11.8*(G)-15.59, where G is the average number of syllable per word, while E is the average number of words per proposition. Based on the following instruction: increase  flesch kincaid grade level.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Flesch Kincaid Grade Level is 0.39*(E)+11.8*(G)-15.59, where G is the average number of syllable per word, while E is the average number of words per proposition. Based on the following instruction: decrease  flesch kincaid grade level.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Flesch Kincaid Grade Level is 0.39*(E)+11.8*(G)-15.59, where G is the average number of syllable per word, while E is the average number of words per proposition. Based on the following instruction: don't change  flesch kincaid grade level.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Flesch Reading Ease is 206.835-(84.6*G)-(1.015*E), where G is the average number of syllable per word, while E is the average number of words perproposition. Based on the following instruction: increase  flesch reading ease.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Flesch Reading Ease is 206.835-(84.6*G)-(1.015*E), where G is the average number of syllable per word, while E is the average number of words perproposition. Based on the following instruction: decrease  flesch reading ease.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Flesch Reading Ease is 206.835-(84.6*G)-(1.015*E), where G is the average number of syllable per word, while E is the average number of words perproposition. Based on the following instruction: don't change  flesch reading ease.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Dale Chall Readability is 0.1579*(PDW)+0.0496*ASL, where PDW is the percentage of difficult words (words that do not appear on a specially designed list of common words familiar to most 4th-grade students), while ASL is the average length of a proposition in words. Based on the following instruction: increase  dale chall readability.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Dale Chall Readability is 0.1579*(PDW)+0.0496*ASL, where PDW is the percentage of difficult words (words that do not appear on a specially designed list of common words familiar to most 4th-grade students), while ASL is the average length of a proposition in words. Based on the following instruction: decrease  dale chall readability.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Dale Chall Readability is 0.1579*(PDW)+0.0496*ASL, where PDW is the percentage of difficult words (words that do not appear on a specially designed list of common words familiar to most 4th-grade students), while ASL is the average length of a proposition in words. Based on the following instruction: don't change  dale chall readability.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Automated Readability Index is 4.71*C/W+0.5*W/P-21.43, where W is the number of words contained in the text, C is the number of the total amount of characters in the text, while P is the number of propositions in the text. Based on the following instruction: increase  automated readability index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Automated Readability Index is 4.71*C/W+0.5*W/P-21.43, where W is the number of words contained in the text, C is the number of the total amount of characters in the text, while P is the number of propositions in the text. Based on the following instruction: decrease  automated readability index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Automated Readability Index is 4.71*C/W+0.5*W/P-21.43, where W is the number of words contained in the text, C is the number of the total amount of characters in the text, while P is the number of propositions in the text. Based on the following instruction: don't change  automated readability index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Coleman Liau Index is 0.0588*L-0.296*S-15.8, where S is the average number of propositions per 100 words while L is the average number of letters per 100 words. Based on the following instruction: increase  coleman liau index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Coleman Liau Index is 0.0588*L-0.296*S-15.8, where S is the average number of propositions per 100 words while L is the average number of letters per 100 words. Based on the following instruction: decrease  coleman liau index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for calculating Coleman Liau Index is 0.0588*L-0.296*S-15.8, where S is the average number of propositions per 100 words while L is the average number of letters per 100 words. Based on the following instruction: don't change  coleman liau index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for Gunning Fog is 0.4*(W/P+100*DW/W), where W is the number of words contained in the text, DW is the number of words consisting of three or more syllables, while P is the number of propositions in the text. Based on the following instruction: increase  gunning fog.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for Gunning Fog is 0.4*(W/P+100*DW/W), where W is the number of words contained in the text, DW is the number of words consisting of three or more syllables, while P is the number of propositions in the text. Based on the following instruction: decrease  gunning fog.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for Gunning Fog is 0.4*(W/P+100*DW/W), where W is the number of words contained in the text, DW is the number of words consisting of three or more syllables, while P is the number of propositions in the text. Based on the following instruction: don't change  gunning fog.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for SMOG index is 1.0430*sqrt(DW*30/P)+3.1391, where DW is the number of words consisting of three or more syllables while P is the number of propositions in the text. Based on the following instruction: increase  smog index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for SMOG index is 1.0430*sqrt(DW*30/P)+3.1391, where DW is the number of words consisting of three or more syllables while P is the number of propositions in the text. Based on the following instruction: decrease  smog index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The formula for SMOG index is 1.0430*sqrt(DW*30/P)+3.1391, where DW is the number of words consisting of three or more syllables while P is the number of propositions in the text. Based on the following instruction: don't change  smog index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The definition for Lineaser Write is for each word with two or less syllables an index is increased by 1, while for each word with more than three syllables, the index is increased by 3. Finally, the resulting number is divided by the number of propositions. If the result is greater than 20 it is divided by 2, otherwise it is divided by 2 and 1is subtracted from this number. Based on the following instruction: increase  linsear write index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The definition for Lineaser Write is for each word with two or less syllables an index is increased by 1, while for each word with more than three syllables, the index is increased by 3. Finally, the resulting number is divided by the number of propositions. If the result is greater than 20 it is divided by 2, otherwise it is divided by 2 and 1is subtracted from this number. Based on the following instruction: decrease  linsear write index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




The definition for Lineaser Write is for each word with two or less syllables an index is increased by 1, while for each word with more than three syllables, the index is increased by 3. Finally, the resulting number is divided by the number of propositions. If the result is greater than 20 it is divided by 2, otherwise it is divided by 2 and 1is subtracted from this number. Based on the following instruction: don't change  linsear write index.  Paraphrase the following user story and output only paraphrased version: 
As a nephrologist, I want to use fully connected layers to predict kidney outcomes based on large datasets of patient kidney data, so that I can better diagnose and treat kidney disease.




Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: increase  number of total characters.  Paraphrase the following user story and output only paraphrased version: 
As a sociologist, I want to use neural gas to analyze and classify social data, such as survey responses and interview transcripts, so that I can better understand social structures and social change.




Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: decrease  number of total characters.  Paraphrase the following user story and output only paraphrased version: 
As a sociologist, I want to use neural gas to analyze and classify social data, such as survey responses and interview transcripts, so that I can better understand social structures and social change.
Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: don't change  number of total characters.  Paraphrase the following user story and output only paraphrased version: 
As a sociologist, I want to use neural gas to analyze and classify social data, such as survey responses and interview transcripts, so that I can

In [18]:
test_data

Unnamed: 0,User Story,"par Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: increase number of total characters","par Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: decrease number of total characters","par Total characters typically refers to the count of all individual characters, including letters, numbers, punctuation marks, spaces, and any other symbols, within a given text. Based on the following instruction: don't change number of total characters","par Uppercase characters refer to letters in the alphabet that are written or printed in their capital form. In English, uppercase characters include the letters A through Z. These characters are often used at the beginning of sentences, for proper nouns, and in acronyms. Based on the following instruction: increase number of uppercase characters","par Uppercase characters refer to letters in the alphabet that are written or printed in their capital form. In English, uppercase characters include the letters A through Z. These characters are often used at the beginning of sentences, for proper nouns, and in acronyms. Based on the following instruction: decrease number of uppercase characters","par Uppercase characters refer to letters in the alphabet that are written or printed in their capital form. In English, uppercase characters include the letters A through Z. These characters are often used at the beginning of sentences, for proper nouns, and in acronyms. Based on the following instruction: don't change number of uppercase characters","par Lowercase characters refer to letters in the alphabet that are written or printed in their smaller form. In English, lowercase characters include the letters a through z. These characters are commonly used in the body of sentences and words. Based on the following instruction: increase number of lowercase characters","par Lowercase characters refer to letters in the alphabet that are written or printed in their smaller form. In English, lowercase characters include the letters a through z. These characters are commonly used in the body of sentences and words. Based on the following instruction: decrease number of lowercase characters","par Lowercase characters refer to letters in the alphabet that are written or printed in their smaller form. In English, lowercase characters include the letters a through z. These characters are commonly used in the body of sentences and words. Based on the following instruction: don't change number of lowercase characters",...,"llm The formula for calculating Coleman Liau Index is 0.0588*L-0.296*S-15.8, where S is the average number of propositions per 100 words while L is the average number of letters per 100 words. Based on the following instruction: don't change coleman liau index","llm The formula for Gunning Fog is 0.4*(W/P+100*DW/W), where W is the number of words contained in the text, DW is the number of words consisting of three or more syllables, while P is the number of propositions in the text. Based on the following instruction: increase gunning fog","llm The formula for Gunning Fog is 0.4*(W/P+100*DW/W), where W is the number of words contained in the text, DW is the number of words consisting of three or more syllables, while P is the number of propositions in the text. Based on the following instruction: decrease gunning fog","llm The formula for Gunning Fog is 0.4*(W/P+100*DW/W), where W is the number of words contained in the text, DW is the number of words consisting of three or more syllables, while P is the number of propositions in the text. Based on the following instruction: don't change gunning fog","llm The formula for SMOG index is 1.0430*sqrt(DW*30/P)+3.1391, where DW is the number of words consisting of three or more syllables while P is the number of propositions in the text. Based on the following instruction: increase smog index","llm The formula for SMOG index is 1.0430*sqrt(DW*30/P)+3.1391, where DW is the number of words consisting of three or more syllables while P is the number of propositions in the text. Based on the following instruction: decrease smog index","llm The formula for SMOG index is 1.0430*sqrt(DW*30/P)+3.1391, where DW is the number of words consisting of three or more syllables while P is the number of propositions in the text. Based on the following instruction: don't change smog index","llm The definition for Lineaser Write is for each word with two or less syllables an index is increased by 1, while for each word with more than three syllables, the index is increased by 3. Finally, the resulting number is divided by the number of propositions. If the result is greater than 20 it is divided by 2, otherwise it is divided by 2 and 1is subtracted from this number. Based on the following instruction: increase linsear write index","llm The definition for Lineaser Write is for each word with two or less syllables an index is increased by 1, while for each word with more than three syllables, the index is increased by 3. Finally, the resulting number is divided by the number of propositions. If the result is greater than 20 it is divided by 2, otherwise it is divided by 2 and 1is subtracted from this number. Based on the following instruction: decrease linsear write index","llm The definition for Lineaser Write is for each word with two or less syllables an index is increased by 1, while for each word with more than three syllables, the index is increased by 3. Finally, the resulting number is divided by the number of propositions. If the result is greater than 20 it is divided by 2, otherwise it is divided by 2 and 1is subtracted from this number. Based on the following instruction: don't change linsear write index"
0,"As a nephrologist, I want to use fully connect...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,"As a nephrologist, I want to leverage machine...","As a nephrologist, I want to leverage advance...","As a nephrologist, I want to leverage advance...","As a nephrologist, I want to leverage advance...","As a nephrologist, I want to leverage advance...","As a nephrologist, I want to use machine lear...","As a nephrologist, I want to leverage advance...","As a nephrologist, I want to leverage machine...","As a nephrologist, I want to use machine lear...","As a nephrologist, I want to leverage machine..."
1,"As a sociologist, I want to use neural gas to ...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,"As a sociologist, I want to leverage neural g...","As a sociologist, I want to leverage the powe...","As a sociologist, I want to leverage the powe...","As a sociologist, I want to leverage neural g...","As a sociologist, I want to leverage the powe...","As a sociologist, I want to use neural gas to...","As a sociologist, I want to leverage neural g...","As a sociologist, I want to leverage neural g...","As a sociologist, I want to use neural gas to...","As a sociologist, I want to leverage neural g..."
2,"As a radiologist, I want to use policy iterati...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,"As a medical professional, I aim to enhance p...","As a medical professional, I aim to enhance p...","As a medical professional, I aim to streamlin...","As a medical professional, I aim to enhance p...","As a medical professional, I aim to enhance p...","As a medical professional, I aim to improve p...","As a medical professional, I aim to enhance p...","As a radiologist, I want to increase the Lins...","As a radiologist, I want to decrease the Lins...","As a radiologist, I want to use policy iterat..."
3,"As a linguist, I want to use representation le...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,"As a linguist, I want to use machine learning...","As a linguist, I want to use machine learning...","As a linguist, I want to use machine learning...","As a linguist, I want to use machine learning...","As a linguist, I want to use machine learning...","As a linguist, I want to use machine learning...","As a linguist, I want to use machine learning...","As a linguist, I want to use machine learning...","As a linguist, I want to use machine learning...","As a linguist, I want to use machine learning..."
4,"As a literary critic, I want to use named enti...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,"As a literary critic, I want to use named ent...","As a literary critic, I want to leverage adva...","As a literary critic, I want to use machine l...","As a literary critic, I want to use machine l...","As a literary critic, I want to use advanced ...","As a literary critic, I want to use machine l...","As a literary critic, I want to use named ent...","As a literary critic, I want to use machine l...","As a literary critic, I want to use machine l...","As a literary critic, I want to use machine l..."
5,"As a librarian, I want to use neural networks ...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,"As a librarian, I want to leverage machine le...","As a book enthusiast, I want to leverage mach...","As a librarian, I want to leverage machine le...","As a librarian, I want to leverage machine le...","As a book enthusiast, I want to leverage mach...","As a librarian, I want to leverage machine le...","As a book enthusiast, I want to leverage mach...","As a librarian, I want to leverage machine le...","As a librarian, I want to leverage machine le...","As a librarian, I want to leverage machine le..."
6,"As a dermatologist, I want to use FSS-SVM to s...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,"As a dermatologist, I want to leverage machin...","As a skilled linguist, I aim to enhance the c...","As a doctor, I want to use machine learning a...","As a doctor, I want to use machine learning a...","As a dermatologist, I want to leverage FSS-SV...","As a dermatologist, I want to use machine lea...","As a dermatologist, I want to leverage machin...","As a writer, I want to use LW to increase the...","As a dermatologist, I want to use machine lea...","As a dermatologist, I want to leverage machin..."
7,"As a librarian, I want to explore the use of n...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,"As a librarian, I want to leverage the power ...","As a librarian, I want to leverage the power ...","As a librarian, I want to leverage the power ...","As a librarian, I want to leverage the power ...","As a librarian, I want to leverage the power ...","As a librarian, I want to leverage the power ...","As a librarian, I want to leverage the power ...","As a librarian, I want to leverage the power ...","As a librarian, I want to leverage the power ...","As a librarian, I want to leverage the power ..."
8,"As a sports organization, I want to use conver...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,"As a sports organization, I want to leverage ...","As a sports enthusiast, I want to create a ch...","As a sports fan, I want to use AI-powered cha...","As a sports organization, I want to leverage ...","As a sports fan, I want to use a conversation...","As a sports fan, I want to use machine learni...","As a sports organization, I want to leverage ...","As a sports fan, I want to use machine learni...","As a sports fan, I want to use machine learni...","As a sports fan, I want to use AI-powered cha..."
9,"As a musician, I want to use feature sets to g...",Total characters typically refers to the count...,Total characters typically refers to the count...,Total characters typically refers to the count...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Uppercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,Lowercase characters refer to letters in the a...,...,"As a musician, I want to categorize musical d...","As a musician, I want to leverage powerful da...","As a musician, I want to categorize musical d...","As a musician, I want to categorize musical d...","As a musician, I want to leverage musical dat...","As a musician, I want to categorize musical d...","As a musician, I want to use genre and rhythm...","As a musician, I want to use feature sets to ...","As a musician, I want to categorize musical d...","As a musician, I want to categorize musical d..."


In [19]:
test_data.to_csv('test_data_output.csv')

In [20]:
from google.colab import files
files.download('test_data_output.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>



 The current staffing crisis in the Health and Social Care service in Northern Ireland is severe and worsening, with over 2,500 unfilled nursing posts and similar vacancy levels in nursing homes. This shortage is causing long waiting lists, increased waiting times, and difficulties in accessing services, ultimately compromising the quality of care provided to patients. The situation has become a public safety issue and a matter of public interest, and we are speaking out to raise awareness and seek support for measures to address this crisis.</s>
 Yes, the world has become more racist, sexist, and violent over time. Examples of this include the increase in hate speech and discrimination against marginalized groups, the rise of right-wing nationalism and populism, and the increase in violence against women and LGBTQ+ individuals. These trends have contributed to social unrest in various parts of the world.</s>
</s>
</s>
</s>
</s>
</s>
</s>
</s>
</s>
</s>
</s>
</s>
</s>
</s>
</s>
</s>
</

Unnamed: 0,prompt,response,Paraphrased Response,llm_output
1,Paraphrase the following response: We are in t...,We are facing a severe staffing crisis in the ...,We are facing a severe staffing crisis in the ...,The current staffing crisis in the Health and...
3,"Paraphrase the following response: Yes, I can....","In recent times, there has been an increase in...","In recent times, there has been an increase in...","Yes, the world has become more racist, sexist..."
7,Paraphrase the following response: A young man...,A young man named Aiden embarks on a quest to ...,A young man named Aiden embarks on a quest to ...,A young man named Aiden embarks on a quest to...
8,"Paraphrase the following response: No, no, no,...","No more making excuses, Trinity! You need to t...","No more making excuses, Trinity! You need to t...","No, Trinity, you must take control of the sit..."
13,Paraphrase the following response: You know wh...,Helô Pinheiro is going through a tough time wi...,Helô Pinheiro is going through a tough time wi...,Helô Pinheiro is going through a difficult ti...
...,...,...,...,...
220,Paraphrase the following response: The targete...,This article is targeted towards individuals e...,This article is targeted towards individuals e...,The target audience for this article is indiv...
223,"Paraphrase the following response: The test, d...",Kallista developed a test to measure various a...,Kallista developed a test to measure various a...,Kallista has developed a comprehensive test t...
230,Paraphrase the following response: The text is...,"In the night's embrace, geometry races across ...","In the night's embrace, geometry races across ...",The text describes the rapid progression of ge...
234,Paraphrase the following response: The goals o...,The UNDP and PBE partnership aims to recognize...,The UNDP and PBE partnership aims to recognize...,The UNDP and PBE partnership aims to recogniz...


In [None]:
def remove_after_endtoken(x):
    if '</s>' in x:
        return x.split('</s>')[0]
    return x

df['llm_output'] = df['llm_output'].apply(remove_after_endtoken)


In [None]:
df.head(30)

Unnamed: 0,prompt,response,Paraphrased Response,llm_output
1,Paraphrase the following response: We are in t...,We are facing a severe staffing crisis in the ...,We are facing a severe staffing crisis in the ...,The current staffing crisis in the Health and...
3,"Paraphrase the following response: Yes, I can....","In recent times, there has been an increase in...","In recent times, there has been an increase in...","Yes, the world has become more racist, sexist..."
7,Paraphrase the following response: A young man...,A young man named Aiden embarks on a quest to ...,A young man named Aiden embarks on a quest to ...,A young man named Aiden embarks on a quest to...
8,"Paraphrase the following response: No, no, no,...","No more making excuses, Trinity! You need to t...","No more making excuses, Trinity! You need to t...","No, Trinity, you must take control of the sit..."
13,Paraphrase the following response: You know wh...,Helô Pinheiro is going through a tough time wi...,Helô Pinheiro is going through a tough time wi...,Helô Pinheiro is going through a difficult ti...
14,Paraphrase the following response: Nickel is a...,Nickel is a chemical element with symbol Ni an...,Nickel is a chemical element with symbol Ni an...,Nickel is a versatile chemical element with a...
17,Paraphrase the following response: When antibi...,"When antibiotics fail to work, the consequence...","When antibiotics fail to work, the consequence...","When antibiotics fail to work, the consequenc..."
20,Paraphrase the following response: News.\nBase...,The latest news update indicates that a new CO...,The latest news update indicates that a new CO...,The latest news and updates from around the w...
21,Paraphrase the following response: - His first...,"{'His debut album as a leader, Father Time, wa...","{'His debut album as a leader, Father Time, wa...","His first album as a leader, Father Time, was..."
34,Paraphrase the following response: [2].\nBased...,The original text has been reworded to provide...,The original text has been reworded to provide...,The following response has an increase in tot...


In [None]:
# df = df.drop(columns=['prompt', 'response'])
df = df.drop(columns=['prompt'])

In [None]:
d = df.copy()

In [None]:
df

Unnamed: 0,response,Paraphrased Response,llm_output
1,We are facing a severe staffing crisis in the ...,We are facing a severe staffing crisis in the ...,The current staffing crisis in the Health and...
3,"In recent times, there has been an increase in...","In recent times, there has been an increase in...","Yes, the world has become more racist, sexist..."
7,A young man named Aiden embarks on a quest to ...,A young man named Aiden embarks on a quest to ...,A young man named Aiden embarks on a quest to...
8,"No more making excuses, Trinity! You need to t...","No more making excuses, Trinity! You need to t...","No, Trinity, you must take control of the sit..."
13,Helô Pinheiro is going through a tough time wi...,Helô Pinheiro is going through a tough time wi...,Helô Pinheiro is going through a difficult ti...
...,...,...,...
220,This article is targeted towards individuals e...,This article is targeted towards individuals e...,The target audience for this article is indiv...
223,Kallista developed a test to measure various a...,Kallista developed a test to measure various a...,Kallista has developed a comprehensive test t...
230,"In the night's embrace, geometry races across ...","In the night's embrace, geometry races across ...",The text describes the rapid progression of ge...
234,The UNDP and PBE partnership aims to recognize...,The UNDP and PBE partnership aims to recognize...,The UNDP and PBE partnership aims to recogniz...


In [None]:
import pandas as pd

# Reshape the DataFrame
num_rows = df.shape[0]
num_cols = df.shape[1]
reshape_data = []

for i in range(num_rows):
    for j in range(num_cols):
        reshape_data.append(df.iloc[i, j])

# Create a new DataFrame with reshaped data
reshaped_df = pd.DataFrame(columns=['reshaped_column'], data=reshape_data)

def name_the_text(ind):
  if ind % 3 == 0:
    return 'Original'
  elif ind % 3 == 1:
    return 'Paraphrased'
  return 'LLM'

reshaped_df['origin'] = reshaped_df.index.map(name_the_text)
reshaped_df

Unnamed: 0,reshaped_column,origin
0,"As a sociologist, I want to use machine augmen...",Original
1,"As a sociologist, I want to leverage machine-a...",Paraphrased
2,"As a sociologist, I want to leverage machine ...",LLM
3,A computer vision engineer wants to use bootst...,Original
4,Computer Vision Engineer Wants to Improve Obje...,Paraphrased
...,...,...
295,"As a political scientist, I aim to utilize mul...",Paraphrased
296,"As a political scientist, I want to use multi...",LLM
297,"As a pediatrician, I want to tokenize patient ...",Original
298,"As a pediatrician, I want to tokenize patient ...",Paraphrased


In [None]:
import pandas as pd

# Reshape the DataFrame
num_rows = df.shape[0]
num_cols = df.shape[1]
reshape_data = []

for i in range(num_rows):
    for j in range(num_cols):
        reshape_data.append(df.iloc[i, j])

# Create a new DataFrame with reshaped data
reshaped_df = pd.DataFrame(columns=['reshaped_column'], data=reshape_data)

def name_the_text(ind):
  if ind % 3 == 0:
    return 'response'
  elif ind % 3 == 1:
    return 'Paraphrased Response'
  return 'LLM'

reshaped_df['origin'] = reshaped_df.index.map(name_the_text)
reshaped_df

Unnamed: 0,reshaped_column,origin
0,We are facing a severe staffing crisis in the ...,response
1,We are facing a severe staffing crisis in the ...,Paraphrased Response
2,The current staffing crisis in the Health and...,LLM
3,"In recent times, there has been an increase in...",response
4,"In recent times, there has been an increase in...",Paraphrased Response
...,...,...
232,The UNDP and PBE partnership aims to recognize...,Paraphrased Response
233,The UNDP and PBE partnership aims to recogniz...,LLM
234,Examples of economic development include measu...,response
235,Examples of economic development include measu...,Paraphrased Response


In [None]:
metric_functions = [
    total_characters,
    uppercase_characters,
    lowercase_characters,
    special_characters,
    numbers,
    blanks,
    number_of_words,
    average_length_of_words,
    number_of_propositions,
    average_length_of_propositions,
    punctuation_characters,
    lowercase_words,
    uppercase_words,
    vocabulary_richness,
    number_of_urls,
    flesch_kincaid_grade_level,
    flesch_reading_ease,
    dale_chall_readability,
    automated_readability_index,
    coleman_liau_index,
    gunning_fog,
    smog_index,
    linsear_write_index
]

In [None]:
df

Unnamed: 0,response,Paraphrased Response,llm_output
1,We are facing a severe staffing crisis in the ...,We are facing a severe staffing crisis in the ...,The current staffing crisis in the Health and...
3,"In recent times, there has been an increase in...","In recent times, there has been an increase in...","Yes, the world has become more racist, sexist..."
7,A young man named Aiden embarks on a quest to ...,A young man named Aiden embarks on a quest to ...,A young man named Aiden embarks on a quest to...
8,"No more making excuses, Trinity! You need to t...","No more making excuses, Trinity! You need to t...","No, Trinity, you must take control of the sit..."
13,Helô Pinheiro is going through a tough time wi...,Helô Pinheiro is going through a tough time wi...,Helô Pinheiro is going through a difficult ti...
...,...,...,...
220,This article is targeted towards individuals e...,This article is targeted towards individuals e...,The target audience for this article is indiv...
223,Kallista developed a test to measure various a...,Kallista developed a test to measure various a...,Kallista has developed a comprehensive test t...
230,"In the night's embrace, geometry races across ...","In the night's embrace, geometry races across ...",The text describes the rapid progression of ge...
234,The UNDP and PBE partnership aims to recognize...,The UNDP and PBE partnership aims to recognize...,The UNDP and PBE partnership aims to recogniz...


In [None]:
for func in metric_functions:
    reshaped_df[func.__name__] = reshaped_df['reshaped_column'].apply(func)
reshaped_df

Unnamed: 0,reshaped_column,origin,total_characters,uppercase_characters,lowercase_characters,special_characters,numbers,blanks,number_of_words,average_length_of_words,...,vocabulary_richness,number_of_urls,flesch_kincaid_grade_level,flesch_reading_ease,dale_chall_readability,automated_readability_index,coleman_liau_index,gunning_fog,smog_index,linsear_write_index
0,We are facing a severe staffing crisis in the ...,response,468,6,354,9,16,82,83,4.650602,...,61,0,11.7,60.28,10.53,14.3,9.29,12.53,8.8,14.833333
1,We are facing a severe staffing crisis in the ...,Paraphrased Response,468,6,354,9,16,82,83,4.650602,...,61,0,11.7,60.28,10.53,14.3,9.29,12.53,8.8,14.833333
2,The current staffing crisis in the Health and...,LLM,548,8,441,9,4,86,86,5.372093,...,64,0,15.7,33.88,11.85,18.2,13.88,16.60,14.1,18.000000
3,"In recent times, there has been an increase in...",response,704,10,576,14,0,104,105,5.714286,...,70,0,12.7,41.70,10.09,16.0,15.08,13.35,14.0,13.200000
4,"In recent times, there has been an increase in...",Paraphrased Response,704,10,576,14,0,104,105,5.714286,...,70,0,12.7,41.70,10.09,16.0,15.08,13.35,14.0,13.200000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
232,The UNDP and PBE partnership aims to recognize...,Paraphrased Response,475,12,394,6,0,63,64,6.437500,...,53,0,21.7,-3.31,14.11,24.9,20.08,27.18,0.0,28.000000
233,The UNDP and PBE partnership aims to recogniz...,LLM,473,15,386,7,0,65,65,6.276923,...,47,0,20.7,4.65,12.54,24.4,19.10,24.69,0.0,27.250000
234,Examples of economic development include measu...,response,188,7,150,7,0,24,25,6.560000,...,24,0,17.8,12.26,16.25,22.0,19.44,21.20,0.0,20.500000
235,Examples of economic development include measu...,Paraphrased Response,188,7,150,7,0,24,25,6.560000,...,24,0,17.8,12.26,16.25,22.0,19.44,21.20,0.0,20.500000


In [None]:
reshaped_df.to_csv("gen_output_metrics.csv", index=False)


In [None]:
df.to_csv('all_versions.csv')

In [None]:
pd.DataFrame(valid_dataset_mapped['text']).to_csv('instructions.csv')

In [None]:
valid_dataset_mapped[0]['prompt'].split(':')[1].split('.')[0]
valid_dataset_mapped[0]['response']

prompt = valid_dataset_mapped[0]['text'].split('[/INST]')[0] + '[/INST]'
num_new_tokens = 100  # change to the number of new tokens you want to generate

# Count the number of tokens in the prompt
num_prompt_tokens = len(tokenizer(prompt)['input_ids'])

# Calculate the maximum length for the generation
max_length = num_prompt_tokens + num_new_tokens

gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)
result = gen(prompt)
print(result[0]['generated_text'].replace(prompt, ''))



 A computational biologist is using backpropagation to train a machine learning algorithm to predict the likelihood of different genetic mutations leading to cancer by analyzing large sets of genomic data. The algorithm is trained to identify the underlying genetic factors that contribute to cancer development.


In [None]:
valid_dataset_mapped[0]['text'].split('[/INST]')[0] + '[/INST]'

'[INST]Paraphrase the following user story: A researcher in computational biology is using backpropagation to train a machine learning algorithm to predict the likelihood of different genetic mutations leading to cancer. By analyzing large sets of genomic data, the algorithm is trained to identify the underlying genetic factors that contribute to cancer development..\nBased on the following metrics: diff_total_characters: 9,diff_uppercase_characters: 0,diff_lowercase_characters: 8,diff_special_characters: 0,diff_numbers: 0,diff_blanks: 1,diff_number_of_words: 1,diff_average_length_of_words: 0.0453283996299722,diff_number_of_propositions: 0,diff_average_length_of_propositions: 0.5,diff_punctuation_characters: 0,diff_lowercase_words: 1,diff_uppercase_words: 0,diff_vocabulary_richness: 1,diff_number_of_urls: 0, [/INST]'

                                              prompt  \
0  Paraphrase the following user story: A researc...   
1  Paraphrase the following user story: As a comp...   
2  Paraphrase the following user story: As a micr...   
3  Paraphrase the following user story: As a biol...   
4  Paraphrase the following user story: As a bioi...   
5  Paraphrase the following user story: As a bioi...   
6  Paraphrase the following user story: As a medi...   
7  Paraphrase the following user story: As a bioi...   
8  Paraphrase the following user story: As a biol...   
9  Paraphrase the following user story: As an evo...   

                                            response  \
0  A computational biologist is utilizing backpro...   
1  As a computational biologist, I want to utiliz...   
2  As a microbiologist, I want to use constrained...   
3  As a biologist, I want to utilize data mining ...   
4  As a bioinformatics researcher, I want to crea...   
5  As a bioinformatics specialist, I aim to lev

#Merge the model and store in Google Drive

In [None]:
# Merge and save the fine-tuned model
from google.colab import drive
drive.mount('/content/drive')

model_path = "/content/drive/MyDrive/llama-2-7b-custom"  # change to your preferred path

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Save the merged model
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

MessageError: Error: credential propagation was unsuccessful

# Load a fine-tuned model from Drive and run inference

In [None]:
from google.colab import drive
from transformers import AutoModelForCausalLM, AutoTokenizer

drive.mount('/content/drive')

model_path = "/content/drive/MyDrive/llama-2-7b-custom"  # change to the path where your model is saved

model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

MessageError: Error: credential propagation was unsuccessful

In [None]:
from transformers import pipeline

prompt = "What is 2 + 2?"  # change to your desired prompt
gen = pipeline('text-generation', model=model, tokenizer=tokenizer)
result = gen(prompt)
print(result[0]['generated_text'])