## Model and Tokenizer

In [1]:
import copy
import torch
import transformers
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
model_name = "microsoft/phi-1_5"

In [3]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
# tokenizer.pad_token_id = 0   # unk. we want this to be different from the eos token pad_token = '!'
# tokenizer.padding_side = "right"  # Allow batched inference

In [4]:
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    # load_in_8bit=True,
    device_map="auto",
    trust_remote_code=True,
)
model = copy.deepcopy(base_model)

## Dataset Preperation

In [5]:
from datasets import load_dataset

In [6]:
cutoff_len = 256

def generate_prompt(data_point):
  return f"""{data_point["user"]}

Answer: {data_point["AI"]}
  """.strip()


def generate_and_tokenize_prompt(data_point):
  full_prompt = generate_prompt(data_point)
  result = tokenizer(full_prompt, padding='max_length', truncation=True, max_length=cutoff_len) # , return_tensors=None)
  result['data'] = full_prompt
  # result['labels'] = [1, 0]

  # if (result["input_ids"][-1] != tokenizer.eos_token_id
  #     and len(result["input_ids"]) < cutoff_len
  #     and add_eos_token):
  #   result["input_ids"].append(tokenizer.eos_token_id)
  #   result["attention_mask"].append(1)
  # result["labels"] = result["input_ids"].copy()
  return result

In [7]:
dataset = load_dataset('json', data_files='qa_gpt4.json', split="train")
dataset = dataset.shuffle().map(generate_and_tokenize_prompt)

Map: 100%|██████████| 131/131 [00:00<00:00, 2963.21 examples/s]


In [8]:
print(dataset[0].keys())
dataset[0]['data']
len(dataset[0]['input_ids'])
# print(len(dataset[1]['input_ids'][0]))
# print(len(dataset[0]['input_ids'][1]))

dict_keys(['AI', 'user', 'input_ids', 'attention_mask', 'data'])


256

## Training

In [9]:
OUTPUT_DIR = "/root/hongyu/JupyterNotebooksFinetuning/models/phi1.5"
training_args = transformers.TrainingArguments(
    per_device_train_batch_size=64,
    gradient_accumulation_steps=4,
    # warmup_steps=100,
    auto_find_batch_size=True,
    num_train_epochs=1,
    learning_rate=1e-6,  # 2e-5,
    weight_decay=0.1,
    fp16=False,
    # optim='adamw_torch',
    # bf16=True,
    save_total_limit=3,
    logging_steps=1,
    output_dir=OUTPUT_DIR,
    save_strategy='epoch',
    adam_beta1=0.9,
    adam_beta2=0.98,
    adam_epsilon=1e-6
)

In [10]:
trainer = transformers.Trainer(
    model=model,
    train_dataset=dataset,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False) #, return_tensors='pt')  #, pad_to_multiple_of=8),
)

In [11]:
model.config.use_cache = False
trainer.train()

You're using a CodeGenTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


Step,Training Loss
1,1.9796
2,2.0067
3,1.8494
4,1.7368


TrainOutput(global_step=4, training_loss=1.8931188583374023, metrics={'train_runtime': 10.9611, 'train_samples_per_second': 11.951, 'train_steps_per_second': 0.365, 'total_flos': 258227526696960.0, 'train_loss': 1.8931188583374023, 'epoch': 0.94})

## Inference

In [13]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [14]:
dataset[0]['AI']

'The phrase "compound interest" is used metaphorically. In the context, it signifies the exponential increase in shared love, joy, and satisfaction that comes from investing time, care, and attention into relationships with others. Just like with financial investment, the benefits multiply over time.'

In [15]:
data_point = dataset[2]
prompt = f"""
{data_point["user"]}

Answer: 
  """.strip()
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
inputs = inputs.to(device)

In [16]:
prompt

"What ethos is suggested by pursuing one’s 'true purpose' and 'identity'?\n\nAnswer:"

In [17]:
base_outputs = base_model.generate(**inputs, max_length=500)

print("Question: " + data_point["user"])
print("\n\n\n#################################")
print("GPT-4: " + data_point["AI"])
print("\n\n\n#################################")
print("Base Model: " + tokenizer.batch_decode(base_outputs)[0])
ft_outputs = model.generate(**inputs, max_length=500)
print("\n\n\n#################################")
print("Fine-tuned Model: " + tokenizer.batch_decode(ft_outputs)[0])

Question: What ethos is suggested by pursuing one’s 'true purpose' and 'identity'?



#################################
GPT-4: The ethos suggested here is one of authenticity and self-fulfilment. It implies living in a way that aligns with one's core values, passions and identity, rather than being primarily driven by the need or desire for financial gain.



#################################
Base Model: What ethos is suggested by pursuing one’s 'true purpose' and 'identity'?

Answer: The ethos suggested by pursuing one's 'true purpose' and 'identity' is authenticity.

Exercise 2: What is the importance of having a clear sense of purpose in life?

Answer: Having a clear sense of purpose in life is important because it gives us direction and motivation to pursue our goals and dreams.

Exercise 3: How can we find our true purpose in life?

Answer: We can find our true purpose in life by exploring our interests, passions, and values, and by reflecting on our experiences and what brings us






#################################
Fine-tuned Model: What ethos is suggested by pursuing one’s 'true purpose' and 'identity'?

Answer: The ethos suggested by pursuing one's 'true purpose' and 'identity' is the importance of self-awareness and self-fulfillment. It emphasizes the significance of living a life that aligns with one's values and identity, rather than just focusing on financial gain. It encourages individuals to pursue their passions and interests, which they may not have considered before, and to live a life that is meaningful and fulfilling. It suggests that one should not just focus on financial gain, but should also strive to live a life that is aligned with their values and identity.
<|endoftext|>


Title: The Importance of Health and Physical Education in Preventing Illness and Promoting Wellness

Introduction:
In this report, we will explore the significance of health and physical education in preventing illness and promoting overall wellness. We will discuss the be