## In the `./data/train_output` located example weights and Lora adapter trained for low amount of epochs

---
### If you *do not* want to retrain the model, just go to the cell where explained how to run the trained model (Load trained model and CLI for asking questions)

In [None]:
# install requirements
!pip install -r requirements.txt
#!pip install jsonargparse

## Prepare dataset

In [None]:
# install dataset with prompted questions to personas
!pip install transformers sentence-transformers
#load prompted data
from huggingface_hub import snapshot_download
snapshot_download(
    local_dir_use_symlinks=True,
    repo_type="dataset",
    repo_id="fnlp/character-llm-data",
    local_dir="./data/dataset")


In [None]:
# Shuffle personas description to one jsonl file
!python shuffle_data.py \
    --data_dir ./data/dataset \
    --out_path ./data/shuffle.jsonl

In [None]:
# create embeddings for personas in the shuffle.jsonl
!python embd_roles.py \
    --encoder_path "google-bert/bert-large-uncased" \
    --seed_data_path ./data/seed_data \
    --save_path ./data/embed

## Train the LORA adapters for the base model

### If required should login to personal account into Hugging Face (!huggingface-cli login)

In [None]:
#!huggingface-cli login
!python train.py \
    --model_name_or_path "meta-llama/Llama-3.2-1B"  \
    --use_fast_tokenizer \
    --data_path ./data/shuffle.jsonl \
    --embds_dir ./data/embed \
    --do_train \
    --finetuning_type moelora \
    --output_dir ./data/train_output/ \
    --max_source_length 4096 \
    --overwrite_cache \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 2e-4 \
    --num_train_epochs 0.1 \
    --plot_loss \
    --lora_rank 32 \
    --num_moe 8 \
    --gating Dense \
    --fp16 \
    --remove_unused_columns False \
    --dataset character-llm

## Load trained model and CLI for asking questions

In [3]:

import json

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the base model name
with open("./data/train_output/adapter_config.json", "r") as f:
    adapter_config = json.load(f)

base_model_path = adapter_config["base_model_name_or_path"]

# Load the base model
model = AutoModelForCausalLM.from_pretrained(base_model_path)
tokenizer = AutoTokenizer.from_pretrained(base_model_path)

# Load the LoRA adapter to base model
adapter_weights = torch.load("./data/train_output/adapter_model.bin")
model.load_state_dict(adapter_weights, strict=False)


def ask_model(prompt):
    system_prompt = """
    I want you to act like {character}. I want you to respond and answer like {character}, using the tone, manner and vocabulary {character} would use. You must know all of the knowledge of {character}.

    The status of you is as follows:
    Location: {loc_time}
    Status: {status}

    The interactions are as follows:
    """
    full_prompt = system_prompt + "\nUser: " + prompt + ":"
    # Tokenize input
    inputs = tokenizer(full_prompt, return_tensors="pt")

    # Generate output
    outputs = model.generate(
        inputs["input_ids"],
        max_length=200,
        num_return_sequences=1,
        temperature=0.7
    )

    # Decode the output
    generated_text = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    )
    return generated_text


while True:
    input_text = input("Ask your question: ")
    response = ask_model(input_text)
    print("Model's response:", response)
    cont = input("Continue?(Y/n)")
    if cont == 'n':
        break


Ask your question: Caesar tell me about your achievements


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Model's response:  

Caesar: I am a great leader of the Roman Empire. I am the Roman's greatest general and conqueror. I have taken many lands and conquered them. I have killed many enemies. I have taken many women as concubines. I have had many lovers and wives. I have been married many times. I have had many children. I have many slaves and servants. I have many great generals and soldiers. I have many great generals and soldiers. I have many great generals and soldiers. I have many great generals and soldiers. I have many
Continue?(Y/n)Y
Ask your question: Voldemort, what do you think about harry potter?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Model's response:  {voldemort}

Voldemort: {voldemort} 
User: Dumbledore, what do you think about harry potter?: {dumbledore}
Dumbledore: {dumbledore} 
User: Harry Potter, do you think you will ever find your father?: {potter}

Harry Potter: {potter} 
User: I don't think so, because he is dead. 
Harry Potter: {potter} 
User: Do you know who killed him? 
Harry Potter: {potter} 
User: No
Continue?(Y/n)n
