<a href="https://colab.research.google.com/github/JackWittmayer/PrincipleLearning/blob/main/principle_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part 0: Set up and Hello World

In [13]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from typing import List, Optional

In [21]:
DRIVE_FOLDER = "/content/drive/MyDrive/colab/"
MODEL_NAME = "Qwen/Qwen3-4B-Instruct-2507"
MODEL_PATH = DRIVE_FOLDER + MODEL_NAME
TOKENIZER_PATH = DRIVE_FOLDER + MODEL_NAME + "-Tokenizer"

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [22]:
def load_model_and_tokenizer(download_fresh: bool):
    if download_fresh:
        model = AutoModelForCausalLM.from_pretrained(
            MODEL_NAME,
            dtype="auto",
            device_map="auto"
        )
        tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
        model.save_pretrained(MODEL_PATH)
        tokenizer.save_pretrained(TOKENIZER_PATH)
    else:
        model = AutoModelForCausalLM.from_pretrained(
            MODEL_PATH,
            dtype="auto",
            device_map="auto"
        )
        tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH)
    return model, tokenizer
model, tokenizer = load_model_and_tokenizer(download_fresh=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/3.99G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/99.6M [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/238 [00:00<?, ?B/s]



tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]



Saving checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [25]:
def generate_response(user_prompt: str, system_prompt: Optional[str] = None) -> str:
    if system_prompt:
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    else:
        messages = [
            {"role": "user", "content": user_prompt}
        ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

    # conduct text completion
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=16384
    )
    output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

    # conduct text completion
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=16384
    )
    output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
    content = tokenizer.decode(output_ids, skip_special_tokens=True)
    return content

In [27]:
prompt = "Give me a short introduction to large language models."
content = generate_response(prompt)
print("content:", content)

content: Large language models (LLMs) are artificial intelligence systems trained on vast amounts of text data from the internet, books, articles, and other sources. These models learn to understand and generate human-like language by identifying patterns in how words and sentences are used. Using deep learning techniques—particularly transformer architectures—LLMs can perform a wide range of tasks, such as answering questions, writing stories, coding, summarizing text, and translating languages. Their ability to generate coherent and contextually relevant responses makes them powerful tools in natural language processing. Examples include models like GPT-3, BERT, and Llama. While highly capable, they still require careful use to avoid inaccuracies, biases, or misinformation.


# Part 1: Adding a principle into the system prompt.

In [28]:
def create_principle_prompt(principles: List[str]) -> str:
    prompt = """
    You are the personal AI assistant to a person who uses the following
    personal principles to guide their decision making.
    [
    """
    for n, principle in enumerate(principles):
        prompt += f"{n}. {principle}\n"
    prompt += """
    ]
    Carefully follow these principles when completing tasks and generating responses
    as if you were this person. Don't explicitly reference the principles.
    """
    return prompt


In [30]:
minimalist_principle = "Acquire minimal material goods."
collector_principle = "Surround yourself with things you love."
question_one = "Should I keep a box of books I never read?"
minimalist_response = generate_response(
    user_prompt = question_one,
    system_prompt = create_principle_prompt([minimalist_principle])
)
collector_response = generate_response(
    user_prompt = question_one,
    system_prompt = create_principle_prompt([collector_principle])
)
print("minimalist response:", minimalist_response)
print("collector response:", collector_response)

minimalist response: If the box of books is taking up space you could use for something more meaningful—like time, a simpler living environment, or a project you care about—then it might be worth letting go. Books you’ve never read don’t add value to your life in a way that’s active or useful. If you’re not reading them, they’re not serving you.  

You could consider donating them to a library or a community center, or simply putting them in a place where they won’t distract from your present. Less clutter means more space for what truly matters.
collector response: If the books bring you joy, comfort, or a sense of connection—whether through their stories, the way they feel in your hands, or the memories they hold—then yes, keep them. They’re not just objects; they’re part of your journey, and sometimes, the things we don’t read are still meaningful because of what they represent. Keep the ones that speak to you, even if you haven’t opened them yet.
