## Step 1 : Model choice and loading

In [2]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [20]:
pipe = pipeline("text-generation", model="openai-community/gpt2", return_full_text=False)

Device set to use cpu


In [10]:
prompt = "Once upon a time in a small village"
results = pipe(prompt, max_length=50, num_return_sequences=2)
for i, r in enumerate(results):
    print(f"Output {i}:", r["generated_text"])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Output 0: Once upon a time in a small village, a pair of knights came on a horse and cut off the eye of one of the nobles. The knight fell to his knees, his face red in pain. However, a knight's life had been spared. He was the one to save the life of the knight.

But it was not the knight's life that was spared. The knight's life would be completely lost.

The knight's life was in the hands of the people of the village. They had been sent to protect the village from the invading army. He would die there. His life would be forever lost.

The village did not know that he had been sent here. It was a city, but there was no one here to protect it. The people of the village were so scared of the people of the village that they did not dare to enter the village and do something about it.

However, the people of the village were so frightened that they refused to let them enter.

The village was a city. It was a place that was created by the people of the village. They had to live in fear of

#### the model Loading is working !!

## Step 2 : Selecting the downstream task and data

- the idea behind that is to use the model and give it a simple passage and tell it to generate questions and answers in order to run the **SEAL** method on it 

In [11]:

context = """A large language model (LLM) is a language model with a large number of parameters (generally more than a billion).

These are deep neural networks trained on large amounts of unlabeled text using self-supervised learning. LLMs appeared around 2017 and have been used to implement conversational agents.

Instead of being trained for a specific task such as sentiment analysis, named entity recognition, or mathematical reasoning, they can accomplish a wide range of tasks. They are first pre-trained to predict a likely continuation for a given input. The quality of generated content tends to increase with the number of parameters, the size and quality of training data, and the amount of compute used to train the model. Large language models are then most often fine-tuned to adopt the role of a conversational assistant and to be “helpful, honest, and harmless.”

Language models with a large number of parameters can capture much of the syntax and semantics of human language. This enables them to reproduce substantial general world knowledge, with memorization of many facts during training.

Before the success of large language models, NLP research mainly focused on supervised learning of specialized models for specific tasks."""

qa_prompt = f"""You are an expert teacher in natural language processing and your task is to generate **question-and-answer pairs** that test a reader’s understanding of a short technical passage.  

**Instructions:**  
1. Use the passage provided below.  
2. Generate **5 distinct question-and-answer pairs**.  
3. Each question should be clear, concise, and focus on a key concept from the passage.  
4. Each answer should be correct, complete, and directly based on the passage (no outside knowledge).  
5. Format your output exactly as follows:

1) Question: <question_1>  
   Answer: <answer_1>  
2) Question: <question_2>  
   Answer: <answer_2>  
…  
5) Question: <question_5>  
   Answer: <answer_5>

**Passage:**  
{context}
"""

results = pipe(qa_prompt, max_length=300, num_return_sequences=1)
print(results[0]["generated_text"])


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=300) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


You are an expert teacher in natural language processing and your task is to generate **question-and-answer pairs** that test a reader’s understanding of a short technical passage.  

**Instructions:**  
1. Use the passage provided below.  
2. Generate **5 distinct question-and-answer pairs**.  
3. Each question should be clear, concise, and focus on a key concept from the passage.  
4. Each answer should be correct, complete, and directly based on the passage (no outside knowledge).  
5. Format your output exactly as follows:

1) Question: <question_1>  
   Answer: <answer_1>  
2) Question: <question_2>  
   Answer: <answer_2>  
…  
5) Question: <question_5>  
   Answer: <answer_5>

**Passage:**  
A large language model (LLM) is a language model with a large number of parameters (generally more than a billion).

These are deep neural networks trained on large amounts of unlabeled text using self-supervised learning. LLMs appeared around 2017 and have been used to implement conversatio

- I tried to generate the initial context Q.A using the same model but based on it's capabilities it couldn't generate them successfuly . 

In [14]:
## baseline of the model before doing the SEAL method
import pandas as pd 

df = pd.read_json('questionAndanswers.json')
df.head()

Unnamed: 0,question,answer
0,What is a large language model (LLM)?,A language model with a large number of parame...
1,Around when did LLMs appear?,They appeared around 2017.
2,What kind of tasks can LLMs accomplish?,A wide range of tasks (not just sentiment anal...
3,How are LLMs pre-trained?,They are pretrained to predict a likely contin...
4,What factors improve the quality of generated ...,"Larger number of parameters, bigger and higher..."


In [15]:
for i in df['question']:
    results = pipe(i, max_length=50, num_return_sequences=1)
    print(f"Question: {i}")
    print(f"Answer: {results[0]['generated_text']}")
    print("-----\n")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Question: What is a large language model (LLM)?
Answer: What is a large language model (LLM)?

The LLM is an LLM that is designed to be used with large numbers and/or complex types. It is written to be written to handle large types. It is written to be written to be used with small types. It is written to be used with data structures such as streams and variables, or even with type system methods.

How do I use it?

Use the documentation for the language model of the language model.

Install the language model from the package manager. To do so, open the package manager (run "apt-get install" to install the language model) and run "make".

What is a type system?

Type system is a set of functions and types that are used to construct a type system. It implements the following functions:

type A = {x: int, y: int,...};

type B = {x: int, y: int,...};

type C = {x: int, y: int,...};

type D = {x: int, y: int,...};

type E = {x: int, y: int,...};

type F = {x: int, y:
-----



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Question: Around when did LLMs appear?
Answer: Around when did LLMs appear?

What is the difference between the LLMs and the NARs?

How many LLMs does an NAR have?

In what cases are the NARs unique and when are the NARs unique?

How many NARs do a particular LLM have?

What is the difference between the LLMs and the NARs?

What is the difference between the LLMs and the NARs?

What is the difference between the LLMs and the NARs?

What is the difference between the LLMs and the NARs?

What is the difference between the LLMs and the NARs?

What is the difference between the LLMs and the NARs?

How many NARs do a particular LLM have?

In what cases are the NARs unique and when are the NARs unique?

What is the difference between the LLMs and the NARs?

What is the difference between the LLMs and the NARs?

What is the difference between the LLMs and the NARs?

What is the difference between the LLMs and the NARs?
-----



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Question: What kind of tasks can LLMs accomplish?
Answer: What kind of tasks can LLMs accomplish? The following diagram shows the total number of tasks completed by a user in each of the four fields. The number of tasks completed is related to the number of data objects in the data set.

How many tasks can LLMs accomplish? The following diagram shows the total number of tasks completed by a user in each of the four fields. The number of tasks completed is related to the number of data objects in the data set.

How many tasks can MLMs accomplish? The following diagram shows the total number of tasks completed by a user in each of the four fields. The number of tasks completed is related to the number of data objects in the data set.

How many tasks can MLMs accomplish? The following diagram shows the total number of tasks completed by a user in each of the four fields. The number of tasks completed is related to the number of data objects in the data set.

What kind of tasks can LLMs ac

KeyboardInterrupt: 

In [21]:
Baseline_answers = []

for i in df['question']:
    results = pipe(i, max_length=20, num_return_sequences=1)
    Baseline_answers.append(results[0]['generated_text'])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/te

In [23]:
import re

def normalize(text):
    text = text.lower().strip()
    text = re.sub(r"[^\w\s]", "", text)   # remove punctuation
    text = re.sub(r"\s+", " ", text)
    return text

def f1_score(pred, truth):
    pred_tokens = normalize(pred).split()
    truth_tokens = normalize(truth).split()
    if len(pred_tokens) == 0 or len(truth_tokens) == 0:
        return 0.0
    common = set(pred_tokens) & set(truth_tokens)
    if not common:
        return 0.0
    prec = len(common) / len(pred_tokens)
    rec  = len(common) / len(truth_tokens)
    return 2 * (prec * rec) / (prec + rec)

def exact_match(pred, truth):
    return 1 if normalize(pred) == normalize(truth) else 0



ems = []
f1s = []
for pred, item in zip(Baseline_answers, df.to_dict(orient="records")):
    ems.append(exact_match(pred, item["answer"]))
    f1s.append(f1_score(pred, item["answer"]))

print("Exact Match avg:", sum(ems) / len(ems))
print("F1 avg:", sum(f1s) / len(f1s))


Exact Match avg: 0.0
F1 avg: 0.040716723562072316


####  Based on the low performace of GPT 2 ,I will follow the Student-Teacher method , where i will use a better model eg GPT-5 to generate the self-edits and then implement the **algorithm** on the smaller model   

In [None]:
import os
import json
import time
import re
from openai import OpenAI  # OpenAI-compatible SDK for OpenRouter

# --- Configuration ---
OPENROUTER_API_KEY = 
client = OpenAI(api_key=OPENROUTER_API_KEY, base_url="https://openrouter.ai/api/v1")

MODEL_NAME = "meta-llama/llama-3.2-3b-instruct:free"  # free tier via OpenRouter

# Number of synthetic variants you want per QA pair
NUM_VARIANTS = 2

# Prompt template for self-edit generation
PROMPT_TEMPLATE = """
You are an expert AI researcher. Given the following question-and-answer pair:
Question: {question}
Answer: {answer}

1) Generate a *synthetic training example* based on this. You can paraphrase the answer, create a new related question + answer, or produce a concise statement summarizing the concept.
2) Propose a small *directive* for how to fine-tune a smaller student model using this synthetic example (e.g., learning rate, epochs, batch size).

Return the result as a JSON object with keys:
{{
  "synthetic_example": "<text>",
  "directive": {{
      "learning_rate": "<value>",
      "epochs": <integer>,
      "batch_size": <integer>
  }}
}}
"""

def safe_parse_json(text):
    # match {...} pattern (non-greedy)
    match = re.search(r"\{.*\}", text, re.DOTALL)
    if match:
        try:
            return json.loads(match.group())
        except json.JSONDecodeError:
            pass
    # fallback
    return {
        "synthetic_example": text.strip(),
        "directive": {"learning_rate": "5e-5", "epochs": 2, "batch_size": 8}
    }

# --- Load your QA dataset ---
qa_dataset = df.to_dict(orient="records") 

# --- Function to generate self-edits ---
def generate_self_edit(question, answer):
    prompt = PROMPT_TEMPLATE.format(question=question, answer=answer)
    
    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_tokens=300,
    )
    
    content = response.choices[0].message.content.strip()
    print("Raw API response:", content)
    
    self_edit = safe_parse_json(content)
    return self_edit


all_self_edits = []
for item in qa_dataset:
    for variant_index in range(NUM_VARIANTS):
        self_edit = generate_self_edit(item["question"], item["answer"])
        record = {
            "original_question": item["question"],
            "original_answer":   item["answer"],
            "synthetic_example": self_edit["synthetic_example"] ,
            "directive":         self_edit["directive"]
        }
        all_self_edits.append(record)
        # to be gentle with rate limits
        time.sleep(2)

# --- Save to file ---
with open("self_edits.json", "w", encoding="utf-8") as f:
    json.dump(all_self_edits, f, ensure_ascii=False, indent=2)

print(f"Generated {len(all_self_edits)} self-edits and wrote to self_edits.json")



RateLimitError: Error code: 429 - {'error': {'message': 'Provider returned error', 'code': 429, 'metadata': {'raw': 'meta-llama/llama-3.2-3b-instruct:free is temporarily rate-limited upstream. Please retry shortly, or add your own key to accumulate your rate limits: https://openrouter.ai/settings/integrations', 'provider_name': 'Venice'}}, 'user_id': 'user_33cH9jR7RukEp7B2XwL81foRQF8'}

In [None]:


## Step 3: Fine-tune with Self-Edits (LoRA)

import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForLanguageModeling
from peft import LoraConfig, get_peft_model, TaskType
from datasets import Dataset

# --- Load self-edits ---
with open("self_edits.json", "r", encoding="utf-8") as f:
    self_edits = json.load(f)

# --- Prepare training data ---
# Format: "Question: ... Answer: ..." for each synthetic example
train_texts = []
for edit in self_edits:
    # Extract synthetic example (could be statement or Q/A pair)
    text = edit["synthetic_example"]
    # Ensure it ends with a period for proper tokenization
    if not text.endswith(('.', '!', '?')):
        text += "."
    train_texts.append(text)

# Create HF Dataset
train_dataset = Dataset.from_dict({"text": train_texts})

# --- Load base model and tokenizer ---
model_name = "openai-community/gpt2"  # Same as your baseline
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token  # GPT-2 needs this

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float32,  # CPU-friendly (avoid fp16 on CPU)
    device_map="cpu"
)

# --- Apply LoRA (parameter-efficient fine-tuning) ---
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,               # Low rank for CPU
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=["c_attn"],  # GPT-2 attention layers
)
model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()  # Should be ~0.3% of total params

# --- Tokenize dataset ---
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=128,  # Keep short for speed
        padding="max_length"
    )

tokenized_dataset = train_dataset.map(tokenize_function, batched=True, remove_columns=["text"])

# --- Training arguments (CPU-optimized) ---
training_args = TrainingArguments(
    output_dir="./lora_finetuned_gpt2",
    overwrite_output_dir=True,
    num_train_epochs=2,           # Use directive's epochs if you want (2-5 typical)
    per_device_train_batch_size=2, # Very small for CPU
    gradient_accumulation_steps=4, # Simulate batch_size=8
    learning_rate=2e-5,            # Median directive value
    logging_steps=5,
    save_steps=50,
    save_total_limit=1,
    fp16=False,                    # CPU doesn't support fp16
    report_to="none",              # Disable wandb
    dataloader_num_workers=0,      # CPU-safe
)

# --- Data collator ---
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # Causal LM (not masked LM)
)

# --- Trainer ---
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

# --- Fine-tune ---
print("Starting LoRA fine-tuning...")
trainer.train()

# --- Save LoRA adapter ---
model.save_pretrained("./lora_adapter")
tokenizer.save_pretrained("./lora_adapter")
print("✅ LoRA adapter saved to ./lora_adapter")

# ...existing code...