# LLM Simplification Exploration

This notebook provides a structured environment to decide on the **best out-of-the-box baseline** for text simplification.

## Decision Criteria: Faithfulness vs. Simplicity

We are looking for a baseline that balances:
1.  **Meaning Preservation (Faithfulness)**: Does not hallucinate or drop critical safety warnings.
2.  **Simplicity**: Significantly lowers the reading level (lexical and syntactic).
3.  **Controllability**: Respects formatting constraints (e.g., bullet points vs. paragraphs).

**Goal**: Identify one Model + Prompt combination to serve as the default for the project.

# 1. Setup & Configuration
Initialize Groq client and list available models.

In [38]:
import os
import time
from dotenv import load_dotenv, find_dotenv
from groq import Groq

# Load keys using find_dotenv with override=True to ensure correct loading
try:
    found_path = find_dotenv(usecwd=True)
    if found_path:
        load_dotenv(found_path, override=True)
        print(f"‚úÖ Loaded .env from: {found_path}")
    else:
        print("‚ö†Ô∏è Could not find .env file. Please ensure it exists in the project root.")
except Exception as e:
    print(f"‚ö†Ô∏è Error loading .env: {e}")

# Initialize Groq
groq_api_key = os.getenv("GROQ_API_KEY")
groq_client = None

if groq_api_key:
    groq_client = Groq(api_key=groq_api_key)
    print("‚úÖ Groq Client Initialized")
    
    print("\nüîç Available Models on Groq:")
    try:
        models = groq_client.models.list()
        for m in models.data:
            print(f" - {m.id}")
    except Exception as e:
        print(f"‚ö†Ô∏è Could not list models: {e}")
else:
    print("‚ö†Ô∏è GROQ_API_KEY missing")

‚úÖ Loaded .env from: /Users/alastair/Github/klartext/.env
‚úÖ Groq Client Initialized

üîç Available Models on Groq:
 - openai/gpt-oss-20b
 - llama-3.3-70b-versatile
 - whisper-large-v3
 - moonshotai/kimi-k2-instruct-0905
 - openai/gpt-oss-120b
 - meta-llama/llama-prompt-guard-2-86m
 - playai-tts
 - playai-tts-arabic
 - meta-llama/llama-4-maverick-17b-128e-instruct
 - moonshotai/kimi-k2-instruct
 - whisper-large-v3-turbo
 - groq/compound-mini
 - allam-2-7b
 - llama-3.1-8b-instant
 - meta-llama/llama-4-scout-17b-16e-instruct
 - openai/gpt-oss-safeguard-20b
 - meta-llama/llama-prompt-guard-2-22m
 - qwen/qwen3-32b
 - groq/compound
 - meta-llama/llama-guard-4-12b


In [39]:
# Configuration: Fixed Test Cases
# We use a fixed set of texts to ensure comparability across runs.

FIXED_TEST_CASES = {
    "Advice (Safety)": """
Make sure that the area is a safe place, especially if you plan on walking home at night. 
It‚Äôs always a good idea to practice the buddy system. Have a friend meet up and walk with you. 
Research the bus, train or streetcar route you plan to take. Check the schedule for both outgoing and return travel. 
Some public transportation ceases to run late at night. Make sure you don't get stuck without a way home.
    """,
    "Procedural (Cooking)": """
Preheat the oven to 375¬∞F (190¬∞C). Grease a 9x13 inch baking pan. 
In a medium bowl, stir together the flour, baking soda, and salt. 
In a large bowl, cream together the butter and sugar until smooth. 
Beat in the eggs one at a time, then stir in the vanilla. 
Gradually blend in the dry ingredients. Stir in the chocolate chips.
    """,
    "Dense (Legal/Formal)": """
The obligations contained herein shall remain in full force and effect indefinitely, 
notwithstanding the termination of this Agreement, until such time as the Confidential Information 
no longer qualifies as confidential under applicable law. 
The Receiving Party agrees to return all physical copies of the Confidential Information upon request.
    """
}

# Configuration: Prompt Strategies (Based on Klartext Levels)
# Defines strict simplification levels for accessibility/learning disabilities.

KLARTEXT_BASE_SYSTEM = "You are an expert in plain language writing. Your task: Rewrite the text in simple, easy-to-understand language. Do NOT invent new facts. If something is unclear, say so."

PROMPT_STRATEGIES = [
    {
        "name": "Level I (Easy)", 
        "intent": "Maximum simplicity, short sentences, bullet points.",
        "prompt": f"""{KLARTEXT_BASE_SYSTEM}
Rules for Level I (Very Easy):
- Very short sentences (maximum 8-10 words)
- Use only simple, everyday words
- Explain any uncommon word in parentheses
- Add blank lines between paragraphs
- Use bullet points when possible
        """
    },
    {
        "name": "Level II (Moderate)", 
        "intent": "Clear structure, active voice, minimal jargon.",
        "prompt": f"""{KLARTEXT_BASE_SYSTEM}
Rules for Level II (Easy-Moderate):
- Short sentences (maximum 12-15 words)
- Clear structure with headings
- Minimal jargon, explain when necessary
- Use active voice
        """
    },
    {
        "name": "Level III (Standard)", 
        "intent": "Normal length but clear, no complex structures.",
        "prompt": f"""{KLARTEXT_BASE_SYSTEM}
Rules for Level III (Standard Plain Language):
- Normal sentence length, but clearly written
- Avoid complex sentence structures
- Technical terms only when necessary
- Less repetition
        """
    }
]

In [40]:
# Helper Functions

def get_completion(text, model_name, system_prompt):
    """Caller for Groq models."""
    try:
        if not groq_client: return "[Groq Client Not Init]"

        completion = groq_client.chat.completions.create(
            model=model_name,
            messages=[
                {"role": "system", "content": system_prompt}, 
                {"role": "user", "content": text}
            ],
            temperature=0.1
        )
        return completion.choices[0].message.content
            
    except Exception as e:
        return f"[Error: {str(e)}]"

def print_comparison(label, original, result, strategy_name, model_name):
    """Prints a readable side-by-side view."""
    print(f"\n{'='*80}")
    print(f"üìå CASE: {label} | ü§ñ MODEL: {model_name} | üéØ STRATEGY: {strategy_name}")
    print(f"{'-'*80}")
    print(f"üîπ ORIGINAL:\n{original.strip()[:200]}... [len={len(original)}]\n")
    print(f"üî∏ SIMPLIFIED:\n{result.strip()}\n")
    print(f"{'='*80}\n")

# 2. Exploration A: Prompt Strategy
**Control Variable**: Model (`llama-3.1-8b-instant`)\n
**Independent Variable**: Prompt Strategy (Level I, II, III)\n
**Question**: Which complexity level is appropriate for our target audience?

In [41]:
BASELINE_MODEL = "llama-3.1-8b-instant"

print(f"üß† Running Prompt Exploration on {BASELINE_MODEL}...")

for label, text in FIXED_TEST_CASES.items():
    for strategy in PROMPT_STRATEGIES:
        output = get_completion(text, BASELINE_MODEL, strategy["prompt"])
        print_comparison(label, text, output, strategy["name"], BASELINE_MODEL)

üß† Running Prompt Exploration on llama-3.1-8b-instant...

üìå CASE: Advice (Safety) | ü§ñ MODEL: llama-3.1-8b-instant | üéØ STRATEGY: Level I (Easy)
--------------------------------------------------------------------------------
üîπ ORIGINAL:
Make sure that the area is a safe place, especially if you plan on walking home at night. 
It‚Äôs always a good idea to practice the buddy system. Have a friend meet up and walk with you. 
Research the ... [len=416]

üî∏ SIMPLIFIED:
Make sure the area is safe.

* This is especially important if you walk home at night.
* It's better to be safe than sorry.

Practice the buddy system.

* Ask a friend to meet you and walk with you.
* This way, you're not alone in the dark.

Plan your trip carefully.

* Research the bus, train, or streetcar route you'll take.
* Check the schedule for both going and coming back.
* Some public transportation stops running late at night.
* Make sure you don't get stuck without a way home.



üìå CASE: Advice (Saf

# 3. Exploration B: Model Comparison
**Control Variable**: Prompt (`Level I (Easy)`)\n
**Independent Variable**: Model choice (Groq Models)\n
**Question**: Which model adheres best to the strict 'Level I' constraints (e.g. word count, sentence length)?

In [42]:
CANDIDATE_MODELS = [
    "llama-3.1-8b-instant",       # Baseline (Fast)
    "llama-3.3-70b-versatile",    # High Intelligence Open Weight
]

# Select 'Level I' as the stress test for simplification capabilities
BASELINE_PROMPT = PROMPT_STRATEGIES[0]["prompt"] 

print(f"ü§ñ Running Model Comparison using prompt: 'Level I (Easy)'...")

for label, text in FIXED_TEST_CASES.items():
    for model in CANDIDATE_MODELS:
        output = get_completion(text, model, BASELINE_PROMPT)
        print_comparison(label, text, output, "Level I", model)
        time.sleep(1) # Polite delay

ü§ñ Running Model Comparison using prompt: 'Level I (Easy)'...

üìå CASE: Advice (Safety) | ü§ñ MODEL: llama-3.1-8b-instant | üéØ STRATEGY: Level I
--------------------------------------------------------------------------------
üîπ ORIGINAL:
Make sure that the area is a safe place, especially if you plan on walking home at night. 
It‚Äôs always a good idea to practice the buddy system. Have a friend meet up and walk with you. 
Research the ... [len=416]

üî∏ SIMPLIFIED:
Make sure the area is safe.

* This is especially important if you walk home at night.
* It's better to be safe than sorry.

Practice the buddy system.

* Ask a friend to meet you and walk with you.
* This way, you're not alone at night.

Plan your trip carefully.

* Research the bus, train, or streetcar route you'll take.
* Check the schedule for both going and coming back.
* Some public transportation stops running late at night.
* Make sure you don't get stuck without a way home.



üìå CASE: Advice (Safety) 

# 4. Optional: LLM-Based Evaluation
Set `RUN_EVALUATION = True` to score the outputs using an LLM judge. 
Useful for final verification but slow/costly for quick iteration.

In [43]:
RUN_EVALUATION = False # Toggle this to True if you want to run the judge

def evaluate_output(original, simplified, judge_model="llama-3.3-70b-versatile"):
    prompt = f"""
    Rate the following simplification on 1-5 scale for Simplicity and Meaning.
    Original: {original[:300]}...
    Simplified: {simplified[:300]}...
    Return ONLY: Simplicity: X/5, Meaning: Y/5
    """
    return get_completion(prompt, judge_model, "You are an evaluator.")

if RUN_EVALUATION:
    print("‚öñÔ∏è Running Evaluation Sample...")
    # Example usage on one case
    sample_text = FIXED_TEST_CASES["Advice (Safety)"]
    sample_out = get_completion(sample_text, "llama-3.1-8b-instant", PROMPT_STRATEGIES[0]["prompt"])
    score = evaluate_output(sample_text, sample_out)
    print(f"Score: {score}")
else:
    print("‚ÑπÔ∏è Evaluation skipped. Set RUN_EVALUATION = True to run.")

‚ÑπÔ∏è Evaluation skipped. Set RUN_EVALUATION = True to run.
