In [1]:
!pip uninstall -y tensorflow tensorflow-macos tensorflow-metal tensorflow-io pyarrow protobuf

Found existing installation: tensorflow 2.20.0
Uninstalling tensorflow-2.20.0:
  Successfully uninstalled tensorflow-2.20.0
[0mFound existing installation: pyarrow 21.0.0
Uninstalling pyarrow-21.0.0:
  Successfully uninstalled pyarrow-21.0.0
Found existing installation: protobuf 6.32.1
Uninstalling protobuf-6.32.1:
  Successfully uninstalled protobuf-6.32.1


In [4]:
!pip install sentencepiece transformers rouge-score tiktoken openai

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "dotslashderek/prompt-compression-v3"

device = torch.device("mps" if torch.backends.mps.is_available() else
                      "cuda" if torch.cuda.is_available() else "cpu")
print("Using:", device)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)

print("Model and tokenizer loaded.")


Using: mps


KeyboardInterrupt: 

In [5]:
import time, torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
print("torch:", torch.__version__, "mps built:", torch.backends.mps.is_built(), "avail:", torch.backends.mps.is_available())

t0 = time.time()
tok = AutoTokenizer.from_pretrained("dotslashderek/prompt-compression-v3")
print("tokenizer:", time.time()-t0, "s")

t1 = time.time()
mdl = AutoModelForSeq2SeqLM.from_pretrained("dotslashderek/prompt-compression-v3", low_cpu_mem_usage=False)
print("model load (cpu):", time.time()-t1, "s")

if torch.backends.mps.is_available():
    t2 = time.time()
    _ = torch.randn(1, device='mps')  # warmup
    print("mps warmup:", time.time()-t2, "s")

    t3 = time.time()
    mdl.to("mps")
    print("to(mps):", time.time()-t3, "s")

    t4 = time.time()
    _ = mdl.generate(**tok("test", return_tensors="pt").to("mps"), max_new_tokens=4)
    print("first generate (jit compile):", time.time()-t4, "s")


torch: 2.8.0 mps built: True avail: True


KeyboardInterrupt: 

In [6]:
import csv
import random

csv_path = "training_data/dolly-summarization-data-rouge.csv"
sample_size = 120

with open(csv_path, "r", encoding="utf-8") as infile:
    reader = csv.reader(infile)
    header = next(reader)
    rows = list(reader)

sampled_rows = random.sample(rows, sample_size)

print(f"Sampled {sample_size} rows for testing.")
for i, row in enumerate(sampled_rows):
    print(f"Sample {i+1}: Original: {row[0][:80]}... Compressed: {row[1][:80]}...")

Sampled 120 rows for testing.
Sample 1: Original: What is Foreverly?... Compressed: What is Foreverly?...
Sample 2: Original: How long was the Titanic?... Compressed: Titanic length?...
Sample 3: Original: Which African country was founded by Americans... Compressed: African country founded by Americans?...
Sample 4: Original: Given this paragraph about cycling, who holds the record for the most Tour de Fr... Compressed: Cycling paragraph: who has the most Tour de France stage wins?...
Sample 5: Original: Should a man get married when he is young, or wait until he is older?... Compressed: Should a man marry young or wait until he's older?...
Sample 6: Original: What is IFSC?... Compressed: What is IFSC?...
Sample 7: Original: Who is country singer Jordan Davis... Compressed: Who is country singer Jordan Davis...
Sample 8: Original: What did José María Arizmendiarrieta do?... Compressed: What did José María Arizmendiarrieta do?...
Sample 9: Original: Suggest some fantasy books I could r

In [9]:
import torch

generated_outputs = []

def count_tokens(text):
    return len(tokenizer.encode(text, add_special_tokens=False))

for i, row in enumerate(sampled_rows):
    original_prompt = row[0]
    orig_token_count = count_tokens(original_prompt)
    min_length = max(1, orig_token_count // 2)
    max_length = orig_token_count
    inputs = tokenizer(original_prompt, return_tensors="pt", truncation=True)
    with torch.no_grad():
        output_ids = model.generate(
            inputs.input_ids,
            min_length=min_length,
            max_length=max_length,
            num_beams=4,
            early_stopping=True
        )
    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    compressed_token_count = count_tokens(output_text)
    generated_outputs.append({
        "original": original_prompt,
        "generated": output_text,
        "min_length": min_length,
        "max_length": max_length,
        "original_token_count": orig_token_count,
        "compressed_token_count": compressed_token_count
    })
    print(f"Sample {i+1}:")
    print(f"Original Prompt ({orig_token_count} tokens):
{original_prompt}
")
    print(f"Compressed Prompt ({compressed_token_count} tokens):
{output_text}
---")



Sample 1:
Original Prompt (9 tokens):
Why do you want to keep customers happy?

Compressed Prompt (4 tokens):
Why do you want
---
Sample 2:
Original Prompt (17 tokens):
Who was chairman of the board of directors of Tesla as of March 2004?

Compressed Prompt (12 tokens):
Tesla chairman of the board as of March 2004?
---
Sample 3:
Original Prompt (10 tokens):
What does the Horse Statue in Ground Zero Represent?

Compressed Prompt (5 tokens):
What does the Horse Statue
---
Sample 4:
Original Prompt (6 tokens):
Difference between EST and EDT?

Compressed Prompt (2 tokens):
Difference between
---
Sample 5:
Original Prompt (8 tokens):
How is "crisp" pronounced?

Compressed Prompt (3 tokens):
How is '
---
Sample 6:
Original Prompt (13 tokens):
Using this passage, extract all the years associated with the area.

Compressed Prompt (10 tokens):
From this passage, list all years associated with the
---
Sample 7:
Original Prompt (14 tokens):
What are some interesting facts about Nelson Mandel Boul

In [None]:
import openai
from rouge_score import rouge_scorer
import time

openai.api_key = ""

scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

def call_chatgpt(prompt):
    try:
        response = openai.chat.completions.create(
            model="gpt-5-nano",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        print(f"Error calling chatgpt-5-nano: {e}")
        return ""

results = []
for i, row in enumerate(sampled_rows):
    original_prompt = row[0]
    compressed_prompt = row[1]
    print(f"\nSample {i+1}:")
    print("Calling chatgpt-5-nano with original prompt...")
    original_output = call_chatgpt(original_prompt)
    time.sleep(1)  # Avoid rate limits
    print("Original Output:")
    print(original_output)
    print("\nCalling chatgpt-5-nano with compressed prompt...")
    compressed_output = call_chatgpt(compressed_prompt)
    print("Compressed Output:")
    print(compressed_output)
    # Calculate ROUGE scores between outputs
    scores = scorer.score(original_output, compressed_output)
    rouge_1 = round(scores['rouge1'].fmeasure, 4)
    rouge_2 = round(scores['rouge2'].fmeasure, 4)
    rouge_l = round(scores['rougeL'].fmeasure, 4)
    results.append({
        "original_prompt": original_prompt,
        "compressed_prompt": compressed_prompt,
        "original_output": original_output,
        "compressed_output": compressed_output,
        "rouge_1": rouge_1,
        "rouge_2": rouge_2,
        "rouge_l": rouge_l
    })
    print(f"\nROUGE-1: {rouge_1}, ROUGE-2: {rouge_2}, ROUGE-L: {rouge_l}\n{'='*60}")


Sample 1:
Calling chatgpt-5-nano with original prompt...
Original Output:
Great question. In short: keeping customers happy helps the business (and you) long-term.

Why it matters
- Revenue and growth: happy customers are more likely to buy again, increasing customer lifetime value.
- Lower costs: retention is cheaper than acquisition; happy customers also spread positive word-of-mouth.
- Reputation and trust: good experiences build trust, making new customers more likely to choose you.
- Feedback loop: satisfied customers provide useful feedback that helps you improve your product or service.
- Competitive edge: reliable, pleasant experiences differentiate you from competitors.

How to keep customers happy (practical)
- Listen and respond quickly: understand needs, acknowledge issues, and provide timely help.
- Set clear expectations: be honest about what you can deliver and by when.
- Deliver quality consistently: avoid promises you can’t keep; aim for reliability.
- Personalize and

KeyboardInterrupt: 