# 🔍 Sustainable Prompt Optimization - NLP Scoring Module

This notebook reuses and adapts code from the **"NLP Pipeline + Probabilistic Language Models" lab** to evaluate the **linguistic fluency** of prompt alternatives. It implements:
- Tokenization, Normalization, Stopword Removal (Lab Part 1)
- Unigram and Bigram Probabilistic Language Models (Lab Part 2)
These models will assign fluency scores to optimized prompts.


In [1]:
# 📦 Setup and Imports (from Lab Step 1)
import nltk
import re
import string
from collections import Counter, defaultdict
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer

# Download needed NLTK components
nltk.download('punkt')
nltk.download('stopwords')


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\yogeshkumar\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\yogeshkumar\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

### 🧪 Tokenization & Normalization (Lab Step 3 & 4)

This step normalizes prompt text using:
- Lowercasing
- Regex-based tokenizer
- Stopword removal
- Porter stemming

⭕ **Lab Connection**: This pipeline was implemented as preprocessing in `Step 3–4` of the lab notebook.


In [2]:
# Lab-style tokenizer and normalizer
def simple_tokenizer(text):
    return re.findall(r'\b\w+\b', text.lower())

def normalize(tokens):
    stop_words = set(stopwords.words('english'))
    stemmer = PorterStemmer()
    return [stemmer.stem(word) for word in tokens if word not in stop_words and word not in string.punctuation]


### 📊 Unigram Model (Lab Part 2 – Section: Unigram Model)

Computes P(wᵢ) = count(wᵢ) / N for each word. Used here to score prompt fluency.


In [3]:
# Example corpus to build unigram model - replace with domain corpus if available
sample_corpus = "How to configure server settings quickly and securely. Configure system with minimal steps for fast deployment."
tokens = normalize(simple_tokenizer(sample_corpus))
unigram_counts = Counter(tokens)
total_words = len(tokens)

def unigram_prob(word):
    return unigram_counts[word] / total_words if word in unigram_counts else 1e-7  # small smoothing


### 🔗 Unigram Sentence Probability (Lab Part 2 – "Chain Rule with Unigrams")

Multiplies P(w₁) * P(w₂) * ... to give fluency score of full prompt.


In [4]:
def sentence_prob_unigram(sentence):
    words = normalize(simple_tokenizer(sentence))
    prob = 1.0
    for word in words:
        prob *= unigram_prob(word)
    return prob


### 🧩 Bigram Model (Lab Part 2 – Section: Bigram Model)

Captures adjacent word dependencies. Requires Count(wᵢ₋₁, wᵢ) and Count(wᵢ₋₁).

⭕ **Lab Connection**: Matching Section "Bigram Model with MLE"


In [5]:
# Build bigram counts on same corpus
bigram_counts = defaultdict(int)
for i in range(len(tokens) - 1):
    pair = (tokens[i], tokens[i + 1])
    bigram_counts[pair] += 1

def bigram_prob(w1, w2):
    return bigram_counts[(w1, w2)] / unigram_counts[w1] if unigram_counts[w1] > 0 else 0


### 🧪 Bigram Sentence Probability (Lab Part 2 – Sentence Probability Using Bigrams)

Uses the bigram chain rule:
P(w₁) · P(w₂|w₁) · P(w₃|w₂)...
Score is used to select fluent prompt variants.


In [6]:
def sentence_prob_bigram(sentence):
    words = normalize(simple_tokenizer(sentence))
    if not words: return 0
    prob = unigram_prob(words[0])  # start with P(w1)
    for i in range(len(words) - 1):
        prob *= bigram_prob(words[i], words[i + 1])
    return prob


### ✅ Example: Evaluate Prompt Alternatives

Select prompt candidates and compare their fluency scores using Unigram/Bigram models.

⭕ **Lab Connection**: This operationalizes the sentence scoring shown in `sentence_prob_unigram` and `sentence_prob_bigram` functions.


In [7]:
candidates = [
    "How to configure system quickly.",
    "Please provide configuration steps in a fast manner.",
    "Assist with server setup guidance to deploy with speed.",
    "Fast deployment via configuration help."
]

print("--- Fluency Scores (Unigram and Bigram) ---")
for prompt in candidates:
    u_score = sentence_prob_unigram(prompt)
    b_score = sentence_prob_bigram(prompt)
    print(f"Prompt: \"{prompt}\"\n ⤷ Unigram: {u_score:.2e}, Bigram: {b_score:.2e}\n")


--- Fluency Scores (Unigram and Bigram) ---
Prompt: "How to configure system quickly."
 ⤷ Unigram: 1.50e-03, Bigram: 0.00e+00

Prompt: "Please provide configuration steps in a fast manner."
 ⤷ Unigram: 1.50e-24, Bigram: 0.00e+00

Prompt: "Assist with server setup guidance to deploy with speed."
 ⤷ Unigram: 8.26e-31, Bigram: 0.00e+00

Prompt: "Fast deployment via configuration help."
 ⤷ Unigram: 1.50e-17, Bigram: 0.00e+00



### ✅ Implementing Combined Sentence Probability Function


In [8]:
def combined_sentence_prob(sentence, lambda_weight=0.7):
    """
    Combine bigram and unigram scores using interpolation.
    """
    words = normalize(simple_tokenizer(sentence))
    if not words:
        return 0.0
    prob = unigram_prob(words[0])  # start with unigram prob for first word

    for i in range(1, len(words)):
        p_bigram = bigram_prob(words[i - 1], words[i])
        p_unigram = unigram_prob(words[i])
        interpolated = lambda_weight * p_bigram + (1.0 - lambda_weight) * p_unigram
        prob *= interpolated
    return prob


### Ranking

In [9]:
def select_best_prompt_combined(prompts, lambda_weight=0.7):
    scores = []
    for prompt in prompts:
        score = combined_sentence_prob(prompt, lambda_weight)
        scores.append((prompt, score))
    best = max(scores, key=lambda x: x[1])
    return {
        'best_prompt': best[0],
        'scores': sorted(scores, key=lambda x: x[1], reverse=True)
    }


In [10]:
results = select_best_prompt_combined(candidates, lambda_weight=0.7)
print("✅ Best prompt:", results['best_prompt'])
for p, s in results['scores']:
    print(f'"{p}": combined score = {s:.2e}')


✅ Best prompt: How to configure system quickly.
"How to configure system quickly.": combined score = 1.87e-03
"Fast deployment via configuration help.": combined score = 3.25e-18
"Please provide configuration steps in a fast manner.": combined score = 9.74e-26
"Assist with server setup guidance to deploy with speed.": combined score = 2.01e-33


## 🚀 Next Steps

This probabilistic scoring component will be combined with:
- Cosine similarity from embedding models (to enforce semantic equivalence)
- Token length/FLOP analysis (to measure inference cost)

Language models help **optimize prompt clarity** without sacrificing computational efficiency.
