Q5: Evaluation Metrics from a Multi-Class Confusion Matrix
The system classified 90 animals into Cat, Dog, or Rabbit. The results are shown below:
System \ Gold	Cat	Dog	Rabbit
Cat	5	10\	5
Dog	15	20\	10
Rabbit	0	1
3.	Programming Implementation
Write Python code that:    
1.	Accepts the confusion matrix above as input    .
2.	Computes per-class precision and recal    l.
3.	Computes macro-averaged and micro-averaged precision and reca    ll.
4.	Prints all results clearly.
5	10


In [12]:
import numpy as np
# Confusion matrix (rows = predicted, cols = gold)
#            Gold:   Cat  Dog  Rabbit
cm = np.array([[ 5,  10,   5],   # Pred Cat
               [15,  20,  10],   # Pred Dog
               [ 0,  15,  10]])  # Pred Rabbit
labels = ["Cat", "Dog", "Rabbit"]
# True Positives, False Positives, False Negatives
tp = np.diag(cm)
fp = cm.sum(axis=1) - tp
fn = cm.sum(axis=0) - tp

precision = tp / (tp + fp)
recall    = tp / (tp + fn)

# Macro-average
macro_precision = precision.mean()
macro_recall    = recall.mean()

# Micro-average
TP_total = tp.sum()
FP_total = fp.sum()
FN_total = fn.sum()
micro_precision = TP_total / (TP_total + FP_total)
micro_recall    = TP_total / (TP_total + FN_total)

# print
print("Per-class metrics:")
for i, lab in enumerate(labels):
    print(f"  {lab:7s}  TP={tp[i]:2d}  FP={fp[i]:2d}  FN={fn[i]:2d}  "
          f"Precision={precision[i]:.4f}  Recall={recall[i]:.4f}")

print("\nMacro-averaged:")
print(f"  Precision={macro_precision:.4f}  Recall={macro_recall:.4f}")

print("\nMicro-averaged:")
print(f"  Precision={micro_precision:.4f}  Recall={micro_recall:.4f}")

# overall accuracy
accuracy = TP_total / cm.sum()
print(f"\nAccuracy={accuracy:.4f}")


Per-class metrics:
  Cat      TP= 5  FP=15  FN=15  Precision=0.2500  Recall=0.2500
  Dog      TP=20  FP=25  FN=25  Precision=0.4444  Recall=0.4444
  Rabbit   TP=10  FP=15  FN=15  Precision=0.4000  Recall=0.4000

Macro-averaged:
  Precision=0.3648  Recall=0.3648

Micro-averaged:
  Precision=0.3889  Recall=0.3889

Accuracy=0.3889


Q8. Programming: Bigram Language Model Implementation (based on “Activity: I love NLP corpus” slide)
Tasks:
Write a Python program to:
1.	Read the training corpus:
2.	<s> I love NLP </s>  
3.	<s> I love deep learning </s>  
4.	<s> deep learning is fun </s>
5.	Compute unigram and bigram counts.
6.	Estimate bigram probabilities using MLE.
7.	Implement a function that calculates the probability of any given sentence.
8.	Test your function on both s    tences:
o	<s> I lov    NLP </s>
o	<s> I love deep learning </s>
9.	Print which sentence the model prfers and why.


In [8]:
from collections import Counter, defaultdict
from math import prod
corpus = [
    ["<s>", "I", "love", "NLP", "</s>"],
    ["<s>", "I", "love", "deep", "learning", "</s>"],
    ["<s>", "deep", "learning", "is", "fun", "</s>"],
]

def get_unigram_bigram_counts(corpus_tokens):
   unigrams = Counter()
   bigrams = Counter()
   for sent in corpus_tokens:
        unigrams.update(sent)
        for w1, w2 in zip(sent[:-1], sent[1:]):
            bigrams[(w1, w2)] += 1
   return unigrams, bigrams

def bigram_mle_prob(w2, w1, bigrams, unigrams):
    c12 = bigrams.get((w1, w2), 0)
    c1 = unigrams.get(w1, 0)
    return (c12 / c1) if c1 > 0 else 0.0

def sentence_prob(sent_tokens, bigrams, unigrams):
    probs = [bigram_mle_prob(w2, w1, bigrams, unigrams) for w1, w2 in zip(sent_tokens[:-1], sent_tokens[1:])]
    return prod(probs), list(zip(sent_tokens[:-1], sent_tokens[1:], probs))

def build_table(bigrams, unigrams):
    table = defaultdict(list)
    for (w1, w2), c in sorted(bigrams.items()):
        p = bigram_mle_prob(w2, w1, bigrams, unigrams)
        table[w1].append((w2, c, p))
    return table

def main():
    unigrams, bigrams = get_unigram_bigram_counts(corpus)
    table = build_table(bigrams, unigrams)
    def bigram_line(w1, entries):
        return f"{w1:>8} -> " + ", ".join([f"{w2} (count={c}, P={p:.3f})" for w2, c, p in entries])

    print("=== Unigram counts ===")
    for w, c in sorted(unigrams.items()):
        print(f"{w:>8}: {c}")
    print("\n=== Bigram counts & MLE probs ===")
    for w1 in sorted(table.keys()):
        print(bigram_line(w1, table[w1]))

    s1 = ["<s>", "I", "love", "NLP", "</s>"]
    s2 = ["<s>", "I", "love", "deep", "learning", "</s>"]
    label_map = {"S1: <s> I love NLP </s>": s1, "S2: <s> I love deep learning </s>": s2}

    results = {}
    for label, sent in label_map.items():
        p, steps = sentence_prob(sent, bigrams, unigrams)
        results[label] = {"prob": p, "steps": steps}

    for label, info in results.items():
        step_str = " × ".join([f"P({w2}|{w1})={p:.3f}" for w1, w2, p in info["steps"]])
        print(f"\n--- {label} ---")
        print("Step probs:", step_str)
        print(f"Total sentence probability = {info['prob']:.6f}")

    best_label = max(results.items(), key=lambda kv: kv[1]["prob"])[0]
    print(f"\nPreferred = {best_label}")
    if best_label.startswith("S1"):
        print("Why: After 'love', the transition to 'NLP' vs 'deep' differs. "
              "The product of bigram probabilities along S1 is larger.")
    else:
        print("Why: After 'love', the transition to 'deep' (and onward) yields a higher product.")

if __name__ == "__main__":
    main()



=== Unigram counts ===
    </s>: 3
     <s>: 3
       I: 2
     NLP: 1
    deep: 2
     fun: 1
      is: 1
learning: 2
    love: 2

=== Bigram counts & MLE probs ===
     <s> -> I (count=2, P=0.667), deep (count=1, P=0.333)
       I -> love (count=2, P=1.000)
     NLP -> </s> (count=1, P=1.000)
    deep -> learning (count=2, P=1.000)
     fun -> </s> (count=1, P=1.000)
      is -> fun (count=1, P=1.000)
learning -> </s> (count=1, P=0.500), is (count=1, P=0.500)
    love -> NLP (count=1, P=0.500), deep (count=1, P=0.500)

--- S1: <s> I love NLP </s> ---
Step probs: P(I|<s>)=0.667 × P(love|I)=1.000 × P(NLP|love)=0.500 × P(</s>|NLP)=1.000
Total sentence probability = 0.333333

--- S2: <s> I love deep learning </s> ---
Step probs: P(I|<s>)=0.667 × P(love|I)=1.000 × P(deep|love)=0.500 × P(learning|deep)=1.000 × P(</s>|learning)=0.500
Total sentence probability = 0.166667

Preferred = S1: <s> I love NLP </s>
Why: After 'love', the transition to 'NLP' vs 'deep' differs. The product of bigram 

Which sentence wins & why
S1 is preferred because both sentences share the same early transitions (<s>→I, I→love) and then diverge at “love”. After that point, S2 has to take one extra hop (learning → </s> with probability 0.5). That extra 0.5 factor makes S2’s total probability half of S1’s.
(This exercise fits the “Naive Bayes ↔ language modeling” section from your slides—estimating probabilities from counts via MLE and multiplying along the path.) 
If you want, I can also add Laplace smoothing or perplexity reporting on top of this.
