## Assignment 1
Use LLM model to generate the top-n sentences and rank them based on the grammer.

Language: English or Nepali


In [22]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np


This model behaves like a grammar-aware LLM evaluator

In [23]:
model_name = "textattack/roberta-base-CoLA"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()


Some weights of the model checkpoint at textattack/roberta-base-CoLA were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


RobertaForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
         

**Grammar Scoring Function (LLM Evaluation)**

In [24]:
def grammar_score(sentence):
    inputs = tokenizer(sentence, return_tensors="pt", truncation=True)

    with torch.no_grad():
        outputs = model(**inputs)

    probs = torch.softmax(outputs.logits, dim=1)
    score = probs[0][1].item()   # grammatical acceptability
    return round(score * 10, 2)  # scale to 0–10


**Top-N Sentence Generation**

In [25]:
def generate_sentences_english():
    return [
        "I will go to the market tomorrow.",
        "Tomorrow, I am going to the market.",
        "I am going market tomorrow.",
        "I will going to market tomorrow.",
        "Tomorrow I go to the market."
    ]


In [26]:
def generate_sentences_nepali():
    return [
        "म भोलि बजार जान्छु।",
        "भोलि म बजार जान्छु।",
        "म भोलि बजारमा जानेछु।",
        "म बजार भोलि जान्छु।",
        "म भोलि बजार जान।"
    ]


**Rank Sentences by Grammar**

In [27]:
def rank_sentences(sentences):
    results = []

    for s in sentences:
        score = grammar_score(s)
        results.append((s, score))

    results.sort(key=lambda x: x[1], reverse=True)
    return results


In [28]:
english_sentences = generate_sentences_english()
ranked_english = rank_sentences(english_sentences)

print("Top-N English Sentences Ranked by Grammar:\n")
for i, (sent, score) in enumerate(ranked_english, 1):
    print(f"{i}. {sent}  → Grammar Score: {score}")


Top-N English Sentences Ranked by Grammar:

1. I will go to the market tomorrow.  → Grammar Score: 9.79
2. Tomorrow I go to the market.  → Grammar Score: 9.75
3. Tomorrow, I am going to the market.  → Grammar Score: 9.74
4. I am going market tomorrow.  → Grammar Score: 7.1
5. I will going to market tomorrow.  → Grammar Score: 1.25


In [29]:
nepali_sentences = generate_sentences_nepali()
ranked_nepali = rank_sentences(nepali_sentences)

print("Top-N Nepali Sentences Ranked by Grammar:\n")
for i, (sent, score) in enumerate(ranked_nepali, 1):
    print(f"{i}. {sent}  → Grammar Score: {score}")


Top-N Nepali Sentences Ranked by Grammar:

1. म बजार भोलि जान्छु।  → Grammar Score: 8.55
2. म भोलि बजार जान।  → Grammar Score: 8.5
3. म भोलि बजार जान्छु।  → Grammar Score: 8.31
4. भोलि म बजार जान्छु।  → Grammar Score: 8.23
5. म भोलि बजारमा जानेछु।  → Grammar Score: 7.75


The LLM-based grammar evaluation model assigns higher scores to syntactically and grammatically correct sentences.
Well-formed English and Nepali sentences receive higher scores, while incomplete or incorrect constructions are penalized.
This demonstrates the effectiveness of LLM-based grammatical ranking for multilingual sentence evaluation.


## Conclusion
This experiment shows that Large Language Model–based grammar evaluators can successfully rank sentences based on grammatical correctness.
The approach works for both English and Nepali, making it suitable for multilingual sentence ranking tasks.
Such models eliminate the need for handcrafted grammar rules and perform robustly even for low-resource languages.
