<a href="https://colab.research.google.com/github/Kishara0/Spell_checker/blob/main/Spell_Grammer_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import google.generativeai as genai
import pandas as pd

# Configure Google Gemini
genai.configure(api_key="AIzaSyBf47wH-r0NZRVG4o8f4zl5TfTzQ-2fOhQ")
model = genai.GenerativeModel("gemini-1.5-flash")


In [None]:
# Load spell checker dataset
spell_checker_path = "/content/data-spell-checker.xlsx"
spell_checker_data = pd.read_excel(spell_checker_path)

# Convert dataset to a dictionary for quick lookup
spell_checker_dict = {row['word']: row['label'] for _, row in spell_checker_data.iterrows()}


In [None]:
def sinhala_spell_checker(word):
    """
    Check if a word exists in the spell checker dictionary.
    """
    if word in spell_checker_dict:
        return True  # Word is correct
    return False  # Word is incorrect


In [None]:
def sinhala_grammar_checker(sentence):
    """
    Check the grammar of a given Sinhala sentence using predefined rules.
    """
    # Split the sentence into words (assuming spaces separate words)
    words = sentence.split()

    # Example: Check for Subject-Verb Agreement (Rule 1)
    if len(words) > 1:
        subject, verb = words[0], words[-1]

        # Add specific subject-verb agreement rules here
        if subject in ["ඔහු", "ඈ"] and not verb.endswith("යි"):
            return "Error: Subject-Verb Agreement issue detected."

    # Add checks for Word Forms, Tense, Gender, etc., based on rules
    # Example: Tense (Rule 4)
    if "ගියේය" in sentence and "යයි" in sentence:
        return "Error: Tense mismatch detected."

    # If no issues are found
    return "Grammar is correct."


In [None]:
def generate_suggestions(sentence):
    """
    Generate suggestions for a Sinhala sentence using Google Gemini.
    """
    response = model.generate_content(f"Check this Sinhala sentence: {sentence} and suggest corrections.")
    return response.text


In [None]:
def sinhala_checker_pipeline(sentence):
    """
    End-to-end pipeline for Sinhala spell and grammar checking.
    """
    # Check spelling for each word
    words = sentence.split()
    incorrect_words = [word for word in words if not sinhala_spell_checker(word)]

    # Perform grammar checking
    grammar_feedback = sinhala_grammar_checker(sentence)

    # Use Google Gemini for advanced suggestions
    gemini_suggestions = generate_suggestions(sentence)

    # Compile results
    results = {
        "incorrect_words": incorrect_words,
        "grammar_feedback": grammar_feedback,
        "gemini_suggestions": gemini_suggestions
    }
    return results


In [None]:
test_sentence = "ඔහු පොතක් කියවමු."
results = sinhala_checker_pipeline(test_sentence)

print("Incorrect Words:", results["incorrect_words"])
print("Grammar Feedback:", results["grammar_feedback"])
print("Gemini Suggestions:", results["gemini_suggestions"])


Incorrect Words: ['පොතක්', 'කියවමු.']
Grammar Feedback: Error: Subject-Verb Agreement issue detected.
Gemini Suggestions: The Sinhala sentence "ඔහු පොතක් කියවමු" is grammatically incorrect.  The issue is with the verb.

* **ඔහු (ohu):** He
* **පොතක් (potak):** a book
* **කියවමු (kiyavamu):** let's read (inclusive, including the speaker)

The sentence structure implies that the speaker and *ohu* (he) should *both* read a book together.  This is unusual phrasing.  To correct it, you need to change the verb to reflect the singular action of "he" reading.

Here are a few correct options, depending on the intended meaning:

* **ඔහු පොතක් කියවයි (ohu potak kiyawai):** He reads a book.  (Simple present tense)
* **ඔහු පොතක් කියවනවා (ohu potak kiyawanaa):** He is reading a book. (Present continuous tense)
* **ඔහු පොතක් කියෙව්වා (ohu potak kiyawwa):** He read a book. (Simple past tense)


The best correction depends on the context.  Choose the option that best fits the intended meaning of your s

In [None]:
def sinhala_spell_checker(word):
    """
    Check if a word exists in the spell checker dictionary.
    """
    if word in spell_checker_dict:
        return {"word": word, "status": "correct"}
    return {"word": word, "status": "incorrect"}

def sinhala_grammar_checker(sentence):
    """
    Check the grammar of a given Sinhala sentence using predefined rules.
    """
    # Split the sentence into words
    words = sentence.split()

    # Output grammar feedback in structured form
    grammar_issues = []

    # Example Rule 1: Subject-Verb Agreement
    if len(words) > 1:
        subject, verb = words[0], words[-1]
        if subject in ["ඔහු", "ඈ"] and not verb.endswith("යි"):
            grammar_issues.append({"rule": "Subject-Verb Agreement", "issue": "Subject and verb do not agree."})

    # Example Rule 4: Tense Consistency
    if "ගියේය" in sentence and "යයි" in sentence:
        grammar_issues.append({"rule": "Tense Consistency", "issue": "Past and non-past tenses are mixed."})

    return grammar_issues if grammar_issues else [{"status": "Grammar is correct."}]


In [None]:
def format_for_gemini(sentence):
    """
    Format spell and grammar check results for Google Gemini input.
    """
    # Spell check each word in the sentence
    words = sentence.split()
    spell_check_results = [sinhala_spell_checker(word) for word in words]

    # Grammar check the sentence
    grammar_check_results = sinhala_grammar_checker(sentence)

    # Structure the input for Gemini
    formatted_input = {
        "sentence": sentence,
        "spell_check_results": spell_check_results,
        "grammar_check_results": grammar_check_results,
        "rules": [
            "Subject-Verb Agreement: Subject and verb must agree in number, gender, and person.",
            "Word Forms: Singular and plural words influence verb conjugation.",
            "Person Variations: First, second, and third person dictate sentence structure.",
            "Tense: Sentences should not mix past and non-past tenses.",
            "Gender Variations: Verb forms vary for masculine, feminine, and neutral genders.",
            "Case Roles: Focus determines sentence emphasis (active/passive)."
        ]
    }
    return formatted_input


In [None]:
def query_gemini(formatted_input):
    """
    Send structured input to Google Gemini and get feedback or corrections.
    """
    # Convert input to a structured natural language format
    prompt = f"""
    Analyze the following Sinhala sentence based on spell and grammar checks:

    Sentence: {formatted_input['sentence']}

    Spell Check Results:
    {formatted_input['spell_check_results']}

    Grammar Check Results:
    {formatted_input['grammar_check_results']}

    Rules for Analysis:
    {formatted_input['rules']}

    Provide corrections or suggestions for improvement.
    """
    response = model.generate_content(prompt)
    return response.text


In [None]:
def sinhala_rag_pipeline(sentence):
    """
    Full pipeline for Sinhala spell and grammar checking with RAG (Gemini).
    """
    # Format the input
    formatted_input = format_for_gemini(sentence)

    # Query Google Gemini
    gemini_output = query_gemini(formatted_input)

    return gemini_output


In [None]:
test_sentence = "ඔහු පොතක් කියවමු."
gemini_response = sinhala_rag_pipeline(test_sentence)

print("Gemini Output:")
print(gemini_response)


Gemini Output:
The Sinhala sentence "ඔහු පොතක් කියවමු" has several issues. Let's analyze them based on the provided feedback:


**Spell Check Analysis:**

* **පොතක් (potak):**  This is spelled correctly. The spell check result is incorrect.  It means "a book".

* **කියවමු (kiyawamu):** This is where the problem lies.  කියවමු is the first-person plural form of the verb "to read" (කියවන්න - kiyawanna). It means "let's read".  The spell check is incorrectly flagging this as incorrect.

**Grammar Check Analysis:**

The core issue highlighted by the grammar checker, "Subject-Verb Agreement," is accurate, even if the spell check is partially wrong.

* **Subject-Verb Disagreement:** The subject "ඔහු (ohu)" is third-person singular (he). The verb "කියවමු (kiyawamu)" is first-person plural (we).  This is a significant grammatical error.  You can't use a plural verb with a singular subject.

**Corrections and Suggestions:**

To correct the sentence, we need to change the verb to agree with the s