# üáÆüá≥ Marathi Dictionary RAG - Phase 1
## Embeddings and Vector Search

**What we'll do in this notebook:**
1. Load the MahaSBERT model (the "brain" that understands Marathi)
2. See how words become numbers (embeddings)
3. Load your dictionary data
4. Create embeddings for all entries
5. Store them in ChromaDB
6. Search and find words!

Let's go! üöÄ

---
## Step 1: Check Everything is Installed

Run this cell first. If you see errors, go back to the terminal and run:
```
pip install -r requirements.txt
```

In [28]:
# Let's check all our packages are installed
import sys
print(f"Python version: {sys.version}")

# These should all work without errors
import torch
print(f"‚úÖ PyTorch version: {torch.__version__}")

import sentence_transformers
print(f"‚úÖ Sentence Transformers version: {sentence_transformers.__version__}")

import chromadb
print(f"‚úÖ ChromaDB version: {chromadb.__version__}")

import json
print(f"‚úÖ JSON module ready")

from tqdm import tqdm
print(f"‚úÖ TQDM (progress bars) ready")

print("\nüéâ Everything is installed! Let's continue.")

Python version: 3.12.4 (main, Jun  6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.1.0.2.5)]
‚úÖ PyTorch version: 2.9.1
‚úÖ Sentence Transformers version: 5.2.0
‚úÖ ChromaDB version: 1.4.0
‚úÖ JSON module ready
‚úÖ TQDM (progress bars) ready

üéâ Everything is installed! Let's continue.


---
## Step 2: Load MahaSBERT Model

### What's happening here?

MahaSBERT is like a translator that converts words into numbers. It was trained on millions of Marathi sentences, so it "understands" Marathi.

**First time running this?** It will download the model (~400MB). This only happens once - after that, it's saved on your computer.

‚òï This might take 1-2 minutes the first time.

In [29]:
from sentence_transformers import SentenceTransformer

# This is the magic line - loading the Marathi-understanding model
print("Loading MahaSBERT model... (this takes a minute the first time)")

model = SentenceTransformer('l3cube-pune/marathi-sentence-similarity-sbert')

print("‚úÖ Model loaded!")
print(f"   Model creates vectors with {model.get_sentence_embedding_dimension()} dimensions")

Loading MahaSBERT model... (this takes a minute the first time)
‚úÖ Model loaded!
   Model creates vectors with 768 dimensions


---
## Step 3: See How Embeddings Work

Let's turn some Marathi words into numbers and see what happens!

### The Big Idea:
- Similar words ‚Üí Similar numbers
- Different words ‚Üí Different numbers

In [30]:
# Let's embed a single word
word = "‡§™‡§æ‡§£‡•Ä"

# Turn it into numbers!
embedding = model.encode(word)

print(f"Word: {word}")
print(f"Embedding shape: {embedding.shape}")  # Should be (768,)
print(f"\nFirst 10 numbers: {embedding[:10]}")
print(f"\nThis word is now represented by {len(embedding)} numbers!")

Word: ‡§™‡§æ‡§£‡•Ä
Embedding shape: (768,)

First 10 numbers: [ 1.7880186e-02 -2.0560540e-02 -1.0075445e-03  1.7184960e-02
  1.2032884e-02  1.1198540e-02 -2.0179005e-02  7.5856689e-05
 -2.7883681e-02 -1.3770126e-02]

This word is now represented by 768 numbers!


In [31]:
# Now let's compare similar vs different words
# We'll use "cosine similarity" - a score from -1 to 1
# 1 = identical, 0 = unrelated, -1 = opposite

from sentence_transformers import util

# Water-related words (should be similar)
water_words = ["‡§™‡§æ‡§£‡•Ä", "‡§ú‡§≤", "‡§™‡§æ‡§ä‡§∏", "‡§®‡§¶‡•Ä"]

# Unrelated word
unrelated = "‡§Æ‡§æ‡§Ç‡§ú‡§∞"  # cat

# Get embeddings for all
water_embeddings = model.encode(water_words)
cat_embedding = model.encode(unrelated)

print("üåä Comparing water-related words to '‡§™‡§æ‡§£‡•Ä' (water):\n")

pani_embedding = water_embeddings[0]  # ‡§™‡§æ‡§£‡•Ä

for i, word in enumerate(water_words):
    similarity = util.cos_sim(pani_embedding, water_embeddings[i]).item()
    bar = "‚ñà" * int(similarity * 20)
    print(f"  ‡§™‡§æ‡§£‡•Ä ‚Üî {word:8} : {similarity:.3f} {bar}")

print("\nüê± Comparing to unrelated word '‡§Æ‡§æ‡§Ç‡§ú‡§∞' (cat):\n")
similarity = util.cos_sim(pani_embedding, cat_embedding).item()
bar = "‚ñà" * int(similarity * 20)
print(f"  ‡§™‡§æ‡§£‡•Ä ‚Üî ‡§Æ‡§æ‡§Ç‡§ú‡§∞   : {similarity:.3f} {bar}")

print("\nüëÜ See how water words have HIGH similarity (close to 1.0)?")
print("   But 'cat' has LOWER similarity? That's embeddings working!")

üåä Comparing water-related words to '‡§™‡§æ‡§£‡•Ä' (water):

  ‡§™‡§æ‡§£‡•Ä ‚Üî ‡§™‡§æ‡§£‡•Ä     : 1.000 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  ‡§™‡§æ‡§£‡•Ä ‚Üî ‡§ú‡§≤       : 0.850 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  ‡§™‡§æ‡§£‡•Ä ‚Üî ‡§™‡§æ‡§ä‡§∏     : 0.343 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  ‡§™‡§æ‡§£‡•Ä ‚Üî ‡§®‡§¶‡•Ä      : 0.521 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà

üê± Comparing to unrelated word '‡§Æ‡§æ‡§Ç‡§ú‡§∞' (cat):

  ‡§™‡§æ‡§£‡•Ä ‚Üî ‡§Æ‡§æ‡§Ç‡§ú‡§∞   : 0.173 ‚ñà‚ñà‚ñà

üëÜ See how water words have HIGH similarity (close to 1.0)?
   But 'cat' has LOWER similarity? That's embeddings working!


---
## Step 4: Load Your Dictionary

Now let's load the Berntsen dictionary you processed.

**Make sure** you've copied `berntsen_dictionary_processed.json` to the `data/` folder!

In [32]:
import json
from pathlib import Path

# Load the dictionary
data_path = Path("../data/berntsen_dictionary_processed.json")

# Check if file exists
if not data_path.exists():
    print(f"‚ùå File not found at: {data_path.absolute()}")
    print("\nüìÅ Please copy your berntsen_dictionary_processed.json to the data/ folder")
else:
    with open(data_path, 'r', encoding='utf-8') as f:
        dictionary = json.load(f)
    
    print(f"‚úÖ Loaded dictionary with {len(dictionary):,} entries!")
    print(f"\nüìñ First entry looks like this:\n")
    print(json.dumps(dictionary[0], indent=2, ensure_ascii=False))

‚úÖ Loaded dictionary with 10,460 entries!

üìñ First entry looks like this:

{
  "entry_id": "berntsen_‡§Ö_1",
  "headword_devanagari": "‡§Ö",
  "headword_romanized": "a",
  "full_entry": "‡§Ö a pref. negative.",
  "source_page": 1,
  "entry_type": "headword",
  "base_word": null,
  "search_text": "‡§Ö a headword negative prefix",
  "definitions": [
    {
      "definition": "negative",
      "pos_display": "pref.",
      "number": null,
      "pos": "prefix",
      "gender": null,
      "declension_class": null,
      "referenced_entry": null
    }
  ]
}


In [33]:
# Let's see what kinds of entries we have
entry_types = {}
for entry in dictionary:
    t = entry.get('entry_type', 'unknown')
    entry_types[t] = entry_types.get(t, 0) + 1

print("üìä Entry types in your dictionary:\n")
for entry_type, count in entry_types.items():
    print(f"   {entry_type}: {count:,}")

üìä Entry types in your dictionary:

   headword: 9,836
   collocation: 624


---
## Step 5: Create Embeddings for ALL Entries

Now the real work! We'll:
1. Take each dictionary entry
2. Use the `search_text` field (which has Devanagari + romanized + definitions)
3. Turn it into an embedding

**This will take a few minutes** for 5,000 entries. You'll see a progress bar!

In [34]:
from tqdm import tqdm

# We'll embed the 'search_text' field - it contains the most useful info
# Let's first check a few examples

print("üìù Examples of 'search_text' we'll embed:\n")
for entry in dictionary[:3]:
    print(f"  ‚Ä¢ {entry['search_text'][:80]}...\n")

üìù Examples of 'search_text' we'll embed:

  ‚Ä¢ ‡§Ö a headword negative prefix...

  ‚Ä¢ ‡§Ö‡§Ç‡§ï a·πÖka headword number noun.masculine masculine issue (of a magazine, newspape...

  ‚Ä¢ ‡§Ö‡§Ç‡§ï‡§ó‡§£‡§ø‡§§ a·πÖkaga·πáita headword arithmetic noun.neuter neuter...



In [35]:
# Now let's create ALL embeddings
# We'll process in batches for efficiency

print("üîÑ Creating embeddings for all dictionary entries...")
print("   (This takes 2-5 minutes depending on your computer)\n")

# Extract all search texts
search_texts = [entry['search_text'] for entry in dictionary]

# Create embeddings in batches (faster than one at a time)
batch_size = 64  # Process 64 entries at a time

all_embeddings = []

for i in tqdm(range(0, len(search_texts), batch_size), desc="Embedding batches"):
    batch = search_texts[i:i + batch_size]
    batch_embeddings = model.encode(batch, show_progress_bar=False)
    all_embeddings.extend(batch_embeddings)

print(f"\n‚úÖ Created {len(all_embeddings):,} embeddings!")
print(f"   Each embedding has {len(all_embeddings[0])} dimensions")

üîÑ Creating embeddings for all dictionary entries...
   (This takes 2-5 minutes depending on your computer)



Embedding batches: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 164/164 [02:01<00:00,  1.35it/s]


‚úÖ Created 10,460 embeddings!
   Each embedding has 768 dimensions





---
## Step 6: Store in ChromaDB

Now we'll put everything in ChromaDB - our vector database.

Think of ChromaDB like a super-organized library where:
- Each book (dictionary entry) has a location based on its meaning
- We can instantly find books that are "nearby" (similar meaning)

In [36]:
import chromadb
from chromadb.config import Settings

# Create a ChromaDB client that saves to disk
# This means your database persists even after you close the notebook!

chroma_path = "../chroma_db"

client = chromadb.PersistentClient(path=chroma_path)

print(f"‚úÖ ChromaDB client created!")
print(f"   Data will be saved to: {chroma_path}")

‚úÖ ChromaDB client created!
   Data will be saved to: ../chroma_db


In [37]:
# Create (or get) a collection for our dictionary
# A "collection" is like a folder that holds related items

# Delete existing collection if it exists (so we can start fresh)
try:
    client.delete_collection(name="berntsen_dictionary")
    print("üóëÔ∏è  Deleted existing collection to start fresh")
except:
    pass

# Create new collection
collection = client.create_collection(
    name="berntsen_dictionary",
    metadata={"description": "Berntsen Marathi-English Dictionary"}
)

print(f"‚úÖ Created collection: 'berntsen_dictionary'")

üóëÔ∏è  Deleted existing collection to start fresh
‚úÖ Created collection: 'berntsen_dictionary'


In [38]:
# Now add all entries to the collection
# We'll include metadata so we can filter and display results nicely

print("üì• Adding entries to ChromaDB...\n")

# Prepare data for ChromaDB
ids = []
embeddings_list = []
documents = []
metadatas = []

for i, entry in enumerate(tqdm(dictionary, desc="Preparing entries")):
    ids.append(entry['entry_id'])
    embeddings_list.append(all_embeddings[i].tolist())  # Convert numpy to list
    documents.append(entry['search_text'])
    
    # Metadata - extra info we want to store and filter by
    # IMPORTANT: ChromaDB doesn't accept None values!
    metadata = {
        'headword': entry['headword_devanagari'],
        'entry_type': entry['entry_type'],
        'source': 'berntsen'  # Will be useful when we add more dictionaries!
    }
    
    # Add all optional fields, filtering out None values
    optional_fields = {
        'romanized': entry.get('headword_romanized'),
        'source_page': entry.get('source_page'),
        'full_entry': entry.get('full_entry'),
    }
    
    # Only add fields that have actual values (not None)
    for key, value in optional_fields.items():
        if value is not None:
            metadata[key] = value
    
    # Add part of speech if available
    if entry.get('definitions') and len(entry['definitions']) > 0:
        first_def = entry['definitions'][0]
        if first_def.get('pos'):
            metadata['pos'] = first_def['pos']
        if first_def.get('gender'):
            metadata['gender'] = first_def['gender']
    
    metadatas.append(metadata)

print(f"\n‚úÖ Prepared {len(ids):,} entries")

üì• Adding entries to ChromaDB...



Preparing entries: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10460/10460 [00:00<00:00, 62753.51it/s]


‚úÖ Prepared 10,460 entries





In [39]:
# Add everything to ChromaDB
# We'll do this in batches because ChromaDB has limits

batch_size = 500  # ChromaDB works well with batches of 500

print("üì• Uploading to ChromaDB...\n")

for i in tqdm(range(0, len(ids), batch_size), desc="Uploading batches"):
    end_idx = min(i + batch_size, len(ids))
    
    collection.add(
        ids=ids[i:end_idx],
        embeddings=embeddings_list[i:end_idx],
        documents=documents[i:end_idx],
        metadatas=metadatas[i:end_idx]
    )

print(f"\n‚úÖ Successfully added {collection.count():,} entries to ChromaDB!")

üì• Uploading to ChromaDB...



Uploading batches: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 21/21 [00:06<00:00,  3.35it/s]


‚úÖ Successfully added 10,460 entries to ChromaDB!





---
## Step 7: Let's Search! üîç

The exciting part! Let's test our system.

We'll:
1. Take a Marathi word
2. Convert it to an embedding
3. Find similar entries in ChromaDB
4. Display the results!

In [40]:
def search_dictionary(query, n_results=5):
    """
    Search the dictionary for entries similar to the query.
    
    Args:
        query: A Marathi word or phrase to search for
        n_results: How many results to return (default 5)
    
    Returns:
        Results from ChromaDB with entries and similarity scores
    """
    # Step 1: Convert query to embedding
    query_embedding = model.encode(query).tolist()
    
    # Step 2: Search ChromaDB
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results,
        include=['documents', 'metadatas', 'distances']
    )
    
    return results


def display_results(query, results):
    """
    Display search results in a nice format.
    """
    print(f"\nüîç Search: '{query}'")
    print("=" * 60)
    
    if not results['ids'][0]:
        print("No results found.")
        return
    
    for i, (id, metadata, distance) in enumerate(zip(
        results['ids'][0],
        results['metadatas'][0],
        results['distances'][0]
    )):
        # Convert distance to similarity (lower distance = higher similarity)
        # ChromaDB uses L2 distance by default
        similarity = 1 / (1 + distance)  # Simple conversion to 0-1 range
        
        print(f"\n{i+1}. {metadata['headword']}")
        if metadata.get('romanized'):
            print(f"   ({metadata['romanized']})")
        print(f"   üìñ {metadata['full_entry']}")
        print(f"   üìä Match score: {similarity:.2%}")
        print(f"   üìÑ Source: {metadata['source']}, page {metadata['source_page']}")

print("‚úÖ Search functions ready!")

‚úÖ Search functions ready!


In [41]:
# TEST 1: Simple word lookup
query = "‡§™‡§æ‡§£‡•Ä"
results = search_dictionary(query)
display_results(query, results)


üîç Search: '‡§™‡§æ‡§£‡•Ä'

1. ‡§™‡§æ‡§£‡•Ä relinquish. ‡§™‡§æ‡§£‡•ç‡§Ø‡§æ‡§§ ‡§™‡§π‡§æ‡§£‡•á
   üìñ ‡§™‡§æ‡§£‡•Ä relinquish. ‡§™‡§æ‡§£‡•ç‡§Ø‡§æ‡§§ ‡§™‡§π‡§æ‡§£‡•á to hate
   üìä Match score: 90.81%
   üìÑ Source: berntsen, page 87

2. ‡§ú‡§≤
   (jala)
   üìñ ‡§ú‡§≤ jala n. water.
   üìä Match score: 90.34%
   üìÑ Source: berntsen, page 49

3. ‡§™‡§æ‡§£‡•Ä ‡§∏‡•ã‡§°‡§£‡•á to give up
   üìñ ‡§™‡§æ‡§£‡•Ä ‡§∏‡•ã‡§°‡§£‡•á to give up to hate
   üìä Match score: 89.97%
   üìÑ Source: berntsen, page 87

4. ‡§™‡§æ‡§£‡§ö‡§ï‡•ç‡§ï‡•Ä
   (pƒÅ·πáacakkƒ´)
   üìñ ‡§™‡§æ‡§£‡§ö‡§ï‡•ç‡§ï‡•Ä pƒÅ·πáacakkƒ´ f. water mill.
   üìä Match score: 89.92%
   üìÑ Source: berntsen, page 87

5. ‡§™‡§æ‡§£‡•Ä ‡§™‡§°‡§£‡•á
   üìñ ‡§™‡§æ‡§£‡•Ä ‡§™‡§°‡§£‡•á to be spoiled
   üìä Match score: 89.14%
   üìÑ Source: berntsen, page 87


In [42]:
# TEST 2: Semantic search - find related words!
query = "water"  # English query - will it find Marathi water words?
results = search_dictionary(query)
display_results(query, results)


üîç Search: 'water'

1. ‡§ú‡§≤
   (jala)
   üìñ ‡§ú‡§≤ jala n. water.
   üìä Match score: 91.26%
   üìÑ Source: berntsen, page 49

2. ‡§™‡§æ‡§£‡•Ä relinquish. ‡§™‡§æ‡§£‡•ç‡§Ø‡§æ‡§§ ‡§™‡§π‡§æ‡§£‡•á
   üìñ ‡§™‡§æ‡§£‡•Ä relinquish. ‡§™‡§æ‡§£‡•ç‡§Ø‡§æ‡§§ ‡§™‡§π‡§æ‡§£‡•á to hate
   üìä Match score: 90.61%
   üìÑ Source: berntsen, page 87

3. ‡§™‡§æ‡§£‡•Ä ‡§∏‡•ã‡§°‡§£‡•á to give up
   üìñ ‡§™‡§æ‡§£‡•Ä ‡§∏‡•ã‡§°‡§£‡•á to give up to hate
   üìä Match score: 89.75%
   üìÑ Source: berntsen, page 87

4. ‡§™‡§æ‡§£‡§ö‡§ï‡•ç‡§ï‡•Ä
   (pƒÅ·πáacakkƒ´)
   üìñ ‡§™‡§æ‡§£‡§ö‡§ï‡•ç‡§ï‡•Ä pƒÅ·πáacakkƒ´ f. water mill.
   üìä Match score: 89.56%
   üìÑ Source: berntsen, page 87

5. ‡§™‡§æ‡§£‡§µ‡§†‡§æ
   (pƒÅ·πáava·π≠hƒÅ)
   üìñ ‡§™‡§æ‡§£‡§µ‡§†‡§æ pƒÅ·πáava·π≠hƒÅ m. a place on the bank of a river or stream where people fill water, wash, etc.
   üìä Match score: 89.30%
   üìÑ Source: berntsen, page 87


In [43]:
# TEST 3: Try a concept
query = "‡§ñ‡§æ‡§£‡•á"  # eating
results = search_dictionary(query)
display_results(query, results)


üîç Search: '‡§ñ‡§æ‡§£‡•á'

1. ‡§ñ‡§æ‡§ä ‡§ò‡§æ‡§≤‡§£‡•á
   üìñ ‡§ñ‡§æ‡§ä ‡§ò‡§æ‡§≤‡§£‡•á to feed
   üìä Match score: 88.62%
   üìÑ Source: berntsen, page 31

2. ‡§≠‡§ï‡•ç‡§∑‡§£
   (bhak·π£a·πáa)
   üìñ ‡§≠‡§ï‡•ç‡§∑‡§£ bhak·π£a·πáa n. eating.
   üìä Match score: 88.09%
   üìÑ Source: berntsen, page 109

3. ‡§ñ‡§æ‡§ä
   (khƒÅ≈´)
   üìñ ‡§ñ‡§æ‡§ä khƒÅ≈´ m. snacks, `eats'. ‡•¶ ‡§ò‡§æ‡§≤‡§£‡•á to feed.
   üìä Match score: 87.73%
   üìÑ Source: berntsen, page 31

4. ‡§ú‡•á‡§µ‡§£
   (jƒìva·πáa)
   üìñ ‡§ú‡•á‡§µ‡§£ jƒìva·πáa n. meal, food.
   üìä Match score: 87.66%
   üìÑ Source: berntsen, page 51

5. ‡§Æ‡§ø‡§∑‡•ç‡§ü‡§æ‡§®‡•ç‡§®
   (mi·π£·π≠ƒÅnna)
   üìñ ‡§Æ‡§ø‡§∑‡•ç‡§ü‡§æ‡§®‡•ç‡§® mi·π£·π≠ƒÅnna n. good food.
   üìä Match score: 87.31%
   üìÑ Source: berntsen, page 119


In [46]:
# TEST 4: Your turn! Try any word
query = "‡§∞‡•ã‡§ü‡•Ä"  # mother - change this to anything!
results = search_dictionary(query)
display_results(query, results)


üîç Search: '‡§∞‡•ã‡§ü‡•Ä'

1. ‡§∞‡•ã‡§ü‡•Ä
   (r≈ç·π≠ƒ´)
   üìñ ‡§∞‡•ã‡§ü‡•Ä r≈ç·π≠ƒ´ f. bread.
   üìä Match score: 90.49%
   üìÑ Source: berntsen, page 128

2. ‡§∞‡•ã‡§ü
   (r≈ç·π≠a)
   üìñ ‡§∞‡•ã‡§ü r≈ç·π≠a m. thick bread.
   üìä Match score: 86.64%
   üìÑ Source: berntsen, page 128

3. ‡§ö‡§™‡§æ‡§§‡•Ä
   (capƒÅtƒ´)
   üìñ ‡§ö‡§™‡§æ‡§§‡•Ä capƒÅtƒ´ f. wheat pancake used as bread. See ‡§™‡•ã‡§≥‡•Ä .
   üìä Match score: 85.75%
   üìÑ Source: berntsen, page 42

4. ‡§®‡§ø‡§§‡§ï‡•ã‡§∞
   (nitak≈çra)
   üìñ ‡§®‡§ø‡§§‡§ï‡•ã‡§∞ nitak≈çra adj. inv. one eighth of a round object like a ‡§≠‡§æ‡§ï‡§∞‡•Ä .
   üìä Match score: 85.38%
   üìÑ Source: berntsen, page 78

5. ‡§™‡•ã‡§≥‡•Ä
   (p≈ç·∏∑ƒ´)
   üìñ ‡§™‡•ã‡§≥‡•Ä p≈ç·∏∑ƒ´ f. 1. flat bread of wheat flour. 2. dewlap.
   üìä Match score: 85.10%
   üìÑ Source: berntsen, page 94


---
## üéâ Phase 1 Complete!

### What you built:
1. ‚úÖ Loaded MahaSBERT - a model that understands Marathi
2. ‚úÖ Created embeddings for 5,000+ dictionary entries
3. ‚úÖ Stored everything in ChromaDB (saved to disk!)
4. ‚úÖ Built a working search function

### What's saved:
- Your ChromaDB database is saved in the `chroma_db/` folder
- You can close this notebook and the data persists!

### What's next (Phase 2):
- Add an LLM (Claude Haiku) to make responses smarter
- Handle morphology (‡§™‡§æ‡§£‡•ç‡§Ø‡§æ‡§≤‡§æ ‚Üí ‡§™‡§æ‡§£‡•Ä)
- Better formatting of results

---

## Bonus: Interactive Search Cell

Run this cell and type any word to search!

In [None]:
# Interactive search - run this and enter words!
while True:
    query = input("\nüîç Enter a Marathi word (or 'quit' to exit): ")
    if query.lower() == 'quit':
        print("üëã Goodbye!")
        break
    results = search_dictionary(query)
    display_results(query, results)


üîç Search: '‡§™‡§æ‡§®‡§∏‡•á'

1. ‡§™‡§ø‡§ï‡§≤‡•á
   (pikalƒì)
   üìñ ‡§™‡§ø‡§ï‡§≤‡•á pikalƒì , ‡§™‡§æ‡§® pƒÅna an extremely old person.
   üìä Match score: 86.51%
   üìÑ Source: berntsen, page 89

2. ‡§™‡§æ‡§®‡§∏‡•Å‡§™‡§æ‡§∞‡•Ä
   (pƒÅnasupƒÅrƒ´)
   üìñ ‡§™‡§æ‡§®‡§∏‡•Å‡§™‡§æ‡§∞‡•Ä pƒÅnasupƒÅrƒ´ f. 1. betel leaf and betel nut. 2. reception.
   üìä Match score: 86.09%
   üìÑ Source: berntsen, page 87

3. ‡§™‡§æ‡§®‡§æ
   (pƒÅnƒÅ)
   üìñ ‡§™‡§æ‡§®‡§æ pƒÅnƒÅ m. spanner, wrench.
   üìä Match score: 85.48%
   üìÑ Source: berntsen, page 87

4. ‡§™‡§æ‡§≤‡•Å‡§™‡§¶
   (pƒÅlupada)
   üìñ ‡§™‡§æ‡§≤‡•Å‡§™‡§¶ pƒÅlupada n. refrain.
   üìä Match score: 84.90%
   üìÑ Source: berntsen, page 89

5. ‡§™‡•Ä‡§™
   (pƒ´pa)
   üìñ ‡§™‡•Ä‡§™ pƒ´pa n. cask, barrel.
   üìä Match score: 84.85%
   üìÑ Source: berntsen, page 90

üîç Search: ''

1. ‡§ö‡§ï‡§æ‡§ü‡•ç‡§Ø
   (cakƒÅ·π≠ya)
   üìñ ‡§ö‡§ï‡§æ‡§ü‡•ç‡§Ø cakƒÅ·π≠ya f.pl. idle talk. ‡•¶ ‡§™‡•¢‡§ü‡§£‡•á to chat idly.
   üìä Match score: 79.66%
