## Sanskrit Machine Learning!

 Using NLP to learn embeddings from Vedic texts and explore conceptual similarity between verses, deities, and philosophical ideas (Dharma, Rta, Atman, Brahman).


### Loading the Dataset:
#### This is for the Kaggle Dataset!


Imports needed for Kaggle

In [2]:
import kagglehub
from kagglehub import KaggleDatasetAdapter
import os

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# This is the CSV file INSIDE the Kaggle dataset
file_path = "complete_rigveda_all_mandalas.json"

# Load the dataset as a pandas DataFrame
df = kagglehub.load_dataset(
    KaggleDatasetAdapter.PANDAS,
    "varunrajuvangar/rigved-all-sukta-verses-and-meaning-dataset",
    file_path,
)

# print("First 5 records:")
# print(df.head())

print("\nColumns:")
print(df.columns)

  df = kagglehub.load_dataset(



Columns:
Index(['Mandala 1', 'Mandala 2', 'Mandala 3', 'Mandala 4', 'Mandala 5',
       'Mandala 6', 'Mandala 7', 'Mandala 8', 'Mandala 9', 'Mandala 10'],
      dtype='object')


In [4]:
# Display the first Sukta data to verify content and structure for cleaning
import pprint
first_sukta_data = df.iloc[0,0]
print("\nFirst Sukta Data:")
pprint.pprint(first_sukta_data)



First Sukta Data:
[{'padapatha': {'devanagari': {'text': 'अग्निम् । ईळे । पुरःऽहितम् । यज्ञस्य । '
                                       'देवम् । ऋत्विजम् ।होतारम् । '
                                       'रत्नऽधातमम् ॥',
                               'type': 'Padapatha Devanagari Nonaccented',
                               'words': ['अग्निम्',
                                         'ईळे',
                                         'पुरःऽहितम्',
                                         'यज्ञस्य',
                                         'देवम्',
                                         'ऋत्विजम्',
                                         'होतारम्',
                                         'रत्नऽधातमम्']},
                'transliteration': {'text': 'agním ǀ īḷe ǀ puráḥ-hitam ǀ '
                                            'yajñásya ǀ devám ǀ ṛtvíjam '
                                            'ǀhótāram ǀ ratna-dhā́tamam ǁ',
                                    'type': 'Padap

In [7]:
import pandas as pd

every_verse = []

for mandala in df.columns:
    for sukta in df.index:
        curr_cell = df.at[sukta, mandala]

        if not isinstance(curr_cell, list):
            continue

        for verse in curr_cell:
            try:
                sanskrit_verse = verse['samhita']['devanagari']['text']
                display_sanskrit = verse['sanskrit_wisdomlib']
                eng_translation = verse['translation']
                verse_num = verse['rik_number']
                
                every_verse.append({
                    'mandala': mandala,
                    'sukta': sukta,
                    'verse_num': verse_num,
                    'sanskrit_verse': sanskrit_verse,
                    'display_sanskrit': display_sanskrit,
                    'english_translation': eng_translation
                })
            
            except KeyError as e:
                print(f"KeyError for mandala {mandala}, sukta {sukta}, verse {verse.get('rik_number', '?')}: {e}")

cleaned_df = pd.DataFrame(every_verse)
print("\nCleaned DataFrame head:")
print(cleaned_df.head())


Cleaned DataFrame head:
     mandala    sukta  verse_num  \
0  Mandala 1  Sukta 1          1   
1  Mandala 1  Sukta 1          2   
2  Mandala 1  Sukta 1          3   
3  Mandala 1  Sukta 1          4   
4  Mandala 1  Sukta 1          5   

                                      sanskrit_verse  \
0  अग्निमीळे पुरोहितं यज्ञस्य देवमृत्विजं ।होतारं...   
1  अग्निः पूर्वेभिर्ऋषिभिरीड्यो नूतनैरुत ।स देवाँ...   
2  अग्निना रयिमश्नवत्पोषमेव दिवेदिवे ।यशसं वीरवत्...   
3  अग्ने यं यज्ञमध्वरं विश्वतः परिभूरसि ।स इद्देव...   
4  अग्निर्होता कविक्रतुः सत्यश्चित्रश्रवस्तमः ।दे...   

                                    display_sanskrit  \
0  अ॒ग्निमी॑ळे पु॒रोहि॑तं य॒ज्ञस्य॑ दे॒वमृ॒त्विज॑...   
1  अ॒ग्निः पूर्वे॑भि॒ॠषि॑भि॒रीड्यो॒ नूत॑नैरु॒त । ...   
2  अ॒ग्निना॑ र॒यिम॑श्नव॒त्पोष॑मे॒व दि॒वेदि॑वे । य...   
3  अग्ने॒ यं य॒ज्ञम॑ध्व॒रं वि॒श्वत॑: परि॒भूरसि॑ ।...   
4  अ॒ग्निर्होता॑ क॒विक्र॑तुः स॒त्यश्चि॒त्रश्र॑वस्...   

                                 english_translation  
0  “I glorifyAgni, the high p