# Metrical Analysis of Sanskrit Ninth Class Verb Forms

## Getting Verbal Roots 

In [None]:
!wget -O data/whitney_roots.pdf http://gretil.sub.uni-goettingen.de/gretil_elib/Whi885__Whitney_Roots-ACCENTED.pdf

In [4]:
# install pdftk if not already there. eg: for ubuntu: sudo apt install pdftk
!pdftk data/whitney_roots.pdf cat 229 output data/whitney_roots_ninth_class.pdf

In [5]:
# produces data/whitney_roots_ninth_class.txt
!pdftotext data/whitney_roots_ninth_class.pdf

Cleanup the text version manually, fixing formatting and diacritics.

Final results are in [data/whitney_roots_ninth_class_cleaned.txt](data/whitney_roots_ninth_class_cleaned.txt)

## Parsing Verbal Roots Info

In [45]:
CLASS_HEADER = "6. nā-class"
EARLIER_LANGUAGE_HEADER = "A. Earlier Language"
EARLIER_AND_LATER_LANGUAGE_HEADER = "B. Earlier and Later Language"
LATER_LANGUAGE_HEADER = "C. Later Language"

NINTH_CLASS_STRONG_MARKER = "ā"
NINTH_CLASS_WEAK_MARKER = "ī"

whitney_roots = []

language_period = None

with open("data/whitney_roots_ninth_class_cleaned.txt", 'r') as whitney_file:
    while line := whitney_file.readline():
        variant_no = None
        attestation_texts = None
        weak_only = False
        
        line = line.rstrip()
        if not line or CLASS_HEADER in line:
            continue    
        elif EARLIER_LANGUAGE_HEADER in line:
            language_period = EARLIER_LANGUAGE_HEADER
            continue
        elif EARLIER_AND_LATER_LANGUAGE_HEADER in line:
            language_period = EARLIER_AND_LATER_LANGUAGE_HEADER
            continue
        elif LATER_LANGUAGE_HEADER in line:
            language_period = LATER_LANGUAGE_HEADER
            continue
                
        line_parts = line.split()
        if line_parts[0].isdigit():
            variant_no = line_parts.pop(0)
        stem = line_parts.pop(0)
        if line_parts:
            attestation_texts = " ".join(line_parts)
        
        if stem.endswith(NINTH_CLASS_WEAK_MARKER):
            weak_only = True
        
        whitney_roots.append({
            "root": stem[:-2], # removes the last two chars
            "variant_no": variant_no,
            "stem": stem,
            "weak_only": weak_only,
            "attestation_texts": attestation_texts,
            "language_period": language_period,
        })

In [17]:
import pandas

In [46]:
df_whitney_roots = pandas.DataFrame.from_dict(whitney_roots)
df_whitney_roots.to_csv("data/whitney_roots_ninth_class.csv", index=None)
df_whitney_roots.head()

Unnamed: 0,root,variant_no,stem,weak_only,attestation_texts,language_period
0,i,,inī,True,V.,A. Earlier Language
1,is,,isṇā,False,,A. Earlier Language
2,ubh,,ubhnā,False,V.,A. Earlier Language
3,uṣ,,uṣṇā,False,V.,A. Earlier Language
4,kṣi,,kṣiṇā,False,V.B.,A. Earlier Language


## Annotating Verbal Roots with Rig Veda attestations

In [47]:
!cp data/whitney_roots_ninth_class.csv data/roots_ninth_class.csv

Using Lubotsky's concordance, attestation info is manually added to [data/roots_ninth_class.csv](data/roots_ninth_class.csv).

In [48]:
df_roots = pandas.read_csv("data/roots_ninth_class.csv")

In [49]:
df_roots.head(100)

Unnamed: 0,root,variant_no,stem,weak_only,attestation_texts,language_period
0,i,,inī,True,V.,A. Earlier Language
1,is,,isṇā,False,,A. Earlier Language
2,ubh,,ubhnā,False,V.,A. Earlier Language
3,uṣ,,uṣṇā,False,V.,A. Earlier Language
4,kṣi,,kṣiṇā,False,V.B.,A. Earlier Language
5,gṛ,2.0,gṛṇā,False,V.S.,A. Earlier Language
6,gṛbh,,gṛbhṇā,False,V.B.,A. Earlier Language
7,ju,,junā,False,V.,A. Earlier Language
8,ji,,jinā,False,,A. Earlier Language
9,dṛ,,dṛṇī,True,B.,A. Earlier Language
