# Roots and Stems for Ninth Class (and Fifth Class)

## Getting Verbal Roots 

We rely on the [list of stems](https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-whitroot/disp/index.php?page=214) given in the index of Whitney's Sanskrit Roots and Verb-forms.

In [1]:
!mkdir -p downloads
!mkdir -p data

In [None]:
!wget -O downloads/whitney_roots.pdf http://gretil.sub.uni-goettingen.de/gretil_elib/Whi885__Whitney_Roots-ACCENTED.pdf

In [2]:
# install pdftk if not already there. eg: for ubuntu: sudo apt install pdftk
!pdftk downloads/whitney_roots.pdf cat 229 output data/whitney_roots_ninth_class.pdf

# for our control data
!pdftk downloads/whitney_roots.pdf cat 228 output data/whitney_roots_fifth_class.pdf

In [30]:
# produces data/whitney_roots_ninth_class.txt
!pdftotext data/whitney_roots_ninth_class.pdf

# produces data/whitney_roots_fifth_class.txt
!pdftotext data/whitney_roots_fifth_class.pdf

The text version is cleaned up manually, fixing formatting and diacritics.

One extra thing we also do is rewrite a form like _mī̆nā_ as _minā/mīnā_, i.e. re-write the variation in the root vowel as two different stem forms explicitly. This helps us visualize and process the variants easily later (note: whitney has only 3 stems here marked like this -- namely _mī̆nā_, _vlī̆nā_ and _dhū̆nī_ -- so we can get away with doing this manually here easily. If there were a lot of these, we could have automated it)

Final results are in [data/whitney_roots_ninth_class_cleaned.txt](data/whitney_roots_ninth_class_cleaned.txt) and [data/whitney_roots_fifth_class_cleaned.txt](data/whitney_roots_fifth_class_cleaned.txt).

## Parsing Verbal Roots Info

In [3]:
# in the same folder as this notebook
import src.lib.roots as roots

# useful during testing to pick up changes in the file
import importlib
importlib.reload(roots)

whitney_roots = roots.parse_whitney_roots([
    "data/whitney_roots_ninth_class_cleaned.txt",
    "data/whitney_roots_fifth_class_cleaned.txt",
])

## Saving the Results

In [4]:
import pandas

In [None]:
df_whitney_roots = pandas.DataFrame.from_dict(whitney_roots)
df_whitney_roots.to_csv("data/roots.csv", index=None)

## Summary of Results

In [12]:
import pandas
res_df_roots = pandas.read_csv("data/roots.csv", keep_default_na=False)
res_df_roots

Unnamed: 0,root_guess,variant_no,strong_stem,weak_stem,weak_only,attestation_texts,language_period,present_class
0,i 2,,inā,inī,True,V.,Earlier,ninth
1,iṣ,,iṣṇā,iṣṇī,False,,Earlier,ninth
2,ubh,,ubhnā,ubhnī,False,V.,Earlier,ninth
3,uṣ,,uṣṇā,uṣṇī,False,V.,Earlier,ninth
4,kṣi,,kṣiṇā,kṣiṇī,False,V.B.,Earlier,ninth
...,...,...,...,...,...,...,...,...
97,hi,,hino,hinu,False,,Earlier & Later,fifth
98,ci,2,cino,cinu,False,,Later,fifth
99,jagh,,jaghno,jaghnu,False,C.,Later,fifth
100,ti,,tino,tinu,False,C.,Later,fifth


In [13]:
res_df_roots.groupby(["present_class", "language_period"]).size().to_frame("count").reset_index()

Unnamed: 0,present_class,language_period,count
0,fifth,Earlier,24
1,fifth,Earlier & Later,21
2,fifth,Later,4
3,ninth,Earlier,31
4,ninth,Earlier & Later,17
5,ninth,Later,5


For Rigveda, we don't expect to find the roots marked as "Later" in the language period.