# Notebook 1: Cleaning Data

### **Introduction**

The Pali Canon is the foundational collection of texts of the Theravada Buddhist Religion. It contains the oldest known records of the Buddha's teachings, compiled in written form about 500 years after the Buddha's death and maintained orally in the interim. 

The Canon is a diverse collection of works that documents teachings, stories, exclamations, quotes and poetry grouped into 5 separate collections by later compilers. Although it is little known and little studied in the west, the Canon and commentaries on it, form the core of the religion for hundreds of millions of Theravada Buddhists, particularly in South East Asia. 

Although translations from Pali (an ancient Sanskritic language) have existed for over one hundred years, the translations were often made by scholars who were not steeped in the living Buddhist monastic culture and discipline and often by those who did not practice. As such, it is unclear how often early translators had experiential insight into the meaning of often complex phenomena and concepts that are represented in the Canon. These insights are undoubtedly important for accurately representing a dead language (that contains many words/concepts with no direct equivalent in english) and, in turn, for outlining a path of practice to an unconditioned happiness that is as alive today as it was in the time of the Buddha. 

In recent years, however, as a result of an enormous effort by several English-speaking Buddhist monks, a large portion of the Pali Canon has been translated and made available online. The suttas that are the data for this project come from www.dhammatalks.org which hosts suttas translated by Ajahn Geoff, a monk of nearly 45 years in the Kammathana (Thai Forest Tradition) lineage. He has significant experience in translating both from Pali and Thai and is an inspiring monk in conduct and learnedness. 

### **Problem Statement**

The purpose of this project is two-fold:
1. To do significant, public-facing, Natural Language Processing analysis on the Pali Canon. An investigation like this, at this scale, has, to my knowledge, never been conducted before. Given the recency of the availability of strong English translations of the Canon coupled with  fairly recent advances in Machine Learning algorithms that will be employed, the absence of an existing analyis at this level is less surprising than it might initially appear. Furthermore, the cross-section of lay-Theravadan Buddhists (non-monks) who are dedicated to reading the original texts (not 'Dhamma' books by other lay-Buddhist 'Dhamma teachers'), and people with an understanding of the tools needed to do this analyis probably yields quite a small number of people. 

2. To develop a recommendation algorithm for suttas that could be used to support the development of particular mental qualities, themes and understandings within the religion. One could consider this to be a sort of 'Netflix' for information on how to develop along a path to an unconditioned happiness. For a rough understanding of the functionality, one can imagine a scenario where a user would input a particular theme, eg 'Generosity', and be recommended a number of suttas that deal with the subject and a number of closely related topics that might also be worth exploring.

### **Technical Introduction to Notebook One**

As mentioned above, the data for this project comes from roughly 30 scrapes of the website www.dhammatalks.org using the [Octoparse](www.octoparse.com) webscraping tool. The decision to not code this scrape and to use a tool instead was made given the structure of the website and the variety of ways that the text is presented on the website. 

The dataframes that are being created in this notebook number thirteen in total: one for each of the four 'stand-alone' compilations; seven additional dataframes for each of the seven sub-collections of the fifth compilation (Khuddaka Nikaya); one for the entirety of the Khuddaka Nikaya; lastly, one for all five compilations. 

# **Merging the first four collections: MN, DN, SN, AN**

These are fairly straightforward to work with as they are all in roughly the same format.

#### Imports

In [1]:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## 1. Majjhima Nikaya (MN)

In [2]:
df_mnf = pd.read_csv('./sutta_csv/mn_full.csv')
df_mnf.columns = df_mnf.columns.str.lower()
df_mnf.columns = df_mnf.columns.str.replace(" ", "_")

df_mnt = pd.read_csv('./sutta_csv/mn_text_2.csv')
df_mnt.columns = df_mnt.columns.str.lower()
df_mnt.columns = df_mnt.columns.str.replace(" ", "_")

In [3]:
df_mnf.head(0)

Unnamed: 0,title,title_url,ref,summary


In [4]:
df_mnt.head(0)

Unnamed: 0,sutta_text,title_url


In [5]:
df_mn = pd.merge(left = df_mnf, right = df_mnt, on='title_url')

In [6]:
df_mn.head(0)

Unnamed: 0,title,title_url,ref,summary,sutta_text


## 2. Digha Nikaya (DN)

In [7]:
df_dnf = pd.read_csv('./sutta_csv/dn_full.csv')
df_dnf.columns = df_dnf.columns.str.lower()
df_dnf.columns = df_dnf.columns.str.replace(" ", "_")

df_dnt = pd.read_csv('./sutta_csv/dn_text_2.csv')
df_dnt.columns = df_dnt.columns.str.lower()
df_dnt.columns = df_dnt.columns.str.replace(" ", "_")

In [8]:
df_dnf.head(0)

Unnamed: 0,title,title_url,ref,summary


In [9]:
df_dnt.head(0)

Unnamed: 0,sutta_text,title_url


In [10]:
df_dn = pd.merge(left = df_dnf, right = df_dnt, on='title_url')

In [11]:
df_dn.head(0)

Unnamed: 0,title,title_url,ref,summary,sutta_text


## 3. Samyutta Nikaya (SN)

In [12]:
df_snf = pd.read_csv('./sutta_csv/sn_full.csv')
df_snf.columns = df_snf.columns.str.lower()
df_snf.columns = df_snf.columns.str.replace(" ", "_")

df_snt = pd.read_csv('./sutta_csv/sn_text_2.csv')
df_snt.columns = df_snt.columns.str.lower()
df_snt.columns = df_snt.columns.str.replace(" ", "_")

In [13]:
df_snf.head(0)

Unnamed: 0,title,title_url,ref,summary


In [14]:
df_snt.head(0)

Unnamed: 0,sutta_text,title_url


In [15]:
df_sn = pd.merge(left = df_snf, right = df_snt, on='title_url')

In [16]:
df_sn.head(0)

Unnamed: 0,title,title_url,ref,summary,sutta_text


## 4. Anguttara Nikaya (AN)

In [17]:
df_anf = pd.read_csv('./sutta_csv/an_full.csv')
df_anf.columns = df_anf.columns.str.lower()
df_anf.columns = df_anf.columns.str.replace(" ", "_")

df_ant = pd.read_csv('./sutta_csv/an_text_2.csv')
df_ant.columns = df_ant.columns.str.lower()
df_ant.columns = df_ant.columns.str.replace(" ", "_")

In [18]:
df_anf.head(0)

Unnamed: 0,title,title_url,ref,summary


In [19]:
df_ant.head(0)

Unnamed: 0,sutta_text,title_url


In [20]:
df_an = pd.merge(left = df_anf, right = df_ant, on='title_url')

In [21]:
df_an.head(0)

Unnamed: 0,title,title_url,ref,summary,sutta_text


# **Merging the Khuddaka Nikaya (KN)**

Khuddaka Nikaya was much harder to scrape from the website. It is composed of 7 smaller collections, each with its own formatting. The content ranges from poetry to exclamations, quotations and stories. 

## KN 01 - Khuddakapatha (Khp)

In [22]:
df_kkhpf = pd.read_csv('./sutta_csv/kn/01_kn_khp_full.csv')
df_kkhpf.columns = df_kkhpf.columns.str.lower()
df_kkhpf.columns = df_kkhpf.columns.str.replace(" ", "_")

df_kkhpt = pd.read_csv('./sutta_csv/kn/01_kn_khp_text.csv')
df_kkhpt.columns = df_kkhpt.columns.str.lower()
df_kkhpt.columns = df_kkhpt.columns.str.replace(" ", "_")

In [23]:
df_kkhpf.head(1)

Unnamed: 0,title,ref,title_url,summary
0,Introduction,Introduction,https://www.dhammatalks.org/suttas/KN/Khp/khpintroduction.html,Introduction


In [24]:
df_kkhpt.head(1)

Unnamed: 0,title_url,sutta_text
0,https://www.dhammatalks.org/suttas/KN/Khp/khp1.html,Khp 1. Saraṇagamana — Going for RefugeNavigationSuttas/KN/Khp/1\n I go to the Buddha for refuge.\n I go to the Dhamma for refuge.\n I go to the Saṅgha for refuge.\n A second time I go to the Buddha for refuge.\n A second time I go to the Dhamma for refuge.\n A second time I go to the Saṅgha for refuge.\n A third time I go to the Buddha for refuge.\n A third time I go to the Dhamma for refuge.\n A third time I go to the Saṅgha for refuge.


In [25]:
df_kkhp = pd.merge(left = df_kkhpf, right = df_kkhpt, on='title_url')

In [26]:
df_kkhp.head(1)

Unnamed: 0,title,ref,title_url,summary,sutta_text
0,Khp 1 Saraṇagamana | Going for Refuge,Khp 1,https://www.dhammatalks.org/suttas/KN/Khp/khp1.html,Khp 1 Saraṇagamana | Going for Refuge — The standard passage for taking refuge.,Khp 1. Saraṇagamana — Going for RefugeNavigationSuttas/KN/Khp/1\n I go to the Buddha for refuge.\n I go to the Dhamma for refuge.\n I go to the Saṅgha for refuge.\n A second time I go to the Buddha for refuge.\n A second time I go to the Dhamma for refuge.\n A second time I go to the Saṅgha for refuge.\n A third time I go to the Buddha for refuge.\n A third time I go to the Dhamma for refuge.\n A third time I go to the Saṅgha for refuge.


## KN 02 - Dhammapada (Dhp)

In [27]:
df_kdhpf = pd.read_csv('./sutta_csv/kn/02_kn_dhp_full.csv')
df_kdhpf.columns = df_kdhpf.columns.str.lower()
df_kdhpf.columns = df_kdhpf.columns.str.replace(" ", "_")

df_kdhpt = pd.read_csv('./sutta_csv/kn/02_kn_dhp_text.csv')
df_kdhpt.columns = df_kdhpt.columns.str.lower()
df_kdhpt.columns = df_kdhpt.columns.str.replace(" ", "_")

In [28]:
df_kdhpf.head(1)

Unnamed: 0,field1,field2_links
0,Preface,https://www.dhammatalks.org/suttas/KN/Dhp/preface.html


In [29]:
df_kdhpt.head(0)

Unnamed: 0,url,text


In [30]:
df_kdhp = pd.merge(left = df_kdhpf, right = df_kdhpt, right_on='url', left_on = 'field2_links')

In [31]:
df_kdhp.head(0)

Unnamed: 0,field1,field2_links,url,text


In [32]:
## Dropping
df_kdhp = df_kdhp.drop(columns = 'field2_links')

In [33]:
## Renaming
df_kdhp = df_kdhp.rename(columns = {
                            "field1": "title",
                              'url': 'title_url',
                                'text': 'sutta_text'
                               })


In [34]:
df_kdhp.head(1)

Unnamed: 0,title,title_url,sutta_text
0,Ch. 1 Pairs,https://www.dhammatalks.org/suttas/KN/Dhp/Ch01.html,"\n Dhp I : PairsNavigationSuttas/KN/Dhp/1\n \n Phenomena are\n preceded by the heart,\n ruled by the heart,\n made of the heart.\n If you speak or act\n with a corrupted heart,\n then suffering follows you –\n as the wheel of the cart,\n the track of the ox\n that pulls it.\n \n \n Phenomena are\n preceded by the heart,\n ruled by the heart,\n made of the heart.\n If you speak or act\n with a calm, bright heart,\n then happiness follows you,\n like a shadow\n that never leaves.\n \n 1-2*\n \n ‘He insulted me,\n hit me,\n beat me,\n robbed me’\n –for those who brood on this,\n hostility isn’t stilled.\n \n \n ‘He insulted me,\n hit me,\n beat me,\n robbed me’–\n for those who don’t brood on this,\n hostility is stilled.\n \n \n Hostilities aren’t stilled\n through hostility,\n regardless.\n Hostilities are stilled\n through non-hostility:\n this, an unending truth.\n \n \n Unlike those who don’t realize\n that we’re here on the verge\n of perishing,\n those who do:\n their quarrels are stilled.\n \n 3-6\n \n One who stays focused on the beautiful,\n is unrestrained with the senses,\n knowing no moderation in food,\n apathetic, unenergetic:\n Mara overcomes him\n as the wind, a weak tree.\n \n \n One who stays focused on the foul,\n is restrained with regard to the senses,\n knowing moderation in food,\n full of conviction & energy:\n Mara does not overcome him\n as the wind, a mountain of rock.\n \n 7-8*\n \n He who,\n depraved,\n devoid\n of truthfulness\n & self-control,\n puts on the ochre robe,\n doesn’t deserve the ochre robe.\n \n \n But he who is free\n of depravity\n endowed\n with truthfulness\n & self-control,\n well-established\n in the precepts,\n truly deserves the ochre robe.\n \n 9-10\n \n Those who regard\n non-essence as essence\n and see essence as non-,\n don’t get to the essence,\n ranging about in wrong resolves.\n \n \n But those who know\n essence as essence,\n and non-essence as non-,\n get to the essence,\n ranging about in right resolves.\n \n 11-12*\n \n As rain seeps into\n an ill-thatched hut,\n so passion,\n the undeveloped mind.\n \n \n As rain doesn’t seep into\n a well-thatched hut,\n so passion does not,\n the well-developed mind.\n \n 13-14\n \n Here\n he grieves\n he grieves\n hereafter.\n In both worlds\n the wrong-doer grieves.\n He grieves, he’s afflicted,\n seeing the corruption\n of his deeds.\n \n \n Here\n he rejoices\n he rejoices\n hereafter.\n In both worlds\n the merit-maker rejoices.\n He rejoices, is jubilant,\n seeing the purity\n of his deeds.\n \n \n Here\n he’s tormented\n he’s tormented\n hereafter.\n In both worlds\n the wrong-doer’s tormented.\n He’s tormented at the thought,\n ‘I’ve done wrong.’\n Having gone to a bad destination,\n he’s tormented\n all the more.\n \n \n Here\n he delights\n he delights\n hereafter.\n In both worlds\n the merit-maker delights.\n He delights at the thought,\n ‘I’ve made merit.’\n Having gone to a good destination,\n he delights\n all the more.\n \n 15-18*\n \n If he recites many teachings, but\n –heedless man–\n doesn’t do what they say,\n like a cowherd counting the cattle of\n others,\n he has no share in the contemplative life.\n \n \n If he recites next to nothing\n but follows the Dhamma\n in line with the Dhamma;\n abandoning passion,\n aversion, delusion;\n alert,\n his mind well released,\n not clinging\n either here or hereafter:\n he has his share in the contemplative life.\n \n 19-20\n"


In [35]:
## Create new column with Dhp + Everything before the colon in the title column just so that I have a ref value.

## KN 03 - Udana (Ud)

In [36]:
df_kudf = pd.read_csv('./sutta_csv/kn/03_kn_ud_full.csv')
df_kudf.columns = df_kudf.columns.str.lower()
df_kudf.columns = df_kudf.columns.str.replace(" ", "_")

df_kudt = pd.read_csv('./sutta_csv/kn/03_kn_ud_text.csv.csv')
df_kudt.columns = df_kudt.columns.str.lower()
df_kudt.columns = df_kudt.columns.str.replace(" ", "_")

In [37]:
df_kudf.head(1)

Unnamed: 0,title,title_url,field
0,Ud 1:1 Bodhi Sutta | Awakening (1),https://www.dhammatalks.org/suttas/KN/Ud/ud1_1.html,Ud 1:1


In [38]:
df_kudt.head(0)

Unnamed: 0,url,text


In [39]:
df_kud = pd.merge(left = df_kudf, right = df_kudt, right_on='url', left_on = 'title_url')

In [40]:
df_kud.head(0)

Unnamed: 0,title,title_url,field,url,text


In [41]:
## Dropping
df_kud = df_kud.drop(columns = 'url')

In [42]:
## Renaming
df_kud = df_kud.rename(columns = {
                            "field": "ref",
                            'text': 'sutta_text'
                               })


In [43]:
df_kud.head(0)

Unnamed: 0,title,title_url,ref,sutta_text


## KN 04 - Itivuttaka (Ud)

In [44]:
df_kitif = pd.read_csv('./sutta_csv/kn/04_kn_iti_full.csv')
df_kitif.columns = df_kitif.columns.str.lower()
df_kitif.columns = df_kitif.columns.str.replace(" ", "_")

df_kitit = pd.read_csv('./sutta_csv/kn/04_kn_iti_text.csv.csv')
df_kitit.columns = df_kitit.columns.str.lower()
df_kitit.columns = df_kitit.columns.str.replace(" ", "_")

In [45]:
df_kitif.head(1)

Unnamed: 0,field1,field2_links
0,"Iti 1 — Abandon greed, and you’re guaranteed non-return.",https://www.dhammatalks.org/suttas/KN/Iti/iti1.html


In [46]:
df_kitit.head(0)

Unnamed: 0,url,text


In [47]:
df_kiti = pd.merge(left = df_kitif, right = df_kitit, right_on='url', left_on = 'field2_links')

In [48]:
df_kiti.head(0)

Unnamed: 0,field1,field2_links,url,text


In [49]:
## Dropping
df_kiti = df_kiti.drop(columns = 'field2_links')

In [50]:
## Renaming
df_kiti = df_kiti.rename(columns = {
                            "url": "title_url",
                            "field1": "title",
                            'text': 'sutta_text'
                               })


In [51]:
df_kiti.head(0)

Unnamed: 0,title,title_url,sutta_text


## KN 05 - Sutta Nipata (Stnp)

In [52]:
df_kstnpf = pd.read_csv('./sutta_csv/kn/05_kn_stnp_full.csv')
df_kstnpf.columns = df_kstnpf.columns.str.lower()
df_kstnpf.columns = df_kstnpf.columns.str.replace(" ", "_")

df_kstnpt = pd.read_csv('./sutta_csv/kn/05_kn_stnp_text.csv.csv')
df_kstnpt.columns = df_kstnpt.columns.str.lower()
df_kstnpt.columns = df_kstnpt.columns.str.replace(" ", "_")

In [53]:
df_kstnpf.head(0)

Unnamed: 0,title,title_url,field,unnamed:_3


In [54]:
df_kstnpt.head(0)

Unnamed: 0,url,text


In [55]:
df_kiti = pd.merge(left = df_kstnpf, right = df_kstnpt, right_on='url', left_on = 'title_url')

In [56]:
df_kiti.head(1)

Unnamed: 0,title,title_url,field,unnamed:_3,url,text
0,Sn 1:1 The Snake,https://www.dhammatalks.org/suttas/KN/StNp/StNp1_1.html,Sn 1:1,"Sn 1:1 The Snake — One who advances far along the path sloughs off the near shore and far, like a snake who sloughs off its skin.",https://www.dhammatalks.org/suttas/KN/StNp/StNp1_1.html,"\n \n 1:1 The SnakeNavigationSuttas/KN/Sn/1:1\n Alternative versions of this poem—a Sanskrit version included in the Udānavarga, and a Gāndhārī version included in the Gāndhārī Dharmapada—have many of the same verses included here, but arranged in a different order. This suggests that the verses originally may have been separate poems, spoken on separate occasions, and that they were gathered together because they share the same refrain.\n \n The monk who subdues his arisen anger\n as, with herbs, snake-venom once it has spread,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n The monk who has cut off passion\n without leaving a trace,\n as he would, plunging into a lake, a lotus,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who has cut off craving\n without leaving a trace,\n drying up the swift-flowing flood,1\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who has uprooted conceit\n without leaving a trace,\n as a great flood, a very weak bridge made of reeds,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk seeing\n in states of becoming\n no essence,\n as he would,\n when examining fig trees,\n no flowers,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk with no inner anger,\n who has thus gone beyond\n becoming & not-,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk whose discursive thoughts are dispersed,\n well-dealt with inside\n without leaving a trace,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who hasn’t slipped past or held back,2\n transcending all\n this objectification,3\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who hasn’t slipped past or held back,\n knowing with regard to the world\n that “All this is unreal,”\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who hasn’t slipped past or held back,\n without greed, as “All this is unreal,”\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who hasn’t slipped past or held back,\n without aversion, as “All this is unreal,”\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who hasn’t slipped past or turned back,\n without delusion, as “All this is unreal,”\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk in whom\n there are no obsessions4\n —the roots of unskillfulness totally destroyed—\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk in whom\n there’s nothing born of disturbance5\n that would lead him back to this shore,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk in whom\n there’s nothing born of the underbrush6\n that would act as a cause\n for binding him to becoming,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who’s abandoned five hindrances,\n who, untroubled, de-arrowed,7\n has crossed over doubt,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n vv. 1–17\n \n Notes\n 1. On craving as a flooding river, see Dhp 251, 337, 339–340, and 347.\n 2. See Iti 49.\n 3. On objectification, see Sn 4:11, note 4, and the introduction to MN 18.\n 4. The seven obsessions, listed in AN 7:11, are: sensual passion, resistance, views, uncertainty, conceit, passion for becoming, and ignorance. The relationship of three of these obsessions—the first two and the last—to the three types of feeling is discussed in MN 44. \n 5. Daratha. For a detailed description of the subtleties of disturbance, see MN 121.\n 6. Underbrush stands for desire. See Dhp 344.\n 7. The arrow can stand for becoming, craving, or grief. See SN 36:6, Sn 3:8, Sn 4:15, Dhp 351, Thag 6:13, Thig 3:5, and Thig 6:1.\n \n"


In [57]:
## Dropping
df_kiti = df_kiti.drop(columns = 'url')

In [58]:
## Renaming
df_kiti = df_kiti.rename(columns = {
                            "unnamed:_3": "summary",
                            "field": "ref",
                            'text': 'sutta_text'
                               })


In [59]:
df_kiti.head(0)

Unnamed: 0,title,title_url,ref,summary,sutta_text


In [60]:
df_kiti['ref'] = df_kiti['ref'].str.replace('Sn', 'Stnp')

In [61]:
df_kiti.head(1)

Unnamed: 0,title,title_url,ref,summary,sutta_text
0,Sn 1:1 The Snake,https://www.dhammatalks.org/suttas/KN/StNp/StNp1_1.html,Stnp 1:1,"Sn 1:1 The Snake — One who advances far along the path sloughs off the near shore and far, like a snake who sloughs off its skin.","\n \n 1:1 The SnakeNavigationSuttas/KN/Sn/1:1\n Alternative versions of this poem—a Sanskrit version included in the Udānavarga, and a Gāndhārī version included in the Gāndhārī Dharmapada—have many of the same verses included here, but arranged in a different order. This suggests that the verses originally may have been separate poems, spoken on separate occasions, and that they were gathered together because they share the same refrain.\n \n The monk who subdues his arisen anger\n as, with herbs, snake-venom once it has spread,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n The monk who has cut off passion\n without leaving a trace,\n as he would, plunging into a lake, a lotus,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who has cut off craving\n without leaving a trace,\n drying up the swift-flowing flood,1\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who has uprooted conceit\n without leaving a trace,\n as a great flood, a very weak bridge made of reeds,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk seeing\n in states of becoming\n no essence,\n as he would,\n when examining fig trees,\n no flowers,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk with no inner anger,\n who has thus gone beyond\n becoming & not-,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk whose discursive thoughts are dispersed,\n well-dealt with inside\n without leaving a trace,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who hasn’t slipped past or held back,2\n transcending all\n this objectification,3\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who hasn’t slipped past or held back,\n knowing with regard to the world\n that “All this is unreal,”\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who hasn’t slipped past or held back,\n without greed, as “All this is unreal,”\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who hasn’t slipped past or held back,\n without aversion, as “All this is unreal,”\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who hasn’t slipped past or turned back,\n without delusion, as “All this is unreal,”\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk in whom\n there are no obsessions4\n —the roots of unskillfulness totally destroyed—\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk in whom\n there’s nothing born of disturbance5\n that would lead him back to this shore,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk in whom\n there’s nothing born of the underbrush6\n that would act as a cause\n for binding him to becoming,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n The monk who’s abandoned five hindrances,\n who, untroubled, de-arrowed,7\n has crossed over doubt,\n sloughs off the near shore & far—\n as a snake, its decrepit old skin.\n \n \n vv. 1–17\n \n Notes\n 1. On craving as a flooding river, see Dhp 251, 337, 339–340, and 347.\n 2. See Iti 49.\n 3. On objectification, see Sn 4:11, note 4, and the introduction to MN 18.\n 4. The seven obsessions, listed in AN 7:11, are: sensual passion, resistance, views, uncertainty, conceit, passion for becoming, and ignorance. The relationship of three of these obsessions—the first two and the last—to the three types of feeling is discussed in MN 44. \n 5. Daratha. For a detailed description of the subtleties of disturbance, see MN 121.\n 6. Underbrush stands for desire. See Dhp 344.\n 7. The arrow can stand for becoming, craving, or grief. See SN 36:6, Sn 3:8, Sn 4:15, Dhp 351, Thag 6:13, Thig 3:5, and Thig 6:1.\n \n"


## KN 06 - Theragatha (Thag)

In [62]:
df_kthagf = pd.read_csv('./sutta_csv/kn/06_kn_thag_full.csv')
df_kthagf.columns = df_kthagf.columns.str.lower()
df_kthagf.columns = df_kthagf.columns.str.replace(" ", "_")

df_kthagt = pd.read_csv('./sutta_csv/kn/06_kn_thag_text.csv.csv')
df_kthagt.columns = df_kthagt.columns.str.lower()
df_kthagt.columns = df_kthagt.columns.str.replace(" ", "_")

In [63]:
df_kthagf.head(2)

Unnamed: 0,field1,field2,field3_links
0,Introduction,Introduction,https://www.dhammatalks.org/suttas/KN/Thag/introduction.html
1,Thag 1:1 Subhūti,"Thag 1:1 Subhūti — My hut is well-thatched, so go ahead and rain.",https://www.dhammatalks.org/suttas/KN/Thag/thag1_1.html


In [64]:
df_kthagt.head(0)

Unnamed: 0,url,text


In [65]:
df_kthag= pd.merge(left = df_kthagf, right = df_kthagt, right_on='url', left_on = 'field3_links')

In [66]:
df_kthag.head(1)

Unnamed: 0,field1,field2,field3_links,url,text
0,Thag 1:1 Subhūti,"Thag 1:1 Subhūti — My hut is well-thatched, so go ahead and rain.",https://www.dhammatalks.org/suttas/KN/Thag/thag1_1.html,https://www.dhammatalks.org/suttas/KN/Thag/thag1_1.html,"\n \n Thag 1:1 SubhūtiNavigationSuttas/KN/Thag/1:1\n \n My hut is roofed, comfortable,\n free of drafts;\n my mind, well-centered,\n released.\n I remain ardent.\n So, rain-deva.\n Go ahead & rain.\n \n See also: AN 3:110; Sn 1:1\n"


In [67]:
## Dropping
df_kthag = df_kthag.drop(columns = 'field3_links')

In [68]:
## Renaming
df_kthag = df_kthag.rename(columns = {
                            "field1": "title",
                            "field2": "summary",
                            'text': 'sutta_text',
                            'url': 'title_url'
                               })


In [69]:
df_kthag.head(0)

Unnamed: 0,title,summary,title_url,sutta_text


## KN 07 - Therigatha (Thig)

In [70]:
df_kthigf = pd.read_csv('./sutta_csv/kn/07_kn_thig_full.csv')
df_kthigf.columns = df_kthigf.columns.str.lower()
df_kthigf.columns = df_kthigf.columns.str.replace(" ", "_")

df_kthigt = pd.read_csv('./sutta_csv/kn/07_kn_thig_text.csv')
df_kthigt.columns = df_kthigt.columns.str.lower()
df_kthigt.columns = df_kthigt.columns.str.replace(" ", "_")

In [71]:
df_kthigf.head(2)

Unnamed: 0,field1,field2,field3_links
0,Introduction,Introduction,https://www.dhammatalks.org/suttas/KN/Thag/introduction.html
1,Thig 1:1 An Anonymous Nun,"Thig 1:1 An Anonymous Nun — Passion stilled, like a pot of pickled greens boiled dry.",https://www.dhammatalks.org/suttas/KN/Thig/thig1_1.html


In [72]:
df_kthigt.head(0)

Unnamed: 0,url,text


In [73]:
df_kthig= pd.merge(left = df_kthigf, right = df_kthigt, right_on='url', left_on = 'field3_links')

In [74]:
df_kthig.head(1)

Unnamed: 0,field1,field2,field3_links,url,text
0,Thig 1:1 An Anonymous Nun,"Thig 1:1 An Anonymous Nun — Passion stilled, like a pot of pickled greens boiled dry.",https://www.dhammatalks.org/suttas/KN/Thig/thig1_1.html,https://www.dhammatalks.org/suttas/KN/Thig/thig1_1.html,"\n \n Thig 1:1 An Anonymous NunNavigationSuttas/KN/Thig/1:1\n \n Sleep, little therī, sleep comfortably,\n wrapped in the robe you’ve made,\n for your passion is stilled—\n like a pot of pickled greens\n boiled dry.\n \n"


In [75]:
## Dropping
df_kthig = df_kthig.drop(columns = 'field3_links')

In [76]:
## Renaming
df_kthig = df_kthig.rename(columns = {
                            "field1": "title",
                            "field2": "summary",
                            'text': 'sutta_text',
                            'url': 'title_url'
                               })


In [78]:
df_kthig.head(1)

Unnamed: 0,title,summary,title_url,sutta_text
0,Thig 1:1 An Anonymous Nun,"Thig 1:1 An Anonymous Nun — Passion stilled, like a pot of pickled greens boiled dry.",https://www.dhammatalks.org/suttas/KN/Thig/thig1_1.html,"\n \n Thig 1:1 An Anonymous NunNavigationSuttas/KN/Thig/1:1\n \n Sleep, little therī, sleep comfortably,\n wrapped in the robe you’ve made,\n for your passion is stilled—\n like a pot of pickled greens\n boiled dry.\n \n"
