# Analysis of an Obsidian Vault
**Michael Gunn**
The purpose of this paper is to analyze any Obsidian Vault of any size or kind. This exploration is relevant to the production of a Retrieval-Augmented Generation (RAG) Large Language Model (LLM).
The contents of the vault are stored in a vector database, which is easily retrievable for a typical LLM. This is meant to prevent hallucination.
The Obsidian Vault is herein analyzed like any corpus of text data. The analysis will end with a transformation into vectors, as preparation for storing the corpus in a database.

In [1]:
import os
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from textblob import TextBlob

In [2]:
# Find parent dir; open Bible from data dir.
cwd = os.getcwd()
p2cwd = os.path.dirname(cwd)
kjv_df = pd.read_csv(os.path.join(p2cwd, 'data','en_kjv.csv'))
print(kjv_df.head())

   Unnamed: 0 language translation book  chapter  verse  \
0           0       en         kjv  Gen        1      1   
1           1       en         kjv  Gen        1      2   
2           2       en         kjv  Gen        1      3   
3           3       en         kjv  Gen        1      4   
4           4       en         kjv  Gen        1      5   

                                                text  
0  In the beginning God created the heaven and th...  
1  And the earth was without form, and void; and ...  
2  And God said, Let there be light: and there wa...  
3  And God saw the light, that it was good: and G...  
4  And God called the light Day, and the darkness...  


In [3]:
# Preprocess text data
sentiments = []
for index, row in kjv_df.iterrows():
	book = row['book']
	chapter = int(row['chapter'])
	verse = int(row['verse'])
	text = row['text']
	blob = TextBlob(text)
	sentiment_result = blob.sentiment
kjv_df.insert(0, 'sentiment', sentiments)
print(kjv_df.head())
print(kjv_df.tail())

ValueError: Length of values (0) does not match length of index (31102)

## Part of Speech Tagging
- Breakdown of parts of speech for entire Bible.
- Number of distinct verbs, with frequencies.

## Sentiment Analysis
- What is the overall sentiment of The Book of Isaiah by chapter (averaging by verse)?
- What is the overall sentiment of the Bible by book type (Law, Prophet, Writing, Gospel, Epistle, Revelation)?