# LangChain with data loaded from Obsidian 

[Obsidian](https://obsidian.md/) is a powerful and extensible knowledge base that works on top of your local folder of plain text files.   
Obsidian store Markdown files on disk as a folder. For more details see [how Obsidian stores data](
https://help.obsidian.md/Files+and+folders/How+Obsidian+stores+data#:~:text=Obsidian%20stores%20your%20notes%20as,file%20system%2C%20including%20any%20subfolders).  


LangChain allow apply prompt to documents. It owns documents loader including for obsidian.   
For more detail see the [documentation](https://python.langchain.com/docs/integrations/document_loaders/obsidian). 




In [53]:
# load envirenmental variables from .env file 
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) 

In [54]:
import os

documents_path = os.path.join(os.getenv('USERPROFILE'), 'Documents')
obsidian_path = os.path.join(documents_path, 'Obsidian Vault')

obsidian_path

'C:\\Users\\otols\\Documents\\Obsidian Vault'

In [55]:
assert os.path.exists(obsidian_path), f"Obsidian path does not exist: {obsidian_path}"


In [60]:
from langchain.document_loaders import ObsidianLoader

loader = ObsidianLoader(obsidian_path, encoding='UTF-8', collect_metadata=False) 
docs = loader.load()

docs[:3]


[Document(page_content="\n\nI encourage you to specify why you're interested in the company (scaleway)", metadata={'source': 'Conseils RH.md', 'path': 'C:\\Users\\otols\\Documents\\Obsidian Vault\\Conseils RH.md', 'created': 1694683825.1194148, 'last_modified': 1694683862.9424913, 'last_accessed': 1696340970.6644685}),
 Document(page_content='\ndispensing with  - обходясь без', metadata={'source': 'English.md', 'path': 'C:\\Users\\otols\\Documents\\Obsidian Vault\\English.md', 'created': 1693818242.0307033, 'last_modified': 1693818259.6911194, 'last_accessed': 1696340970.6644685}),
 Document(page_content=' AWS Summit,\n Pycon \n HashiConf', metadata={'source': "Evenements d'ingénérie.md", 'path': "C:\\Users\\otols\\Documents\\Obsidian Vault\\Evenements d'ingénérie.md", 'created': 1694507620.1801715, 'last_modified': 1695743942.5252457, 'last_accessed': 1696340970.6650107})]

Create your index

In [52]:
%pip install chromadb
%pip install tiktoken 

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\otols\Desktop\dev\PromptEngineering\venv\Scripts\python.exe -m pip install --upgrade pip' command.


Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\otols\Desktop\dev\PromptEngineering\venv\Scripts\python.exe -m pip install --upgrade pip' command.


In [None]:
from langchain.indexes import VectorstoreIndexCreator

# Create a vectorstore index from loaders
index = VectorstoreIndexCreator().from_loaders([loader])

In [111]:
query = "What are the limits I have listed on my note about LangChain?"

index.query_with_sources(query)

{'question': 'What are the limits I have listed on my note about LangChain?',
 'answer': ' The limits listed on the note about LangChain are: Large Language Model (LLM), two types of LLM (base and instructions tuned), Transformers, predicting tokens in a chain, multi-modal LLM trained on video and images, API with temperature parameter, and limiting the number of tokens.\n',
 'sources': 'Maximiser sa productivité avec les LLM et Chat GPT.md'}

In [113]:
query = "Summarize my thoughts on my note?"
query = "Give me the main points of my note about Maximiser sa productivité avec les LLM et Chat GPT?"
index.query_with_sources(query)

{'question': 'Give me the main points of my note about Maximiser sa productivité avec les LLM et Chat GPT?',
 'answer': " The main points of the note about Maximiser sa productivité avec les LLM et Chat GPT are: Large Language Models (LLM) use a lot of data and scientific review to generate text using the same logic; LLM has two types, base and instructions tuned; Chat GPT is limited; it is important to have keywords; there are 6 alternatives to Chat GPT; a data scientist junior needs to generate a motivation letter using their competences and the elements mentioned; multi-modal LLM can be trained on video and images; the API can be used to manage the parameters; the note also mentions Kafka + RabbitMQ, a GCP project, Sopra Steria, Google's vector search technology, HuggingFace datasets, Langchain, Open AI, and hallucination models.\n",
 'sources': 'Maximiser sa productivité avec les LLM et Chat GPT.md'}