# Step:1 - Data Ingestion 

In [2]:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("speech.txt")

In [3]:
loader 

<langchain_community.document_loaders.text.TextLoader at 0x1070942d0>

In [4]:
text_documents = loader.load()

In [5]:
text_documents

[Document(metadata={'source': 'speech.txt'}, page_content='History, a journey through time, is more than just a collection of dates and events; it\'s the story of humanity, shaping our present and influencing our future. By studying history, we gain a deeper understanding of ourselves, our cultures, and the interconnectedness of the world. \nHere\'s why history is important and how it can inform our lives:\n1. Understanding Our Past: History provides context and allows us to trace the origins of our societies, cultures, and institutions. It helps us understand how we got to where we are today. \n2. Learning from Mistakes: By examining past mistakes, we can learn from them and avoid repeating them in the future. History teaches us about the consequences of our actions and decisions. \n3. Developing Critical Thinking: Studying history encourages critical thinking and analytical skills as we evaluate sources, interpret events, and form our own conclusions. \n4. Fostering Cross-Cultural Aw

In [6]:
# Web based loader 
from langchain_community.document_loaders import WebBaseLoader
import bs4

# Load, Chunk and index the content of the Html page 
loader = WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent",),
                      bs_kwargs=dict(parse_only=bs4.SoupStrainer(
                          class_=("post-title", "post-content", "post-header")
                      )))

text_documents = loader.load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [7]:
text_documents

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes

In [9]:
# PDF Reader 
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader('Python_Refresher_2.pdf')

In [11]:
docs = loader.load()
docs

[Document(metadata={'source': 'Python_Refresher_2.pdf', 'page': 0}, page_content='Python Refresher\nWhy choose Python\nIn this section, we will learn why Python is a popular and powerful choice for data\nscience.\nVariables, Data Types, and Typecasting\nIn this section, we will learn about variables, data types, and typecasting in Python\nto store and convert data effectively. ## Variables\nContainers for storing data values.\nNo need to declare data type explicitly.\nData Types\nType Example Description\nint 10, -5 Integer numbers\nfloat 3.14, -0.5 Decimal numbers\nstr "hello" Text (string)\nbool True, False Boolean values\nlist [1, 2, 3] Ordered, mutable collection\ntuple (1, 2, 3) Ordered, immutable collection\n• \n• \nname ="Alice"\nage =25\nis_student =True\n'),
 Document(metadata={'source': 'Python_Refresher_2.pdf', 'page': 1}, page_content='Type Example Description\ndict {"a": 1} Key-value pairs\nTypecasting (Type Conversion)\nConvert data from one type to another using built-in

# Step 2 - Transform 

In [13]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(docs)

In [15]:
documents[:5]

[Document(metadata={'source': 'Python_Refresher_2.pdf', 'page': 0}, page_content='Python Refresher\nWhy choose Python\nIn this section, we will learn why Python is a popular and powerful choice for data\nscience.\nVariables, Data Types, and Typecasting\nIn this section, we will learn about variables, data types, and typecasting in Python\nto store and convert data effectively. ## Variables\nContainers for storing data values.\nNo need to declare data type explicitly.\nData Types\nType Example Description\nint 10, -5 Integer numbers\nfloat 3.14, -0.5 Decimal numbers\nstr "hello" Text (string)\nbool True, False Boolean values\nlist [1, 2, 3] Ordered, mutable collection\ntuple (1, 2, 3) Ordered, immutable collection\n• \n• \nname ="Alice"\nage =25\nis_student =True'),
 Document(metadata={'source': 'Python_Refresher_2.pdf', 'page': 1}, page_content='Type Example Description\ndict {"a": 1} Key-value pairs\nTypecasting (Type Conversion)\nConvert data from one type to another using built-in f

In [17]:
documents

[Document(metadata={'source': 'Python_Refresher_2.pdf', 'page': 0}, page_content='Python Refresher\nWhy choose Python\nIn this section, we will learn why Python is a popular and powerful choice for data\nscience.\nVariables, Data Types, and Typecasting\nIn this section, we will learn about variables, data types, and typecasting in Python\nto store and convert data effectively. ## Variables\nContainers for storing data values.\nNo need to declare data type explicitly.\nData Types\nType Example Description\nint 10, -5 Integer numbers\nfloat 3.14, -0.5 Decimal numbers\nstr "hello" Text (string)\nbool True, False Boolean values\nlist [1, 2, 3] Ordered, mutable collection\ntuple (1, 2, 3) Ordered, immutable collection\n• \n• \nname ="Alice"\nage =25\nis_student =True'),
 Document(metadata={'source': 'Python_Refresher_2.pdf', 'page': 1}, page_content='Type Example Description\ndict {"a": 1} Key-value pairs\nTypecasting (Type Conversion)\nConvert data from one type to another using built-in f

# Step 3 - Vector Embedding and Vector store
Vector Embedding is a technique of converting the text to vectors

In [34]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
db = Chroma.from_documents(documents[:10], OllamaEmbeddings())

# Vector Database

In [44]:
query = "String and String Methods"
result = db.similarity_search(query)

In [52]:
result[0].page_content

'Python Refresher\nWhy choose Python\nIn this section, we will learn why Python is a popular and powerful choice for data\nscience.\nVariables, Data Types, and Typecasting\nIn this section, we will learn about variables, data types, and typecasting in Python\nto store and convert data effectively. ## Variables\nContainers for storing data values.\nNo need to declare data type explicitly.\nData Types\nType Example Description\nint 10, -5 Integer numbers\nfloat 3.14, -0.5 Decimal numbers\nstr "hello" Text (string)\nbool True, False Boolean values\nlist [1, 2, 3] Ordered, mutable collection\ntuple (1, 2, 3) Ordered, immutable collection\n• \n• \nname ="Alice"\nage =25\nis_student =True'