In [2]:
# data ingestion
from langchain_community.document_loaders import TextLoader
loader = TextLoader("india.txt")
text_document = loader.load()
text_document

[Document(page_content="India, officially the Republic of India (ISO: Bhārat Gaṇarājya),[21] is a country in South Asia. It is the seventh-largest country by area; the most populous country as of June 2023;[22][23] and from the time of its independence in 1947, the world's most populous democracy.[24][25][26] Bounded by the Indian Ocean on the south, the Arabian Sea on the southwest, and the Bay of Bengal on the southeast, it shares land borders with Pakistan to the west;[j] China, Nepal, and Bhutan to the north; and Bangladesh and Myanmar to the east. In the Indian Ocean, India is in the vicinity of Sri Lanka and the Maldives; its Andaman and Nicobar Islands share a maritime border with Thailand, Myanmar, and Indonesia.\n\nModern humans arrived on the Indian subcontinent from Africa no later than 55,000 years ago.[27][28][29] Their long occupation, initially in varying forms of isolation as hunter-gatherers, has made the region highly diverse, second only to Africa in human genetic di

In [3]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ["OPENAI_API_KEY"] = os.getenv("openai_api_key")

In [4]:
# web based loader
from langchain_community.document_loaders import WebBaseLoader
import bs4

# load,chunk and index the content of the html page
loader=WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
                     bs_kwargs=dict(parse_only=bs4.SoupStrainer(
                         class_=("post-title","post-content","post-header"),
                     )))

text_documents=loader.load()

In [5]:
text_documents

[Document(page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final re

In [6]:
## Pdf reader
from langchain_community.document_loaders import PyPDFLoader
loader=PyPDFLoader("PPP.pdf")
docs=loader.load()

In [7]:
docs

[Document(page_content='Melting and Pouring in \nCasting', metadata={'source': 'PPP.pdf', 'page': 0}),
 Document(page_content='Basic requirements of casting process\nMelting is a process of preparing the molten material for casting . The molten metal \nis transported to the pouring area by ladles wherein the molds are filled.', metadata={'source': 'PPP.pdf', 'page': 1}),
 Document(page_content='Melting and pouring\n•The quality of casting depends on the method of melting\n•The melting technique provides the molten metal required temperature , which \nensure flow of material in every corner of the mould cavity and maintain the quality \nof the product\n•Molten metal should prevented from oxidation by covering it with fluxes or by \nperforming melting and pouring in vacuum\n•Pouring temperature and pouring rate should be maintained properly\n•At higher temperature, fluidity of molten metal is more but dissolved more gases\n•The fluidity of the material able to fill the whole cavity\n•Hig

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter=RecursiveCharacterTextSplitter(chunk_size=200,chunk_overlap=40)
documents=text_splitter.split_documents(docs)

In [18]:
documents

[Document(page_content='Melting and Pouring in \nCasting', metadata={'source': 'PPP.pdf', 'page': 0}),
 Document(page_content='Basic requirements of casting process\nMelting is a process of preparing the molten material for casting . The molten metal \nis transported to the pouring area by ladles wherein the molds are filled.', metadata={'source': 'PPP.pdf', 'page': 1}),
 Document(page_content='Melting and pouring\n•The quality of casting depends on the method of melting\n•The melting technique provides the molten metal required temperature , which', metadata={'source': 'PPP.pdf', 'page': 2}),
 Document(page_content='ensure flow of material in every corner of the mould cavity and maintain the quality \nof the product\n•Molten metal should prevented from oxidation by covering it with fluxes or by', metadata={'source': 'PPP.pdf', 'page': 2}),
 Document(page_content='performing melting and pouring in vacuum\n•Pouring temperature and pouring rate should be maintained properly\n•At higher t

In [20]:
## Vector Embedding And Vector Store
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
db1 = Chroma.from_documents(documents[:20],OpenAIEmbeddings())

In [21]:
db1

<langchain_community.vectorstores.chroma.Chroma at 0x26be469db80>

In [25]:
query = "What is cupola furnace?"
retireved_results=db1.similarity_search(query)
print(retireved_results[0].page_content)

•The cupola consists of a vertical cylindrical steel 
sheet, lined inside with refractory bricks . The lining 
is thicker in the lower portion (temperatures are 
higher than upper portion)


In [26]:
## FAISS Vector Database
from langchain_community.vectorstores import FAISS
db2 = FAISS.from_documents(documents[:20], OpenAIEmbeddings())

In [27]:
query = "What is cupola furnace?"
retireved_results=db2.similarity_search(query)
print(retireved_results[0].page_content)

•The cupola consists of a vertical cylindrical steel 
sheet, lined inside with refractory bricks . The lining 
is thicker in the lower portion (temperatures are 
higher than upper portion)


In [33]:
## LanceDB Vector Database
from langchain_community.vectorstores import LanceDB
db3 = LanceDB.from_documents(documents[:20], OpenAIEmbeddings())

In [34]:
query = "What is cupola furnace?"
retireved_results=db3.similarity_search(query)
print(retireved_results[0].page_content)

•The cupola consists of a vertical cylindrical steel 
sheet, lined inside with refractory bricks . The lining 
is thicker in the lower portion (temperatures are 
higher than upper portion)
