RAG Pipeline with Vector Database

Data Ingestion

In [1]:
# Data Ingestion
from langchain_community.document_loaders import TextLoader

loader = TextLoader('Advanced LMS.txt')

In [2]:
docs = loader.load()
docs

[Document(metadata={'source': 'Advanced LMS.txt'}, page_content="Project Idea: Advanced Learning Management System (LMS)\nDescription: The Advanced Learning Management System (LMS) is a comprehensive platform designed to facilitate modern educational environments by supporting multiple user roles and integrating a variety of collaboration, communication, and learning tools. It goes beyond traditional LMS features to incorporate real-time interaction, multimedia content management, and robust course management functionalities. The LMS ensures a scalable solution that can accommodate institutions of all sizes, offering a centralized system for managing learning activities, student progress, and administrative operations.\n\nKey Features:\nUser Role Management:\n\nAdmin Role:\nFull control over users, courses, and platform operations.\nManage user roles (admins, teachers, students).\nMonitor platform metrics, including student performance, video conference attendance, and collaboration us

In [3]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["AZURE_OPENAI_ENDPOINT"] = os.getenv("AZURE_OPENAI_ENDPOINT")
os.environ["AZURE_OPENAI_API_KEY"] = os.getenv("AZURE_OPENAI_API_KEY")
os.environ["OPENAI_API_VERSION"] = os.getenv("OPENAI_API_VERSION")
os.environ["OPENAI_API_TYPE"] = os.getenv("OPENAI_API_TYPE")

In [1]:
# Web based loader
from langchain_community.document_loaders import WebBaseLoader
import bs4

# Load, chunk and index the content of a html page
loader = WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
                          bs_kwargs=dict(parse_only=bs4.SoupStrainer(
                            class_=("post-title","post-content",) #Name of the classes of contents to be retrieved
                          )))

docs=loader.load()
docs

USER_AGENT environment variable not set, consider setting it to identify your requests.


[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n      LLM Powered Autonomous Agents\n    Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\

In [10]:
# Pdf reader (Including images in the pdf)
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader('CCAG_End-to-End_Point_Cloud_Registration.pdf')
docs=loader.load()
docs

[Document(metadata={'source': 'CCAG_End-to-End_Point_Cloud_Registration.pdf', 'page': 0, 'page_label': '1'}, page_content='IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 1, JANUARY 2024 435\nCCAG: End-to-End Point Cloud Registration\nYong Wang, Pengbo Zhou, Guohua Geng,L iA n , and Yangyang Liu\nAbstract—Point cloud registration is a crucial task in com-\nputer vision and 3D reconstruction, aiming to align multiple point\nclouds to achieve globally consistent geometric structures. How-\never, traditional point cloud registration methods face challenges\nwhen dealing with low overlap and large-scale point cloud data.\nTo overcome these issues, we propose an end-to-end point cloud\nregistration method called CCAG. The CCAG algorithm lever-\nages the Cross-Convolution Attention module, which combines\ncross-attention mechanism and depth-wise separable convolution\nto capture relationships between point clouds and integrate fea-\ntures. Through cross-attention computation, this module e

Split the loaded data

In [11]:
from langchain_text_splitters  import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

chunk_docs = text_splitter.split_documents(docs)
chunk_docs

[Document(metadata={'source': 'CCAG_End-to-End_Point_Cloud_Registration.pdf', 'page': 0, 'page_label': '1'}, page_content='IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 1, JANUARY 2024 435\nCCAG: End-to-End Point Cloud Registration\nYong Wang, Pengbo Zhou, Guohua Geng,L iA n , and Yangyang Liu\nAbstract—Point cloud registration is a crucial task in com-\nputer vision and 3D reconstruction, aiming to align multiple point\nclouds to achieve globally consistent geometric structures. How-\never, traditional point cloud registration methods face challenges\nwhen dealing with low overlap and large-scale point cloud data.\nTo overcome these issues, we propose an end-to-end point cloud\nregistration method called CCAG. The CCAG algorithm lever-\nages the Cross-Convolution Attention module, which combines\ncross-attention mechanism and depth-wise separable convolution\nto capture relationships between point clouds and integrate fea-\ntures. Through cross-attention computation, this module e

Vector Embeddings and Vector Stores

In [12]:
from langchain_openai import AzureOpenAIEmbeddings
from langchain_ollama import OllamaEmbeddings
# from langchain.embeddings import AzureOpenAIEmbeddings (Depricated package)
# from langchain.embeddings import OllamaEmbeddings

from langchain_community.vectorstores import FAISS

db = FAISS.from_documents(chunk_docs, AzureOpenAIEmbeddings(model='gpt-text-embedding-3-large'))
db

<langchain_community.vectorstores.faiss.FAISS at 0x160273d6ad0>

Similarity search

In [19]:
query="This method first encodes the input point cloud through an encoder network, transforming the position and feature informationofthepointcloudintoasetofhigh-dimensional representations."
retrieved_result = db.similarity_search(query)
print(retrieved_result[0].page_content)

WANG et al.: CCAG: END-TO-END POINT CLOUD REGISTRATION 437
Fig. 1. Network architecture of CCAG.
networks. This method ﬁrst encodes the input point cloud
through an encoder network, transforming the position and fea-
ture information of the point cloud into a set of high-dimensional
representations. Then, self-attention mechanisms are employed
to learn the relationships between points in the point cloud, cap-
turing both global and local feature associations in the encoded
point cloud representation. Finally, a fully connected layer maps
the point cloud representation to the prediction space of point
cloud correspondences. This prediction space can represent the
probability of correspondence between each point in the point
cloud and other points. Based on the predicted correspondences,
an iterative optimization algorithm is used to resolve the optimal
point cloud correspondences. Additionally, there are unsuper-
vised deep probabilistic methods, such as the method proposed
