# RAG Pipeline
##### 1. Loading the Document
##### 2. Splitting into CHUNKS
##### 3. Embedding the TEXT --> VECTORS
##### 4. Storing Vectors into Vectore_Store
##### 5. Querying and Retrieving from the Vector_Store

## 1. Document Loading Techniques

In [75]:
# .txt file

from langchain_community.document_loaders import TextLoader
loader = TextLoader("example.txt")


docs = loader.load()
docs

[Document(metadata={'source': 'example.txt'}, page_content="The only reason any person does not have enough money is because they are blocking money from coming to them with their \nthoughts. Every negative thought, feeling, or emotion is blocking \nyour good from coming to you, and that includes money. It is not \nthat the money is being kept from you by the Universe, because all \nthe money you require exists right now in the invisible. If you do \nnot have enough, it is because you are stopping the flow of money \ncoming to you, and you are doing that with your thoughts. You \nmust tip the balance of your thoughts from lack-of-money to more than-enough-money. Think more thoughts of abundance than of \nlack, and you have tipped the balance. \nWhen you need money, it is a powerful feeling within you, and so \nof course through the law of attraction you will continue to attract \nneeding money. \nI can speak from experience about money, because just before I \ndiscovered The Secret my 

In [76]:
# website (webscraping)

import bs4
from langchain_community.document_loaders import WebBaseLoader
# pipeline is like (load, chunks, and index the content of html page)

loader = WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),  
                       bs_kwargs=dict(parse_only=bs4.SoupStrainer(
                           class_=("post-title", "post-content", "post-header")
                       )))

docs = loader.load()
docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistake

In [77]:
# PDF file

from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("the-secret.pdf")

docs = loader.load()
docs

[Document(metadata={'source': 'the-secret.pdf', 'page': 0}, page_content=''),
 Document(metadata={'source': 'the-secret.pdf', 'page': 1}, page_content=''),
 Document(metadata={'source': 'the-secret.pdf', 'page': 2}, page_content='ATRIA BOOKS \nNew York London Toronto Sydney '),
 Document(metadata={'source': 'the-secret.pdf', 'page': 3}, page_content='20827 N.W. Cornell Road, Suite 500 \nHillsboro, Oregon 97124-9808 \n503-531-8700 tel 503-531-8773 fax \nwww.beyondvvord.com \nATRIA BOOKS \n1230 Avenue of the Americas \nNew York, NY 10020 \nCopyright © 2006 by TS Production Limited Liability Company \nTHE SECRET and The Secret logo are trademarks or registered trademarks \nowned by or licensed to TS Production Limited Liability Company \nwww.thesecret.tv \nAll rights reserved. No part of this book may be reproduced, copied, stored, or \ntransmitted in any form or by any means—graphic, electronic, or mechanical, \nincluding photocopying, recording, or information storage and retrieval syst

## 2. Splitting (CHUNKS) Technique

In [78]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunk_documents = text_splitter.split_documents(docs)
chunk_documents

[Document(metadata={'source': 'the-secret.pdf', 'page': 2}, page_content='ATRIA BOOKS \nNew York London Toronto Sydney'),
 Document(metadata={'source': 'the-secret.pdf', 'page': 3}, page_content='20827 N.W. Cornell Road, Suite 500 \nHillsboro, Oregon 97124-9808 \n503-531-8700 tel 503-531-8773 fax \nwww.beyondvvord.com \nATRIA BOOKS \n1230 Avenue of the Americas \nNew York, NY 10020 \nCopyright © 2006 by TS Production Limited Liability Company \nTHE SECRET and The Secret logo are trademarks or registered trademarks \nowned by or licensed to TS Production Limited Liability Company \nwww.thesecret.tv \nAll rights reserved. No part of this book may be reproduced, copied, stored, or \ntransmitted in any form or by any means—graphic, electronic, or mechanical, \nincluding photocopying, recording, or information storage and retrieval systems— \nwithout the prior written permission of Atria Books/Beyond Words Publishing, \nInc., except where permitted by law. \nThe information contained in thi

## 3. Vector Embedding & Storing Vectors into Vector_Store

In [79]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name = "all-MiniLM-L6-v2")


db = FAISS.from_documents(chunk_documents,embeddings)
db   # Vector_Store, all the vectors are stored in the 'db'

<langchain_community.vectorstores.faiss.FAISS at 0x2b1f77cfbc0>

## 4. Querying and Retrieving from the Vector_Store

In [80]:
# Now I can do Similarity-Search in the Vector_Store 

query = """Different things will 
shift you at different times"""
retrieved_result = db.similarity_search(query)
print(retrieved_result[0].page_content)

Make a list of some Secret Shifters to have up your sleeve. By 
Secret Shifters, I mean things that can change your feelings in a 
snap. It might be beautiful memories, future events, funny mo-
ments, nature, a person you love, your favorite music. Then if you 
find yourself angry or frustrated or not feeling good, turn to your 
Secret Shifters list and focus on one of them. Different things will 
shift you at different times, so if one doesn't work, go to another. It 
only takes a minute or two of changing focus to shift yourself and 
shift your frequency.


In [85]:
query = "As you think of yourself living in abundance"
retrieved_result = db.similarity_search(query)
print(retrieved_result[0].page_content)

that"—then you'll never be able to afford it, because you 
begin to attract more of what you don't have. If you want 
abundance, if you want prosperity, then focus on abundance. 
Focus on prosperity. 
"The spiritual substance from which comes all 
visible wealth is never depleted. It is right with 
you all the time and responds to your faith in it 
and your demands on it." 
Now that you know The Secret, when you see someone who is 
wealthy you will know that that person's predominant thoughts
