**import libraries**

In [1]:
#FAISS Vectoredatabase
#from langchain_community.vectorstores import FAISS

In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import CharacterTextSplitter

**Loading text data**

In [3]:
text = TextLoader('TEXT_FILE.txt')
text = text.load()

In [4]:
text

[Document(metadata={'source': 'TEXT_FILE.txt'}, page_content='William Shakespeare[a] (c. 23[b] April 1564 – 23 April 1616)[c] was an English playwright, poet and actor. He is widely regarded as the greatest writer in the English language and the world\'s pre-eminent dramatist. He is often called England\'s national poet and the "Bard of Avon" (or simply "the Bard"). His extant works, including collaborations, consist of some 39 plays, 154 sonnets, three long narrative poems and a few other verses, some of uncertain authorship. His plays have been translated into every major living language and are performed more often than those of any other playwright. Shakespeare remains arguably the most influential writer in the English language, and his works continue to be studied and reinterpreted.\n\nShakespeare was born and raised in Stratford-upon-Avon, Warwickshire. At the age of 18, he married Anne Hathaway, with whom he had three children: Susanna, and twins Hamnet and Judith. Sometime bet

**Splitting text**

In [7]:
text_splitter = CharacterTextSplitter(separator="\n",chunk_size=500,chunk_overlap=20)
text= text_splitter.split_documents(text)

Created a chunk of size 735, which is longer than the specified 500
Created a chunk of size 838, which is longer than the specified 500
Created a chunk of size 503, which is longer than the specified 500
Created a chunk of size 525, which is longer than the specified 500
Created a chunk of size 608, which is longer than the specified 500
Created a chunk of size 530, which is longer than the specified 500
Created a chunk of size 731, which is longer than the specified 500
Created a chunk of size 1375, which is longer than the specified 500
Created a chunk of size 666, which is longer than the specified 500
Created a chunk of size 599, which is longer than the specified 500
Created a chunk of size 1120, which is longer than the specified 500
Created a chunk of size 607, which is longer than the specified 500
Created a chunk of size 1560, which is longer than the specified 500
Created a chunk of size 722, which is longer than the specified 500
Created a chunk of size 572, which is longer 

**HuggingFace Embedding on text**

In [8]:
import os
from dotenv import load_dotenv
load_dotenv() # load environment variables from .env file

True

In [9]:
os.environ['Huggingface_API_KEY']=os.getenv("Huggingface_API_KEY")

In [10]:
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") # AT THE PLACE OF MODEL WE CAN USE ANY MODEL FROM HUGGINGFACE

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
#here we have embedded eachand every sentence separated by splitting technique
vector_embeddings = embedding.embed_documents([doc.page_content for doc in text])

**FAISS VECTORSTORAGE**

In [12]:
db = FAISS.from_documents(text,embedding)

In [13]:
db

<langchain_community.vectorstores.faiss.FAISS at 0x20894fa04c0>

**Querying and Searching from VECTORDB**

In [17]:
query = "who was william shakespeare?"
docs = db.similarity_search(query)
for i in docs:
    print(i.page_content)

William Shakespeare[a] (c. 23[b] April 1564 – 23 April 1616)[c] was an English playwright, poet and actor. He is widely regarded as the greatest writer in the English language and the world's pre-eminent dramatist. He is often called England's national poet and the "Bard of Avon" (or simply "the Bard"). His extant works, including collaborations, consist of some 39 plays, 154 sonnets, three long narrative poems and a few other verses, some of uncertain authorship. His plays have been translated into every major living language and are performed more often than those of any other playwright. Shakespeare remains arguably the most influential writer in the English language, and his works continue to be studied and reinterpreted.
After the birth of the twins, Shakespeare left few historical traces until he is mentioned as part of the London theatre scene in 1592. The exception is the appearance of his name in the "complaints bill" of a law case before the Queen's Bench court at Westminster

**Retriever ( works as a interface for QUERY n VectorDB / Mostly will be used in future for LLM.)**

In [19]:
retriever = db.as_retriever()
docs = retriever.invoke(query)

In [20]:
docs

[Document(id='7020a825-70ef-4ee9-b27d-1e4aa98e0cbb', metadata={'source': 'TEXT_FILE.txt'}, page_content='William Shakespeare[a] (c. 23[b] April 1564 – 23 April 1616)[c] was an English playwright, poet and actor. He is widely regarded as the greatest writer in the English language and the world\'s pre-eminent dramatist. He is often called England\'s national poet and the "Bard of Avon" (or simply "the Bard"). His extant works, including collaborations, consist of some 39 plays, 154 sonnets, three long narrative poems and a few other verses, some of uncertain authorship. His plays have been translated into every major living language and are performed more often than those of any other playwright. Shakespeare remains arguably the most influential writer in the English language, and his works continue to be studied and reinterpreted.'),
 Document(id='4ad21798-0467-4546-b769-e694567c087d', metadata={'source': 'TEXT_FILE.txt'}, page_content='After the birth of the twins, Shakespeare left fe

**Similarity Search with Score**

#this will allows to not only to return output for the query but also allows to give the distance score of the query to them. (AS LOW, AS GOOD)

In [None]:
docs_and_scores = db.similarity_search_with_score(query)
docs_and_scores

[(Document(id='7020a825-70ef-4ee9-b27d-1e4aa98e0cbb', metadata={'source': 'TEXT_FILE.txt'}, page_content='William Shakespeare[a] (c. 23[b] April 1564 – 23 April 1616)[c] was an English playwright, poet and actor. He is widely regarded as the greatest writer in the English language and the world\'s pre-eminent dramatist. He is often called England\'s national poet and the "Bard of Avon" (or simply "the Bard"). His extant works, including collaborations, consist of some 39 plays, 154 sonnets, three long narrative poems and a few other verses, some of uncertain authorship. His plays have been translated into every major living language and are performed more often than those of any other playwright. Shakespeare remains arguably the most influential writer in the English language, and his works continue to be studied and reinterpreted.'),
  np.float32(0.69037426)),
 (Document(id='4ad21798-0467-4546-b769-e694567c087d', metadata={'source': 'TEXT_FILE.txt'}, page_content='After the birth of t

**Getting output by input Query**

In [22]:
embedding_vector = embedding.embed_query(query)
embedding_vector

[0.025181980803608894,
 0.017569134011864662,
 0.06538469344377518,
 0.00991843082010746,
 -0.048630114644765854,
 0.052188046276569366,
 0.08272631466388702,
 -0.01785614900290966,
 -0.012043516151607037,
 0.07280828058719635,
 -0.02595778927206993,
 -0.027289986610412598,
 -0.04074171558022499,
 -0.009890916757285595,
 -0.008904434740543365,
 0.016340769827365875,
 -0.03639690577983856,
 0.02306036278605461,
 -0.04665011540055275,
 -0.044065240770578384,
 -0.009663651697337627,
 -0.00022220458777155727,
 -0.019777797162532806,
 -0.0321551151573658,
 -0.0035644706804305315,
 0.005115368403494358,
 0.001110594137571752,
 -0.0025122344959527254,
 -0.02752426080405712,
 0.009789826348423958,
 -0.04984106123447418,
 -0.11448453366756439,
 0.02452218532562256,
 -0.057941559702157974,
 -0.03269447386264801,
 -0.0032452032901346684,
 0.025084713473916054,
 0.06699496507644653,
 -0.020892325788736343,
 -0.021935930475592613,
 -0.004743832163512707,
 -0.024955766275525093,
 -0.0003954015846829

In [26]:
vec_docs = db.similarity_search_by_vector(embedding_vector)
vec_docs

[Document(id='7020a825-70ef-4ee9-b27d-1e4aa98e0cbb', metadata={'source': 'TEXT_FILE.txt'}, page_content='William Shakespeare[a] (c. 23[b] April 1564 – 23 April 1616)[c] was an English playwright, poet and actor. He is widely regarded as the greatest writer in the English language and the world\'s pre-eminent dramatist. He is often called England\'s national poet and the "Bard of Avon" (or simply "the Bard"). His extant works, including collaborations, consist of some 39 plays, 154 sonnets, three long narrative poems and a few other verses, some of uncertain authorship. His plays have been translated into every major living language and are performed more often than those of any other playwright. Shakespeare remains arguably the most influential writer in the English language, and his works continue to be studied and reinterpreted.'),
 Document(id='4ad21798-0467-4546-b769-e694567c087d', metadata={'source': 'TEXT_FILE.txt'}, page_content='After the birth of the twins, Shakespeare left fe

**Saving VectorDB as pickle file**

In [29]:
db.save_local("faiss_index")

**Load the saved VECTORDB pickle file**

In [None]:
#loading the saved index
vector_db = FAISS.load_local("faiss_index",embedding,allow_dangerous_deserialization=True)

In [35]:
# Just searching the output for query from the loaded index
query = "who was william shakespeare?"
docs = vector_db.similarity_search(query)
docs

[Document(id='7020a825-70ef-4ee9-b27d-1e4aa98e0cbb', metadata={'source': 'TEXT_FILE.txt'}, page_content='William Shakespeare[a] (c. 23[b] April 1564 – 23 April 1616)[c] was an English playwright, poet and actor. He is widely regarded as the greatest writer in the English language and the world\'s pre-eminent dramatist. He is often called England\'s national poet and the "Bard of Avon" (or simply "the Bard"). His extant works, including collaborations, consist of some 39 plays, 154 sonnets, three long narrative poems and a few other verses, some of uncertain authorship. His plays have been translated into every major living language and are performed more often than those of any other playwright. Shakespeare remains arguably the most influential writer in the English language, and his works continue to be studied and reinterpreted.'),
 Document(id='4ad21798-0467-4546-b769-e694567c087d', metadata={'source': 'TEXT_FILE.txt'}, page_content='After the birth of the twins, Shakespeare left fe