In [53]:
# DataIngestion steps (read from pdf, web page or text files)

from langchain_community.document_loaders import TextLoader
loader = TextLoader("speech.txt")

text_documents = loader.load()
text_documents

[Document(metadata={'source': 'speech.txt'}, page_content="Introduction to Large Language Models\nLarge language models (LLMs) are a groundbreaking development in the field of artificial intelligence (AI) that have transformed the way computers process and generate human language. These advanced AI systems are trained on vast amounts of text data, enabling them to understand and mimic natural language with unprecedented accuracy and sophistication. LLMs have the ability to generate human-like text, answer questions, translate languages, and even engage in creative writing tasks. As these models continue to evolve and improve, they are poised to revolutionize various industries and applications that rely on language processing and generation.\nHow Large Language Models Work\nAt the core of LLMs is deep learning, a type of machine learning that utilizes artificial neural networks to identify patterns and relationships in large datasets. These neural networks are composed of multiple laye

In [54]:

print(text_documents[0].page_content)

Introduction to Large Language Models
Large language models (LLMs) are a groundbreaking development in the field of artificial intelligence (AI) that have transformed the way computers process and generate human language. These advanced AI systems are trained on vast amounts of text data, enabling them to understand and mimic natural language with unprecedented accuracy and sophistication. LLMs have the ability to generate human-like text, answer questions, translate languages, and even engage in creative writing tasks. As these models continue to evolve and improve, they are poised to revolutionize various industries and applications that rely on language processing and generation.
How Large Language Models Work
At the core of LLMs is deep learning, a type of machine learning that utilizes artificial neural networks to identify patterns and relationships in large datasets. These neural networks are composed of multiple layers that process and transform the input data, gradually learni

In [55]:
## web based loader

from langchain_community.document_loaders import WebBaseLoader
import bs4

## Load, chunk and index the content of the html page
loader = WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",), 
                       bs_kwargs = dict(parse_only = bs4.SoupStrainer(
                           class_=("post-title","post-content","post-header")
                       )))

text_documents = loader.load()
text_documents

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistake

In [56]:
## PDF loader

from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("FuseCap.pdf")
docs = loader.load()
docs

[Document(metadata={'source': 'FuseCap.pdf', 'page': 0}, page_content='FUSECAP: Leveraging Large Language Models\nfor Enriched Fused Image Captions\nNoam Rotstein* David Bensa ¨ıd* Shaked Brody Roy Ganz Ron Kimmel\nTechnion - Israel Institute of Technology\n*Indicates equal contribution.\nAbstract\nThe advent of vision-language pre-training techniques en-\nhanced substantial progress in the development of models for\nimage captioning. However, these models frequently produce\ngeneric captions and may omit semantically important im-\nage details. This limitation can be traced back to the image-\ntext datasets; while their captions typically offer a general\ndescription of image content, they frequently omit salient\ndetails. Considering the magnitude of these datasets, man-\nual reannotation is impractical, emphasizing the need for an\nautomated approach. To address this challenge, we leverage\nexisting captions and explore augmenting them with visual\ndetails using “frozen” vision expe

In [57]:
## Making the chunks from the loaded data
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
documents = text_splitter.split_documents(docs)
documents[:5] # 5 chunks

[Document(metadata={'source': 'FuseCap.pdf', 'page': 0}, page_content='FUSECAP: Leveraging Large Language Models\nfor Enriched Fused Image Captions\nNoam Rotstein* David Bensa ¨ıd* Shaked Brody Roy Ganz Ron Kimmel\nTechnion - Israel Institute of Technology\n*Indicates equal contribution.\nAbstract\nThe advent of vision-language pre-training techniques en-\nhanced substantial progress in the development of models for\nimage captioning. However, these models frequently produce\ngeneric captions and may omit semantically important im-\nage details. This limitation can be traced back to the image-\ntext datasets; while their captions typically offer a general\ndescription of image content, they frequently omit salient\ndetails. Considering the magnitude of these datasets, man-\nual reannotation is impractical, emphasizing the need for an\nautomated approach. To address this challenge, we leverage\nexisting captions and explore augmenting them with visual\ndetails using “frozen” vision expe

In [58]:
len(documents) # total 93 chunks

93

In [93]:
## Embeddings
from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(model="llama3")

In [94]:
## Embedding the chunks and Vector Store
from langchain_community.vectorstores import Chroma

# persist_directory = 'chroma-db'
# vectordb = Chroma.from_documents(documents, embeddings)
# Save the vector store
# vectordb.persist()

vector_store = Chroma(
    collection_name=documents,
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not neccesary
)

AttributeError: module 'chromadb' has no attribute 'config'

In [92]:
## Embedding the chunks and Vector Store
from langchain_community.vectorstores import Chroma

db = Chroma.from_documents(documents, embeddings)


AttributeError: module 'chromadb' has no attribute 'config'

In [None]:
## vector database
query = "Who are the others of FUSECAP: Leveraging Large Language Models for Enriched Fused Image Captions reserch paper?"
result = db.similarity_search(query)
result

In [69]:
## FAISS VECTOR DATABASE
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="llama3")
db2 = FAISS.from_documents(documents, embeddings)

In [74]:
import faiss

# Initialize your FAISS index
index = db2.index  # Access the underlying FAISS index

# Define the path to the file where you want to save the index
index_file_path = 'faiss_index.bin'

# Save the index to a file
faiss.write_index(index, index_file_path)


In [75]:
import pickle

# Save the docstore to a file
with open('docstore.pkl', 'wb') as f:
    pickle.dump(db2.docstore, f)


In [76]:
import faiss

# Define the path to the file where the index is saved
index_file_path = 'faiss_index.bin'

# Load the index from the file
index = faiss.read_index(index_file_path)


In [77]:
import pickle

# Load the docstore from the file
with open('docstore.pkl', 'rb') as f:
    docstore = pickle.load(f)


In [78]:
from langchain_community.vectorstores import FAISS

# Create a new FAISS vectorstore with the loaded index and docstore
db3 = FAISS(
    embedding_function=embeddings,  # Make sure to initialize embeddings as needed
    index=index,
    docstore=docstore,
    index_to_docstore_id=db2.index_to_docstore_id,  # or recreate if needed
)


In [80]:
query = "How many components are avilable in the architecture of the FUSE cap model?"
result = db3.similarity_search(query)
result[0].page_content

'passing it.\nA.2. Training Set Generation for LLM Fuser with\nChatGPT.\nAs outlined in Section 3.2, we harness the zero-shot ca-\npabilities of ChatGPT to generate a compact dataset encom-\npassing 20,000 examples. This data is subsequently used\nto fine-tune an open-source Large Language Model (LLM).\nThe prompt provided to ChatGPT is as follows:\n”A caption of an image is given: original caption .\nThe following objects are detected in the image from left to\nright:\nAa1\n1, ...,ak−1\n1andak\n1o1[with the following text: t1].\n...\nAa1\nn, ...,akn−1\nn andaknnon[with the following text: tn].\nWrite a comprehensive and concise caption of the scene\nusing the objects detected.”'

In [82]:
from langchain_community.llms import Ollama

llm = Ollama(model="llama3")
llm

Ollama(model='llama3')

In [83]:
## Designing the Chatprompt Template
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(""""
Answer the following question based only on the provided context
Think step by step before providing a detailed answer.
I will tip you $1000 if the used fiinds the answer helpful.
<context>
{context}
</context>
Question: {input}""")

In [84]:
## Create Stuff Document Chain

from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)

In [86]:
retriever = db3.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7f7720e37e00>)

In [88]:
## Retriever Chain 
from langchain.chains import create_retrieval_chain
retrival_chain = create_retrieval_chain(retriever, document_chain)

In [90]:
response = retrival_chain.invoke({"input":"What is LLM Fuser?"})

In [91]:
response['answer']

'Based on the provided context, I can answer your question.\n\nLLM Fuser refers to a model that combines (fuses) enriched captions generated by ChatGPT with an open-source Large Language Model (LLM). The specific prompt used for generating these captions involves providing an image and detected objects in the image, along with their corresponding text descriptions.'