In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"] = "true"

# Document loaders


In [2]:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("AeroLeads Work JD.txt", encoding='utf-8')
text_document = loader.load()
print(text_document)

  from .autonotebook import tqdm as notebook_tqdm


[Document(metadata={'source': 'AeroLeads Work JD.txt'}, page_content='\ufeffThere are 1400+ people applied. I am sharing this JD with around 20 of the best profiles. Please share the assignment GitHub repository and YouTube video with me on WhatsApp at 9981513777. \n\n\nWhat we are looking for\n1. Fast learner\n2. Someone who enjoys building things and coding.\n3. Someone who can think for himself and make his own decisions \n4. Strong understanding of software and product development skills. \n\n\nRoadmap - what features do we want to build \n\n\n1. Add more AI features - see our site aeroleads.com. We need to add many AI features, automating sales for our current and future users. You need to create AI agents to save time for our users. We want a prompt on homepage which will allow user to do all actions. \n\n\n2. Create more AI content pages - we have create millions of pages using python and ruby. You need to keep working on them to make them better so they get more traffic.\n\n\n3

In [3]:
from langchain_community.document_loaders import WebBaseLoader
import bs4

loader = WebBaseLoader(web_path="https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/", 
                        bs_kwargs=dict( parse_only=bs4.SoupStrainer(
                                        class_=("post-content", "post-title", "post-header")
                                    )
                                ))

text_documents = loader.load()
text_document

USER_AGENT environment variable not set, consider setting it to identify your requests.


[Document(metadata={'source': 'AeroLeads Work JD.txt'}, page_content='\ufeffThere are 1400+ people applied. I am sharing this JD with around 20 of the best profiles. Please share the assignment GitHub repository and YouTube video with me on WhatsApp at 9981513777. \n\n\nWhat we are looking for\n1. Fast learner\n2. Someone who enjoys building things and coding.\n3. Someone who can think for himself and make his own decisions \n4. Strong understanding of software and product development skills. \n\n\nRoadmap - what features do we want to build \n\n\n1. Add more AI features - see our site aeroleads.com. We need to add many AI features, automating sales for our current and future users. You need to create AI agents to save time for our users. We want a prompt on homepage which will allow user to do all actions. \n\n\n2. Create more AI content pages - we have create millions of pages using python and ruby. You need to keep working on them to make them better so they get more traffic.\n\n\n3

In [4]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('yolo.pdf')
text_document = loader.load()
text_document

[Document(metadata={'producer': 'pdfTeX-1.40.12', 'creator': 'LaTeX with hyperref package', 'creationdate': '2016-05-11T00:04:54+00:00', 'author': '', 'keywords': '', 'moddate': '2016-05-11T00:04:54+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.1415926-2.3-1.40.12 (TeX Live 2011) kpathsea version 6.0.1', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'yolo.pdf', 'total_pages': 10, 'page': 0, 'page_label': '1'}, page_content='You Only Look Once:\nUniﬁed, Real-Time Object Detection\nJoseph Redmon∗, Santosh Divvala∗†, Ross Girshick¶, Ali Farhadi∗†\nUniversity of Washington∗, Allen Institute for AI†, Facebook AI Research¶\nhttp://pjreddie.com/yolo/\nAbstract\nWe present YOLO, a new approach to object detection.\nPrior work on object detection repurposes classiﬁers to per-\nform detection. Instead, we frame object detection as a re-\ngression problem to spatially separated bounding boxes and\nassociated class probabilities. A single neural network pre-\ndicts bounding bo

# Text splitters

In [5]:
from langchain_text_splitters import CharacterTextSplitter, RecursiveCharacterTextSplitter

character_text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=20, separator="\n\n\n")
splitted = character_text_splitter.split_documents(text_document)
print(splitted[:1])

recursive_text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20, separators=["\n\n\n"])
splitted = recursive_text_splitter.split_documents(text_document)
print(splitted[:1])

[Document(metadata={'producer': 'pdfTeX-1.40.12', 'creator': 'LaTeX with hyperref package', 'creationdate': '2016-05-11T00:04:54+00:00', 'author': '', 'keywords': '', 'moddate': '2016-05-11T00:04:54+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.1415926-2.3-1.40.12 (TeX Live 2011) kpathsea version 6.0.1', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'yolo.pdf', 'total_pages': 10, 'page': 0, 'page_label': '1'}, page_content='You Only Look Once:\nUniﬁed, Real-Time Object Detection\nJoseph Redmon∗, Santosh Divvala∗†, Ross Girshick¶, Ali Farhadi∗†\nUniversity of Washington∗, Allen Institute for AI†, Facebook AI Research¶\nhttp://pjreddie.com/yolo/\nAbstract\nWe present YOLO, a new approach to object detection.\nPrior work on object detection repurposes classiﬁers to per-\nform detection. Instead, we frame object detection as a re-\ngression problem to spatially separated bounding boxes and\nassociated class probabilities. A single neural network pre-\ndicts bounding bo

# Embeddings and Store

In [None]:
!ollama pull embeddinggemma:latest

In [8]:
from langchain_community.vectorstores import Chroma, FAISS
from langchain_ollama.embeddings import OllamaEmbeddings

db = Chroma.from_documents(splitted[:20], OllamaEmbeddings(model='embeddinggemma:latest'))

# db = FAISS.from_documents(splitted, OllamaEmbeddings(model='embeddinggemma:latest'))

In [9]:
query = 'Who are the authors of YOLO reserach paper?'

result = db.similarity_search(query)

result[0].page_content

'You Only Look Once:\nUniﬁed, Real-Time Object Detection\nJoseph Redmon∗, Santosh Divvala∗†, Ross Girshick¶, Ali Farhadi∗†\nUniversity of Washington∗, Allen Institute for AI†, Facebook AI Research¶\nhttp://pjreddie.com/yolo/\nAbstract\nWe present YOLO, a new approach to object detection.\nPrior work on object detection repurposes classiﬁers to per-\nform detection. Instead, we frame object detection as a re-\ngression problem to spatially separated bounding boxes and\nassociated class probabilities. A single neural network pre-\ndicts bounding boxes and class probabilities directly from\nfull images in one evaluation. Since the whole detection\npipeline is a single network, it can be optimized end-to-end\ndirectly on detection performance.\nOur uniﬁed architecture is extremely fast. Our base\nYOLO model processes images in real-time at 45 frames\nper second. A smaller version of the network, Fast YOLO,\nprocesses an astounding 155 frames per second while\nstill achieving double the mAP

In [10]:
from langchain_ollama.llms import OllamaLLM
llm = OllamaLLM(model='gemma3:1b')
llm

OllamaLLM(model='gemma3:1b')

In [13]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    """Answer the following question based on the provided context. Think step by step before providing a detailed answer.
    If there's not enough information in the context just return \"Sorry i don't have knowledge of that\"
    <context>{context}</context>
    
    Question : {input}
    """
)
prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='Answer the following question based on the provided context. Think step by step before providing a detailed answer.\n    If there\'s not enough information in the context just return "Sorry i don\'t have knowledge of that"\n    <context>{context}</context>\n\n    Question : {input}\n    '), additional_kwargs={})])

# Chains

In [16]:
from langchain_classic.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='Answer the following question based on the provided context. Think step by step before providing a detailed answer.\n    If there\'s not enough information in the context just return "Sorry i don\'t have knowledge of that"\n    <context>{context}</context>\n\n    Question : {input}\n    '), additional_kwargs={})])
| OllamaLLM(model='gemma3:1b')
| StrOutputParser(), kwargs={}, config={'run_name': 'stuff_documents_chain'}, config_factories=[])

# Retriever

In [15]:
from langchain_classic.retrievers import BM25Retriever

retriever = db.as_retriever()
retriever

VectorStoreRetriever(tags=['Chroma', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001D15481F3E0>, search_kwargs={})

# Retriever Chain

In [17]:
from langchain_classic.chains import create_retrieval_chain

retriever_chain = create_retrieval_chain(retriever, document_chain)
retriever_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001D15481F3E0>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='Answer the following question based on the provided context. Think step by step before providing a detailed answer.\n    If ther

In [19]:
response = retriever_chain.invoke({"input":"What is the yolo research paper about?"})
response

{'input': 'What is the yolo research paper about?',
 'context': [Document(metadata={'moddate': '2016-05-11T00:04:54+00:00', 'total_pages': 10, 'creationdate': '2016-05-11T00:04:54+00:00', 'author': '', 'title': '', 'keywords': '', 'source': 'yolo.pdf', 'page_label': '1', 'subject': '', 'page': 0, 'producer': 'pdfTeX-1.40.12', 'ptex.fullbanner': 'This is pdfTeX, Version 3.1415926-2.3-1.40.12 (TeX Live 2011) kpathsea version 6.0.1', 'trapped': '/False', 'creator': 'LaTeX with hyperref package'}, page_content='You Only Look Once:\nUniﬁed, Real-Time Object Detection\nJoseph Redmon∗, Santosh Divvala∗†, Ross Girshick¶, Ali Farhadi∗†\nUniversity of Washington∗, Allen Institute for AI†, Facebook AI Research¶\nhttp://pjreddie.com/yolo/\nAbstract\nWe present YOLO, a new approach to object detection.\nPrior work on object detection repurposes classiﬁers to per-\nform detection. Instead, we frame object detection as a re-\ngression problem to spatially separated bounding boxes and\nassociated clas

In [22]:
print(response['answer'])

The YOLO research paper is about **the advancements and improvements made in the YOLO (You Only Look Once) object detection algorithm, particularly focusing on its speed, accuracy, and the techniques used to further enhance its performance.**

Here’s a breakdown of the key areas the paper discusses:

*   **Focus on Speed and Accuracy:** The paper highlights how YOLO has become increasingly faster and more accurate than previous object detection algorithms.
*   **YOLO’s Core Approach:** It explains the core concept of YOLO – it treats object detection as a regression problem, predicting bounding boxes and class probabilities directly from the input image, making it computationally efficient.
*   **Techniques for Improvement:** The paper details several strategies employed to improve YOLO’s performance, including:
    *   **Darknet Architecture:** Explains the improvements made to the neural network architecture.
    *   **Confidence Thresholds:** Utilizing a confidence threshold to filt

In [23]:
response = retriever_chain.invoke({"input":"What is the architecture of yolo"})
print(response['answer'])

Here's a breakdown of YOLO's architecture, based on the text provided:

**Core Components:**

1.  **Convolutional Layers:**
    *   **Multiple Layers:** YOLO uses a stack of convolutional layers, starting with a feature extractor, and going deeper.
    *   **Downsampling:** Each layer progressively reduces the spatial resolution of the image. This reduces the number of parameters and computation.

2.  **Anchor Boxes:**
    *   **Predicting Boxes:** YOLO predicts bounding boxes (rectangles) for each object in the image. These bounding boxes are "anchored" to grid cells.
    *   **Anchor Sizes:** YOLO uses a set of predefined anchor box sizes to match the typical shapes of objects in the dataset.

3.  **Confidence Scores:**
    *   **Layer Output:** Each convolutional layer outputs a "confidence score" – a measure of how sure the model is that a predicted box represents an actual object.

4.  **Non-Maximal Suppression (NMS):**
    *   **Handling Overlapping Boxes:** After predicting mult