In [54]:
import os
from dotenv import load_dotenv

load_dotenv()

PDF_FILE = "sample.pdf"

# MODEL = "gemma2:2b"

Loading the PDF document

In [55]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(PDF_FILE)
pages = loader.load()

print(f"Number of pages: {len(pages)}")
print(f"Length of a page: {len(pages[1].page_content)}")
print("Content of a page:", pages[1].page_content)

Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 20 0 (offset 0)
Ignoring wrong pointing object 36 0 (offset 0)
Ignoring wrong pointing object 42 0 (offset 0)
Ignoring wrong pointing object 46 0 (offset 0)
Ignoring wrong pointing object 85 0 (offset 0)
Ignoring wrong pointing object 88 0 (offset 0)


Number of pages: 3
Length of a page: 3898
Content of a page: 0.831.672.503,334.175.005.836.67−0.200.20.40.60.811.21.4
Relative time (s)Temperature (C)PIR data sample
  PIR1PIR2PIR3
Figure 2: Time shifts: The time shifts in the signalsfrom three PIR sensors.for classiﬁcation is reduced to three dimensions – one foreach PIR sensor. These vectors are used to train logisticregression andGaussian mixture model(GMM) classiﬁers.3.2 Speed estimationSince the sensor array has three PIR sensors in di↵erentorientations (Figure 1), the temperature changes associatedwith a passing vehicle are recorded at di↵erent times by eachsensor (Figure 2). Convolution is applied to the three signalsto obtain the shifts of the signal spikes.Three time shifts are obtained t32, t21, t31, where tijrepresents the time shift of spikes from PIRito PIRj. Two linear regression models are compared:v=c1+c2 t32+c3 t21+c4 t31,(t-model),v=c1+c2/ t32+c3/ t21+c4/ t31,(1/t-model).The ﬁrstt-modelsimply takes the time shifts as 

Splitting the pages in chunks

In [56]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=1100, chunk_overlap=100)

chunks = splitter.split_documents(pages)

print(f"Number of chunks: {len(chunks)}")
print(f"Length of a chunk: {len(chunks[1].page_content)}")
print("Content of a chunk:", chunks[1].page_content)

Number of chunks: 18
Length of a chunk: 1099
Content of a chunk: this poster presents our pre-liminary results for detecting vehicles and estimating tra cspeed, by applying signal processing and machine learningtechniques using thePassive Infrared(PIR) sensor data.1. MOTIVATIONConstruction activities on sections of roads (known aswork zones) commonly introduce mobility and safety is-sues, such as the development of queues and hence rear-endcrashes. ITS is expected to mitigate such issues, by mea-suring the real-time tra c conditions, and taking actionsaccordingly. The performance of ITS relies on the qualityand quantity of tra c data (e.g., tra c ﬂow, speed). Par-ticularly, the work zones by nature is dynamic: the scale(miles) and duration (days to years) of construction projectsvary across work zones. To promote the adoption of ITS inwork zones, the ITS devices should be mobile, robust, andself-contained, without requiring much installation e↵ort orexternal power lines.With the above 

Storing the chunks in a vector store

In [57]:
from langchain_community.vectorstores import FAISS
# from langchain_community.embeddings import OllamaEmbeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings

# embeddings = OllamaEmbeddings(model=MODEL)
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=os.getenv("GEMINI_API_KEY"))
vectorstore = FAISS.from_documents(chunks, embeddings)

Setting up a retriever

In [58]:
retriever = vectorstore.as_retriever()
retriever.invoke("A high-pass filter is used")

[Document(metadata={'source': 'sample.pdf', 'page': 0}, page_content='abrupt change in the tem-perature is recorded. This subsection presents a method todetect windows of time that contain such an abrupt temper-ature change.A high-pass ﬁlter is used to remove the e↵ect of grad-ual ambient temperature changes due to weather conditions.The window sizewis varied from 0.25sto 6.25swith a15wo↵set. The signal data in each window is averaged using aHamming function, which allows for higher weights for mea-surements in the center of the window. Thus, the data used'),
 Document(metadata={'source': 'sample.pdf', 'page': 1}, page_content='Figure 2: Time shifts: The time shifts in the signalsfrom three PIR sensors.for classiﬁcation is reduced to three dimensions – one foreach PIR sensor. These vectors are used to train logisticregression andGaussian mixture model(GMM) classiﬁers.3.2 Speed estimationSince the sensor array has three PIR sensors in di↵erentorientations (Figure 1), the temperature cha

Configuring the model

In [59]:
# from langchain_ollama import ChatOllama
from langchain_google_genai import ChatGoogleGenerativeAI

# model = ChatOllama(model=MODEL, temperature=0)
model = ChatGoogleGenerativeAI(model="gemini-1.5-pro", google_api_key=os.getenv("GEMINI_API_KEY"), temperature=0)

Parsing the model's response

In [60]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain =  model | parser

print(chain.invoke("Who is the president of Nigeria?"))

The current president of Nigeria is **Bola Tinubu**. He assumed office on May 29, 2023, succeeding Muhammadu Buhari. 



Setting up a prompt

In [61]:
from langchain.prompts import PromptTemplate

template = """
You are an assistant that provides answers to questions based on 
a given context.

Answer the question based on the context. If you can't answer the
question, reply "I don't know".

Be as concise as possible and go straight to the point.

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="The president of Nigeria is Muhammadu Buhari.", question="Who is the president of Nigeria?"))


You are an assistant that provides answers to questions based on 
a given context.

Answer the question based on the context. If you can't answer the
question, reply "I don't know".

Be as concise as possible and go straight to the point.

Context: The president of Nigeria is Muhammadu Buhari.

Question: Who is the president of Nigeria?



Adding the prompt to the chain

In [62]:
chain = prompt | model | parser

chain.invoke({
    "context": "A person who is a citizen of Nigeria is called a Nigerian.",
    "question": "What do you call a person who is a citizen of Nigeria?"
})

'A Nigerian \n'

Adding the retriever to the chain

In [63]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    |parser
)

Using the chain to answer questions

In [64]:
questions = [
    "how is the recorded data divided?",
    "describe the results",
    "What is the research objective?"
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print("#"*20)

Question: how is the recorded data divided?
Answer: The recorded data is divided into windows of size w. 
####################
Question: describe the results
Answer: The logistic regression model achieved the best performance for vehicle detection, with zero false positives and only one false negative out of 288 vehicles. 

####################
Question: What is the research objective?
Answer: Use a combination of signal processing and machine learning techniques to detect vehicles and estimate their velocities from PIR sensor data. 

####################
