# Test Text Extraction

In [1]:
from pypdf import PdfReader

In [2]:
reader = PdfReader(r"C:\Users\Andres\Desktop\ScienceRAG\papers\Bartholet_et_al.pdf")

In [3]:
print(len(reader.pages))

11


In [4]:
page = reader.pages[1]

In [5]:
text = page.extract_text()
print(text)

Fuel 295 (2021) 120509
2properties of the oxygenated species are important when considering 
blending effects and engine performance. For example, ethanol is 
commonly added to gasoline, but its use as a diesel fuel additive is 
limited by low cetane number; low flashpoint temperature, which cre-
ates headspace flammability concerns in fuel tanks; and poor miscibility, 
which results in the need for engine and distribution infrastructure 
modifications [11,12,18] . 
It is also important to consider the environmental fate of a potential 
fuel additive molecule. Methyl tert-butyl ether (MTBE) was used widely 
as an oxygenated additive in gasoline to reduce emissions. However, due 
to its water solubility and environmental persistence, MTBE became a 
problematic groundwater and soil contaminant [19] and is no longer 
used as a fuel additive in the United States [20]. 
A molecule that has recently gained attention as a potential diesel 
additive is dimethoxymethane (DMM), also known as met

# Test Ollama Embeddings

## Generate Embeddings

In [18]:
import ollama
import chromadb

client = chromadb.Client()
try:
  collection = client.create_collection(name="docs")
except:
  collection = client.get_collection(name="docs")

# store each document in a vector embedding database
for i, page in enumerate(reader.pages):
  response = ollama.embeddings(model="mxbai-embed-large", prompt=page.extract_text())
  embedding = response["embedding"]
  collection.add(
    ids=[str(i)],
    embeddings=[embedding],
    documents=[page.extract_text()]
  )

## Retrieve

In [21]:
# an example prompt
prompt = "What are some drawbacks of the combustion of diesel fuels?"

# generate an embedding for the prompt and retrieve the most relevant doc
response = ollama.embeddings(
  prompt=prompt,
  model="mxbai-embed-large"
)
results = collection.query(
  query_embeddings=[response["embedding"]],
  n_results=5
)
data = results['documents'][0][0]

In [25]:
# generate a response combining the prompt and data we retrieved in step 2
output = ollama.generate(
  model="llama3",
  prompt=f"Using this data: {data}. Respond to this prompt: {prompt}"
)

print(output['response'])

Based on the data provided, there are several drawbacks associated with the combustion of diesel fuels:

1. **Flash point concerns**: Shorter POMEs have extremely low flash points, which can present a safety hazard.
2. **Poor water solubility**: Increased molecular weight (backbone length or end-group size) leads to decreased water solubility, making it more difficult to mix with water-based additives or clean up spills.
3. **Solidification at room temperature**: Longer POMEs become solids at room temperature, which can affect their handling and storage.

These drawbacks highlight the importance of considering the properties of POMEs when designing fuel blends for combustion applications.
