# Photography lenses sales agent

## Part 1: Preprocessing

In the first part we are going to preprocess our data by splitting into chunks, and then we are going to store it in a vector database. We are going to use the OpenAI embeddings, and the Chroma vector store.

In [13]:
import os
from authentication import get_open_ai_api_key

os.environ["OPENAI_API_KEY"] = get_open_ai_api_key()

The first we have to do is preprocess our product data, and store it in a vector database. The product data is a json file (lenses.json), and each object contains information about a camera lens: its name, price, shipping time, warranty, some technical specs, and a description.

In [2]:
import json

from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(
  chunk_size=500,
  chunk_overlap=50,
  separator="},"
)

with open("lenses.json", "r") as f:
    lenses_data = json.loads(f.read())

lenses = splitter.create_documents([str(lenses_data)])

Created a chunk of size 577, which is longer than the specified 500
Created a chunk of size 507, which is longer than the specified 500
Created a chunk of size 568, which is longer than the specified 500
Created a chunk of size 548, which is longer than the specified 500
Created a chunk of size 570, which is longer than the specified 500
Created a chunk of size 504, which is longer than the specified 500
Created a chunk of size 513, which is longer than the specified 500
Created a chunk of size 549, which is longer than the specified 500
Created a chunk of size 576, which is longer than the specified 500
Created a chunk of size 518, which is longer than the specified 500
Created a chunk of size 572, which is longer than the specified 500


We can see that the data has been split in 20 different documents, and this is exactly the number of products we have in our json file. Choosing a chunk_size of 500 and a separator of "}," helped here, as each product object is approximately 500 characters long, and the separator is the end of a product object.

In [3]:
print(len(lenses))
print(lenses[0])

20
page_content="[{'name': 'Orion 24mm f/1.4 Wide-Angle Lens', 'price': 749, 'shipping_time': '3-5 business days', 'warranty': '2 years', 'technical_details': '24mm focal length, f/1.4 maximum aperture, manual focus', 'info': 'The Orion 24mm f/1.4 Wide-Angle Lens is perfect for landscape and astrophotography. Its wide-angle view captures expansive scenes, and the fast f/1.4 aperture performs well in low light.'" metadata={}


In [3]:
lenses[5]

Document(page_content="{'name': 'Panasonic Lumix G Leica DG Nocticron 42.5mm f/1.2 Lens', 'price': 1597, 'shipping_time': '3-5 business days', 'warranty': '2 years', 'technical_details': '42.5mm focal length, f/1.2 maximum aperture, Power O.I.S. stabilization', 'info': 'The Panasonic Lumix G Leica DG Nocticron 42.5mm f/1.2 is a prime lens that offers a fast maximum aperture and is perfect for portraiture with its smooth out-of-focus quality when working with shallow depth of field techniques.', 'compatible_with': ['Panasonic Lumix DC-GH5 II', 'Panasonic Lumix DC-G9', 'Panasonic Lumix DC-GX9']", metadata={})

Now we can create a vector database, and store our documents in it.

In [4]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = OpenAIEmbeddings()
lenses_data = Chroma.from_documents(lenses, embeddings=embeddings, collection_name="lenses")

Using embedded DuckDB without persistence: data will be transient
No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction


## Part 2: Creating the retrieval model

In [5]:
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain

In [6]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

In [7]:
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), lenses_data.as_retriever(), memory=memory)

In [8]:
qa("What lenses do you recommend for landscapes?")

{'question': 'What lenses do you recommend for landscapes?',
 'chat_history': [HumanMessage(content='What lenses do you recommend for landscapes?', additional_kwargs={}, example=False),
  AIMessage(content=' I recommend the Panasonic Lumix G X Vario 12-35mm f/2.8 II ASPH. POWER O.I.S. Lens, the Olympus M.Zuiko Digital ED 40-150mm f/2.8 PRO Lens, the Canon EF 24-70mm f/2.8L II USM Lens, and the Fujifilm XF 16-55mm f/2.8 R LM WR Lens. All of these lenses cover a wide range of focal lengths, making them perfect for capturing landscapes.', additional_kwargs={}, example=False)],
 'answer': ' I recommend the Panasonic Lumix G X Vario 12-35mm f/2.8 II ASPH. POWER O.I.S. Lens, the Olympus M.Zuiko Digital ED 40-150mm f/2.8 PRO Lens, the Canon EF 24-70mm f/2.8L II USM Lens, and the Fujifilm XF 16-55mm f/2.8 R LM WR Lens. All of these lenses cover a wide range of focal lengths, making them perfect for capturing landscapes.'}

In [9]:
qa("And for portraits")['answer']

' I recommend the Olympus M.Zuiko Digital ED 40-150mm f/2.8 PRO Lens, the Panasonic Lumix G Leica DG Nocticron 42.5mm f/1.2 Lens, and the Canon EF 24-70mm f/2.8L II USM Lens. All of these lenses are perfect for portraits with their fast maximum apertures and versatile focal lengths.'

In [12]:
qa("Now recommend me some lenses within the $1000-$2000 budget, compatible with Canon")["answer"]

' The Canon EF 24-70mm f/2.8L II USM Lens and the Canon EF 85mm f/1.2L II USM Lens are both compatible with Canon cameras and are within the $1000-$2000 budget.'