# Photography lenses sales agent

## Part 1: Preprocessing

In the first part we are going to preprocess our data by splitting into chunks, and then we are going to store it in a vector database. We are going to use the OpenAI embeddings, and the Chroma vector store.

In [1]:
import os

os.environ["OPENAI_API_KEY"] = ""

The first we have to do is preprocess our product data, and store it in a vector database. The product data is a json file (lenses.json), and each object contains information about a camera lens: its name, price, shipping time, warranty, some technical specs, and a description.

In [None]:
import json

from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(
  chunk_size=500,
  chunk_overlap=50,
  separator="},"
)

with open("lenses.json", "r") as f:
    lenses_data = json.loads(f.read())

lenses = splitter.create_documents([str(lenses_data)])

We can see that the data has been split in 20 different documents, and this is exactly the number of products we have in our json file. Choosing a chunk_size of 500 and a separator of "}," helped here, as each product object is approximately 500 characters long, and the separator is the end of a product object.

In [4]:
print(len(lenses))
print(lenses[0])

20
page_content="[{'name': 'Orion 24mm f/1.4 Wide-Angle Lens', 'price': 749, 'shipping_time': '3-5 business days', 'warranty': '2 years', 'technical_details': '24mm focal length, f/1.4 maximum aperture, manual focus', 'info': 'The Orion 24mm f/1.4 Wide-Angle Lens is perfect for landscape and astrophotography. Its wide-angle view captures expansive scenes, and the fast f/1.4 aperture performs well in low light.'" metadata={}


In [63]:
lenses[5]

Document(page_content='{\'name\': \'ProFocus 135mm f/2 Portrait Lens\', \'price\': 949, \'shipping_time\': \'3-5 business days\', \'warranty\': \'2 years\', \'technical_details\': \'135mm focal length, f/2 maximum aperture, autofocus\', \'info\': "The ProFocus 135mm f/2 Portrait Lens provides a longer focal length for portrait photography, delivering a high level of clarity and detail while maintaining a pleasing background blur. It\'s ideal for outdoor portraits and studio work."', metadata={})

Now we can create a vector database, and store our documents in it.

In [5]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = OpenAIEmbeddings()
lenses_data = Chroma.from_documents(lenses, embeddings=embeddings, collection_name="lenses")

Using embedded DuckDB without persistence: data will be transient
No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction


## Part 2: Creating the retrieval model

In [70]:
AGENT_ROLE_PROMPT_TEMPLATE = """
You are a photography sales agent, helping customers choose the right camera lens for their needs.
Your name is {agent_name}, and you work for {company_name}.
You have high expertise and knowledge in photography and photography gear.
Keep your responses in short length to retain the user's attention.

You must respond according to the conversation history and the conversation stage that you are in, only generating one response at a time.

{context}

Question: {question}
"""

In [73]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

llm = ChatOpenAI(temperature=0.5)

SALES_PROMPT = PromptTemplate(
    template=AGENT_ROLE_PROMPT_TEMPLATE,
    input_variables=[
       "agent_name", "company_name", "context", "question"
    ],
)
sales_qa = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=lenses_data.as_retriever(),
    chain_type_kwargs={"prompt": SALES_PROMPT},
)

### TODO: cannot pass the agent name and company name as input variables,

In [None]:
sales_qa.run(query="What types of lenses do you recommend for landscape?", agent_name="John", company_name="LensCo")

In [56]:
sales_qa.run(query="What is your name and where do you work?", agent_name="John", company_name="LensCo")

'My name is [Your Name] and I work as a photography sales agent at [Company Name]. How can I help you today?'

'Hello! My name is [insert name] and I work as a photography sales agent at [insert company name]. How can I assist you today?'

In [65]:
sales_qa.run(query="List me all the lenses you have")

'We have the EagleEye 100mm f/2.8 Macro Lens, Orion 24mm f/1.4 Wide-Angle Lens, ProFocus 105mm f/2 Portrait Lens, and Orion 40mm f/2 Pancake Lens available. Which one are you interested in?'

## Part 3: Creating a summarization chain

In [58]:
SUMMARIZATION_PROMPT = """
You are the chief of staff at a photography gear company. You are in charge of answering questions regarding our products. You have high expertise and knowledge in photography and photography gear, and you have access to all details about all our products.

The following is a text that contains information about all of our products:
{context}

Question: {question}
"""

In [None]:
from langchain.chains.summarize import load_summarize_chain

PROMPT = PromptTemplate(template=SUMMARIZATION_PROMPT, input_variables=["context"])
chain = load_summarize_chain(llm, chain_type="stuff", prompt=PROMPT)

chain.run(lenses)

'This is a list of various lenses with different focal lengths and apertures designed for specific types of photography such as landscape, portrait, macro, and wildlife. They range in price from $399 to $1399 and come with a 2-year warranty and a shipping time of 3-5 business days. Some are manual focus while others have autofocus capabilities.'

ValidationError: 1 validation error for PromptTemplate
__root__
  Invalid prompt schema; check for mismatched or missing input parameters. {'context'} (type=value_error)