# Structure answers with OpenAI functions

OpenAI functions allows for structuring of response output. This is often useful in question answering when you want to not only get the final answer but also supporting evidence, citations, etc.

In this notebook we show how to use an LLM chain which uses OpenAI functions as part of an overall retrieval pipeline.

In [1]:
# !pip install langchain_openai

In [2]:
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

In [3]:
# !pip install chromadb

In [5]:
import os
from dotenv import load_dotenv
load_dotenv()
# os.environ["OPENAI_API_KEY"]

True

In [6]:
loader = TextLoader("/Users/bytedance/Documents/GitHub/langchain/docs/docs/how_to/state_of_the_union.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
for i, text in enumerate(texts):
    text.metadata["source"] = f"{i}-pl"
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)

In [7]:
from langchain.chains import create_qa_with_sources_chain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

In [8]:
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")

In [9]:
qa_chain = create_qa_with_sources_chain(llm)

In [10]:
doc_prompt = PromptTemplate(
    template="Content: {page_content}\nSource: {source}",
    input_variables=["page_content", "source"],
)

In [11]:
final_qa_chain = StuffDocumentsChain(
    llm_chain=qa_chain,
    document_variable_name="context",
    document_prompt=doc_prompt,
)

In [14]:
retrieval_qa = RetrievalQA(
    retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain
)

In [15]:
query = "总统是如何谈论俄罗斯的？"
# query ="What did the president say about russia"

In [17]:
retrieval_qa.invoke(query)

{'query': '总统是如何谈论俄罗斯的？',
 'result': '{"answer":"总统在讲话中谴责俄罗斯总统普京的侵略行为，并表示美国将与其他国家合作对俄罗斯实施制裁，释放石油储备以缓解燃油价格上涨，并关闭美国领空对俄罗斯飞机。此外，总统强调美国将提供军事援助、经济援助和人道主义援助给乌克兰，并承诺不会与俄罗斯军队在乌克兰交战，而是为了保卫北约盟友。","sources":["0-pl","2-pl","6-pl","4-pl"]}'}

## Using Pydantic

If we want to, we can set the chain to return in Pydantic. Note that if downstream chains consume the output of this chain - including memory - they will generally expect it to be in string format, so you should only use this chain when it is the final chain.

In [18]:
qa_chain_pydantic = create_qa_with_sources_chain(llm, output_parser="pydantic")

In [19]:
final_qa_chain_pydantic = StuffDocumentsChain(
    llm_chain=qa_chain_pydantic,
    document_variable_name="context",
    document_prompt=doc_prompt,
)

In [20]:
retrieval_qa_pydantic = RetrievalQA(
    retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic
)

In [21]:
retrieval_qa_pydantic.invoke(query)

{'query': '总统是如何谈论俄罗斯的？',
 'result': AnswerWithSources(answer='总统在讲话中谈到俄罗斯的侵略行为，强调俄罗斯总统普京的错误判断和对乌克兰的攻击。他提到俄罗斯对乌克兰的侵略是经过预谋的、毫无挑衅的，并指出西方和北约做好了准备，与其他自由国家建立了联盟，共同对抗普京。总统还宣布了对俄罗斯实施制裁的强有力行动，包括释放全球储备的石油以减轻国内的燃油价格，并关闭美国领空对俄罗斯航班。此外，总统强调美国及其盟国将继续支持乌克兰，提供军事援助、经济援助和人道主义援助。', sources=['0-pl', '2-pl', '6-pl', '4-pl'])}

## Using in ConversationalRetrievalChain

We can also show what it's like to use this in the ConversationalRetrievalChain. Note that because this chain involves memory, we will NOT use the Pydantic return type.

In [22]:
from langchain.chains import ConversationalRetrievalChain, LLMChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\
Make sure to avoid using any unclear pronouns.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
condense_question_chain = LLMChain(
    llm=llm,
    prompt=CONDENSE_QUESTION_PROMPT,
)

  warn_deprecated(


In [23]:
qa = ConversationalRetrievalChain(
    question_generator=condense_question_chain,
    retriever=docsearch.as_retriever(),
    memory=memory,
    combine_docs_chain=final_qa_chain,
)

  warn_deprecated(


In [24]:
query = "总统对Ketanji Brown Jackson说了什么。"
result = qa({"question": query})

  warn_deprecated(


In [25]:
result

{'question': '总统对Ketanji Brown Jackson说了什么。',
 'chat_history': [HumanMessage(content='总统对Ketanji Brown Jackson说了什么。'),
  AIMessage(content='{"answer":"总统对Ketanji Brown Jackson说: One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.","sources":["31-pl"]}')],
 'answer': '{"answer":"总统对Ketanji Brown Jackson说: One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.","sources":["31-pl"]}'}

In [26]:
query = "他说了什么关于她的前任?"
# query = "what did he say about her predecessor?"
result = qa({"question": query})

In [27]:
result

{'question': '他说了什么关于她的前任?',
 'chat_history': [HumanMessage(content='总统对Ketanji Brown Jackson说了什么。'),
  AIMessage(content='{"answer":"总统对Ketanji Brown Jackson说: One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.","sources":["31-pl"]}'),
  HumanMessage(content='他说了什么关于她的前任?'),
  AIMessage(content='{"answer":"总统对Ketanji Brown Jackson的前任说了Justice Breyer, thank you for your service.","sources":["31-pl"]}')],
 'answer': '{"answer":"总统对Ketanji Brown Jackson的前任说了Justice Breyer, thank you for your service.","sources":["31-pl"]}'}

## Using your own output schema

We can change the outputs of our chain by passing in our own schema. The values and descriptions of this schema will inform the function we pass to the OpenAI API, meaning it won't just affect how we parse outputs but will also change the OpenAI output itself. For example we can add a `countries_referenced` parameter to our schema and describe what we want this parameter to mean, and that'll cause the OpenAI output to include a description of a speaker in the response.

In addition to the previous example, we can also add a custom prompt to the chain. This will allow you to add additional context to the response, which can be useful for question answering.

In [30]:
from typing import List

from langchain.chains.openai_functions import create_qa_with_structure_chain
from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_core.messages import HumanMessage, SystemMessage
from pydantic import BaseModel, Field

In [36]:
# class CustomResponseSchema(BaseModel):
#     """针对所提出的问题给出一个含有来源的回答。"""

#     answer: str = Field(..., description="回答提出的问题")
#     countries_referenced: List[str] = Field(
#         ..., description="所有在来源中提到的国家"
#     )
#     sources: List[str] = Field(
#         ..., description="回答问题所使用的来源列表"
#     )

class CustomResponseSchema(BaseModel):
    """An answer to the question being asked, with sources."""
    answer: str = Field(..., description="Answer to the question that was asked")
    countries_referenced: List[str] = Field(
        ..., description="All of the countries mentioned in the sources"
    )
    sources: List[str] = Field(
        ..., description="List of sources used to answer the question"
    )

In [37]:
prompt_messages = [
    SystemMessage(
        content=(
            "你是一个世界级算法"
            "能够以特定格式回答问题。"
        )
    ),
    HumanMessage(content="依据具体的上下文，更好地回答问题。"),
    HumanMessagePromptTemplate.from_template("{context}"),
    HumanMessagePromptTemplate.from_template("问题: {question}"),
    HumanMessage(
        content="提示: 请确保以正确的格式回答问题。以加粗字体列出所有在来源中提到的国家。"
    ),
]

chain_prompt = ChatPromptTemplate(messages=prompt_messages)

qa_chain_pydantic = create_qa_with_structure_chain(
    llm=llm, schema=CustomResponseSchema, output_parser="pydantic", prompt=chain_prompt
)
final_qa_chain_pydantic = StuffDocumentsChain(
    llm_chain=qa_chain_pydantic,
    document_variable_name="context",
    document_prompt=doc_prompt,
)
retrieval_qa_pydantic = RetrievalQA(
    retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic
)
query = "他针对俄罗斯说了什么？"
retrieval_qa_pydantic.run(query)

ValueError: Must provide a pydantic class for schema when output_parser is 'pydantic'.