## LangChain (chat) prompt template & Structured output with Pydantic 
- [structured chat](https://python.langchain.com/v0.1/docs/modules/agents/agent_types/structured_chat/)
- [TavilySearchResults](https://python.langchain.com/docs/integrations/tools/tavily/)
- [Search Agent(in Korean)](https://teddylee777.github.io/langchain/langchain-agent/)
    - 단계별 추적을 위한 LangSmith 설정
        - LangChain으로 구축한 애플리케이션은 여러 단계에 걸쳐 LLM 호출을 여러 번 사용하게 됩니다.
        - 단계별로 호출되는 로직이 더 복잡해짐에 따라, 체인이나 에이전트 내부에서 정확히 무슨 일이 일어나고 있는지 조사할 수 있는 능력이 매우 중요해집니다. 이를 위한 최선의 방법은 LangSmith를 사용하는 것입니다.
        - LangSmith가 필수는 아니지만, 매우 유용한 도구입니다. LangSmith를 사용하고 싶다면, 위의 링크에서 가입한 후, 로깅 추적을 시작하기 위해 환경 변수를 설정해야 합니다.
- [Agent Types](https://python.langchain.com/docs/modules/agents/agent_types/)  

----- 
- [Genrating Text Roles](https://platform.openai.com/docs/guides/text-generation)
- [Prompt Engineering Guide](https://platform.openai.com/docs/guides/prompt-engineering)
    - [LangChain Blog](https://blog.langchain.dev/generating-usable-text-with-ai/)
- [PrmptTemplates - official website](https://python.langchain.com/docs/modules/prompts/prompt_templates/) / [prompt_templates - medium](https://medium.com/@ssmaameri/prompt-templates-in-langchain-efb4da260bd3#65e8)
- [Chain](https://python.langchain.com/docs/how_to/sequence/)
- [Prompt](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/quick_start/)
- [prompt template vs chat prompt template](https://rudaks.tistory.com/entry/langchain-PromptTemplate-ChatPromptTemplate-사용법) \

- [LangSmith Experiment 평가 비교](https://wikidocs.net/259214)
- [Groundedness Evaluator : Hallucination 평가](https://wikidocs.net/259216)
- [Pairwise Evaluation : 2개 이상 LLM 생성물 비교](https://wikidocs.net/259217)


In [1]:
import os
from dotenv import load_dotenv

# Loading API KEY from environment
load_dotenv()
OpenAI_key = os.getenv("OPENAI_API_KEY")  

In [2]:
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from langchain.prompts import ChatPromptTemplate, PromptTemplate 
from langchain_core.output_parsers import PydanticOutputParser
from typing import List
import json
import pandas as pd

In [3]:
prompt_path = "prompts/Structured_prompt_Experiments_prompts/"

def red_prompt(prompt_path):
    with open(prompt_path, 'r') as file:
        return file.read()

system_input = red_prompt(prompt_path + "system_input.txt")
human_input = red_prompt(prompt_path + "human_input.txt")


In [4]:
# Define Pydantic models
class SubCategory(BaseModel):
    name: str = Field(..., description="name of the insurance sub-category")
    description: str = Field(..., description="description of the insurance sub-category")

class InsuranceCategory(BaseModel):
    main_category_name: str = Field(..., description="name of the insurance category that encompasses its sub-categories")
    sub_categories: List[SubCategory] = Field(..., description="list of sub-categories under this main category")

class InsuranceCategories(BaseModel):
    categories: List[InsuranceCategory] = Field(..., description="list of insurance categories")

llm = ChatOpenAI(model_name="gpt-4o", max_tokens=10000, temperature=0)

parser = PydanticOutputParser(pydantic_object=InsuranceCategories)

format_instructions = parser.get_format_instructions()
print(format_instructions)

prompt = ChatPromptTemplate.from_messages([
    ("system", system_input),
    ("human", human_input)
])

chain = prompt | llm

response = chain.invoke({"format_instructions": format_instructions})

structured_output = parser.parse(response.content)


The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"InsuranceCategory": {"properties": {"main_category_name": {"description": "name of the insurance category that encompasses its sub-categories", "title": "Main Category Name", "type": "string"}, "sub_categories": {"description": "list of sub-categories under this main category", "items": {"$ref": "#/$defs/SubCategory"}, "title": "Sub Categories", "type": "array"}}, "required": ["main_category_name", "sub_categories"], "title": "InsuranceCategory", "type": "object"}, "SubCategory": {"properties": {"name": {"description": "name of the

In [5]:
insurance_data = []

for category in structured_output.categories:  # Access the categories list
    for sub_cat in category.sub_categories:
        insurance_data.append({
            'main_category': category.main_category_name,
            'sub_category': sub_cat.name,
            'description': sub_cat.description
        })

insurance_df = pd.DataFrame(insurance_data)
insurance_df.to_csv('insurance_categories.txt', index=False)
print("saved to 'insurance_categories.txt'.")
insurance_df

saved to 'insurance_categories.txt'.


Unnamed: 0,main_category,sub_category,description
0,Health Insurance,Individual Health Insurance,"Covers medical expenses for an individual, inc..."
1,Health Insurance,Family Health Insurance,Provides health coverage for all family member...
2,Health Insurance,Critical Illness Insurance,Offers a lump sum benefit upon diagnosis of a ...
3,Health Insurance,Dental Insurance,"Covers dental care expenses, including routine..."
4,Health Insurance,Vision Insurance,"Provides coverage for eye care, including eye ..."
...,...,...,...
235,Income Protection Insurance,Short-Term Income Protection Insurance,Provides income replacement for a limited peri...
236,Income Protection Insurance,Long-Term Income Protection Insurance,Offers income replacement for an extended peri...
237,Income Protection Insurance,Accident and Sickness Insurance,Covers income loss due to accidents or illness...
238,Income Protection Insurance,Redundancy Insurance,Provides income replacement if the policyholde...


## Open AI chat completion with structured output 


In [6]:
import os
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables and initialize clients
load_dotenv()
OpenAI_key = os.getenv("OPENAI_API_KEY")  
client = OpenAI()

In [7]:
# Import necessary libraries
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser 
from trulens.providers.openai import OpenAI
import pandas as pd

# Define Pydantic models
class ResponseModel(BaseModel):  # Renamed class to avoid conflict with variable name
    answer: str = Field(..., description="answer to the question")

parser = PydanticOutputParser(pydantic_object=ResponseModel)  # Updated to use the renamed class
format_instructions = parser.get_format_instructions() 

# Load data
df = pd.read_csv('insurance_company_db.csv')

prompt_path = "prompts/classification_prompts/"

def read_prompt(prompt_path):
    with open(prompt_path, 'r') as file:
        return file.read()
    
system_input_stage_1 = read_prompt(prompt_path + "classification_system_stage_1.txt")
human_input_stage_1 = read_prompt(prompt_path + "classification_human_stage_1.txt")

input_data = df.iloc[0]

# chat completion format reference ( passing variables to messages ) : https://docs.smith.langchain.com/prompt_engineering/tutorials/optimize_classifier 
# chat completion structured output reference : https://platform.openai.com/docs/guides/structured-outputs
# completion = client.chat.completions.create(
completion = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {
            "role": "system", 
            "content": system_input_stage_1.format(format_instructions=format_instructions)
        },
        {
            "role": "user", 
            "content": human_input_stage_1.format(company=input_data['company'], title=input_data['title'], content=input_data['content'])
        }
    ],
    response_format=ResponseModel
).choices[0].message.parsed   

print(completion.answer)


yes
