<a href="https://colab.research.google.com/github/budianto98/simple-llm-based-scrapper/blob/main/SimpleReAct_HSCode.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# ! pip install python-dotenv
# ! pip install langchain-community
# ! pip install langchain
# ! pip install openai
# ! pip install langchain-openai
# ! pip install faiss-cpu
# ! pip install serpapi
# ! pip install google-search-results

# 0. Loading the environment Variable for chatgpt, serper_api, etc.

In [2]:
import os
from dotenv import load_dotenv,find_dotenv, dotenv_values
dd = load_dotenv(find_dotenv())

config = dotenv_values(".env")
# for p in config:
#     print(p, config[p])




# 1. Agent with *ReAct* Logic

There are two tools used by this agent:
1. Customize Conversational memory
2. GoogleSearch API

There is one Agent (1.2)



## 1.1. Tools

*Customized Conversation Memory Tool*

It has two input (as we can see from the prompt):
1. Chat History {history}
2. Human Description {input}

and a memory based on HS Code

Below is the details:
- First we make the prompt template (1.1.1) and create the llm model (1.1.2).
- Than we create the conversation modelwith memory of HSCode from Faiss embedding data (1.1.3)
- Finally we create Google Search API tool and put them together in a list

### 1.1.1. PROMPT template

In [3]:
from langchain.prompts.prompt import PromptTemplate

_DEFAULT_TEMPLATE = """
Here are 2 examples of the HS code reference:
(index: 3011
section: XVI
hscode: 854141
description: Electrical apparatus; photosensitive semiconductor devices, light emitting diodes (LED)
level: 6
parent: 85414
Answer: The HS Code for the above item is 854141. Description is "Electrical apparatus; photosensitive semiconductor devices, light emitting diodes (LED)". Index of the HS code 854141 is 3011.)

(index: 322
section: I
hscode: 030543
description: Fish; smoked, whether or not cooked before or during smoking, trout (Salmo trutta, Oncorhynchus mykiss/clarki/aguabonita/gilae/apache/chrysogaster), includes fillets, but excludes edible fish offal
level: 6
parent: 03054
Answer: The HS Code for the above item is 030543. Description is "Fish: smoked, whether or not cooked before or during smoking, trout (Salmo trutta, Oncorhynchus mykiss/clarki/aguabonita/gilae/apache/chrysogaster), includes fillets, but excludes edible fish offal". Index of the HS code 030543 is 322.)

If none of the HS code items related to human's description, treat the human's description as unknown object, and use the following format:
Answer: It seems that the object you described is unregistered, please use 999999 as your HS code instead.

Below is the chat history:
{history}

Human's current description: {input}
"""

PROMPT = PromptTemplate(
    input_variables=["history", "input"], template=_DEFAULT_TEMPLATE
)

### 1.1.2. llm_conv (LLM Model based on Azure OpenAI)

In [4]:

from langchain_openai import AzureChatOpenAI
from langchain.schema import HumanMessage



llm_conv = AzureChatOpenAI(
    temperature=0,
     deployment_name=f"{os.getenv('AZURE_OPENAI_MODEL_3')}",
      openai_api_version=f"{os.getenv('AZURE_OPENAI_API_VERSION_3')}",
      openai_api_key=f"{os.getenv('AZURE_OPENAI_API_KEY_3')}",
      azure_endpoint=f"{os.getenv('AZURE_OPENAI_ENDPOINT_3')}",
    verbose=True
    # model_name='systems-mt-evaluation-gpt-4',
)


### 1.1.3. Conversation Chat Tool with HSCode embedding data store in faiss format

In [5]:
import faiss
from langchain_openai import AzureOpenAIEmbeddings
from langchain.chains import ConversationChain
from langchain_community.vectorstores import FAISS
from langchain.memory import VectorStoreRetrieverMemory

embedding_size = 1536 # Dimensions of the OpenAIEmbeddings
index = faiss.IndexFlatL2(embedding_size)


azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_key=f"{os.environ['OPENAI_API_KEY']}",
    azure_endpoint=f"{os.environ['OPENAI_API_ENDPOINT']}",
    openai_api_type=f"{os.environ['OPENAI_API_TYPE']}",
    openai_api_version=f"{os.environ['OPENAI_API_VERSION']}",
    )



faiss_file_path = "data/faiss_db/"
faiss_index_name = "wco_hscode"

def get_retreiver_tool(search_kwargs:int):
    faiss_db = FAISS.load_local(faiss_file_path, azure_embeddings, index_name=faiss_index_name, allow_dangerous_deserialization=True)
    retriever = faiss_db.as_retriever(search_kwargs=dict(k=search_kwargs))
    return retriever

def get_faiss_db_supported_memory(retriever):
    #VectorStoreRetrieverMemory contains the key of memory
    vector_memory = VectorStoreRetrieverMemory(retriever=retriever)
    return vector_memory


vectorstore_chain = ConversationChain(
    llm=llm_conv,
    prompt=PROMPT,
    # We set a very low max_token_limit for the purposes of testing.
    memory=get_faiss_db_supported_memory(get_retreiver_tool(4)),
    verbose=True
)

### 1.1.4. Create Tools (Google Search API and ConversationChain with HSCode information)

In [6]:


from langchain.agents import Tool
from langchain.agents import load_tools
from langchain_community.utilities.serpapi import SerpAPIWrapper


# Online search tool
general_search = SerpAPIWrapper(search_engine={"engine":"google", "google_domain": "google.com", "gl":"hk"},serpapi_api_key=f"{os.environ['SERPER_API_KEY']}")
t = load_tools(["google-serper"])
tools = [
    Tool(
        name = "Online Search",
        func=general_search.run,
        description="useful online searching when receiving unknown product description from user. As sometimes user may just provide a brand or product number."
    ),
    Tool(
        name="HS Code Reference",
        func=vectorstore_chain.run,
        description="this is the hs code reference, uses human provided information to search for the most relevant hs code information.",
    ),
]


# 1.2. main agent

## Custom template

See here sample code: https://github.com/langchain-ai/langchain/discussions/12821

or https://github.com/openai/openai-cookbook/blob/main/examples/How_to_build_a_tool-using_agent_with_Langchain.ipynb

here we can see the ReAct Logic is implemented in the Prompt and Parser

In [7]:
from langchain.chains import LLMChain
from langchain.prompts import StringPromptTemplate
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.memory import ConversationBufferWindowMemory
from langchain.schema import AgentAction, AgentFinish, OutputParserException
from typing import List, Union
import re


# Set up a prompt template
class CustomPromptTemplate(StringPromptTemplate):
    # The template to use
    template: str
    # The list of tools available
    tools: List[Tool]

    def format(self, **kwargs) -> str:
        # Get the intermediate steps (AgentAction, Observation tuples)
        # Format them in a particular way
        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""
        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "
        # Set the agent_scratchpad variable to that value
        kwargs["agent_scratchpad"] = thoughts
        # Create a tools variable from the list of tools provided
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])
        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
        return self.template.format(**kwargs)

class CustomOutputParser(AgentOutputParser):

    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
        if "Final Answer:" in llm_output:
            return AgentFinish(
                # Return values is generally always a dictionary with a single `output` key
                # It is not recommended to try anything else at the moment :)
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )
        # Parse out the action and action input
        regex = r"Action\s*\d*\s*:(.*?)\n.*?Action Input\s*\d*\s*:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)
        if not match:
            raise OutputParserException(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2)
        # Return the action and action input
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)

### Here are the basic template from langchain hub

```
Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}

```

In [8]:
base_prompt = """
You are an HS code identifier, all your response should only base on hscode reference or facts.
You should never lie to human.
Observe the potential HScode items and remember the most suitable one's HS code, description and index for later use.
Use the HS code tool in order to determine which HS code suits most for the user's input.
Think back if the HS code is most suitable for the human description before you give answer.
You can always ask user for more information about their product if there are still properties missed out.

You have access to the following tools:
{tools}

Use the following format:
Question: A human input on product description or answer of the previous question of AI
Thought: online search for product detail / check the HS code reference book to gain references
Action: the action to take, most likely be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer (Do not provide a HScode if I still have any question to ask human) / I should ask human a question to provide more description to match the most detailed hs code decription
Final Answer: the HS code answer that fits most to human's description / I should ask human a follow up question

Begin!

You should encourage human user to describe more about their product by answering them with question.
HS code reference contains id,section,hscode,description,level,parent.
Level represent number of digits of HS code.
Double check the final answer if the HS code is level 6.
Double check the final answer if the ID corrispond to HScode final answer.
Double check the information provide by user fulfill all items of description to get the most suitable answer.

Use the follwing format for your final answer:
'The HS code for your product decriptions: 'Human description history'. Matches with the description in reference book: 'Description of the HS code final answer'. So the the most possible HS code is 'A 6-digits HScode', with the index I referenced to is 'The index of HS code final answer'.

Here are the wrong examples for HS code final answer, you should not use the below:
The HS code for your product decriptions: Meat and edible offal; of fowls of the species Gallus domesticus not cut in pieces, fresh, chilled, frozen, cuts and offal is 1.
(This is wrong because of wrong hscode format and missing index, the correct answer should be 02071)
The HS code for your product decriptions: Meat and edible offal; of fowls of the species Gallus domesticus not cut in pieces, fresh, chilled, frozen, cuts and offal is 02071, with index 96.
(This is wrong because of wrong index number. AI should always check if the index number matches with the HS code answer.)

Chat History:
{history}

current chat
Human: {input}
AI:
{agent_scratchpad}
"""

In [9]:
# from classes import *
# from faiss_to_memory import *
# print(tools)

memory_backed_prompt = CustomPromptTemplate(
    template=base_prompt,
    tools=tools,
    # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically
    # This includes the `intermediate_steps` variable because that is needed
    # Feel free to add varriables into the base template and the input_varriables
    input_variables=["input", "intermediate_steps", "history"]
)


## 2.2. Creating main agent and chain

In [10]:
from langchain.chains import LLMChain


main_llm_model = AzureChatOpenAI(
    temperature=0.4,
     deployment_name=f"{os.getenv('AZURE_OPENAI_MODEL_3')}",
      openai_api_version=f"{os.getenv('AZURE_OPENAI_API_VERSION_3')}",
      openai_api_key=f"{os.getenv('AZURE_OPENAI_API_KEY_3')}",
      azure_endpoint=f"{os.getenv('AZURE_OPENAI_ENDPOINT_3')}"
)

# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=main_llm_model, prompt=memory_backed_prompt)

# output parser
output_parser = CustomOutputParser()

#tools name
tool_names = [tool.name for tool in tools]


from langchain.agents import LLMSingleActionAgent

main_agent = LLMSingleActionAgent(
    llm_chain=llm_chain,
    output_parser=output_parser,
    stop=["\nObservation:"],
    # stop=["\nThought:"],
    # stop=["\nQuestion:"],
    allowed_tools=tool_names,
)


  warn_deprecated(
  warn_deprecated(


# 2. Execution

In [11]:
from langchain.agents import AgentExecutor
import time

previous_conversation_memory = ConversationBufferWindowMemory(k=10)
gpt_response = "Hi, how may I help you today"
previous_conversation_memory.save_context({"input": "Hi"}, {"output": gpt_response})

while True:
    try:
        user_input = input(gpt_response + "\n")

        # recalled_memory = memory_recalling(user_input, memory_with_history)
        agent_executor = AgentExecutor.from_agent_and_tools(
            agent=main_agent,
            tools=tools,
            verbose=True,
            memory=previous_conversation_memory,

            #use <handle_parsing_errors="Check your output and make sure it conforms!"> to customize the parsing error if action is not taken.
            handle_parsing_errors=True,
        )

        # gpt_response = agent_executor.run(user_input)
        gpt_response = agent_executor.invoke({"input": user_input})
        # previous_conversation_memory.save_context({"input": user_input}, {"output": gpt_response})

    except Exception as e:
        print(e)

        # Wait for a few seconds before restarting the loop
        time.sleep(1)

        # # Restart the loop
        continue

Hi, how may I help you today
Samsung S8


[1m> Entering new AgentExecutor chain...[0m
Connection error.


KeyboardInterrupt: Interrupted by user