# Introduction 

This notebook walks through the logic behind the chatbot application

# Sections

- Section 1: High-Level Overview of the Application
- Section 2: Data
- Section 3: Models
- Section 4: Prompts
- Section 5: Application Logic

In [1]:
import sys
sys.path.append("../")
import importlib
import os
import pathlib
import shutil
import re
import action
import prompt
import numpy as np
import pandas as pd
from dotenv import load_dotenv
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.foundation_models.extensions.langchain import (
    WatsonxLLM,
)
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker
from hf_hub import HuggingFaceHubEmbeddings

# Section 1: High-Level Overview

This generative AI-powered application is made up of the following 4 core components:

- Input Data
- Models:
  - LLMs: Models (LLAMA, FLAN, GRANITE) which take text as input and return text as output. This is the "generative AI" component of our application
  - Search Model: Models to help us to search documents for information that is relevant to the customer question. In the pilot code below, we used simple semantic search. In production, elastic search is recommended.
- Prompts: Texts which are provided as input to the LLMs, usually taking the form of instructions. 
- Application Logic: The code written to chain the different LLM inputs and outputs together to build a full-fledged application

Displayed below is a diagram which depicts how the models, prompts, and application logic are tied together to form the full chatbot application. We've split the diagram into 2 solutions: the first is the main chat application, and the second is for FAQ extraction. The main solution is solution 1. 

![](arch4.jpg)

For solution 1, reading from left to right, here is how a user's question is processed by the GenAI application. 

* First, the user asks the question in watson assistant.
* The first prompt that we use is the "routing prompt" which is used to determine whether the user question should be answered by referring to data from a transactional database or by referring to unstructured data
* The second prompt that we use is the "property name prompt" which is used to determine whether the property name is mentioned in the user question
* Depending on the result of the first prompt, the logic diverges into one of two pipelines.
  * RAG pipeline: In the RAG pipeline, we find relevant documents using our search mechanism, then pass the documents and the third prompt (RAG prompt) to the LLM. The LLM uses the retrieved relevant documents to answer the user query. We use the fourth prompt (custom response prompt" to find images which are relevant to the response obtained from the RAG prompt.
  * SQL pipeline: In the SQL pipeline, we use the fifth prompt (SQL prompt) to convert the user question to an SQL query, then we query the SQL database. Finally, the output of the SQL query is converted back to natural language using the "direct answer prompt"
* Finally output is relayed back to the user in watson assistant.


# Section 2: Input Data

In the pilot, we took as input some structured table data exported from SalesForce, and some unstructured text data about several UEM properties. In the cell below, we read that data and do some preprocessing.

For the structured data, we add the Excel data into an SQL database.
For the unstructured data, we read it from the text file and split it up into chunks.

Important Note: In the production case, the data ingestion process will look quite different.

In [2]:


# Structured Data
engine = create_engine("sqlite:///database.db", echo=False)
data = pd.read_excel("../../backend/data/sql/zig-minh-connaught-sample.xlsx", sheet_name=None)
for k, v in data.items():
    v.columns = [x.replace(" ", "_") for x in v.columns]
    table = k.split(" ")[0]
    v.to_sql(table, con=engine, index=False, if_exists="replace")
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)

db = SessionLocal()
with db.connection().engine.connect() as conn:
    connaught = pd.read_sql(text("SELECT * from connaught"), conn)
    minh = pd.read_sql(text("SELECT * from minh"), conn)
    zig = pd.read_sql(text("SELECT * from zig"), conn)


# Unstructured Data
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
text_files = [
    "../data/qa/the-connaught-one.txt",
    "../data/qa/the-minh.txt",
    "../data/qa/residensi-zig.txt",
]
docs = [
    Document(
        page_content=open(x, encoding="utf-8").read(),
        metadata={"filename": pathlib.Path(x).stem},
    )
    for x in text_files
]
docs = text_splitter.split_documents(docs)

# Section 3.1: LLM Models

In the cell below, we instantiate the LLMs which we will use to build out the application.
Notice that this is where we authenticate to watsonx, and are able to access different LLMs i.e. FLAN, LLAMA and GRANITE.
We will use these as the building blocks of our AI application.

In [3]:
import sys
sys.path.append("../")

import importlib
import os
import pathlib
import shutil

import numpy as np
import pandas as pd
import prompt
from dotenv import load_dotenv
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.foundation_models.extensions.langchain import (
    WatsonxLLM,
)
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from langchain.docstore.document import Document
from langchain.embeddings import HuggingFaceHubEmbeddings
from langchain.schema.embeddings import Embeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from sentence_transformers import SentenceTransformer
from sqlalchemy import create_engine, text

load_dotenv()

engine = create_engine("sqlite://", echo=False)

MODELS = [
    ModelTypes.FLAN_T5_XXL,
    ModelTypes.LLAMA_2_70B_CHAT,
    "meta-llama/llama-3-70b-instruct",
]

MODELS = {
    x: WatsonxLLM(
        model=Model(
            model_id=x,
            credentials={
                "apikey": os.getenv("IBM_API_KEY"),
                "url": "https://us-south.ml.cloud.ibm.com",
            },
            params={
                GenParams.DECODING_METHOD: "greedy",
                GenParams.MAX_NEW_TOKENS: 300,
                GenParams.TEMPERATURE: 0,
                GenParams.RANDOM_SEED: 12345,
                GenParams.STOP_SEQUENCES: ["\n\n"],
            },
            project_id=os.getenv("PROJECT_ID"),
        )
    )
    for x in MODELS
}


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
#this is how a model works
print(MODELS[ModelTypes.FLAN_T5_XXL]("hello how are you?"))

  warn_deprecated(


i am fine


# Section 3.2: Search Model

In the pilot, we used simple semantic search using a HuggingFace model as shown below. In production, elastic search is recommended.

In [5]:
embeddings = HuggingFaceHubEmbeddings(
    model="sentence-transformers/all-mpnet-base-v2",
    task="feature-extraction",
    huggingfacehub_api_token=os.getenv("HF_TOKEN"),
)

vdb = FAISS.from_documents(docs, embeddings)
vdb.save_local("../../backend/vdb")
vdb = FAISS.load_local("../../backend/vdb", embeddings, allow_dangerous_deserialization=True)

# Section 4: Defining the Prompts

The prompts are currently maintained in prompt.py, which you can find in notebook/prompt.py.

In the below cell we display the RAG prompt after importing it.

You can explore by importing other prompts.

The prompts are described in Section 1, so they will not be repeated here.

In [6]:
import prompt

print("RAG prompt: \n", prompt.QUESTION_TEMPLATE, "\n************************\n")

RAG prompt: 
 
Context information is below.
---------------------
{{context}}
---------------------

Given only the context information and no prior knowledge, answer the query in a brief and concise manner using only one sentence.  

Avoid statements like 'Based on the context, ...' or 'According to the provided context ...', or anything along those lines.

If you don't know the answer to a query, say "I do not know".

If the user did not ask a question, you should reply accordingly in a conversational manner. 

Query: {{question}} 
 
Response: 

 
************************



# Section 5: Application Logic

The primary application logic utilizes several "action" functions which are imported . You can think of the "action" functions as the types of questions that can be handled by the chatbot. Recall that in the architecture diagram defined in the first cell, depending on the routing prompt output, we follow the "RAG pipeline" or the "SQL pipeline. The 2 primary action functions "property_specific_general_query" and "transactional_query" define the RAG pipeline and SQL pipeline respectively.

Note that we do not yet have a function, nor the input data to handle  non-property specific RAG queries. It is implemented as a placeholder function called "general_query" which you can refer to in action.py

![](arch5.jpg)

In [7]:
import action
import inspect

print("SQL PIPELINE: transactional_query")
print("********************************************************************************************************************")
print(inspect.getsource(action.transactional_query))
print("********************************************************************************************************************")

print("RAG PIPELINE: property_specific_general_query")
print("********************************************************************************************************************")
print(inspect.getsource(action.property_specific_general_query))
print("********************************************************************************************************************")

SQL PIPELINE: transactional_query
********************************************************************************************************************
def transactional_query(params: ActionParams):
    question = params["question"]
    models = params["models"]
    db = params["db"]
    # if params["property"] is None:
    #     return {
    #         "generated_text": "Sounds like you're asking a question about a property. Kindly specify a valid property name so that I can answer this question correctly.",
    #         "custom_response": {},
    #     }
    if params["property"] is not None and params["detected_property_name"] == "NONE":
        prop_replace = params["property"]
        question = f"For {prop_replace}, {question}"
    prompt = build_prompt(
        SQL_TEMPLATE.replace("{{question}}", question), SQL_SYSTEM_PROMPT
    )
    sql = models[ModelTypes.LLAMA_2_70B_CHAT](prompt).strip()
    if "```" in sql:
        sql = re.search("```\n([\S\s]*);\n", sql).group(1)
    if "

In the cell below, you can see the "generate" function which ties the data, models, actions, and prompts into a GenAI Q&A bot.

In [8]:
from action import ACTIONS
from prompt import DEFAULT_SYSTEM_PROMPT, ROUTING_TEMPLATE, build_prompt, PROPERTY_TEMPLATE

    
def generate(generate_request):
    params = {
    "models": MODELS,
    "db": db,
    "vdb": vdb,
    "property": None, 
    }
    k_docs = generate_request["k_docs"]
    wa_property = generate_request["current_page"]
    question = generate_request["question"]
    prompt = ROUTING_TEMPLATE.replace("{{question}}", question)
    action_output = int(params["models"][ModelTypes.FLAN_T5_XXL](prompt).strip())
    if action_output in [1, 3]:
        property = None
        property_name = params["models"][ModelTypes.FLAN_T5_XXL](
            PROPERTY_TEMPLATE.replace("{{question}}", question)
        ).strip()
        if wa_property is not None:
            property = wa_property
        if property_name != "NONE":
            property = property_name
        params.update(
            {
                "question": question,
                "property": property,
                "detected_property_name": property_name
            }
        )
    else:
        params.update({"question": question, "property": None, "detected_property_name": "NONE"})
    print(f"Question: {question} - [Action: {action_output}]")
    generated_text, custom_response = ACTIONS[action_output](params).values()
    return {"generated_text": generated_text, "custom_response": custom_response}

In [13]:
#test on a question
wa_request = {
    "k_docs": 3,
    "current_page": "the-minh",
    "question": "Give me the minimum price of available units?"
}
generate(wa_request)

Question: Give me the minimum price of available units? - [Action: 1]
Generated SQL Query: SELECT min(Price)
FROM minh
WHERE Status = 'Available';


{'generated_text': 'The MINH has a minimum price of 1,404,800.',
 'custom_response': {}}