# Document Summarization

## Introduction 
This demo showcases a chatbot system powered by Generative AI (OpenAI). Using technologies like <b>RAG, Langchain, and LLM models</b> users can ask questions in simple terms, retrieve relevant data, and receive concise answers. The approach integrates retrieval-based and generative techniques to deliver accurate, user-friendly insights from structured sources.

Additionally, we will be using the Teradata as a Vector Store.

The following diagram illustrates the overall architecture.

<center><img src="images/header_chat_td.png" alt="architecture" /></center>

# Steps in the analysis
1. Configuring the environment  
2. Connect to Vantage  
3. Data Exploration  
4. Generate the embeddings  
5. Load the existing embeddings to DB  
6. Calculate the VectorDistance using Teradata Vantage in-DB function  
7. LLM  
8. Chat with documents  
9. Cleanup  

# Configure the environment

In [1]:
!pip install --upgrade -r requirements.txt --quiet

Import required libraries

In [6]:
import os
import timeit
import tqdm
from tqdm.notebook import *

tqdm_notebook.pandas()

# teradata lib
from teradataml import *

# helper functions
from utils.sql_helper_func import *
from utils.tdapiclient_helper_func import *

# LLM
from langchain.chat_models import ChatOpenAI
from langchain.schema import StrOutputParser
from langchain.prompts import PromptTemplate
from langchain.schema import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

from dotenv import load_dotenv

# Suppress warnings
import warnings
warnings.filterwarnings("ignore")
display.max_rows = 5

# Connect to Vantage

We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.

In [13]:
load_dotenv()

input_username = os.getenv('TD_USERNAME')
input_password = os.getenv('TD_PW')
input_host = os.getenv('TD_HOST')

print(input_username)
print(input_password)
print(input_host)

demo_user
demo_user
ruvendataiku2-bglgq0q0y78bcvsk.env.clearscape.teradata.com


In [11]:
eng = create_context(host = input_host, username=input_username, password = input_password)
print(eng)
execute_sql('''SET query_band='DEMO= Chat_with_docs_VantageDB_GenAI_Python.ipynb;' UPDATE FOR SESSION;''')

Engine(teradatasql://demo_user:***@ruvendataiku2-bglgq0q0y78bcvsk.env.clearscape.teradata.com)


TeradataCursor uRowsHandle=41 bClosed=False

Load OpenAI API key.

In [14]:
api_key = os.getenv('OPENAI_API_KEY')
print(api_key)

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


# Data Exploration

This noteboook demonstrates how to interact with documentation, such as in insurance policy with a LLM. 

The Traveller Easy Single Trip - International insurance policy is a comprehensive travel insurance plan that provides cover for a wide range of risks, including medical expenses, trip cancellation, loss of luggage, and personal accident. The policy is designed to be affordable and flexible, and it can be purchased online or over the phone.

The source data from [AXA]("https://axa-com-my.cdn.axa-contento-118412.eu/axa-com-my/3d2f84a5-42b9-459b-911a-710546df0633_Policy+wording+-+SmartTraveller+Easy+Single+Trip+-+International+%280820%29.pdf") is loaded in FAISS as Vector Database.

Now, let's use `PyMuPDFLoader` library to read the pdf document and split it into pages.

In [16]:
from langchain_community.document_loaders import PyMuPDFLoader

pages = PyMuPDFLoader("data/SmartTraveller_International.pdf").load_and_split()
print(pages[2].page_content)

page 2 
 
Area 3 (Overseas Only): Worldwide EXCLUDING Iran, Syria, Belarus, Cuba, Democratic Republic of Congo, North 
Korea, Somalia, Sudan, South Sudan, Crimea (including Sevastopol), Russia, Ukraine, Zimbabwe and Malaysia. 
 
OPERATION OF INSURANCE  
Save for Benefit 6B (Loss of Deposit or Cancellation) of this policy, the Cover provided by this policy is for the Period of 
Insurance and commences when the Insured Person leaves his/her place of residence or business in Malaysia (whichever 
is later) to commence the trip until the time of the Insured Person’s return to his/her place of residence or business in 
Malaysia on completion of the trip. 
For Benefit 6B, the Cover is effective upon the issuance of the Policy Schedule and terminates on commencement of the 
Insured Person’s trip.  
 
AUTOMATIC EXTENSION OF COVERAGE 
In the event of delay beyond the Insured Person’s control as a ticket holding passenger on a scheduled Common Carrier 
as a result of: 
1. 
the Insured Person’s Se