# Enhancing Customer Support with Generative AI: Applying RAG using CrateDB and LangChain

Retrieval-Augmented Generation (RAG) combines a retrieval system, which fetches
relevant documents, with a generative model, allowing it to incorporate external
knowledge for more accurate and informed responses.

It is particularly effective for tasks like question answering, customer support,
and any application where referencing external data can enhance the quality of the
output.

This notebook illustrates the RAG implementation of a customer support scenario.
The corresponding dataset is based on a collection of customer support interactions
from Twitter related to Microsoft products or services.

It is derived from the modern corpus of tweets and replies published on Kaggle,
called [Customer Support on Twitter].

[Customer Support on Twitter]: https://www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter

## What is CrateDB?

CrateDB is an open-source, distributed, and scalable SQL analytics database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is wire-compatible to PostgreSQL, based on Lucene, and inherits the shared-nothing distribution layer of Elasticsearch.

Combining RAG with CrateDB's vector store support provides a powerful framework for building sophisticated AI applications. CrateDB can store and manage the vector representations of data, which the RAG retrieval system can then utilize to fetch relevant information. Using vector search, CrateDB can quickly identify the most similar items in a large dataset based on their vector representations.


This notebook shows how to use the CrateDB vector store functionality to create a retrieval augmented generation (RAG) pipeline. To implement RAG we use the Python client driver for CrateDB and vector store support in LangChain.

## What is LangChain?

LangChain is an open-source Python library designed to facilitate the creation and deployment of language model chains, particularly in the context of Generative AI. It provides tools for integrating various components of language models, such as retrieval systems, transformers, and custom processing steps.



## Getting Started
CrateDB supports storing vectors since version 5.5. You can leverage the fully managed service of CrateDB Cloud, or install CrateDB on your own, for example using Docker.

```shell
docker run --publish 4200:4200 --publish 5432:5432 --pull=always crate:latest -Cdiscovery.type=single-node
```

## Setup

Install required Python packages, and import Python modules.

In [None]:
#!pip install -r requirements.txt

# Note: If you are running in an environment like Google Colab, please use the absolute path of the requirements:
#!pip install -r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt

In [1]:
import os

import openai
import pandas as pd
import warnings
import requests

from pueblo.util.environ import getenvpass
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import CSVLoader
from langchain_community.vectorstores import CrateDBVectorSearch

warnings.filterwarnings('ignore')

### Configure database settings

This notebook will connect to a CrateDB server instance running on localhost. You can start a sandbox instance on your workstation by running [CrateDB using Docker]. Alternatively, you can also connect to a cluster running on [CrateDB Cloud].

[CrateDB Cloud]: https://console.cratedb.cloud/
[CrateDB using Docker]: https://crate.io/docs/crate/tutorials/en/latest/basic/index.html#docker.

In [10]:
# Define the connection string to running CrateDB instance.
CONNECTION_STRING = os.environ.get(
    "CRATEDB_CONNECTION_STRING",
    "crate://crate@localhost/?schema=openai",
)

# Connect to CrateDB Cloud.
# CONNECTION_STRING = os.environ.get(
#     "CRATEDB_CONNECTION_STRING",
#     "crate://username:password@hostname/?ssl=true&schema=notebook",
# )

# Define the store collection to use for this notebook session.
COLLECTION_NAME = "customer_data"

## Inspect the dataset

To illustrate the dataset the next code snippets load dataset into a Pandas DataFrame, display the first few rows 
and show basic information such as the number of entries, column names, data types.

In [3]:
url = 'https://github.com/crate/cratedb-datasets/raw/main/machine-learning/fulltext/twitter_support_microsoft.csv'
dataset = 'twitter_support.csv'

r = requests.get(url)
with open(dataset, 'wb') as f:
    f.write(r.content)


pd.set_option('display.max_columns', 5)
df = pd.read_csv(dataset)

# Display the first few rows of the DataFrame
print(df.head(5))

   tweet_id       author_id  ...  response_tweet_id in_response_to_tweet_id
0      2301          116231  ...               2299                  2306.0
1     11879  MicrosoftHelps  ...                NaN                 11877.0
2     11881  MicrosoftHelps  ...              11878                 11882.0
3     11890          118332  ...              11889                     NaN
4     11912  MicrosoftHelps  ...                NaN                 11911.0

[5 rows x 7 columns]


In [4]:
# Display basic information about the DataFrame
print("\nDataFrame Info:")
df.info()


DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 142 entries, 0 to 141
Data columns (total 7 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   tweet_id                 142 non-null    int64  
 1   author_id                142 non-null    object 
 2   inbound                  142 non-null    bool   
 3   created_at               142 non-null    object 
 4   text                     142 non-null    object 
 5   response_tweet_id        92 non-null     object 
 6   in_response_to_tweet_id  125 non-null    float64
dtypes: bool(1), float64(1), int64(1), object(4)
memory usage: 6.9+ KB


## RAG implementation with OpenAI and CrateDB

### Configure OpenAI

In [5]:
getenvpass("OPENAI_API_KEY", prompt="OpenAI API key:")

### Create embeddings from dataset

We use `CSVLoader` class to load support tickets from Twitter. The next step initializes a vector search store in CrateDB using embeddings generated by an OpenAI model. This will create a table that stores the embeddings with the name of the collection. It is important to make sure the collection name is unique and that you have the permission to create a table.

In [6]:
loader = CSVLoader(file_path=dataset, encoding="utf-8", csv_args={'delimiter': ','})
data = loader.load()

In [None]:
embeddings = OpenAIEmbeddings()

store = CrateDBVectorSearch.from_documents(
    embedding=embeddings,
    documents=data,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
)

### Ask question
Let's define our question:

In [None]:
my_question = "How to update shipping address on existing order in Microsoft Store?"

# Alternative question.
# my_question = "I can not make purchase on Xbox for fifa points, what to do?"

### Find relevant context using similarity search

The following step performs a similarity search against a collection of documents based on the given question. The search uses Eucledian distance to find similar vectors and compute the score. This returns a set of documents (`docs_with_score`) along with their corresponding similarity scores. 

The code then iterates over these results, and for each document (doc), it adds the content to the list of relevant documents.

In [None]:
def return_documents(store, question):
    # retrieve documents similar to user question
    docs_with_score = store.similarity_search_with_score(question)
    
    # extract the page content
    documents=[]
    for doc, score in docs_with_score:
        documents.append(doc.page_content)
    return documents

documents = return_documents(store, my_question)

### Augment system prompt and query LLM

In the final step we create an interactive chatbot scenario where GPT-3.5-turbo serves as a customer support assistant, using a preprocessed set of documents as its knowledge base to answer questions about Microsoft products and services. This context represents the information the AI has available to answer customer questions. A `system_prompt` is then constructed, instructing the AI that it is a customer support expert specializing in Microsoft products and services. The prompt also specifies that if the answer to a question isn't in the provided documents, the system should respond with "I don't know."


In [None]:
def create_prompt(documents):
    context = '---\n'.join(documents)

    system_prompt = f"""
    You are customer support expert and get questions about Microsoft products and services.
    To answer question use the information from the context. Remove new line characters from the answer.
    If you don't find the relevant information in the context, say "I don't know".

    Context:
    {context}"""
    
    return system_prompt
    

In [None]:
system_prompt = create_prompt(documents)

To answer the question we need an interactive chatbot scenario where GPT-3.5 or another LLM serves as a customer support assistant, using a given set of documents and system prompt:

In [None]:
chat_completion = openai.chat.completions.create(model="gpt-3.5-turbo",
                                               messages=[{"role": "system", "content": system_prompt},
                                                         {"role": "user", "content": my_question}])

Finally, to access the content response message generated by the OpenAI model in the context of a chat conversation we need to call:

In [None]:
chat_completion.choices[0].message.content

## CrateDB &#129505; OpenSource

In the second part of this notebook we build a Retrieval-Augmented Generation (RAG) system using CrateDB with Jina AI and the Llama open-source models. 

### Configure Jina AI API 

Jina AI embeddings on Hugging Face are a suite of open-source models designed for encoding text data into high-dimensional vectors, which can be used in various natural language processing tasks such as semantic search, text classification, and clustering. These embeddings utilize the BERT architecture and are specially tuned for longer sequence lengths up to 8192 tokens, which is notably higher than the standard BERT model's capacity.

To use Jina AI embeddings you need a Jina AI API key which can be easily obtained once you sign up for an account on the Jina AI website.

In [None]:
from langchain_community.embeddings import JinaEmbeddings

getenvpass("JINA_API_KEY", prompt="Jina API key:")
embeddings = JinaEmbeddings()

### Initialize new embedding store

The next step initializes a vector search store in CrateDB using embeddings generated by Jina AI models and returns relevant documents using similarity search.

In [None]:
CONNECTION_STRING = os.environ.get(
    "CRATEDB_CONNECTION_STRING",
    "crate://crate@localhost/jina",
)

COLLECTION_NAME = "customer_data_jina"

store = CrateDBVectorSearch.from_documents(
    embedding=embeddings,
    documents=data,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
)
documents = return_documents(store, my_question)

### Connect to a local LLM 

One of the easiest ways to connect to a local LLM is to download one of the models from the [Llamafile repository](https://github.com/Mozilla-Ocho/llamafile#llamafile) and run it locally. Upon initiating the llamafile, it not only sets up a web-based user interface for chatting at the local address http://127.0.0.1:8080/, but it also offers an endpoint for chat completions that is compatible with the OpenAI API.

In this example, we use the Phi-2 model that is about 2GB in size.

In [None]:
from openai import OpenAI

# An API key is not required!
client = OpenAI(
    base_url="http://localhost:8080/v1", 
    api_key = "sk-no-key-required"
)

Finally, we create a system prompt using the previously defined template and instantiate a client to interact with the "LLaMA_CPP" model.

In [None]:
system_prompt = create_prompt(documents)

completion = client.chat.completions.create(
    model="LLaMA_CPP",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": my_question}
    ]
)

After completion, we extract the response message generated by the model.

In [None]:
completion.choices[0].message.content