# Power your test copilot with platform data

This is a walkthrough taking readers through how to setup your copilot environment.

It is laid out in these sections:
- **Setup Variables:** 
    - In the first step we would be initiating variables and constansts that would help us through the journey
- **Data Ingestion:**
    - We will set up the vector database to accept vectors and data
    - We will load the dataset, chunk the data up for embedding and store it in the vector database
- **Search Engine:**
    - We will add a retrieval mechanism where users provide queries and we return the most relevant entries
    - We will prompt engineer to create refined answers with the help of GPT
- **Building the copilot:**
    - Setting up an assistant class to manage context and interaction
    - We will setup semantic search context for the bot to answer questions

In [70]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Setup Variables

First we'll setup our libraries and environment variables

In [71]:
import openai
import os
import requests
import numpy as np
import pandas as pd
from typing import Iterator
import tiktoken
import textract
from numpy import array, average

from storageClient import get_redis_connection

# Set our default models and chunking size
from config import COMPLETIONS_MODEL, EMBEDDINGS_MODEL, CHAT_MODEL, TEXT_EMBEDDING_CHUNK_SIZE, VECTOR_FIELD_NAME

# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings

warnings.filterwarnings(action="ignore", message="unclosed", category=ImportWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning) 


In [72]:
pd.set_option('display.max_colwidth', 0)

In [73]:
data_dir = os.path.join(os.curdir,'rawFiles')
pdf_files = sorted([x for x in os.listdir(data_dir) if 'DS_Store' not in x])
pdf_files

['Bug Bounty Data - Data_of_22-23.pdf',
 'Bug Bounty Data - Data_of_23-24.pdf',
 'Bug Bounty Data - Data_of_24-25.pdf']

## Data Ingestion

### Storage Setup

We're going to use Redis as our database for both document contents and the vector embeddings. You will need the full Redis Stack to enable use of Redisearch, which is the module that allows semantic search.

To set this up locally, you will need to install Docker and then run the following command: ```docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest```.

After setting up the Docker instance of Redis Stack, you can follow the below instructions to initiate a Redis connection and create a Hierarchical Navigable Small World (HNSW) index for semantic search.

In [74]:
# Setup Redis
from redis import Redis
from redis.commands.search.query import Query
from redis.commands.search.field import (
    TextField,
    VectorField,
    NumericField
)
from redis.commands.search.indexDefinition import (
    IndexDefinition,
    IndexType
)

redis_client = get_redis_connection()

In [75]:
# Constants
VECTOR_DIM = 1536 #len(data['title_vector'][0]) # length of the vectors
#VECTOR_NUMBER = len(data)                 # initial number of vectors
PREFIX = "tiradoc"                            # prefix for the document keys
DISTANCE_METRIC = "COSINE"                # distance metric for the vectors (ex. COSINE, IP, L2)

In [76]:
# Create search index

# Index
INDEX_NAME = "cases-index"           # name of the search index
VECTOR_FIELD_NAME = 'content_vector'

# Define RediSearch fields for each of the columns in the dataset
# This is where you should add any additional metadata you want to capture
filename = TextField("filename")
text_chunk = TextField("text_chunk")
file_chunk_index = NumericField("file_chunk_index")

# define RediSearch vector fields to use HNSW index

text_embedding = VectorField(VECTOR_FIELD_NAME,
    "HNSW", {
        "TYPE": "FLOAT32",
        "DIM": VECTOR_DIM,
        "DISTANCE_METRIC": DISTANCE_METRIC
    }
)
# Add all our field objects to a list to be created as an index
fields = [filename,text_chunk,file_chunk_index,text_embedding]

In [77]:
redis_client.ping()

True

In [79]:
# Optional step to drop the index if it already exists
#redis_client.ft(INDEX_NAME).dropindex()

# Check if index exists
try:
    redis_client.ft(INDEX_NAME).info()
    print("Index already exists")
except Exception as e:
    print(e)
    # Create RediSearch Index
    print('Not there yet. Creating')
    redis_client.ft(INDEX_NAME).create_index(
        fields = fields,
        definition = IndexDefinition(prefix=[PREFIX], index_type=IndexType.HASH)
    )

Index already exists


### Data Ingestion

We'll load up our PDFs and do the following
- Initiate our tokenizer
- Run a processing pipeline to:
    - Mine the text from each PDF
    - Split them into chunks and embed them
    - Store them in Redis

In [80]:
# The embeddingInteface.py file contains all of the transforming functions, including ones to chunk, embed and load data
# For more details the file and work through each function individually
from embeddingInteface import handle_file_string

In [81]:
%%time
# This step takes about 5 minutes
# Initialise tokenizer

openai.api_key = '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'

tokenizer = tiktoken.get_encoding("cl100k_base")

# Process each PDF file and prepare for embedding
for pdf_file in pdf_files:
    
    pdf_path = os.path.join(data_dir,pdf_file)
    print(pdf_path)
    
    # Extract the raw text from each PDF using textract
    text = textract.process(pdf_path, method='pdfminer')
    
    # Chunk each document, embed the contents and load to Redis
    handle_file_string((pdf_file,text.decode("utf-8")),tokenizer,redis_client,VECTOR_FIELD_NAME,INDEX_NAME)

./rawFiles/Bug Bounty Data - Data_of_22-23.pdf
./rawFiles/Bug Bounty Data - Data_of_23-24.pdf
./rawFiles/Bug Bounty Data - Data_of_24-25.pdf
CPU times: user 973 ms, sys: 414 ms, total: 1.39 s
Wall time: 17.3 s


In [82]:
# Check that our docs have been inserted
redis_client.ft(INDEX_NAME).info()['num_docs']

'250'

## Search Enginge

Now we can test that our search works as intended by:
- Querying our data in Redis using semantic search and verifying results
- Adding a step to pass the results to GPT-3 for refined answer

In [83]:
from storageClient import get_redis_results

In [86]:
%%time

f1_query='Bug related to coupon'

result_df = get_redis_results(redis_client,f1_query,index_name=INDEX_NAME)
result_df.head(2)

CPU times: user 2.54 ms, sys: 2.11 ms, total: 4.64 ms
Wall time: 680 ms


Unnamed: 0,id,result,certainty
0,0,Ticket has been auto disposed due to 3992:-Bug Bounty escalation. While I had a cart value above 500 but below 900 I was still getting TIRA20 as first and TIRA10 as second coupon. But Tira20 is not applicable due to cart size. Since it came on top i chose and I got disappointed. Suppose there are 10+ coupons live during TIRA launch regardless of applicability the order of coupon may be manual or sorted by Benefit %. then this will lead very poor customer experience and in fact leads to cart abandonment and leaves the customer frustrated. HUGE IMAPCT ON CUSTOMER JOURNEY AND CONVERISON METRICS..Ticket has been auto disposed due to 3992:- Bug Bounty escalation. If a coupon is available but cart is not meeting the criteria then the coupon should appear disabled (like Grey out) similar to how we see certain functions on excel which cannot be clicked but is shown as an available functions. Example: Pls check Swiggy cart page and then go to apply coupon they have an excellent display of coupon and payment offer including display mode and order of display..Ticket has been auto disposed due to 3992:-Bug Bounty escalation.,0.152292132378
1,1,679671746535 Bug Bounty Major Flaw: Coupon Criteria Not met - View all eligible products aravind.paranthaman@ril.com 9176972186 https://kapture-email-attachments.s3.amazonaws.com/88871438056407904028/Screenshot_20230323_231238_Tira-6693260991342239999.jpg 679672218177 Bug Bounty Flaw in coupon order display aravind.paranthaman@ril.com 9176972186 https://kapture-email-attachments.s3.amazonaws.com/96921865347107939571/Screenshot_20230323_231238_Tira-9074585107670746275.jpg 679672738673 Bug Bounty Criteria Based Enabled or Disabled view of coupons aravind.paranthaman@ril.com 9176972186 https://kapture-email-attachments.s3.amazonaws.com/96738870561133428324/Screenshot_20230323_231238_Tira-1493443256531252752.jpg 679674883779 Bug Bounty On 5G network Images are not loading immediately dhiren.jain@ril.com 9226025768 https://kapture-email-attachments.s3.amazonaws.com/73575537139312411795/mediaFile 679675296024 Bug Bounty Cart page UI is not properly displayed 9226025768 https://kapture-email-attachments.s3.amazonaws.com/65476262995037100360/mediaFile Images are taking longer period of time to load on 5G network as well as on WI-FI..Ticket has been auto disposed due to 3992:-Bug Bounty escalation. When scrolling up and down the white line is getting displayed on the screen. Please find attached photo for the same which is highlighted in the photo.Ticket has been auto disposed due to 3992:-Bug Bounty escalation.,0.160482048988


In [89]:
# Build a prompt to provide the original query, the result and ask to summarise for the user
summary_prompt = '''Summarise this result to answer the search query a user has sent.
Search query: SEARCH_QUERY_HERE
Search result: SEARCH_RESULT_HERE
Summary:
'''
summary_prepped = summary_prompt.replace('SEARCH_QUERY_HERE',f1_query).replace('SEARCH_RESULT_HERE',result_df['result'][0])
summary = openai.Completion.create(engine=COMPLETIONS_MODEL,prompt=summary_prepped,max_tokens=500)
# Response provided by GPT-3
print(summary['choices'][0]['text'])

This ticket was auto disposed due to a bug bounty escalation. It was related to the coupon selection order when the cart value was above 500 but below 900, with Tira20 not applicable for the cart size. To improve the customer experience, coupons should appear as disabled if the cart does not meet the criteria. This would also prevent cart abandonment and customer frustration.


## Building the copilot

For the next step we'll make a bot using the Chat Completions endpoint, which will:
- Be given instructions on how it should act and what the goals of its users are
- Be supplied some required information that it needs to collect
- Go back and forth with the user until it has populated that information
- Say a trigger word that will kick off semantic search and use GPT for refined responses

### Framework

This section outlines a basic framework for working with the API and storing context of previous conversation "turns". Once this is established, we'll extend it to use our retrieval endpoint.

In [90]:
# A basic example of how to interact with our ChatCompletion endpoint
# It requires a list of "messages", consisting of a "role" (one of system, user or assistant) and "content"
question = 'How can you help me'


completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": question}
  ]
)
print(f"{completion['choices'][0]['message']['role']}: {completion['choices'][0]['message']['content']}")

assistant: As an AI language model, I can help you in various ways, including:

1. Providing answers to your questions
2. Generating ideas for your projects or assignments
3. Proofreading and editing your written content
4. Summarizing long texts for you
5. Offering suggestions on how to improve your writing skills
6. Helping you learn a new language or improve your fluency in it
7. Providing emotional support and offering helpful resources if you're going through a difficult time.

Let me know how I can be of assistance, and I'll do my best to help.


In [91]:
from termcolor import colored

# A basic class to create a message as a dict for chat
class Message:
    
    
    def __init__(self,role,content):
        
        self.role = role
        self.content = content
        
    def message(self):
        
        return {"role": self.role,"content": self.content}
        
# Our assistant class we'll use to converse with the bot
class Assistant:
    
    def __init__(self):
        self.conversation_history = []

    def _get_assistant_response(self, prompt):
        
        try:
            completion = openai.ChatCompletion.create(
              model="gpt-3.5-turbo",
              messages=prompt
            )
            
            response_message = Message(completion['choices'][0]['message']['role'],completion['choices'][0]['message']['content'])
            return response_message.message()
            
        except Exception as e:
            
            return f'Request failed with exception {e}'

    def ask_assistant(self, next_user_prompt, colorize_assistant_replies=True):
        [self.conversation_history.append(x) for x in next_user_prompt]
        assistant_response = self._get_assistant_response(self.conversation_history)
        self.conversation_history.append(assistant_response)
        return assistant_response
            
        
    def pretty_print_conversation_history(self, colorize_assistant_replies=True):
        for entry in self.conversation_history:
            if entry['role'] == 'system':
                pass
            else:
                prefix = entry['role']
                content = entry['content']
                output = colored(prefix +':\n' + content, 'green') if colorize_assistant_replies and entry['role'] == 'assistant' else prefix +':\n' + content
                print(output)

In [92]:
# Initiate our Assistant class
conversation = Assistant()

# Create a list to hold our messages and insert both a system message to guide behaviour and our first user question
messages = []
system_message = Message('system','You are a helpful business assistant who has innovative ideas')
user_message = Message('user','What can you do to help me')
messages.append(system_message.message())
messages.append(user_message.message())
messages

[{'role': 'system',
  'content': 'You are a helpful business assistant who has innovative ideas'},
 {'role': 'user', 'content': 'What can you do to help me'}]

In [93]:
# Get back a response from the Chatbot to our question
response_message = conversation.ask_assistant(messages)
print(response_message['content'])

As an AI business assistant, I can help you in many ways. Here are a few examples:

1. Market Research: I can perform in-depth research on your industry and competitors and provide you with recommendations for growth and competition strategies.

2. Administrative Tasks: I can take care of scheduling, email management, and data organization to help you stay focused on your business goals.

3. Social Media Management: I can help you create and implement a social media strategy to connect with your audience, grow your following, and increase engagement.

4. Customer Service: I can assist with customer inquiries, complaints, and support requests, helping you to build strong relationships with your customers.

5. Sales Funnel Optimization: I can analyze your sales funnel and suggest changes to improve conversions and increase revenue.

These are just a few examples of what I can do to help you. As we work together, we'll discover even more ways for me to support and assist your business.


In [95]:
next_question = 'Tell me more about option 2'

# Initiate a fresh messages list and insert our next question
messages = []
user_message = Message('user',next_question)
messages.append(user_message.message())
response_message = conversation.ask_assistant(messages)
print(response_message['content'])

Sure, administrative tasks are an important part of any business, but they can be time-consuming and take your focus away from your primary responsibilities. As your AI business assistant, I can help with tasks such as:

1. Calendar Management: I can schedule meetings, appointments, and reminders, ensuring that you never double-book or miss an important engagement.

2. Email Management: I can filter and organize your inbox, respond to routine messages, and flag important emails that require your immediate attention.

3. Data Entry: I can take care of inputting data into spreadsheets or other tools, freeing up your time to focus on high-value tasks.

4. Record-keeping: I can help you maintain an efficient and organized record-keeping system for expenses, receipts, and other important documents.

5. Travel Management: I can plan and book travel arrangements, ensuring that you have everything you need for a smooth trip.

By handling these tasks for you, I can help you streamline your day-

In [94]:
# Print out a log of our conversation so far

conversation.pretty_print_conversation_history()

user:
What can you do to help me
[32massistant:
As an AI business assistant, I can help you in many ways. Here are a few examples:

1. Market Research: I can perform in-depth research on your industry and competitors and provide you with recommendations for growth and competition strategies.

2. Administrative Tasks: I can take care of scheduling, email management, and data organization to help you stay focused on your business goals.

3. Social Media Management: I can help you create and implement a social media strategy to connect with your audience, grow your following, and increase engagement.

4. Customer Service: I can assist with customer inquiries, complaints, and support requests, helping you to build strong relationships with your customers.

5. Sales Funnel Optimization: I can analyze your sales funnel and suggest changes to improve conversions and increase revenue.

These are just a few examples of what I can do to help you. As we work together, we'll discover even more wa

### Knowledge retrieval

Now we'll extend the class to call a downstream service when a stop sequence is spoken by the bot.

The main changes are:
- The system message is more comprehensive, giving criteria for the bot to advance the conversation
- Adding an explicit stop sequence for it to use when it has the info it needs
- Extending the class with a function ```_get_search_results``` which sources Redis results

In [100]:
# Updated system prompt requiring Question and Year to be extracted from the user
system_prompt = '''
You are a helpful bug bounty assistant. You need to capture a feature from each user.
The Question is their query on bugs in Tira, and the feature is the feature of a module in Tira application.
Think about this step by step:
- The user will ask a Question
- You will ask them for the module or feature if it doesn't include a module or feature
- Once you have the module or feature, say "searching for answers".

Example:

User: I would like to know bugs raised for Tira

Assistant: Certainly, what feature you are looking out for?

User: Payments please.

Assistant: Searching for answers.
'''

# New Assistant class to add a vector database call to its responses
class RetrievalAssistant:
    
    def __init__(self):
        self.conversation_history = []  

    def _get_assistant_response(self, prompt):
        
        try:
            completion = openai.ChatCompletion.create(
              model=CHAT_MODEL,
              messages=prompt,
              temperature=0.1
            )
            
            response_message = Message(completion['choices'][0]['message']['role'],completion['choices'][0]['message']['content'])
            return response_message.message()
            
        except Exception as e:
            
            return f'Request failed with exception {e}'
    
    # The function to retrieve Redis search results
    def _get_search_results(self,prompt):
        latest_question = prompt
        search_content = get_redis_results(redis_client,latest_question,INDEX_NAME)['result'][0]
        return search_content
        

    def ask_assistant(self, next_user_prompt):
        [self.conversation_history.append(x) for x in next_user_prompt]
        assistant_response = self._get_assistant_response(self.conversation_history)
        
        # Answer normally unless the trigger sequence is used "searching_for_answers"
        if 'searching for answers' in assistant_response['content'].lower():
            question_extract = openai.Completion.create(model=COMPLETIONS_MODEL,prompt=f"Extract the user's latest question and the module for that question from this conversation: {self.conversation_history}. Extract it as a sentence stating the summary of the bug")
            search_result = self._get_search_results(question_extract['choices'][0]['text'])
            
            # We insert an extra system prompt here to give fresh context to the Chatbot on how to use the Redis results
            # In this instance we add it to the conversation history, but in production it may be better to hide
            self.conversation_history.insert(-1,{"role": 'system',"content": f"Answer the user's question using this content: {search_result}. If you cannot answer the question, say 'Sorry, I don't know the answer to this one'"})
            #[self.conversation_history.append(x) for x in next_user_prompt]
            
            assistant_response = self._get_assistant_response(self.conversation_history)
            print(next_user_prompt)
            print(assistant_response)
            self.conversation_history.append(assistant_response)
            return assistant_response
        else:
            self.conversation_history.append(assistant_response)
            return assistant_response
            
        
    def pretty_print_conversation_history(self, colorize_assistant_replies=True):
        for entry in self.conversation_history:
            if entry['role'] == 'system':
                pass
            else:
                prefix = entry['role']
                content = entry['content']
                output = colored(prefix +':\n' + content, 'green') if colorize_assistant_replies and entry['role'] == 'assistant' else prefix +':\n' + content
                #prefix = entry['role']
                print(output)

In [105]:
conversation = RetrievalAssistant()
messages = []
system_message = Message('system',system_prompt)
user_message = Message('user','I would like to find out bugs')
messages.append(system_message.message())
messages.append(user_message.message())
response_message = conversation.ask_assistant(messages)
response_message

{'role': 'assistant',
 'content': 'Sure, can you provide me with more information? Are you looking for a summary of the types of bugs reported in the past month for a specific e-commerce website?'}

In [106]:
messages = []
user_message = Message('user','For coupons please.')
messages.append(user_message.message())
response_message = conversation.ask_assistant(messages)
#response_message

In [107]:
conversation.pretty_print_conversation_history()

user:
I would like to find out bugs
[32massistant:
Sure, can you provide me with more information? Are you looking for a summary of the types of bugs reported in the past month for a specific e-commerce website?[0m
user:
For coupons please.
[32massistant:
I'm sorry, but I need to clarify. Are you looking for a summary of the types of bugs reported in the past month for a specific e-commerce website related to coupons, or are you looking for bugs related to the coupon module in general?[0m


### Test Copilot

Now we'll put all this into action with a real Chatbot.

In the directory containing this app, execute ```streamlit run bot.py```. This will open up a Streamlit app in your browser where you can ask questions of your embedded data. 

__Example Questions__:
- What are some bugs found in payments module
- Help me summarise the bugs for performance issues