**This notebook connects the postgresql instant of the Cohere Docker's container and extract the chat history.**

**To perform evaluation, we will extract the following data**
* **questions**: users' input to the system
* **conversation_id**: the ids of the conversations. Each conversation has a set of questions and responses (answers + contexts)
* **timestamp** (for when the questions are asked and when the answers are generated) 
* **message_id**: the ids of the responses from the system 
* **answers**: the answers generated for each question
* **contexts**: documents retrieved from MongoDB vector database to generate the answers

# Connect postgersql instant

In [1]:
import psycopg2
import sqlalchemy
from sqlalchemy import MetaData, text
import os
from sqlalchemy import create_engine
import pandas as pd

DATABASE_URL = 'postgresql+psycopg2://postgres:postgres@localhost:5432'

engine = sqlalchemy.create_engine(DATABASE_URL, echo=True)

# print("Connection is", engine)

conn = psycopg2.connect(
    host="localhost",
    database="postgres",
    user="postgres",
    password="postgres"
)

cur = conn.cursor()

# Create table views

In [2]:
# Create a table view for user input
create_view = """
                    CREATE OR REPLACE VIEW v_user_messages
                    AS
                    SELECT text, agent, conversation_id, created_at, id, tool_plan, position FROM messages 
                    WHERE agent = 'USER'
                    ORDER BY created_at desc;
"""
# Execute the SQL query
cur.execute(create_view)

# Commit the transaction
conn.commit()

In [3]:
# Create a table view for system's answers
create_view = """
                    CREATE OR REPLACE VIEW v_chatbot_messages
                    AS
                    SELECT text, agent, conversation_id, created_at, id, tool_plan, position FROM messages 
                    WHERE agent = 'CHATBOT'
                    ORDER BY created_at desc;
"""
# Execute the SQL query
cur.execute(create_view)

# Commit the transaction
conn.commit()

# Query Chat Data

## Query user input

In [4]:
# Get the questions from user
query = """SELECT 
            user_msgs.agent, 
            user_msgs.text,
            user_msgs.conversation_id,
            user_msgs.created_at,
            user_msgs.position
        FROM public.v_user_messages as user_msgs
        ORDER BY user_msgs.created_at desc;"""

# Execute the query
cur.execute(query)

In [5]:
# Fetch all results from the executed query
user_msg = cur.fetchall()
user_msg

[('USER',
  'What is the purpose of regular HR audits for labor law and policy compliance, and what happens if there is non-compliance?',
  '71189b05-eb1c-4c08-b4fc-7d90cc9f3b96',
  datetime.datetime(2024, 9, 18, 18, 22, 41, 616375),
  8),
 ('USER',
  'How is data processed and organized in the RAG application using LangChain?',
  '71189b05-eb1c-4c08-b4fc-7d90cc9f3b96',
  datetime.datetime(2024, 9, 18, 18, 22, 23, 648734),
  7),
 ('USER',
  'How do employee engagement and disengagement differ in terms of emotional commitment, motivation, and their relation to labor laws?',
  '71189b05-eb1c-4c08-b4fc-7d90cc9f3b96',
  datetime.datetime(2024, 9, 18, 18, 21, 57, 798201),
  6),
 ('USER',
  "What are employees' responsibilities in maintaining compliance with labor laws, company policies, and promoting a fair and legal workplace at Tech Innovators Inc.",
  '71189b05-eb1c-4c08-b4fc-7d90cc9f3b96',
  datetime.datetime(2024, 9, 18, 18, 21, 37, 346709),
  5),
 ('USER',
  'What role does leadership

**Convert the retrieved data to a dataframe.**

In [8]:
# Assign the appropriate column names
user_column_name = ['agent_user', 'question', 'conversation_id', 'question_timestamp', 'position']

# Create a DataFrame
df_question = pd.DataFrame([row for row in user_msg], columns=user_column_name)

In [9]:
df_question

Unnamed: 0,agent_user,question,conversation_id,question_timestamp,position
0,USER,What is the purpose of regular HR audits for l...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:22:41.616375,8
1,USER,How is data processed and organized in the RAG...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:22:23.648734,7
2,USER,How do employee engagement and disengagement d...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:21:57.798201,6
3,USER,What are employees' responsibilities in mainta...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:21:37.346709,5
4,USER,What role does leadership play in employee eng...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:21:23.941020,4
5,USER,What types of leaves are offered by Tech Innov...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:20:57.575491,3
6,USER,What are the potential consequences of non-com...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:20:37.001287,2
7,USER,What is the significance of emotional and aest...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:20:24.293333,1
8,USER,What information should be included in the sec...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:20:06.556136,0
9,USER,What is the purpose of regular HR audits for l...,80db8675-6d68-42e9-949e-cd6a4a14c1f4,2024-09-18 18:14:08.344431,8


## Query system response

### Answers

In [10]:
# Get the questions from user
query = """SELECT 
            chatbot_msgs.agent, 
            chatbot_msgs.text,
            chatbot_msgs.conversation_id,
            chatbot_msgs.created_at,
            chatbot_msgs.id,
            chatbot_msgs.position
        FROM public.v_chatbot_messages as chatbot_msgs
        ORDER BY chatbot_msgs.created_at desc;"""

# Execute the query
cur.execute(query)

In [11]:
# Fetch all results from the executed query
chatbot_msg = cur.fetchall()
chatbot_msg

[('CHATBOT',
  'Regular HR audits are conducted to ensure that employees and managers understand and comply with labor laws and company policies. Internal audits are performed by the HR department, focusing on payroll, benefits, and employment practices. Additionally, periodic external audits are carried out by third-party consultants to verify compliance and identify areas for improvement. If non-compliance is found, employees may face disciplinary actions, up to and including termination of employment.',
  '71189b05-eb1c-4c08-b4fc-7d90cc9f3b96',
  datetime.datetime(2024, 9, 18, 18, 22, 45, 332887),
  '99bcb13a-d816-4f11-a229-260411fc6ad6',
  8),
 ('CHATBOT',
  "I will search for 'purpose of regular HR audits' and 'non-compliance with labor laws' in the database and relay the relevant information to the user.",
  '71189b05-eb1c-4c08-b4fc-7d90cc9f3b96',
  datetime.datetime(2024, 9, 18, 18, 22, 45, 175570),
  '81aaa84c-ccb5-4bab-bc11-714ed07656f4',
  8),
 ('CHATBOT',
  'Sorry, I could n

**Convert the retrieved data to a dataframe.**

In [12]:
# Assign the appropriate column names
chatbot_column_names = ['agent_chatbot', 'answer', 'conversation_id', 'answer_timestamp', 'msg_id','position']

# Create a DataFrame
df_answer = pd.DataFrame([row for row in chatbot_msg], columns=chatbot_column_names)

In [13]:
df_answer

Unnamed: 0,agent_chatbot,answer,conversation_id,answer_timestamp,msg_id,position
0,CHATBOT,Regular HR audits are conducted to ensure that...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:22:45.332887,99bcb13a-d816-4f11-a229-260411fc6ad6,8
1,CHATBOT,I will search for 'purpose of regular HR audit...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:22:45.175570,81aaa84c-ccb5-4bab-bc11-714ed07656f4,8
2,CHATBOT,"Sorry, I could not find any information about ...",71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:22:25.903597,6deab777-e339-4b13-b261-3a233e60f298,7
3,CHATBOT,I will search for 'data processed and organize...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:22:25.721731,63adca29-e97b-425c-a0e2-819c26a3d5ac,7
4,CHATBOT,Employee engagement is characterized by motiva...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:22:00.752513,d929a278-9e27-42d9-b172-cd299fec2c04,6
5,CHATBOT,I will search for 'employee engagement and dis...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:22:00.448805,5b520114-561d-4aa8-8955-520d0b2a0d8b,6
6,CHATBOT,Employees at Tech Innovators Inc. are responsi...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:21:40.485097,3e45f825-7d55-48b9-b743-2f6e6ff735f2,5
7,CHATBOT,I will search for 'employees' responsibilities...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:21:40.336074,339ee6f2-c3f3-4b4d-8164-b633a11a1d96,5
8,CHATBOT,"Sorry, I could not find any information about ...",71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:21:26.098442,81b7e1dc-e5d0-4a78-932c-12ee91b1debb,4
9,CHATBOT,I will search for 'role of leadership in emplo...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:21:25.947933,823444ce-521a-451e-85a7-5daae7e481df,4


### Contexts

In [14]:
# Fetch the context for the response if present
query_documents = """SELECT text, conversation_id, message_id, document_id 
        FROM public.documents
            ;"""

# Execute the query
cur.execute(query_documents)

# Fetch all results from the executed query
doc_result = cur.fetchall()
doc_result

[('the Senior Director responsible for Analytics Delivery, your role is critical to the success of Tech Innovators Inc. By leveraging your strategic vision, technical expertise, and leadership skills,',
  '119128da-543a-4e36-b7f3-41fa2bebb6cb',
  '80276b01-5a07-4be7-bb6a-470fa629315a',
  '0'),
 ('the Senior Director responsible for Analytics Delivery, your role is critical to the success of Tech Innovators Inc. By leveraging your strategic vision, technical expertise, and leadership skills,',
  '119128da-543a-4e36-b7f3-41fa2bebb6cb',
  '80276b01-5a07-4be7-bb6a-470fa629315a',
  '1'),
 ('the Senior Director responsible for Analytics Delivery, your role is critical to the success of Tech Innovators Inc. By leveraging your strategic vision, technical expertise, and leadership skills,',
  '119128da-543a-4e36-b7f3-41fa2bebb6cb',
  '80276b01-5a07-4be7-bb6a-470fa629315a',
  '2'),
 ('the Senior Director responsible for Analytics Delivery, your role is critical to the success of Tech Innovators 

**Convert the retrieved data to a dataframe.**

In [15]:
# Assign the appropriate column names
docs_column_names = ['contexts', 'conversation_id', 'msg_id', 'doc_id']

# Create a DataFrame
df_docs = pd.DataFrame([row for row in doc_result], columns=docs_column_names)

In [16]:
# Group by 'conversation_id' and 'msg_id' and combine 'content' as a list
df_contexts = df_docs.groupby(['conversation_id','msg_id'])['contexts'].apply(list).reset_index()

In [17]:
df_contexts

Unnamed: 0,conversation_id,msg_id,contexts
0,119128da-543a-4e36-b7f3-41fa2bebb6cb,5337da38-0f84-4dd3-be38-3839ea9c16bf,[and identify areas for improvement.5.3 Report...
1,119128da-543a-4e36-b7f3-41fa2bebb6cb,5d697220-b0fb-4fe0-a6f4-93c618568b25,[IntroductionThis guide provides a step-by-ste...
2,119128da-543a-4e36-b7f3-41fa2bebb6cb,7f03ffa6-b3d7-4428-a2a1-5a0fe53cab95,[Inc. upholds the highest ethical standards in...
3,119128da-543a-4e36-b7f3-41fa2bebb6cb,80276b01-5a07-4be7-bb6a-470fa629315a,[the Senior Director responsible for Analytics...
4,119128da-543a-4e36-b7f3-41fa2bebb6cb,bed1ac5d-6164-4924-97f5-5111509a5f8e,"[IntroductionAt Tech Innovators Inc., we belie..."
5,119128da-543a-4e36-b7f3-41fa2bebb6cb,c98720a2-7457-4f8c-a9cf-857126cfbff0,[LabourEmotional and aesthetic labor involves ...
6,119128da-543a-4e36-b7f3-41fa2bebb6cb,cbf80b5e-76fd-4c8b-a8ac-146150452b65,[to identify strengths and areas for improveme...
7,119128da-543a-4e36-b7f3-41fa2bebb6cb,d0f3a220-1b1f-4de8-81b2-4cfc73bdaf44,[to help you get started. Company OverviewTech...
8,119128da-543a-4e36-b7f3-41fa2bebb6cb,d86faf67-d07c-497c-b760-39ff8eba131d,"[are motivated and committed, disengaged emplo..."
9,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,3e38e220-7904-4667-899b-a2495fd4d9a0,[IntroductionTech Innovators Inc. strives to p...


# Prepare data for evaluation

In [18]:
# Perform a merge between the answers dataframe and contexts dataframe using 'conversation_id' and 'msg_id'
df_answer_contexts = pd.merge(df_answer, df_contexts, on=['conversation_id', 'msg_id'], how='inner')

In [19]:
# Merge the resulting dataframe with question dataframe using conversation_id and position
df_question_answer_contexts = pd.merge(df_question, df_answer_contexts, on=['conversation_id', 'position'], how='inner')

In [20]:
# Final result dataframe
df_question_answer_contexts

Unnamed: 0,agent_user,question,conversation_id,question_timestamp,position,agent_chatbot,answer,answer_timestamp,msg_id,contexts
0,USER,What is the purpose of regular HR audits for l...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:22:41.616375,8,CHATBOT,Regular HR audits are conducted to ensure that...,2024-09-18 18:22:45.332887,99bcb13a-d816-4f11-a229-260411fc6ad6,[are conducted to ensure employees and manager...
1,USER,How is data processed and organized in the RAG...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:22:23.648734,7,CHATBOT,"Sorry, I could not find any information about ...",2024-09-18 18:22:25.903597,6deab777-e339-4b13-b261-3a233e60f298,[IntroductionThis guide provides a step-by-ste...
2,USER,How do employee engagement and disengagement d...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:21:57.798201,6,CHATBOT,Employee engagement is characterized by motiva...,2024-09-18 18:22:00.752513,d929a278-9e27-42d9-b172-cd299fec2c04,[and disengagement are two sides of the same c...
3,USER,What are employees' responsibilities in mainta...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:21:37.346709,5,CHATBOT,Employees at Tech Innovators Inc. are responsi...,2024-09-18 18:21:40.485097,3e45f825-7d55-48b9-b743-2f6e6ff735f2,[mitigate these risks through diligent complia...
4,USER,What role does leadership play in employee eng...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:21:23.941020,4,CHATBOT,"Sorry, I could not find any information about ...",2024-09-18 18:21:26.098442,81b7e1dc-e5d0-4a78-932c-12ee91b1debb,[motivated to contribute to its success. Criti...
5,USER,What types of leaves are offered by Tech Innov...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:20:57.575491,3,CHATBOT,Tech Innovators Inc. offers various types of l...,2024-09-18 18:21:00.323130,4b3f833e-c31d-4ca6-8488-6ff4ef46e076,[Inc. offers various types of leaves including...
6,USER,What are the potential consequences of non-com...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:20:37.001287,2,CHATBOT,Non-compliance with labor laws or company poli...,2024-09-18 18:20:39.290190,a601b2c2-0fd9-41b2-9500-857d6b590e14,[findings reported to senior management.Conseq...
7,USER,What is the significance of emotional and aest...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:20:24.293333,1,CHATBOT,Emotional and aesthetic labor involves managin...,2024-09-18 18:20:26.958788,ea955e42-5db6-459b-a390-920164db800c,[healthy work-life balance and providing adequ...
8,USER,What information should be included in the sec...,71189b05-eb1c-4c08-b4fc-7d90cc9f3b96,2024-09-18 18:20:06.556136,0,CHATBOT,Tech Innovators Inc. strives to provide compre...,2024-09-18 18:20:09.007778,3e38e220-7904-4667-899b-a2495fd4d9a0,[IntroductionTech Innovators Inc. strives to p...
9,USER,What is the purpose of regular HR audits for l...,80db8675-6d68-42e9-949e-cd6a4a14c1f4,2024-09-18 18:14:08.344431,8,CHATBOT,HR audits are conducted regularly by the HR de...,2024-09-18 18:14:10.047930,0aadd2ab-c0d5-40b5-ac98-7950f84152a0,[are conducted to ensure employees and manager...


# Save data for evaluation

**The contexts column of the final dataframe has a data type list. We need a function to serialize it before saving the dataframe to a csv file, and a de-serialize function to read a csv file with a list column**

In [21]:
import json
import pandas as pd

def serialize_list(value):
    """Serializes a list to a JSON string."""
    return json.dumps(value)

def deserialize_list(value):
    """Deserializes a JSON string back into a list."""
    return json.loads(value)

def save_dataframe_with_list_column(df, filename):
    """Saves a DataFrame with a list column to a CSV file, preserving the list structure.

    Args:
        df: The DataFrame to save.
        filename: The name of the output CSV file.
    """

    # Apply the serialization function to the list column
    df['contexts'] = df['contexts'].apply(serialize_list)

    # Save the DataFrame to CSV
    df.to_csv(filename, index=False)

def load_dataframe_with_list_column(filename):
    """Loads a DataFrame from a CSV file, restoring the list structure.

    Args:
        filename: The name of the input CSV file.

    Returns:
        The loaded DataFrame.
    """

    # Load the DataFrame
    df = pd.read_csv(filename)

    # Apply the deserialization function to the list column
    df['contexts'] = df['contexts'].apply(deserialize_list)

    return df

**Save the result dataframe to a csv file for evaluation.**

In [None]:
# from from_root import from_root

# file_name = "<your_file_name>"

# save_dataframe_with_list_column(df_question_answer_contexts_<filter_this dataframe_as_you_need>, 'your_path_to_save_this_file>', file_name))

In [24]:
# Example: 

from from_root import from_root

file_name = "test_dataset_it_openai_deployment_test.csv"

save_dataframe_with_list_column(df_question_answer_contexts, os.path.join(from_root(), "data-test/test-dataset/", file_name))