Repository for the [paper]:
CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs
Authors: Jingzhe Shi, Jialuo Li, Qinwei Ma, Zaiwen Yang, Huan Ma, Lei Li.
Abstract: Businesses and software platforms are increasingly utilizing Large Language Models (LLMs) like GPT-3.5, GPT-4, GLM-3, and LLaMa-2 as chat assistants with file access or as reasoning agents for custom service. Current LLM-based customer service models exhibit limited integration with customer profiles and lack operational capabilities, while existing API integrations prioritize diversity over precision and error avoidance which are crucial in real-world scenarios for Customer Service. We propose an LLMs agent called CHOPS (CHat with custOmer Profile in existing System) that: (1) efficiently utilizes existing databases or systems to access user information or interact with these systems based on existing guidance; (2) provides accurate and reasonable responses or executing required operations in the system while avoiding harmful operations; and (3) leverages the combination of small and large LLMs together to provide satisfying performance while having decent inference cost. We introduce a practical dataset, CPHOS-dataset, including a database, some guiding files, and QA pairs collected from CPHOS, which employs an online platform to facilitate the organization of simulated Physics Olympiads for high school teachers and students. We conduct extensive experiments to validate the performance of our proposed CHOPS architecture using the CPHOS-dataset, aiming to demonstrate how LLMs can enhance or serve as alternatives to human customer service.
This repository includes:
- Codes for the CHOPS-architecture.
- The data desensitized CPHOS-dataset.
- running logs.
- Database and corresponding apis.
- Database:
- Format: tables are stored in json format in this file in
prepare_datas\database
. - Content:
- cmf_tp_admin: the table to record the admins
- cmf_tp_area: the table to record the areas of schools
- cmf_tp_correct: the table to record the situations of every single problem and question
- cmf_tp_exam: the table to record exam status
- cmf_tp_member: the table of all users (team_leader, vice team_leader, arbiter)
- cmf_tp_school: the table of all schools
- cm_tp_student: the table of all students
- cmf_tp_subject: the table to record every single answer sheet and its grades, etc.
- cmf_tp_test_paper: the table to record every single test paper
- provided scripts:
deal_with_*.py
: our script to run data desensitization. i.e. nicknames, names, school names are desensitized by using part of their hash value.add_to_database.py
: to add these tables into local MySQL database.
- Format: tables are stored in json format in this file in
- Apis:
- Format: python wrapped SQL commands, with exception checks. At
db_api
. - Content:
db_api/DataManagingApis
db_api/DataQueryApis
- Description of the used Apis: see
guidefiles/executable_operations.txt
.
- Format: python wrapped SQL commands, with exception checks. At
- Database:
- Guide Files
- Format: user guide and common questions are in pdf format in
guidefiles
.
- Format: user guide and common questions are in pdf format in
- Guide File related QAs and System related Queries and Instructions
- Format: csv files at
QAs
. - system query or instructions: at
QAs\instructions_augmented.csv
. - guide file or basic information based QAs: at
QAs\newQAs.csv
.
- Format: csv files at
install from MySQL.
python 3.8 is needed. requirements can be found in requirements.txt
(windows platform is recommended because of encoding issues for .json
and .csv
files)
Please log into your MySQL then run:
create database CPHOS_dataset_2;
Then, please run the python script to add tables into the database: (remember to modify the PASSWORD of your own MySQL database in add_to_database.py
to accomodate local environment).
python prepare_datas\database\add_to_database.py
You also need to modify the 5th line in db_api\dbinformation.txt
to your own MySQL password.
Modify the tokens for GPT, GLM and LLaMa to your own tokens to tokens.py
. You may leave the token of unused LLMs unmodified, since they (or it) will not be used. This is related to the setting of configs in configs.py
, where we will introduce below.
The config file is at configs.py
. This file is related to configurations of experiments for (1-level C)-E, (2-level C)-E, (1-level C)-E-V and (2-level C)-E-V. Adjust EXP
, LOG_DIR
and TAG
to your own tag.
- Modify
DATA
betweeninstructions_augmented
andnewQAs
to switch between system-related queries & instructions and guidfile-based queries. - Modify configuration for each agent (Classifier, Executor, Verifier). Please refer to the config file for more details as the config itself is easy to understand
Run the following command:
python test.py
then the experiment will start.
The .csv files generated are at result
by default and the place to save can be modified in configs.py
.
Please refine the auto-checked results by human to avoid inaccuracy for accuracy caused by auto-checking.
The input and output .txt
files can be found in the log_*
dir by default which can be modified in configs.py
. To count the characters please run character_counting.py
with the parameters in it modified.
The configs and experiment entrance for running executor only is written separately in configs_executor_only.py
and test_Eonly.py
.
Please modify the configs in configs_executor_only.py
in the same method as described above, then run:
python test_Eonly.py
to run this experiment.
We provide all .csv result files and .txt log files can be found in result
dir and log_*
dir.
Please refer to the [paper] for more details.