Search Based Test Input Generation for Chatbot Testing

Note: this repository was created solely for the purpose of organizing and submitting our finalized project. The commits made to this repository do not accurately represent the individual contributions from each member.

Search Based Test Input Generation for Chatbot Testing

KAIST 2020 Fall CS454 Team 9

20160811 Jeongeon Park
20160830 Suro Lee
20170798 Seungho Kim
20170828 Chanhee Lee

Introduction

In this project, we propose a search-based approach that automatically generates chatbot test input of high quality. Our approach uses the Metropolis-Hastings algorithm, where we improve on an existing paper by N. Miao et al. (2019) to generate input data in the form of questions. Through our comparison with human-generated test input in terms of both the generated test input and the chatbot output when putting in the generated test input, we show that the model-generated test input using our approach is more diverse and relevant to the topic keyword than the human-generated test input.

Requirements

python 3.8
Training and generation
- TensorFlow == 2.3.1 (other versions are not tested)
- numpy
- pickle
Evaluation
- spaCy
  - after installing, run python -m spacy download en_core_web_lg to download the required model
- gensim
- pandas
- nltk

Train Language Model

To use a pre-trained language model, download the forward and backward folders into model.

Usage

$ python model/train.py [-h] [--backward] [-e EPOCH] [-b BATCH]

Optional Arguments

-h, --help

shows the help message and exits

--backward

include this argument to train the backward model (instead of the forward model)

-e EPOCH, --epoch EPOCH

sets the maximum number of epochs to run (type: int, default: 100)

-b BATCH, --batch BATCH

sets the batch size (type: int, default: 32)

Generate Questions

Optional: insert your own keywords (from which the questions are generated) into data/input/keywords.txt.

Usage

$ python model/questions_gen.py

Generated questions are written into data/output/output.txt.

Evaluation: Diversity

The file evaluate/diversity.py is used to evaluate both [1] the generated questions and [2] the chatbot's responses.

Evaluate Generated Questions

Generate the questions file data/output/output.txt.
Use this file's relative path as the file argument.

Evaluate Chatbot Responses

For evaluation, we used Pandorabots' Kuki as our test chatbot.

Enter each question into the chat, and download the conversation as a .json file.
Parse the conversation using evaluate/parseMessages.py (for usage, add the --help argument for details.)
Use the parsed file's relative path as the file argument.

Usage

$ python evaluate/diversity.py [-h] [--output] [-a A] [-b B] file

Positional Arguments

file

relative path of the .txt file to be used for evaluation.

Optional Arguments

-h, --help

shows the help message and exits

--output

add this argument to evaluate generated questions (instead of chatbot conversation)

-a A

Only for chatbot responses

index of the first message to evaluate (type: int, default: 0)

-b B

Only for chatbot responses

index of the last message to evaluate (type: int, default: last index)

Evaluation: Topic Relevance

The file evaluate/topic_relevance.py is used to evaluate the generated questions.

Evaluate Generated Questions

Generate the questions file data/output/output.txt.
Divide the sentences generated in output.txt by the keyword used, and place them in the arrays input_text1, input_text2, and input_text3.
- Only three topics can be covered during one evaluation run.

Usage

$ python evaluate/topic_relevance.py [-h] keyword

Positional Arguments

keyword

questions generated with this keyword will be chosen from output.txt to evaluate relevance

Example

Train forward/backward language model

$ python model/train.py

$ python model/train.py --backward

Generate questions

$ python model/questions_gen.py

Evaluate diversity and topic relevance (example with sports) of generated questions

$ python evaluate/diversity.py --output ../data/output/output.txt

$ python evaluate/topic_relevance.py sports

Parse and evaluate diversity of chatbot conversation

$ python evaluate/parseMessages.py message_1.json data.txt

$ python evaluate/diversity.py data.txt

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
evaluate		evaluate
model		model
utils		utils
.gitignore		.gitignore
README.md		README.md

2020-fall-cs454-team9/chatbot-CGMH

Folders and files

Latest commit

History

Repository files navigation

Search Based Test Input Generation for Chatbot Testing

KAIST 2020 Fall CS454 Team 9

Introduction

Requirements

Train Language Model

Usage

Optional Arguments

Generate Questions

Usage

Evaluation: Diversity

Evaluate Generated Questions

Evaluate Chatbot Responses

Usage

Positional Arguments

Optional Arguments

Evaluation: Topic Relevance

Evaluate Generated Questions

Usage

Positional Arguments

Example

About

Resources

Stars

Watchers

Forks

Languages