CantCounter

Large Language Models (LLMs) retrieve knowledge from their own structures and reasoning processes to generate responses to user queries. Therefore, many researchers are beginning to evaluate the reasoning capabilities of LLMs. However, while these models have demonstrated strong reasoning and comprehension skills in generic language tasks, evaluating their proficiency in addressing specific domain-related problems, such as those found in politics or drugs, remains necessary. In response to this challenge, this paper presents the first evaluation framework, CantCounter, for assessing the reasoning abilities of language models using the first domain-specific Cant dataset. Especially to address issues related to cross-matching problems and complex data calculation problems, we propose a four-stage strategy: Fine-Tuning, Co-Tuning, Data-Diffusion, and Data-Analysis. Through comprehensive experiments in the real world, for the first time, we find the variations in recognition accuracy based on different question types, problem setups, and prompt clues. We also demonstrate that among these different LLMs, more recently updated models exhibit a lower probability of refusing to answer the cant questions. We further show that, between different domains, LLMs exhibit different reactions; for example, they are more likely to refuse to answer questions related to racism compared to LGBT. Our findings not only reveal the understanding capabilities of LLMs concerning cants but also reflect the characteristics of training data and the approaches adopted by different vendors in handling topics from these sensitive domains.

After the Review Period we update our work, dataset and codes.

Data can be uncomfortable. (contains specific content generated during our experiments, which may include examples of special topics such as drugs and violence and thus may cause discomfort). The complete data set needs to be obtained by contacting the author.

First, make sure you have Python 3.8 installed on your machine

$ python --version
Python 3.8

Next, create a virtual environment and install the project's dependencies within the virtual environment

# Pull warehouse
$ git clone https://github.com/cistineup/CantCounter.git

# Go to directory
$ cd CantCounter

# Install all dependencies
$ pip install -r requirements.txt

# Enter fine-tuning
$ cd cantcounter_finetuned-gpt2-convai-main
$python generate_drug_news.py # Take the field of drugs

Because the fine-tuning is done separately in different domains, you need to set the domains manually

# Enter the overall test
$ cd cantcounter_gpt_test
$ cd drugs

In the case of drugs_zero_shot.py, you need to set the desired cant in the corresponding cant file and set the fine-tuned text in the scene. Put them in drugs_zero_shot.py at the same time and then run.

# Run test
$ python drugs_zero_shot.py

The final test results will be summarized and sorted in the test data.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
114514.py		114514.py
ChatData.py		ChatData.py
README.md		README.md
True.csv		True.csv
accuracy test.py		accuracy test.py
api_key.txt		api_key.txt
cant dataset.xlsx		cant dataset.xlsx
chat_data.json		chat_data.json
demo.py		demo.py
demo02.py		demo02.py
demo03.py		demo03.py
demo04.py		demo04.py
demo05.py		demo05.py
demo06.py		demo06.py
demo07.py		demo07.py
demo08.py		demo08.py
demo09.py		demo09.py
demo10.py		demo10.py
demo11.py		demo11.py
demo12.py		demo12.py
deni.py		deni.py
get_dataset.py		get_dataset.py
main.py		main.py
mothers day.py		mothers day.py
one shot.py		one shot.py
one_shot10.py		one_shot10.py
one_shot11.py		one_shot11.py
one_shot12.py		one_shot12.py
one_shot2.py		one_shot2.py
one_shot3.py		one_shot3.py
one_shot4.py		one_shot4.py
one_shot5.py		one_shot5.py
one_shot6.py		one_shot6.py
one_shot7.py		one_shot7.py
one_shot8.py		one_shot8.py
one_shot9.py		one_shot9.py
openai-0.27.6-py3-none-any.whl		openai-0.27.6-py3-none-any.whl
part1.txt		part1.txt
part10.txt		part10.txt
part11.txt		part11.txt
part12.txt		part12.txt
part2.txt		part2.txt
part3.txt		part3.txt
part4.txt		part4.txt
part5.txt		part5.txt
part6.txt		part6.txt
part7.txt		part7.txt
part8.txt		part8.txt
part9.txt		part9.txt
read_csv.py		read_csv.py
read_txt.py		read_txt.py
requestments.txt		requestments.txt
test accuracy.py		test accuracy.py
test data.rar		test data.rar
test-jx.py		test-jx.py
test.py		test.py
test.txt		test.txt
train.py		train.py
trump_test.txt		trump_test.txt
winnie_the_pooh.txt		winnie_the_pooh.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CantCounter

After the Review Period we update our work, dataset and codes.

About

Releases

Packages

Languages

cistineup/CantCounter

Folders and files

Latest commit

History

Repository files navigation

CantCounter

After the Review Period we update our work, dataset and codes.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages