ChainLM

This repository contains the code, dataset, and models in our paper: ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting. We release:

The 44K CoT data generated based on our proposed CoTGenius framework.
The code for generating the data.
The code for fine-tuning.
The code for evaluating the model.
The code for CoT debating.

Overview

CoTGenius is a Chain-of-Thought improvement framework to synthesize more complicated, diverse, and detailed CoT rationales. In this framework, we introduce three evolution strategies for improving CoT, i.e., complicate, diversify, and specify. Following CoTGenius, we generate a large-scale CoT dataset that contains 44335 samples covering commonsense reasoning, mathematical reasoning, scientific reasoning, and symbolic reasoning. Furthermore, we fine-tune open-source LLMs (i.e., Llama 2-Chat 7B and 13B) with our evolved CoT data, called ChainLM, and compare ChainLM to existing popular LLMs on 9 complex reasoning datasets. Finally, based on our ChainLM model, we propose a CoT reasoning strategy,step-level debating.

The Overall Framework of CoTGenius

Data Release

The directory data contains 44k CoT samples generated after 4 rounds based on CoTGenius.

train_data.json is all the improved CoT data in the 4 rounds.
no_cs.json is the data after removing commonsense reasoning categories
no_math.json is the data after removing mathematical reasoning categories
no_sci.json is the data after removing scientific reasoning categories
no_sym.json is the data after removing symbolic reasoning categories
seed.json is the seed dataset used for generation.

Data Generation Process

Our data generation process is a combination of three pipelines.

Complicate: Firstly, we use complication strategy to complicate the questions of the origin data. Secondly, conduct evolutionary success judgement based on the complexity of the new questions. Then, generate answers to new questions. Finally, conduct correctness verification for new <question, CoT> samples.
Diversify: Similar to complication, but use diversification methods to guide question generation.
Specify: First rewrite the CoTs in the seed dataset and then conduct evolutionary success judgement.

To perform the generation process using CoTGenius, three scripts [complicate.sh, diversify.sh, specify.sh] are provided in generate.

cd generate
bash complicate.sh
bash diversify.sh
bash specify.sh

Fine-tune

We fine-tune Llama 2-Chat 7B and 13B models with our dataset. We call the CoT fine-tuning model ChainLM. The fine-tuning code is adopted from Alpaca.

cd fine-tune
bash run.sh

Evaluation

We conduct evaluation on 9 datasets independent of the seed dataset and present the performance.

cd evaluate
bash test.sh

CoT Debating

Based on the MagicLM, we propose Step-level CoT Debating strategy. To evaluate with CoT debating:

cd debate
bash run.sh

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
asset		asset
data		data
debate		debate
evaluate		evaluate
finetune		finetune
generate		generate
.DS_Store		.DS_Store
LISENCE		LISENCE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asset

asset

data

data

debate

debate

evaluate

evaluate

finetune

finetune

generate

generate

.DS_Store

.DS_Store

LISENCE

LISENCE

README.md

README.md

Repository files navigation

ChainLM

Overview

Data Release

Data Generation Process

Fine-tune

Evaluation

CoT Debating

About

Releases

Packages

Languages

RUCAIBox/ChainLM

Folders and files

Latest commit

History

Repository files navigation

ChainLM

Overview

Data Release

Data Generation Process

Fine-tune

Evaluation

CoT Debating

About

Resources

Stars

Watchers

Forks

Languages