This is the official implementation of paper "Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models".
1. Data
We put the GSM8K (grade-school-math), MATH500 (math500), AIME24 (aime24), and AMC23 (amc23) datasets in the ./data folder.
2. Conda environment
conda create -n jointthinking python=3.10
conda activate jointthinking
pip install -r requirements.txt3. Run scripts
Vllm is used as the model launcher.
Then replace your own paths/paramters in the shell files of each dataset.
NUM_THREADS= # Number of processes running simultaneously
API_KEY= # Vllm key, normally is "EMPTY"
API_BASE= # Vllm address
MODEL_NAME= # Vllm model name
DATA_PATH= # Dataset path
PROMPT_TYPE= # Prompt type
TEMPERATURE=
TOPP=
MAX_TOKENS=
N_SHOT=3 # Cuurently support <= 3
REFER= # Only used for "the second thinking"
SAVE_PATH=
Currently we support 14 kinds of prompt formats by --prompt_type argument.
For "qwen3-JointThinking-thinking-before", "qwen3-JointThinking-thinking-middle-open", "JointThinking-thinking-before", "JointThinking-thinking-middle-open", "thinking-twice", and "JointThinking-thinking-middle-open-always", they are the second thinking prompt formats and need --reference_ideas argument to provide questions and generated answers filtered from other prompt format (such as "direct" or "nothinking") results. Such filter can be processed by ./utils/thinking_compare.sh.
For others, you can ignore --reference_ideas argument and generate final answers directly.
