This repository contains the evaluation scripts for miniCTX: Neural Theorem Proving with (Long-)Contexts.
-
Python 3
-
PyTorch
-
Required Python packages (specified in
requirements.txt)pip install -r requirements.txt
-
Lean 4
-
Mathlib 4, HTPI, PrimeNumberTheoremAnd, PFR, or any other Lean project to test
-
Install Lean 4
Follow the instructions on the Lean 4 installation page to set up Lean 4.
-
Set up and build your target Lean project
To setup the projects in this repository:
git submodule init git submodule update
Then build the project; for instance, for Mathlib:
cd mathlib4 lake exe cache get lake build -
Set up and build Lean REPL
After running
git submodule initandgit submodule update:cd repl lake build
Open script.sh and verify that the paths and parameters are correctly set according to your setup. The script contains the following variables:
TASK: The task name, selected fromtactic_prediction,tactic_prediction_context,full_proof,full_proof_context.MAX_ITERS: The maximum number of iterations (default:100).NUM_SAMPLES: The number of samples (default:32).TEMPERATURES: The temperatures for sampling (default:0.0).DATASET: The name of the dataset (default:mathlib).DATA: The path to the dataset file (default:data/mathlib.jsonl).MODEL: The model name (default:l3lab/ntp-mathlib-context-deepseek-coder-1.3b).NAME: The name for the output directory (default:deepseekCT).OUTPUT_DIR: The directory where the output will be saved.REPL_PATH: The path to the REPL project.LEAN_PATH: The path to the Lean project environment (e.g., Mathlib 4).
Make the script executable and run it:
chmod +x script.sh
./script.shThe output will be saved in the output/${NAME}_mathlib directory, and the log will be saved in mathlib_context.out.
The evaluation code interacts with the Lean compiler using the Lean REPL. repl_wrapper.py provides a Python interface to interact with the Lean REPL directly.
Create a new thread by calling InteractiveThread(thread_id, repl_path, lean_env_path), where:
thread_id: Any numberrepl_path: The path to the REPL directorylean_env_path: The path to the Lean project containing the environment you want to test
Example:
from repl_wrapper import InteractiveThread
thread = InteractiveThread(1, repl_path, lean_env_path)
thread.start()
cmd = {'cmd': 'import MiniF2F.Minif2fImport\n open BigOperators Real Nat Topology'}
output = thread.submit_and_receive(cmd)
thread.close()
thread.join()thread.submit_and_receive takes a dictionary as input and returns the output of the REPL in a dictionary.