# Goedel-Prover-V2 — Codespaces CPU Demo (JupyterLab)

> Note: 8B model on CPU FP32 is heavy. Use a large Codespace (>=32GB RAM) and keep n/max_length small. For practical runs, generate in Colab GPU then compile/summarize here.

In [8]:
import sys, subprocess, json, os, pathlib
print(sys.version)
!python -V
!pip -V

3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ]
Python 3.12.2
pip 25.2 from /opt/miniconda3/lib/python3.12/site-packages/pip (python 3.12)


In [9]:
# Minimal Python deps for CPU inference + compile/summarize
!pip install -U pip
!pip install jupyterlab ipywidgets
!pip install torch --index-url https://download.pytorch.org/whl/cpu
!pip install transformers accelerate tqdm pandas sentencepiece

Looking in indexes: https://download.pytorch.org/whl/cpu


## (Optional) Hugging Face login
Only needed if the model repo requires auth.

In [10]:
# from huggingface_hub import login
# login(token="hf_...")
pass

## Prepare a tiny input set
Keep input extremely small for CPU demo.

In [11]:
from pathlib import Path
src = Path('dataset/test.jsonl')
dst = Path('dataset/test_small.jsonl')
if src.exists():
    with src.open('r') as fin, dst.open('w') as fout:
        line = fin.readline()
        if line:
            fout.write(line)
    print('Wrote 1-line sample to', dst)
else:
    print('Warning: dataset/test.jsonl not found. Please add a JSONL input.')

Wrote 1-line sample to dataset/test_small.jsonl


## CPU Inference (very slow; high RAM)
Uses `inference_cpu.py`. Reduce `--n` and `--max_length`.

In [12]:
!python inference_cpu.py \
        --model_path Goedel-LM/Goedel-Prover-V2-8B \
        --input_path dataset/test_small.jsonl \
        --output_dir results/codespaces_cpu \
        --n 1 \
        --max_length 256 \
        --temp 0.2 \
        --use_cpu

Fetching 4 files:   0%|                                   | 0/4 [00:00<?, ?it/s]
model-00004-of-00004.safetensors:   0%|             | 0.00/1.58G [00:00<?, ?B/s][A

model-00002-of-00004.safetensors:   0%|             | 0.00/4.92G [00:00<?, ?B/s][A[A


model-00001-of-00004.safetensors:   0%|             | 0.00/4.90G [00:00<?, ?B/s][A[A[A



model-00003-of-00004.safetensors:   0%|             | 0.00/4.98G [00:00<?, ?B/s][A[A[A[A
model-00004-of-00004.safetensors:   0%|   | 211k/1.58G [00:02<5:45:42, 76.2kB/s][A


model-00001-of-00004.safetensors:   0%|    | 612k/4.90G [00:02<6:09:13, 221kB/s][A[A[A
model-00004-of-00004.safetensors:   0%|   | 211k/1.58G [00:19<5:45:42, 76.2kB/s][A


model-00001-of-00004.safetensors:   0%|    | 612k/4.90G [00:19<6:09:13, 221kB/s][A[A[A

model-00002-of-00004.safetensors:   0%| | 591k/4.92G [04:26<616:06:31, 2.22kB/s][A[A

model-00002-of-00004.safetensors:   0%| | 591k/4.92G [04:39<616:06:31, 2.22kB/s][A[A
model-00004-of-00004.safetensor

## Install Lean toolchain (elan)
Build `mathlib4` to enable compilation via Lean REPL.

In [13]:
# Install elan (Lean toolchain manager) — non-interactive
!curl -sSf https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh | sh -s -- -y
# Note: Lean binaries are installed under ~/.elan/bin/

[1minfo:[0m downloading installer
[1minfo: [mdefault toolchain set to 'stable'


In [14]:
# Build mathlib4 (can take a while on first run)
!bash -lc 'cd mathlib4 && ~/.elan/bin/lake build'

Build completed successfully. (+ 0 more)ry.Cyclotomic.Three (+ 0 more)0 more)alculus.Order (+ 2 more)ore)[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2K[2

## Compile generated codes via Lean REPL
Limit CPU parallelism and proof timeout for stability.

In [17]:
!PROOF_TIMEOUT=180 python src/compile.py \
        --input_path results/codespaces_cpu/to_inference_codes.json \
        --output_path results/codespaces_cpu/code_compilation_repl.json \
        --cpu 2

Traceback (most recent call last):
  File "/Users/ts21/dev/Goedel-Prover-V2/src/compile.py", line 48, in <module>
    with open(input_file_path, 'r') as json_file:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'results/codespaces_cpu/to_inference_codes.json'


## Summarize results

In [18]:
!python src/summarize.py \
        --input_path results/codespaces_cpu/code_compilation_repl.json \
        --full_record_path results/codespaces_cpu/full_records.json \
        --output_dir results/codespaces_cpu/summary

import json, os
meta_path = 'results/codespaces_cpu/summary/meta_summarize.json'
if os.path.exists(meta_path):
    print(json.dumps(json.load(open(meta_path)), indent=2, ensure_ascii=False))
else:
    print('meta_summarize.json not found')

Traceback (most recent call last):
  File "/Users/ts21/dev/Goedel-Prover-V2/src/summarize.py", line 19, in <module>
    df = pd.read_json(input_file)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/pandas/io/json/_json.py", line 791, in read_json
    json_reader = JsonReader(
                  ^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/pandas/io/json/_json.py", line 904, in __init__
    data = self._get_data_from_filepath(filepath_or_buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/pandas/io/json/_json.py", line 960, in _get_data_from_filepath
    raise FileNotFoundError(f"File {filepath_or_buffer} does not exist")
FileNotFoundError: File results/codespaces_cpu/code_compilation_repl.json does not exist
meta_summarize.json not found


### Notes
- If CPU OOM occurs during inference, reduce `--max_length`, `--n`, and input size.
- For practical usage: run inference on Colab GPU (quantized) and copy `to_inference_codes.json` + `full_records.json` here, then run compile/summarize cells only.
- You can adjust `PROOF_TIMEOUT` and `--cpu` to control compile parallelism and timeout.