# HOL4 to LEAN Translation using Gemini API

This notebook translates HOL4 theorem statements to LEAN using Google's Gemini API.

## Setup and Imports

In [29]:
# Install required packages
!pip install google-generativeai



In [30]:
import json
import google.generativeai as genai
import time
from typing import List, Dict
import os

## Configuration

Set your Gemini API key here. You can get one from: https://makersuite.google.com/app/apikey

In [31]:
# Set your API key
API_KEY = os.getenv("GEMINI_API_KEY")

genai.configure(api_key=API_KEY)

## File Paths Configuration

In [32]:
# Input and output file paths
INPUT_FILE = "extracted/output.json"
OUTPUT_FILE = "extracted/output_lean.json"

## Initialize Gemini Model

In [33]:

model = genai.GenerativeModel('gemini-2.5-pro')


## Translation Functions

### Translation Strategy

This notebook now uses a **batch translation approach** where all statements are sent to the LLM at once. This has several advantages:

1. **Dependency Awareness**: The LLM can see all Datatypes, Definitions, and Theorems together, understanding how they relate to each other.

2. **Type Consistency**: When translating theorems that reference datatypes or definitions, the LLM knows exactly how those types were translated.

3. **Efficiency**: Only one API call is needed instead of multiple calls (though this may hit token limits for very large files).

4. **Ordering**: Statements are sorted during extraction (Datatypes → Definitions → Theorems) to ensure dependencies are presented in the correct order.

**Note**: For very large files (>100 items), you may need to split them into chunks to avoid token limits.


In [34]:
def translate_all_statements(data: List[Dict]) -> List[Dict]:
    """
    Translate all HOL4 statements to LEAN in one API call, considering dependencies.
    
    Args:
        data: List of dictionaries with 'kind', 'name', and 'statement' fields
    
    Returns:
        List of translated items with LEAN statements
    """
    # Input data is already sorted (Datatypes → Definitions → Theorems)
    # No need to sort again
    
    # Build the prompt with all statements
    prompt = """You are an expert in formal theorem proving systems. Translate ALL of the following HOL4 statements to LEAN 4 syntax.

IMPORTANT: The statements are ordered by dependency - Datatypes first, then Definitions, then Theorems. Many theorems and definitions depend on the datatypes and earlier definitions. Please consider these dependencies when translating.

Instructions:
- Use LEAN 4 syntax (not LEAN 3)
- Preserve the logical structure and meaning
- Use appropriate LEAN type annotations
- Handle option types (SOME/NONE in HOL4 → some/none in LEAN)
- Convert HOL4 list notation to LEAN list notation
- Use LEAN's unicode symbols where appropriate (e.g., ∀, ∃, →, ∧, ∨)
- Pay attention to type definitions (Datatypes) as later statements may reference them
- Ensure definitions are properly typed based on earlier type definitions
- When a theorem references a datatype or definition above, reuse the exact translated identifier and signature from that earlier translation

Format your response as a JSON array where each element has:
{
  "name": "original_name",
  "statement": "translated LEAN 4 statement"
}

Here are the HOL4 statements to translate:

"""
    
    # Add all statements to the prompt (using original order)
    for i, item in enumerate(data, 1):
        prompt += f"\n{i}. {item['kind']}: {item['name']}\n"
        prompt += f"   HOL4 Statement:\n   {item['statement']}\n"
    
    prompt += "\n\nPlease provide the translations as a JSON array. Include ONLY the JSON array in your response, no additional text or markdown."
    
    try:
        print("Sending all statements to LLM for translation...")
        response = model.generate_content(prompt)
        response_text = response.text.strip()
        
        # Clean up markdown formatting if present
        if response_text.startswith("```json"):
            response_text = response_text.replace("```json", "").replace("```", "").strip()
        elif response_text.startswith("```"):
            lines = response_text.split("\n")
            response_text = "\n".join(lines[1:-1]).strip()
        
        # Parse the JSON response
        translated_items = json.loads(response_text)
        
        # Match translations back to original items (preserving original order)
        name_to_translation = {item['name']: item['statement'] for item in translated_items}
        
        result = []
        for item in data:  # Use original order
            lean_statement = name_to_translation.get(item['name'], f"[Translation not found for {item['name']}]")
            translated_item = {
                "kind": item['kind'],
                "name": item['name'],
                "statement": lean_statement,
                "original_hol4": item['statement']
            }
            if 'source_file' in item:
                translated_item['source_file'] = item['source_file']
            result.append(translated_item)
        
        return result
        
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON response: {str(e)}")
        print(f"Response text: {response_text[:500]}...")
        raise
    except Exception as e:
        print(f"Error during translation: {str(e)}")
        raise

In [35]:
def translate_json_file_chunked(input_path: str, output_path: str, chunk_size: int = 50) -> None:
    """
    Translate statements in chunks for large files.
    Useful when the file is too large to process in one API call.
    
    Args:
        input_path: Path to input JSON file
        output_path: Path to output JSON file
        chunk_size: Number of items to process per chunk
    """
    # Load the input JSON file
    print(f"Loading input file: {input_path}")
    with open(input_path, 'r', encoding='utf-8') as f:
        data = json.load(f)
    
    print(f"Found {len(data)} items to translate")
    
    # Input data is already sorted by dependency order (Datatypes → Definitions → Theorems)
    # No need to sort again
    
    # Split into chunks
    chunks = [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]
    print(f"Processing in {len(chunks)} chunks of up to {chunk_size} items each")
    
    all_translated = []
    for i, chunk in enumerate(chunks, 1):
        print(f"\nProcessing chunk {i}/{len(chunks)} ({len(chunk)} items)...")
        translated_chunk = translate_all_statements(chunk)
        all_translated.extend(translated_chunk)
        
        # Small delay between chunks
        if i < len(chunks):
            time.sleep(2)
    
    # No need to restore original order since we preserved it throughout
    
    # Save the translated data
    print(f"\nSaving translated data to: {output_path}")
    with open(output_path, 'w', encoding='utf-8') as f:
        json.dump(all_translated, f, indent=2, ensure_ascii=False)
    
    print(f"\nTranslation complete! {len(all_translated)} items translated.")

## Test Translation on a Single Example

Let's test the translation on one theorem first:

In [36]:
# Load a sample from the input file
with open(INPUT_FILE, 'r', encoding='utf-8') as f:
    sample_data = json.load(f)

print(f"Loaded {len(sample_data)} items")
print(f"  Datatypes: {sum(1 for x in sample_data if x['kind'] == 'Datatype')}")
print(f"  Definitions: {sum(1 for x in sample_data if x['kind'] == 'Definition')}")
print(f"  Theorems: {sum(1 for x in sample_data if x['kind'] == 'Theorem')}")

# Test translation on a small subset (first 5 items)
test_sample = sample_data[:5]
print(f"\nTesting translation on first {len(test_sample)} items...")

translated_sample = translate_all_statements(test_sample)

print(f"\nTranslation Results:")
print("="*80)## Run Full Translation

# Choose one of the following approaches based on your file size:

# Option 1: Translate all at once (recommended for files with < 100 items)
print("Starting full translation...")
translate_json_file(INPUT_FILE, OUTPUT_FILE)

# Option 2: Use chunked translation for larger files (uncomment if needed)
# print("Starting chunked translation...")
# translate_json_file_chunked(INPUT_FILE, OUTPUT_FILE, chunk_size=50)

print("Full translation completed!")
for item in translated_sample:
    print(f"\n{item['kind']}: {item['name']}")
    print(f"HOL4: {item['original_hol4'][:100]}..." if len(item['original_hol4']) > 100 else f"HOL4: {item['original_hol4']}")
    print(f"LEAN: {item['statement'][:100]}..." if len(item['statement']) > 100 else f"LEAN: {item['statement']}")
    print("-"*80)


Loaded 101 items
  Datatypes: 21
  Definitions: 27
  Theorems: 53

Testing translation on first 5 items...
Sending all statements to LLM for translation...

Translation Results:
Starting full translation...
Loading input file: extracted/output.json
Found 101 items to translate
  Datatypes: 21
  Definitions: 27
  Theorems: 53
Sending all statements to LLM for translation...

Saving translated data to: extracted/output_lean.json

Translation complete! 101 items translated.
Full translation completed!

Datatype: lit
HOL4: lit =
    IntLit int
  | Char char
  | StrLit string
  | Word8 word8
  | Word64 word64
  | Float64 w...
LEAN: inductive Lit where
  | intLit : Int → Lit
  | charLit : Char → Lit
  | strLit : String → Lit
  | wo...
--------------------------------------------------------------------------------

Datatype: opn
HOL4: opn = Plus | Minus | Times | Divide | Modulo
LEAN: inductive Opn where
  | plus
  | minus
  | times
  | divide
  | modulo
-------------------------------------

In [37]:
## Run Full Translation

# Choose one of the following approaches based on your file size:

# Option 1: Translate all at once (recommended for files with < 100 items)
print("Starting full translation...")
translate_json_file(INPUT_FILE, OUTPUT_FILE)

# Option 2: Use chunked translation for larger files (uncomment if needed)
# print("Starting chunked translation...")
# translate_json_file_chunked(INPUT_FILE, OUTPUT_FILE, chunk_size=50)

print("Full translation completed!")

Starting full translation...
Loading input file: extracted/output.json
Found 101 items to translate
  Datatypes: 21
  Definitions: 27
  Theorems: 53
Sending all statements to LLM for translation...

Saving translated data to: extracted/output_lean.json

Translation complete! 101 items translated.
Full translation completed!


## Export Statistics

In [38]:
# Generate statistics about the translation
with open(OUTPUT_FILE, 'r', encoding='utf-8') as f:
    translated_data = json.load(f)

print("Translation Statistics:")
print("="*50)
print(f"Total items translated: {len(translated_data)}")

# Count by kind
kind_counts = {}
for item in translated_data:
    kind = item['kind']
    kind_counts[kind] = kind_counts.get(kind, 0) + 1

print("\nBreakdown by kind:")
for kind, count in kind_counts.items():
    print(f"  {kind}: {count}")

# Check for translation errors
errors = [item for item in translated_data if "[Translation Error" in item['statement']]
print(f"\nTranslation errors: {len(errors)}")

if errors:
    print("\nItems with errors:")
    for item in errors:
        print(f"  - {item['name']}")

Translation Statistics:
Total items translated: 101

Breakdown by kind:
  Datatype: 21
  Definition: 27
  Theorem: 53

Translation errors: 0


## Check Dependency Awareness

Let's verify that the translation properly handles dependencies between Datatypes, Definitions, and Theorems:


In [39]:
# Show examples of how datatypes, definitions, and theorems are related
with open(OUTPUT_FILE, 'r', encoding='utf-8') as f:
    translated_data = json.load(f)

# Find datatypes
datatypes = [item for item in translated_data if item['kind'] == 'Datatype']
definitions = [item for item in translated_data if item['kind'] == 'Definition']
theorems = [item for item in translated_data if item['kind'] == 'Theorem']

print("Dependency Chain Example:")
print("="*80)

if datatypes:
    print("\n1. DATATYPE (defines types used by definitions and theorems):")
    dt = datatypes[0]
    print(f"   Name: {dt['name']}")
    print(f"   HOL4: {dt['original_hol4'][:80]}...")
    print(f"   LEAN: {dt['statement'][:80]}...")

if definitions:
    print("\n2. DEFINITION (may use datatypes, used by theorems):")
    defn = definitions[0]
    print(f"   Name: {defn['name']}")
    print(f"   HOL4: {defn['original_hol4'][:80]}...")
    print(f"   LEAN: {defn['statement'][:80]}...")

if theorems:
    print("\n3. THEOREM (may use datatypes and definitions):")
    thm = theorems[0]
    print(f"   Name: {thm['name']}")
    print(f"   HOL4: {thm['original_hol4'][:80]}...")
    print(f"   LEAN: {thm['statement'][:80]}...")




Dependency Chain Example:

1. DATATYPE (defines types used by definitions and theorems):
   Name: lit
   HOL4: lit =
    IntLit int
  | Char char
  | StrLit string
  | Word8 word8
  | Word64 ...
   LEAN: inductive lit where
  | IntLit (i : Int) : lit
  | Char (c : Char) : lit
  | Str...

2. DEFINITION (may use datatypes, used by theorems):
   Name: isFpBool_def
   HOL4: isFpBool op = case op of FP_cmp _ => T | _ => F...
   LEAN: def isFpBool (o : op) : Bool :=
  match o with
  | op.FP_cmp _ => true
  | _ => ...

3. THEOREM (may use datatypes and definitions):
   Name: mk_id_surj
   HOL4: !id. ?p n. id = mk_id p n...
   LEAN: theorem mk_id_surj {m n} (i : id m n) : ∃ (p : List m) (val : n), i = mk_id p va...


In [40]:
## Generate LEAN File

# Create a single .lean file with all translated content
def create_lean_file(input_json_path: str, output_lean_path: str) -> None:
    """
    Create a .lean file from the translated JSON data.
    
    Args:
        input_json_path: Path to the translated JSON file
        output_lean_path: Path to the output .lean file
    """
    with open(input_json_path, 'r', encoding='utf-8') as f:
        translated_data = json.load(f)
    
    with open(output_lean_path, 'w', encoding='utf-8') as f:
        # Write file header
        f.write("-- Auto-generated LEAN 4 file from HOL4 translation\n")
        f.write("-- Generated using Gemini API\n\n")
        
        # Process each item in order (already sorted: Datatypes -> Definitions -> Theorems)
        for item in translated_data:
            kind = item['kind']
            name = item['name']
            statement = item['statement']
            
            # Add a comment with the original HOL4 statement
            f.write(f"-- Original HOL4 {kind}: {name}\n")
            f.write(f"-- {item['original_hol4']}\n")
            
            if kind == "Datatype":
                # Write datatype as-is
                f.write(f"{statement}\n\n")
                
            elif kind == "Definition":
                # Write definition as-is
                f.write(f"{statement}\n\n")
                
            elif kind == "Theorem":
                # Format theorem with := sorry for auto-proving
                # Extract theorem name from the statement if possible
                if statement.startswith("theorem "):
                    # Handle "theorem name : statement"
                    f.write(f"{statement} := sorry\n\n")
                else:
                    # Fallback: assume the statement is just the proposition
                    f.write(f"theorem {name} : {statement} := sorry\n\n")
    
    print(f"LEAN file created: {output_lean_path}")
    
    # Print statistics
    datatypes = sum(1 for x in translated_data if x['kind'] == 'Datatype')
    definitions = sum(1 for x in translated_data if x['kind'] == 'Definition')
    theorems = sum(1 for x in translated_data if x['kind'] == 'Theorem')
    
    print(f"Content summary:")
    print(f"  - {datatypes} Datatypes")
    print(f"  - {definitions} Definitions")
    print(f"  - {theorems} Theorems (with := sorry)")

# Generate the .lean file
LEAN_OUTPUT_FILE = "extracted/translated_statements.lean"
create_lean_file(OUTPUT_FILE, LEAN_OUTPUT_FILE)

LEAN file created: extracted/translated_statements.lean
Content summary:
  - 21 Datatypes
  - 27 Definitions
  - 53 Theorems (with := sorry)
