Skip to content

Official code for paper: INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

License

Notifications You must be signed in to change notification settings

albertqjiang/INT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INT

This is the official code for the paper INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving.

To cite the paper, use:

@inproceedings{WJBG2021INT,
  author    = {Yuhuai Wu and
               Albert Jiang and
               Jimmy Ba and
               Roger Baker Grosse},
  title     = {{INT:} An Inequality Benchmark for Evaluating Generalization in Theorem
               Proving},
  booktitle = {9th International Conference on Learning Representations, {ICLR} 2021,
               Virtual Event, Austria, May 3-7, 2021},
  publisher = {OpenReview.net},
  year      = {2021},
  url       = {https://openreview.net/forum?id=O6LPudowNQm},
}

This directory contains

  • a light-weight theorem prover with fast interactions
  • the generator of theorems and proofs in the INT dataset
  • the scripts for training agents to prove such theorems

Installation

The software packages required are:

PyTorch

PyTorch geometric

DGL

pytorch-a2c-ppo-acktr

OpenAI Gym

OpenAI Baselines

Setup

git clone https://github.com/albertqjiang/INT
cd INT

Data

To generate the theorems and proofs, we need to first generate the axiom combinations and orders to use. The following command generates the axiom combinations and orders for fewer than 5 unique axioms and 5 axiom applications in a proof, with 10000 trials for each (k, l) pair.

python -m int_environment.data_generation.combos_and_orders --combo_path data/benchmark/field --max_k 5 --max_l 5 --trial 10000

After getting the axiom combinations and orders, we can use them to generate synthetic theorems and proofs. The command below generates 100 separate theorems and corresponding proofs, for graph processing, with the parameters k=5, l=5.

python -m int_environment.data_generation.generate_problems --orders_path data/benchmark/field -k 5 -l 5 --num_probs 100 -dp $dump_path

For seq2seq training, use the following command instead:

python -m int_environment.data_generation.gen_seq2seq --orders_path data/benchmark/field/  -k 5 -l 5 --num_probs 100 -dp $dump_path

To visualise 5 proofs just generated using the seq2seq mode, use the following command:

python -m int_environment.visualization.display_proofs --proof-file $dump_path/problems.pkl --how-many 5

Training

We prepare the training script for a graph-neural-network based agent. The command below trains the agent for 1 million updates in an online setting. We generate 1000 theorems and proofs with the setting k=3, l=3 and train 10 epochs on them before generating a new set of problems. The training results and models are dumped in directory data/pt_models.

mkdir data/pt_models
python -m int_environment.algos.main -cp data/benchmark/field -du data/pt_models/ --online -trs k\=5_l\=5 -tes k\=5_l\=5 -epod 10 -np 1000 --lr 1e-4 -u 1000000 -tg --degree 0 --seed 0

Executing tests

Simply run:

pytest test

Documentation for important functions and classes

INT/data_generation/generate_problems.py:268

def generate_multiple_problems(num_axioms, length, num_probs, **kwargs):
    """
    Generate multiple theorems and proofs and return the tuple (datasets, problems)

    :param num_axioms: the number of unique axioms in the proofs generated

    :param length: the length of the proofs generated

    :param num_probs: how many theorems and proofs to generate

    :param kwargs: keyword arguments to provide additional specifications for the problems generated

        :keyword train_test: str, optional
        Could be "train" or "test", specifies which partition of the axiom combinations or orders to use, default: "train"

        :keyword combos: dict, optional 
        The dictionary containing all the available combinations to use in generation, cannot appear at the same time as orders
        
        :keyword orders: dict, optional
        The dictionary containing all the available orders to use in generation, cannot appear at the same time as combos
        
        :keyword avoid_objective_names: list, optional
        The list containing all the objective names we wish to avoid during generation. Use this to prevent having test problems, default: []
        
        :keyword ivs: list, optional
        Individual variables to initialize the prover, default: [a, b, c]
        
        :keyword ed: dict, optional
        Entity dictionary of the initial prover, default: [0:a, 1:b, 2:c]
        
        :keyword ent_per_degree: int, optional
        The number of entities to sample from each degree, default: 10
        
        :keyword degree: int, optional
        The maximum degree of entities sampled, default: 0
        
        :keyword no_atom_ent_max: int, optional
        Maximum number of atomic entities, default: 20
        
        :keyword no_node_max: int, optional
        Maximum number of nodes in the graphs, default: 2000
        
        :keyword entity_length_limit: int, optional
        Maximum length of entities in characters, default: 20
        
        :keyword transform_gt: bool, optional
        Whether to enable transform_gt, default: True
        
        :keyword probability_no_transform: float, optional
        Probability of not doing transform_gt in each step, default 0.0

    :return: tuple (Datasets, Problems)

        Datasets is a list containing:
            One dataset with the keyword "all" contains all the proof steps generated, randomly shuffled
            One dataset with the keyword "all_first" contains the first proof steps of each problem, randomly shuffled
        
        Problems is a list, each element of which is all the proof steps for an individual theorem
    """

INT/proof_system/prover.py:4

class Prover:
    def __init__(self, axioms: dict, conditions: list, objectives: list, prove_direction: str):
        """
        This prover is capable of doing forward generation and backward proving
        :param axioms: the axioms that can be used inside the prover
        :param conditions: the conditions to start the proof with
        :param objectives: the objectives to prove, usually only one
        :param prove_direction: either "forward" or "backward",
                                use forward for generation and backward for writing proofs
        """

    @staticmethod
    def _trivial(logic_statement):
        """
        Checks if a logic statement is trivially true or not
        """

    def update_conditions(self):
        """
        Register the initial conditions as ground truth
        """

    def add_logic_statement(self, logic_statement):
        """
        Index and register a logic statement
        """

    def add_logic_statements(self, logic_statement_list):
        """
        Index and register multiple logic statements
        """

    def interpret_result(self, result):
        """
        Interpret the result of a theorem application
        """

    def _parse_entity_ids_from_entity(self, entity):
        """
        Get the ids of all entities that are subtrees of a given entity(including itself)
        """

    def _parse_entity_ids_from_ls(self, logic_statement):
        """
        Get the ids of all entities in a logic statement
        """

    def _add_entity(self, entity):
        """
        Add one entity to the register, return the id.
        """

    def _add_entities(self, entities):
        """
        Add multiple entities to the register, return a list of the ids of the entities
        """

    def get_entities(self):
        """
        Get all the entities in the prover ever registered
        """

    def get_ground_truth(self):
        """
        Get all the ground truth in the prover ever registered,
        including the initial conditions and statements proven later
        """

    def get_objectives(self):
        """
        Get the current objectives in the prover
        """

    def get_observation(self):
        """
        Get the observation for the prover environment, this should contain everything needed to prove the theorem
        """

    def logic_statement_connected(self, ls_id):
        """
        Determine whether the logic statement with the id is connected to premises
        """

    def _logic_statements_exist_and_are_proven(self, ls_list):
        """
        Determine if all logic statements in ls_list exist in the prover and are proven
        """

    def apply_theorem(self, theorem, operands):
        """
        Apply a theorem with operands
        """

    def is_proved(self):
        """
        Determine whether the proof is complete
        """

About

Official code for paper: INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published