Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neural_PaMpeR: build the database again with each line tagged with proof obligations. #178

Open
yutakang opened this issue Sep 10, 2020 · 16 comments
Assignees

Comments

@yutakang
Copy link
Collaborator

No description provided.

@yutakang yutakang self-assigned this Sep 10, 2020
@brando90
Copy link

brando90 commented Oct 5, 2020

hi @yutakang, is this the data set we spoke about? Can I help you make it?

@yutakang
Copy link
Collaborator Author

yutakang commented Oct 7, 2020

Hi @brando90,

hi @yutakang, is this the data set we spoke about?

Yes.
I created a new directory Neural_PaMpeR at https://github.com/data61/PSL/tree/master/Neural_PaMpeR

Bits and pieces of the code base for the data extraction for PaMpeR are outdated for Isabelle2020.
So, I have to update them.

And unfortunately I did not record on what date I downloaded the AFP articles to build the database I used for the CICM paper.

So, I have to build the database again, so that I can match each line of database with the string representing each proof goal.

I can use a big server machine in Innsbruck, so in terms of computational resources I think it's alright.

@yutakang yutakang changed the title PaMpeR2: build the database again with each line tagged with proof obligations. Neural_PaMpeR: build the database again with each line tagged with proof obligations. Oct 7, 2020
@brando90
Copy link

brando90 commented Oct 7, 2020

Hi @brando90,

hi @yutakang, is this the data set we spoke about?

Yes.
I created a new directory Neural_PaMpeR at https://github.com/data61/PSL/tree/master/Neural_PaMpeR

Bits and pieces of the code base for the data extraction for PaMpeR are outdated for Isabelle2020.
So, I have to update them.

And unfortunately I did not record on what date I downloaded the AFP articles to build the database I used for the CICM paper.

So, I have to build the database again, so that I can match each line of database with the string representing each proof goal.

I can use a big server machine in Innsbruck, so in terms of computational resources I think it's alright.

ok cool let me know if there is something I can do to help to get the proof obligations.

Thanks!

@yutakang yutakang pinned this issue Oct 12, 2020
@yutakang
Copy link
Collaborator Author

Hi @brando90 ,

I produced a sample datapoint for this lemma.

This sample datapoint is expressed in 4 lines:

  • 1 line for the results of the assertions,
  • 3 lines for the strings of proof obligations. (1 for the first subgoal and 2 for the local assumptions (a.k.a. chained facts))

Each line representing this datapoint starts with the location information (file name and line number).
The lines representing first goals are tagged with "First_Subgoal", and the lines representing chained facts are tagged with "Chained_Fact".

What do you think?

@brando90
Copy link

brando90 commented Nov 9, 2020

Hi @brando90 ,

I produced a sample datapoint for this lemma.

This sample datapoint is expressed in 4 lines:

  • 1 line for the results of the assertions,
  • 3 lines for the strings of proof obligations. (1 for the first subgoal and 2 for the local assumptions (a.k.a. chained facts))

Each line representing this datapoint starts with the location information (file name and line number).
The lines representing first goals are tagged with "First_Subgoal", and the lines representing chained facts are tagged with "Chained_Fact".

What do you think?

Hi yutaka,

that looks good for that goal. I do want to keep the hardcoded vector representation that you came up with. The the most important is that for every node in the proof tree that we have it's raw formula. So if we go from assm1 g1 to amms2 g2 using tac then we want:

(assm1 g1, Tac)

the next set of goals would be a data point for the next tactic (not included in my example). I think your example might be too simple to see if it's robust but I think it's ok. What we want is to know what tactic was done to prove a specific goal with a specific local context.

If the tactic took an argument, we'd also like that to be part of the target e.g.

(assm1 g1, Tac arg)

Does that make sense?

@brando90
Copy link

@yutakang actually perhaps this the best thing:

  1. I think what tactician suggests is better: "tactica1, is saved into the database accompanied both by the proof states before and after tactic execution, in this case, <Γ_a1 |- σ1, tactic_a1, Γ_a2 |- σ2" is good.

I still think the arguments to the tactic should be saved.

For future work, I think event just saving the entire proof tree from a proof script would be best. That way ML researchers can extract what they want (e.g. parents, steps, anything from the proof tree but perhaps that's for a different discussion but I'd be curious on how easy from an engineering perspective this is to do for Isabelle)

@brando90
Copy link

by proof tree I mean the tree from the tactics (not from the "logicians" perspective)

@brando90
Copy link

this is ultimately the proof I am suggesting:

(current_goal, assumptions, tactic arguments, resulting obligations)

@brando90
Copy link

brando90 commented Nov 17, 2020

@yutakang

Summary of discussion:

We want to create a data set that is as complete as possible (mainly because its hard to recover information that is missing once it's created in an offline setting). I suggest thus we have something like this:

data = (A1==>G1, assms, Tac args args_names, A2==>G2)

does this sound good? Is there anything missing?

@brando90
Copy link

@yutakang what is the difference between A1 vs assms?

@brando90
Copy link

hi @yutakang! How are you? How is the data set we discussed going?

Regards, Brando

@yutakang
Copy link
Collaborator Author

Hi @brando90 ,

Sorry, my only brother was dying, which was causing lots of problems.
To handle the situation I limited my time for research to the lowest possible level, which was just enough not to be fired... at least so far.

I am back in Singapore and resuming Isabelle-related work in my spare time.
Sorry, it might be already too late for you and your folks by now.

By the way, the notification from GitHub sometimes doesn't work for me.
In case you don't receive a reply from me, please send an email at yutaka@yale-nus.edu.sg or tweet at YutakangE.

Best regards,
Yutaka

@brando90
Copy link

Hi @brando90 ,

Sorry, my only brother was dying, which was causing lots of problems.
To handle the situation I limited my time for research to the lowest possible level, which was just enough not to be fired... at least so far.

I am back in Singapore and resuming Isabelle-related work in my spare time.
Sorry, it might be already too late for you and your folks by now.

By the way, the notification from GitHub sometimes doesn't work for me.
In case you don't receive a reply from me, please send an email at yutaka@yale-nus.edu.sg or tweet at YutakangE.

Best regards,
Yutaka

No Worries Yutaka, family is really important. Recently my father also had a terrible complication - likely as severe as yours.

It's not to late to create the data set we were chatting about but it might take me longer to implement agents to benchmark it or get a team to try that. Let me know if you do have time for that.

Regards, Brando

@brando90
Copy link

btw, I did recently made my repo public in case someone wanted to help us out: brando90/isabelle-gym#30

@yutakang
Copy link
Collaborator Author

Hi @brando90 ,

I am sorry about your father's situation.

I have recovered a little and started working on this repository in my spare time.
Josef at CTU in Prague told me that I can use machines to extract data.

So, probably I can extract the dataset again.

I check this repository every once in a while.
But if you don't receive a reply from me, please send an email at yutaka@yale-nus.edu.sg or tweet at YutakangE.

@brando90
Copy link

brando90 commented Jul 19, 2021 via email

@yutakang yutakang unpinned this issue Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants