-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mm0 for kotlin #78
Comments
The proofchecker doesn't know anything about mmu files. Mmu files are compiled to mm0 files containing all theorems and mmb fioes containing all proofs. The proofchecker then checks that all theorems in the mm0 file are satisfied by using the proofs in the corresponding mmb file. Mmu files only exist to make it easier to write everything as far as I know. This allows for a simpler and thus more likely correct proofchecker. |
thanks for the answer. The way I see it, mmu files would be a really good interchange format between software/platforms. Once mmu (and mm0) is fully understood, it is easier to implement a performance-oriented custom binary format to work with proofs. Though I am planning to use a custom binary format (that avoids allocation, if possible by proofChecking in place), Also, just before merging mm0/mmu files, I check that the mm0 statement and the mmu directive points to the same stuff (or the public guman api publicized in the mm0 file would be lying about what goes on, whch is unproductive) and then I check theorems (assynchronically) before yielding a merged statement/directive...(with a Future check answer) It allows me to build more complete error reports than just the first one and stopping |
My Bad. I was using old versions of peano.mm0 and peano.mmu With the latest versions of peano;mmu and peano.mm0, I'm still encountering issues with names though like |
MMU files are the "text mode" representation of MMB files, like with WASM text/binary formats. They are intended primarily for debugging, and you can use MMB files are not platform dependent, and while I think the possibility of alternate proof formats still exists I'm trying to get all the supporting tools to converge on MMB. Currently you can use the If you are planning a custom binary format, then I would like to know what you intend to support so that MMB can be made inter-convertible with it. I recently added an extensible table structure for the "index" (the non-verified data), which should now be able to handle arbitrary metadata coming from different provers like theorem names, attributes on theorems, lisp metaprograms (in the case of MM1) and so on. Regarding the error in peano.mmu, you might consider checking up to alpha equivalence (i.e. including renamings of local variables), but the real reason for the error is that peano.mmu is ancient and almost certainly does not reflect recent changes to peano.mm1. But it's not so hard to regenerate the file using |
I updated the MMB and MMU files in commit 13e7bfd. |
There is now an MMB spec at https://github.com/digama0/mm0/blob/master/mm0-c/mmb.md . Hopefully that addresses your documentation needs, and if not feel free to make additional suggestions on what should go in there if something needs clarification. |
This helps really a lot (now I can look at the mmu documentation and the mmb documentation, without having to learn a new language). This makes things a lot better on my side. I have juste looked quickly at the mmb spec. I am sure that it is full of really smart stuff.
And Java uses (or used to use) modified UTF-16 strings (not utf8). So, in my experience, something that works really great for C++ isn't sometimes appropriate for Kotlin/Jvm
So, that probably means using ByteBuffer (to be able to mmap things) and working on bytes like you do, .. So well, maybee there is a way, but it is too soon for me to know; So, I cannot promise anything. :) I'll look really really hard at it and try to implement the smart stuff I get. Time will tell. |
I ponder your answer. So you say that This is not what I have been believing for the past year and a half (but maybe I was wrong) I thought that mmu and mmb files were intercheangeable and could both be used with a mm0 file to checkproof stuff Also, till now, only mmu was documented which led me to think that it was expected for software developer to implement their own binary format and to use the documented mmu format for interoperability with others. My bad, if I was wrong. Anyway, it is already hard enough to implement a (correct) proofchecker for mm0 + mmu (it is even harder with alpha renaming to do), I cannot fathom how hard it would be for someone new to mm0 to implement a correct proof checker for mm0 + mmb On a side point, the documentation of mmb is a real treasure. Thank you Mario for this effort. I USED to have some minor gripes with mm0/mmu. At the moment, I do not understand why the names of variables aren't the same on the mm0 and the mmu side. Is there something that is supposed to help computing the alpha-renaming stuff ? Am I missing something ? |
Kotlin should presumably have a way of decoding UTF-8 into whatever the native format is. I am using UTF-8 for specification purposes, in case other proof translators want to use unicode characters (like lean, for example), but all strings that appear in all MM0 and MM1 files in this repo (including the translated ones) are pure ASCII (and more than that, restricted to the very small character set
Using
I think @bjorn3 is confusing mmu with mm1 files. The picture is something like this:
You can use
I haven't really decided how strict to be about allowing names in the mmu file to be different from the corresponding mm0 file. It's certainly easier to require that they are the same, but I also want to minimize the number of instances where the mm0 file has to change because of convenience for the proof author. For example, the scenario might be that the mm0 is written by someone (the "client") who wants a particular theorem verified, and the proof author should still be able to have some flexibility in writing the proof as they would like to without having to bother the client to make trivial changes to the mm0 file (which will require review, i.e. "have we accidentally trivialized the theorem?"). This is weighed against the bother of having to alpha rename things. In mmb this isn't really an issue because everything is named by indices, so the textual names of things matter very little, except when parsing the mm0 file. In fact the mm0-c verifier doesn't even care if you give different names to all the term constructors and theorems, as long as they have the same types and come in the same order. But in mmu, since it's a textual format, it's logical to index things by string names, and then you have to keep track also of the name used in the mm0 file (if applicable) for each of these entities so that you can perform the necessary translation. In any case, for the present if it makes easier for you you can just assume that alpha renaming isn't necessary.
Ah, okay. Alpha renaming is really simple here, no fancy stuff is needed. Suppose you are trying to match expressions like:
You initialize a name map, let's say MM0 -> MMU although I think either direction will work. It starts out empty. You read the binders on both sides: The reason your complicated alpha renaming possibilities don't come up is because the order of arguments is not allowed to change. It's also possible to do alpha renaming at the level of statements; that is we would also keep a mapping of sorts, terms and such. In this case we would have |
My bad. |
This is the important bit that I was missing. I guess that this is also true for definitions, theorems and axioms. Without that bit, the mm0/mmu model I had in my head was : Instead of seeing Strees (termId arg1 arg2) that require binder order, I think that there are still alpha-renaming issues in peano.mm0 and peano.mmu (you did not refresh peano.mm0) IMO, it is weird that the proof author would not provide a proof with the variables names used by the client (he can change the variables names in his proof assistant software and change them back one he is finished). But oh well, it will be ok. :) Thanks for taking the time to explain things to me. |
Fixed in 16fa2d8
Sure. But I'm not really "finished" with peano.mm1, and proof assistants should be useful even before the proof is "finished". For a bare bones verifier it makes sense not to include things like alpha renaming if they are too much work to implement, since you would generally only use such a verifier with a "finished" proof, but during development a more full-featured verifier (ideally with good error messages) can help move things along.
It is a bit weird. But in mmb names are second class citizens, shuffled off in the debugging data, so it wouldn't be able to check even if it wanted to. Plus, the primary purpose of the mmb format is to provide evidence that the mm0 file is provable, not that the mmb file is correct, so it's not necessary to make sure the mmb file is well formed beyond the requirements of delivering a proper proof. Even if the names are wrong, as long as it still proves the theorem who cares what it's called. In practice if the theorem name is different it's probably proving a different statement, so it will be caught at that point. I think the same argument applies to mmu to a lesser extent. Even though mmu files use text to express the proof, they are still only a means to justify the provability of the mm0 file, so things like alpha renaming are only a performance concern.
What do you mean by this? If you shadow names in mm0 binders, you will only be able to refer to the later binding. Internally the other binding is still there, and in MMB everything is numbered so you can refer to whatever you want. Not having to worry about name clashes just makes a lot of things simpler. |
Yeah, I get that in mmb strings are second class citizen as indices instead are unsed, which makes string equality trivial. in mm0, say that ou have you cannot share the wff this way so, users of mm0 files might wrongly think that they can share vairaibles declarations (like you do sometimes at the start of a human proof) but they can't and the name sharing capacities of the mm0 binders are then of very limited usefulnes (they just save some typing...) When they use math stuff, I wouldn't be surprised if humans somewhat used (in their heads) NAMED arguments mm0 files encode BOTH things (the concept AND a Stree representation), And it enforces 1 single way to write trees (which is a good thing) you could have used the same syntax to encode binders that the way programmers do it instead of You could have used It is quite surprising that you did not (no complain here though) Of course, it is important to be able to use a partially written mm0/mmu file. But a proof checker should just always reject unfinished files (ideally with a great reports that says what theorems where proved and why other were rejected) |
Well you can reorder the binders, but it is a different term and all subsequent uses of the term will have to use that order of arguments. It's still basically equivalent, so you might just be able to use it that way, unless the mm0 file has it the other way for some reason.
Note that notations can reorder arguments. You can define
even though the arguments come in a different order in the notation than in the term itself.
MM0's syntax is based on functional programming languages, which is why it uses space for application and a binder list syntax similar to that used in Lean, Coq, or Agda. If you are coming from a C-derivative language like Kotlin this will be slightly unfamiliar. |
This is very interesting and will help me in the future. Thanks to your explanations, the proofChecker I intend to contribute passes string, hello and set.mm This is my fault, I suspect my dynamic Parser to be slightly incorrect. I'll delve into the mmu documentation once more to make things right and update stuff with the latest changes (my old code only used both-delimiters) and once everything looks fine, I'll pr stuff. |
Is any of your code public? You should just create a new repo, like https://github.com/ammkrn/second_opinion , and I can make suggestions if I see anything I can help with. If you wait until you are done and PR your whole project that will be way too much to properly review. |
Ok, I'll do that. :) |
I did it there. This is the stuff I'm going to contribute (the patcher isn't there yet as I want to at least bug fix the proofChecker before that) Also, I usually code in intellij idea and I am the lone consummer of my code (except for my android apps). |
I fixed my dynamic parser for notations (it wasn't doing what the spec said it should... it is quite hard sometimes to understand what should be going on and, sometimes I grow impatient and I try to guess...with my limited understanding :/) in peano.mm0 Or, is it ok if the additional n : nat dummy is not declared in the mm0 definition ? |
The dummy is not declared in the mm0 definition because the definition itself is not provided. For "abstract definitions" like this one, you only need to check that the type signatures match. In other words, ignore the dummies and value in the mmu definition and pretend you are matching two I forget whether dummies are permitted in an abstract def, but there isn't any reason to have them unless you are trying to stress test the verifier. |
ok, then check dummies for non abstract def and do not check them for abstract def (which makes sense). |
the additional dummies do not always have the same order in mm0 and mmu def all2 (R: set) {.l1 .l2 .x .y .n: nat}: set = (def all2 ((R set ())) (set ()) |
heh. You won't want to hear this, but dummies are unordered; the dummy declarations in mmu serve only to tell you what the types of the dummy variables are. MM1 will automatically put them in alphabetical order, so it's a bit tricky to change this behavior. Probably you should just collect them and then sort them before comparing. Actually the ordering of dummies in MMU files does matter a bit, because the ordering of the dummy list argument to |
I just double checked the haskell verifier, which does mm0 + mmu verification, and it sorts the binders at the MM0 parsing stage: when it parses an MM0 declaration, it adds dummy binders into an ascii-ordered Fixed in 3218744 |
The haskell MMU verifier (surprisingly) still works, but it had a bug in it regarding the treatment of dummy binder order, brought to my attention by @Lakedaemon in #78. Defs are supposed to be matched up to reordering of dummy binders, which requires sorting both the MM0 dummies (which was already being done) and the MMU dummies (which was not). However, MMU dummy order is significant for `:unfold` applications because the dummy list is in the same order as the MMU dummy declaration. So we only do the sorting for the equality check.
Checking that the unordered aditionnal dummies match, let peano.mm0/mmu pass for me. So, I have a proofchecker (with missing pieces) that is able to pass 4 (mm0/mmu) pairs. As I have A LOT of experience with my badly designed proofcheckers, I might even do a good job at setting traps for faulty or sloppy proof checkers :) (Btw, I hope to get your proof checkers too ! Game is on ! :) ) Please tell me, how/when you want prs done, what the requirements are, how to proceed. I'm a patient man, and I have a lot of work to do so I can afford to patiently polish things further For example, error messages need to be polished and returned (instead of a boolean sometimes...) Also, please tell me your needs so that a kotlin proof-checker can be added to your pipeline making everything just a sligh bit safer for us all. :) |
Writing test cases has never been my strong suit. If you write a bunch of positive and negative test cases, I would very much like it as a PR. |
Then, let's complement each other ! I could pr a Any suggestion ? for positive tests, we could just use the many valid mm0/mmu/mmb pairs that will be produced in the future :) Those files will be exported from the kotlin library, from a single .kt file, that will also rely on classes I'll pr later This reminds me that my mm0 perser doesn't save line comments, which is not nice. I'll have to do something about it |
I would just call it Of course real world mm0/mm1 files are also a good test, and that's what I've been using in the |
ok. This sounds sensible. The source for the test creating process will be released so you'll be able to port that to mm1, if you need to I'll do my best |
It will also complement your mm0/mmu documentation. I think that I can come up with 50/100+ "interesting" tests in a few days |
Is the bizarre formatting in things like https://github.com/Lakedaemon/mm0kt/blob/main/tests/pass/matching/same%20order%20for%20binders.mm0 deliberate? I need to double check but I don't think tabs are allowed, and the EDIT: From mm0.md, the only valid whitespace characters are Except for the tests that are testing weird formatting or special characters, I think all of the tests should use proper formatting, similar to that used in |
|
Here's a fail test for the
It means that dummies can never have sort |
thanks, I'll update the test asap and also add your comments for the future readers, so that they aren't lost |
The mmu spec says that term have This prevent them from having a return type that is a dummy type like (sort-name), right ? |
Oh I see now. Yes, that's a parse error then. |
I'll format things like peano.mm0 ans sanitize spaces. I want to make an api change for test declaration on my side (I made a bad decision at some point), so this will take some time to happen though. I'll use the time to slowly mature the test declaration api, grow the tests and make them better |
Could you explain what this means, beyond just having a list of mm0/mmu files? I see some kotlin code that specifies e.g. the parsed forms of some of the tests, but is there more to it? Are you producing the test files automatically? |
I am producing tests semi-automatically because it would be a burden to maintain tests in hundred of files. I'm writting both mm0 and mmu files and categorizing them in fail/pass/parsingMMU/oarsingBoth with stuff like |
The |
I'm not sure that I understand all you say (with the AST thingie) but yes. And the test producing kotlin code should probably remain outside of your repository. |
Okay, how about opening a PR with just the test files? Check "allow edits from maintainers" and I will be able to make tweaks on the PR. |
Not yet, but soon. |
I managed to change my test creation api (thanks to the already written proofChecker code). This will speed up things quite a bit Those 50 "tests" are way not enough : there is not even things about axioms, theorems and proofs in there. And as time go by, I'll pester you with an ever growing collection of tests as I write the set.mm patcher As long as I have not pr-ed tests that caught my code, I cannot fix the buggy code (or I take the risk of writing faulty tests and not catching it), so at some point, I'll grow restless and pr some tests just to be able to fix my code ! :) |
should it fail because of the unnecessary garbage in the formula ?
If you wanted to allow people to write 1 and 2 pass proofCheckers, shouldn't you have forbidden usage before definitions ? |
|
Here's the relevant text:
So it says that verifiers are allowed to do two pass notation parsing, in which case this example is legal, but they are also allowed to do one pass notation parsing in which case it is not. It is implementation defined, so it should not be used as a pass or fail test. |
Yes, this is precisely my point. If the example you cite is allowed to exist, I'll cry ! A consequence of the two pass approach is that notations may be used before they are defined in the file` This means that when mm0/mmu/mmb becomes popular (because if you and/or I succeed, it will), And they will say that we suck because our proof checkers do not even work though theirs do :D Maybee I do not understand what you say (my brain doesn't behave sometimes). |
maybee I do not understand what you wrote. |
Also, I do not remember reading in the spec that "garbage in a formula makes the proofChecking process fail even if the dynamic parser returned a tree". If I do not presume, I would love that to be written in the spec. |
That's true. Conversely, if they want to write mm0 files that are broadly checkable, they need to follow the strictest guidelines, which means they can't use conditionally supported features. Anything that is accepted by some checkers and not others should be viewed with suspicion. But it's not always practical to require verifiers to precisely reject everything, and in this case it doesn't matter much. You should just reject such files and move on. By the way, the reason you might want the 2 pass style is if you have used a parser generator to construct the math parser. There are a lot of parser generators that expect the grammar up front and don't allow the parser to be dynamically extended. So if you first get all the notations and turn them into a BNF description and hand them to yacc or something, you will get a parser that parses all math in the file equally. Given a parsed expression it is then difficult to tell whether you have used a notation from "the future", although if you use a term from the future this is more obvious.
The expression does not parse. The dynamic parser is required to parse the entire string. There is nothing in the spec that says you can add additional text after the end, the |
I see. It buggs me less now, thanks to your explanation
Having generated a parser for Metamath, I see what you mean. They might as well use the slow, cumbersome and unmaintainable parser I generated for Metamath like 2 years ago. Because, when I had the metamath parser, I had like 100 grammar rules :/ Also, having people write strict mm0 because they hope to have universally useful files does not achieve the same results that I just do not see benefits in having this tolerance and lots of pain later But, I will abide by any decision you take But please consider history and why there is a Strict mode for html. :)
YES ! I remember reading that. Good ! There is nothing in the spec that says you can add additional text after the end, the |
Well you pretty much have to write the dynamic parser by hand, because it doesn't exactly fit most parser architectures. It's not huge but it's sometimes nice to pull a parser off the shelf instead of writing your own. Actually, metamath is broadly the same as regards dynamic parsing. You can preprocess the file to construct a CFG if you like, or you can build the parser dynamically. In metamath you definitely can't use a syntax before the syntax axiom is introduced, so you have the same issues (although the term constructor and the syntax are the same so there isn't any danger of accidentally using notations from the future without also using term constructors from the future).
Yes, but if the definition of "strict mm0" is itself difficult to check then that can mean a meaningless performance penalty. For example, there are a few things in the MMB files that are not checked for validity, because you can still verify the proof even if the data in the file is slightly off. The verifier is not supposed to be a validator, odd as that may sound. It's supposed to verify the well formedness and the truth of the theorems in the file. If the theorems are stated in some unusual way that the verifier is nevertheless able to make sense of, that should still be okay. The most important thing that should be in fail tests are false proofs. If the proof doesn't follow the rules, then it needs to be rejected. That would be the negation of "loose mm0". But there is a gap between loose mm0 and strict mm0 in order to give some flexibility for verifiers to implement things as conveniently and efficiently as possible. There will be "strict mode mm0" validators, but I do not want to mandate that all conforming mm0 verifiers are such. By the way, another example of a gap I'm considering is alternative whitespace and unicode identifiers. Sometimes, for implementation simplicity, you want to use an |
Wow. It is possible to build a dynamic parser with metamath ? (you say it, so it must be true, wow) yet a metamath dynamic parser with set.mm (yuckkkk, it gives me goose bumps.... ) Also, thanks for the insights (you are quite the theorist...I'm amazed at the depth of thoughts you put in your creation) I was considering the unicode identifiers also (for operators, sum, arrows....). It would make reading textual mm0 slightly cuter but it would open a whole new can of worms. And degrade performance. Also, graphical rendering is going to be donne by TeX/MathJax on my side, so unicode wouldn't bring anything to the table (ans unicode is hard to input). Unicode makes sense for human languages though. Yet, there is still the possibility to map unicode ids to ascii ones, so it may be done by alother layer on top of an ascii- mm0 powered engine. So, for now, I'm sticking with good old ascii :/ false proofs... I'm not sure if I'll be good at building that. |
The "tests I'm writing" right now look more like Unit tests, making sure the developpers make their job implementing things on top of your specs. But if those tests ensure that verifyers are implemented correctly, maybe the strength of your formalism will ensure that it is not possible to write false proof (I'm naive and an optimist, maybe). |
In theory, "code coverage tests" should help here. That is, every time the program has to check something and fail otherwise, it should be possible to construct an input that hits that check. There are 88 uses of |
I think that I got the "false proof" stuff. false proof test = design a test code, so that if a proof checker doesn't respect an aspect of the mm0/mmu spec, then it is possible to prove something false Basically, such a test would prove that your requirements are necessary (on the theorical side). Writing tests makes me look again/harder at the different specs, which is a very good thing for my software (making it go from the "somehow working" to the "mostly working" state) I discovered that mmu can have line comments ! I'll be implementing line comment support in my mmu parser and next pr-ing my first tests I also sanitized the names of the test to allow easy navigation like The folders can be used to test different parts of the mm0/mmu toolchain (mm0 parser, mmu parser or proofchecker at different stage : matching, registering, dynamic parsing, proof checking) |
Yes that's right. You don't have to create an actual "exploit"; exploit tests are generally best written against a specific verifier with a bug in it, to demonstrate that the bug is in fact a soundness hole. For general testing it's simpler to just check all the primitives that can potentially be used in an exploit for general robustness. It's not perfect, but that's just a limitation of testing. By the way it's also okay to have multiple tests in a single file. Basic parsing tests should be short and focused but for more high level tests like secondary parsing or binder order stuff I think it's fine if you have 4 to 10 individual tests in one file. The whole file fails as one so you probably don't want to put too much in the file, but I think it's organizationally easier not to have thousands of (pairs of) two line files. |
Hello,
In the peano examples. Is it normal that term al
has type wff () in mmu :
(term al ((x nat) (ph wff (x))) (wff ()))
and type wff (x) in mm0 :
term al {x: nat} (p: wff x): wff;
Is a proofchecker supposed to let that go or not ?
and the variable names are also different p != ph, is it supposed to let that go ?
In which case, how does the proofchecker decide that a mmu directive corresponds to a mm0 statement ?
I get that the mmu file is the source of truth and that binders are actually used with respects to the order they have in the mmu file and that the mm0 file is just supposed to reflect how variables are used in the formula but present it to the human reader in a eye-pleasing manner (for example (x y z : nat) (ph:wff) instead of (x:nat) (ph : wff) (z:nat) (y:nat)
and that order is not that important in the mm0 file.
All that because, the computer will do the proof building dirty work behind the scene and it knows what it needs to do it.
But shouldn't the names be the same ?
(for terms, that might not be that important, maybe that is a tolerance for them)
The text was updated successfully, but these errors were encountered: