Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for an update of the recommended style in programming Coq. #267

Closed
wants to merge 1 commit into from

Conversation

herbelin
Copy link
Member

In the late 90s, Jean-Christophe Filliâtre wrote a small document with writing guidelines. This is a proposal for an update of this document covering:

  • syntactic layout of ML code (spacing, parenthesizing, ...)
  • methodology (e.g. no "try ... with _ -> ..."
  • naming policy (e.g. use "env" for environments)
  • common pitfalls in writing OCaml code

@maximedenes
Copy link
Member

This is an excellent idea, for instance for engineers joining the development.

One thing that comes to my mind: the document talks a bit about match on one line line. Maybe that should be discouraged, because it is not robust to insertion of new constructors. That's a minor point anyway.

I believe one of the motivation for avoiding OCaml's polymorphic equality is performance. Sometimes, fixing the types of arguments makes the compiler use more efficient equality functions. Typically, let f x y = Int.equal x y will be much faster than let f x y = x = y (of course, it's an over-simplified example). Another argument is that polymorphic equality breaks parametricity.

@herbelin
Copy link
Member Author

I don't buy your argument about discouraging match on one line. If ever one eventually needs an extra constructor, one can just reindent the code, the same way as e.g. "let in" line which becomes too long would be split over several lines. But maybe you have a specific example in mind which would motivate your view.

Equality: I did not know. If I understand correctly, this means that OCaml does not implement some form of "type class" selection of the appropriate equality when the type is known? [I thought it did it for float, already?] In any case, constraining oneself to choose the equality by hand is somehow a burden, and you have to know that you have to do it, but I guess the issue is well-known from OCaml developers.

Btw, I added a file api.txt to start listing some recommendations on using correctly the API.

welcome if it can make comments more readable (then
"toggle-enable-multibyte-characters" can help when using the
debugger in emacs)
- all of initial "open File", or of small-scope File.(...), or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is that the typical litany of "open" at the beginning of the file is bad in OCaml, since the order is relevant. So one shoult limit that to what replaces/augments the standard prelude (cUtill).

What I found to be reasonable in Matita is to start with

module R = WhateverReduction
module TC = TypeSomethingChecker

... R.whd ... TC.typeof ...

i.e. shorten names. The eventually open these modules locally. Such practice serves a documentation purpose (in the "header" you see the dependencies) and you can still get compact code (repeating long module names).

Such practice can only be applied if the number of modules one has to use is limited. Coq has too many little modules, so an API refactoring is somewhat needed to make my proposal applicable.

@gares
Copy link
Member

gares commented Aug 28, 2016

I think it is a very good initiative. I've a simple remark that I attached to the patch. I have a more general one on the "multiple choices" on naming. Pick "evd" or "sigma" but then always use that everywhere. A newcomer seeing two different names thinks they are different things!
Another little point I could not find in this document, but I must have seen somewhere in doc/ or devel/. Ending a function with ;; was discouraged. I find that it helps quite a lot the OCaml parser to signal, say, a missing ")", close the opening one. There are many cases in which an "unbalanced" let can just eat up the rest of your file and you get the error at the last line. ;; sort of prevent this.

To me such document has to go in tandem with an equivalent document on files (file names, files size, file place).
I don't have many ideas on the subject, but a few remarks on the status quo.

Well, file names are non uniform. Given That OCaml forces you to start with a capital, we should probably go toward CamlCaseModuleNames.

We have too many little files and this complicates navigating the sources. What I found to work reasonably well is to use modules inside files to separate related but different functions. E.g. what Matej did for the context/Rel/Named.

Also "proxy files", ie mapping almost 1:1 two APIs are quite annoying. The can be a valuable transition strategy, in the spirit of a CompatV8X module, but they should not be part of the current sources.

@herbelin
Copy link
Member Author

herbelin commented Aug 28, 2016

About ";;": I fully agree, this is the best way to find the origin of a syntax error of the "unbalanced" style. In term of style, Coq is uniform here and I don't feel a change of trend coming, so I would recommend keeping things as they are, reminding however that the ";;" trick is indeed useful for debugging syntax errors.

About file names, the trend is apparently changing (ls */*.ml | egrep '[A-Z]' | wc returns 0 in 8.4 and 30 in 8.6). As far as I'm concerned, I don't care about the exact rule but I care about the uniformity (and about the maintenance issues for external users). I'm curious about the motivation of those who added filenames with a capital in the same, and in particular what eventually led them to prefer using a capital over preferring uniformity of style.

About the Matita convention, I know it from Claudio who used it extensively in the xml plugin, but I must confess that it was for me more a source of noise in reading his code than an help. To me, explicit qualification is useful when the name (or 1 or 2-letter code of the name) of the file gives information about what the function does (List.map style), but otherwise heavy. Anyway, there is no accounting for tastes, so this is why I would recommend to rather move the focus on finding a good balance between uniformity of style, some reasonable stability over time, as well as developers feeling globally comfortable with the conventions adopted.

@ppedrot
Copy link
Member

ppedrot commented Aug 28, 2016

Equality: I did not know. If I understand correctly, this means that OCaml does not implement some form of "type class" selection of the appropriate equality when the type is known? [I thought it did it for float, already?] In any case, constraining oneself to choose the equality by hand is somehow a burden, and you have to know that you have to do it, but I guess the issue is well-known from OCaml developers.

It does indeed for base types such as integers or constant constructors, but it is fragile in the sense that it is easy to loose monomorphisation when reordering code (e.g. by inadvertently creating a let-binding that is polymorphic). Furthermore, as Maxime said, it breaks parametricity. People writing code such as

if e = Map.empty then foo

instead of

if Map.is_empty e then foo

probably deserve to die in a horrible well-deserved suffering (and I did see such instances in the Coq codebase). Structural equality should never be used on an abstract type because it may not be the relevant one, and nothing in the type-checker makes this robust to code change. So until we get modular implicits, I think we should discourage the use of Pervasives.(=).

@herbelin
Copy link
Member Author

Thanks for the explanation, I made an update.

@gares
Copy link
Member

gares commented Aug 29, 2016

Would

if Map.(e = empty) then

be supported?

I perfectly know the reasons why = is bad in OCaml, but I find the infix syntax way better.
The Map.() scope could encompass a infix = that does the right thing...

@gares
Copy link
Member

gares commented Aug 29, 2016

The real question is, to me, what would be easier to port to whatever form of correct = OCaml will provide. To me, Map.(x = y) would require just the erasure of Map, and is both readable today and will stay readable later.

@ppedrot
Copy link
Member

ppedrot commented Aug 29, 2016

We could do this indeed. Note that I believe it is still better to use Map.is_empty than equality on the underlying empty set, because it is semantically more precise, and (in general) algorithmically more efficient.

As of today, there is still a problem though, which is that it is easy to shoot yourself in the foot because Pervasives.(=) is in scope by default and has the right type, so the compiler won't help whenever you're doing it wrong...

@herbelin
Copy link
Member Author

@gares: I don't know the opinion of Matthieu and others, but if you are ready to implement a uniform naming scheme of ML files, together with a sed script to allow an easy migration, this would probably be a good way not only to rely on a good basis for further extensions, but also an easier path for explaining the code.

@herbelin herbelin force-pushed the v8.6+style-guidelines branch 2 times, most recently from 8087682 to 8c3272c Compare September 1, 2016 08:31
@ejgallego
Copy link
Member

ejgallego commented Sep 8, 2016

A minor comment I have is how does the recommended coding style interact with popular Ocaml development tools such as ocp-indent or tuareg-mode. IMHO Coq should either match the default style of such tools or provide instructions how to configure them.

Another issue on style in the current codebase is the presence of tabs. IMO tab use should be deprecated and even cleaned up.

Starting listing some recommendations in using the API.
@herbelin
Copy link
Member Author

herbelin commented Sep 9, 2016

@ejgallego: when Jean-Christophe started Coq V7, he was using the default configuration of tuareg with the only modification "(setq tuareg-in-indent 2)". If my memories are correct, the default configuration of tuareg evolved, and, as a consequences, new indentation styles arrived in the code! The document lists a few settings of tuareg which were relevant in 2008. I don't know if it they are still relevant for current tuareg. If you could provide an up-to-date list of settings to apply to tuareg or ocp, I guess that that would be useful.

About TAB, I guess they were inserted automatically by tuareg. Your point is that they may cause trouble with diff tools, or with early versions of merlin, etc. is that correct? Personally, I don't care if someone wants to replace all TABS with sequences of spaces, as we removed all spaces preceding newlines at some time a few years ago. Additionally, we should also configurate our tools to avoid they come back inadvertently.

@maximedenes
Copy link
Member

maximedenes commented Sep 20, 2016

I think a major point would be to write self documenting code. For instance, by using records instead of tuples. No function should take an argument of type bool * bool * bool. Likewise, arguments of type int or bool should carry a label. Avoid e.g. a function of type int -> int -> constr -> constr.

In general, the type of functions should give a rough idea of the flow of data manipulated by the function. The other day, while looking for a bug, I saw for the first time:

set_var_scope : Loc.t -> Names.Id.Map.key -> bool -> intern_env -> (bool ref *
(Notation_term.tmp_scope_name option * Notation_term.tmp_scope_name list) option ref * Notation_term.notation_var_internalization_type) Names.Id.Map.t -> unit

A map that contains refs that contain a ref on a boolean, a ref on an option of an option of something and a list of the same thing. I can't get even a vague idea of what this is here for. In practice, I resigned and moved on to the next bug.

@maximedenes
Copy link
Member

Maybe I should mention that I'm currently preparing a PR that turns some tuples into records, so that we don't do the work twice.

@ejgallego
Copy link
Member

@herbelin In my limited experience the default tuareg+merlin+ocp setup seems fairly close to the current style, and won't insert tabs anymore. Given history, it could make sense indeed to recommend tuareg+merlin+ocp defaults then.

@maximedenes
Copy link
Member

Also, generic union types (Interface.union, CSig.union and Util.union) should generally be avoided because they don't document what the alternatives mean.

@herbelin
Copy link
Member Author

@maximedenes, @ejgallego: all this looks very reasonable methodology to me!

instantiated on a particular type (e.g. use List.mem_f,
List.assoc_f, rather than List.mem, List.assoc, etc, unless it is
absolutely clear that "=" will implement the intended equality, and
with the right complexity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, using Pervasives.(=) is a good way to express the intent to use polymorphic = after careful consideration. (Another option would be for Coq to export it under another name.) It is also robust to the technique of using let (=) = () at the beginning of a file to make sure to find all potentially-dangerous uses of polymorphic equality, turned into compilation failures -- I think that @ppedrot used it in the past.

- Common OCaml pitfalls
- in "match ... with Case1 -> try ... with ... -> ... | Case2 -> ...", or in
"match ... with Case1 -> match ... with SubCase -> ... | Case2 -> ...", or in
parentheses are needed around the "try" and the inner "match"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: using begin .. end instead of parentheses for nested matches is a good way to have a style close to Coq's grammar (which is superior to OCaml's in this case), especially if begin match are kept together.

begin match ...
| ...
end

@gasche
Copy link
Contributor

gasche commented Sep 26, 2016

ocp-indent is a well-designed tool for OCaml indentation, that I would trust more than Tuareg in the long run. It also has the advantage of being independent from a specific editor, which would make it much better for eg. Vim users (or Kate or whatever). ocp-indent configuration can be distributed as file for people to use, and many Emacs users start to rely on it instead of Tuareg for indentation, so the change could be made without loss of convenience.

(Whether or not to update the existing codebase to harmonize the indentation style is another choice.)

@Zimmi48
Copy link
Member

Zimmi48 commented Sep 26, 2016

Just my 2 cents: In the Elm community, people are going crazy about a tool called elm-format, which enforces very strict formatting guidelines. It does not leave much choice to the developer and is applied at file saving so what people really like about it is: 1) not having to even think about formatting; 2) clean diffs. I don't know much the various tools that exist for OCaml so I have no idea how much coding style could be automatically enforced this way but that could be something to aim at.

@mattam82
Copy link
Member

@gashe that sounds great! I guess if we set some rules we should start by implementing them on the existing codebase ourselves no?

@gasche
Copy link
Contributor

gasche commented Sep 28, 2016

So I just tried to see whether I could come up with a set of ocp-indent parameter that would match some of Coq's source files, but I have not been able to find a file that is consistently indented. So I think you (the devs) should just decide on whatever style you prefer, and you could apply it going forward -- or do a Great Reindent after the release.

For reference, the various knobs provided by ocp-indent are explained in its own configuration file:

https://github.com/OCamlPro/ocp-indent/blob/master/.ocp-indent

(We could easily implement more if required.)

@ejgallego
Copy link
Member

Thats very useful, thanks @gasche . Indeed there seems to be many different indentation styles on the codebase. IMVHO, I would agree on a common ocp-indent set of rules and suggest a soft policy on indentation. By the way, this will be discussed in today's Coq WG (sorry for the late notice) see coq-dev for details.

@maximedenes
Copy link
Member

This PR seems to have unrelated commits. @herbelin can you filter them out? Could we merge it? We can still add more info on the recommended style later.

@maximedenes maximedenes added the needs: fixing The proposed code change is broken. label Mar 14, 2017
@herbelin herbelin changed the base branch from trunk to v8.6 March 14, 2017 21:10
@herbelin
Copy link
Member Author

Rebase done wrt v8.6, but cherry-picking or rebasing to trunk should be immediate.

Note that there were comments on using ocp, but I don't feel able to write on that so I would prefer someone else to do something about this. So, as far as I'm concerned, I consider I'm done about what I could do.

@maximedenes
Copy link
Member

Merged in trunk, thanks a lot!

@psteckler
Copy link
Contributor

Post-merge comment, maybe it can be added in another PR. This came up in some code awhile ago, IIRC. Would a good style rule be, don't rebind the same identifier, that is, don't write code like:

let x = something in
let x = something_else in
...

because if you're looking at x further down, it may not be clear which binding was used, especially if there's a lot of code between the two bindings.

@maximedenes
Copy link
Member

maximedenes commented Mar 15, 2017

@psteckler doesn't your IDE tell you that?

In fact I think it is often a very good practice to do it, when the two x are related and you want to make sure only the last copy is used afterwards. Typically:

let evd = .... in
let evd = .... in
...

I would even strongly suggest to never use the other style in these cases:

let evd = .... in
let evd' = .... in
...
let evd'''''' = .... in
...
`` 

@ejgallego
Copy link
Member

Indeed it is sometimes useful to alias bindings, however, I am not sure if this is a good idea at the module toplevel, which the codebase does quite often. I find it confusing when reading code to have to versions of function f being called (in fact I usually need to use the IDE to know which one is used).

@psteckler
Copy link
Contributor

Sure, Merlin tells me via C-c C-l where a binding is, but I think it's better to use distinct names to reflect the change of "state", to make it obvious. You can use numbers instead of 9 primes. :-)

Also, if you have several rebindings, and you remove one in the middle, using distinct names will generate an error, which you might miss otherwise.

@Zimmi48 Zimmi48 added this to Done in User documentation Jun 13, 2017
@Zimmi48 Zimmi48 added kind: documentation Additions or improvement to documentation. and removed needs: fixing The proposed code change is broken. labels Jul 17, 2017
@Zimmi48 Zimmi48 removed this from Done in User documentation Oct 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: documentation Additions or improvement to documentation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants