Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapel standard module style guide (collecting input) #6698

Open
mppf opened this issue Jul 14, 2017 · 38 comments
Open

Chapel standard module style guide (collecting input) #6698

mppf opened this issue Jul 14, 2017 · 38 comments

Comments

@mppf
Copy link
Member

mppf commented Jul 14, 2017

This issue is meant to collect:

  • suggestions for style to follow in the Chapel standard and internal modules, especially in user-facing APIs
  • known API names that "stick out" for some reason or other
  • other known API issues

Having a style guide and adjusting the user-facing Chapel APIs to adhere to it addresses a language stability concern. There are two reasons for that:

  1. The style guide will change rarely, once it exists. Or at least rarely go back on a decision in it.
  2. APIs that do not match the style guide will need to change before we could claim language stability.

The style guide would simply represent some best practices for API design in the context of Chapel modules. Of course there will be situations in which there is a good reason to make an exception to it.

Please add any requests for the style guide or API issues that "stick out" in the comments on this issue.

@mppf
Copy link
Member Author

mppf commented Jul 14, 2017

Michael's input for the style guide

Single Namespace

Style guide should note many of these decisions are there to aid developers in working with the fact that Chapel is single-namespace language.

CamelCase

Generally, prefer CamelCase or camelCase to separating_with_underscores.

Style of Identifier Names

  • Modules should be CamelCase, starting with an uppercase letter
  • Types should be CamelCase or lowercase. User-defined types should generally start with an uppercase letter and be CamelCase (e.g. RandomStream,Timer). Internal types can be lowercase (e.g. int, domain,bigint)
    ** ? Alternative option? Value types start with lower case, class types start with uppercase?
  • Types should be CamelCase for by-reference types (classes) and camelCase or camelcase for by-value types (records).
  • Methods should be camelCase and start with a lowercase letter
  • Variables should be camelCase or CamelCase.
    ** Q: Should constant variables or params have a different rule?

Method Names and Part-of-Speech

One of the nice things about a well-designed method names is that it's easy to mentally translate from a method call into a sentence. For example, myNode.accept(someThing) can be translated into "myNode, please accept someThing". In order for this to work, method names that represent an action should generally be a verb.

Field names can be nouns. Some methods are sortof accessors or similar to them. In that case I don't have a problem with using a noun. I think that student.height() fits into that category.

In the case of replicand, I don't have a problem with A.replicand because it's arguably accessing a property of A. What is some evidence that it's a "property" rather than an "action"?

  • it's accessing something fundamentally stored in the data structure (A replicated array might be viewed as consisting of replicands...)
  • It returns something that makes sense as the LHS of an assignment statement

In other words, if I was writing complete English sentences in my programming, A.replicand would be a noun phrase that I would presumably then want to do something else with using a verb. A.replicand.split() for an arbitrary example, has reasonable code-to-English properties.

Module Naming vs Type Naming

When creating a module that mainly exists to define a single type, how should one choose the type name and the module name?

I think we should use the convention of making the module name a more specific/spelled out version. Examples:

  • class Block is in module BlockDist
  • class BigInt can be in module BigInteger
  • class Barrier can be in module TaskBarrier

I think that this is a useful and reasonable convention because the module name seems to me to be more likely to be visible as introductory material:

  • at the top of a file, in a 'use' statement
  • in the module documentation pages

And, at the same time, I would expect the module name to appear fewer times in user code than the class name. Thus it is less troublesome for the module name to be longer.

(It is true that some users might opt to write something like BlockDist.Block everywhere; but such users are opting in to verbosity and there is an obvious way to reduce it).

@mppf
Copy link
Member Author

mppf commented Jul 14, 2017

Michael's list of things that "stick out"

  • domain.member should be named something verb-y like domain.contains
  • array vector operations like push_back contain an underscore but should be camelCase
  • push_back is available on List but is not camelCase

@ben-albrecht
Copy link
Member

Some patterns that might benefit from a stylistic recommendation

  • Should methods on an object vs. function taking an object follow a general pattern?
sort(A);
// vs.
A.sorted();
  • Should parentheses vs parentheses-less functions & methods follow a general pattern?
A.dims()
// vs.
A.shape

@lydia-duncan
Copy link
Member

I think the method versus function question falls under general programming practices (regardless of language with objects).

My recommendation for parentheses vs. parentheses-less methods is:

  • is it simple?
  • is it practically a field, but you don't want to store the value?

@nspark
Copy link
Contributor

nspark commented Jul 20, 2017

I have sometimes wanted consistency between using primary or secondary methods on classes (or records, or unions). For example,

// bar is a primary method
record Quux {
  ...
  proc foo(...) {...}
}

vs

// bar is a secondary method
record Quux { ... }
proc Quux.bar(...) {...}

Stylistically, my preference is for:

  • Primary methods for methods on a class defined in the same module; e.g., ZMQ and Socket.close
  • Secondary methods for methods on a class defined in another module; e.g., Regexp and string.search

However, secondary methods on classes defined in the same module are common; e.g., IO and file.check.

@buddha314
Copy link

In Java and Python, there is disambiguation on the package level, so scipy.spatial.distance.cosine differentiates from sklearn.metrics.pairwise.cosine_similarity at the distance vs pairwise level. How would that play in Chapel?

@LouisJenkinsCS
Copy link
Member

LouisJenkinsCS commented Jun 14, 2018

@mppf

For "Single Namespace", does that mean no nested modules under a parent module? Whether or not this is a 'single' namespace or not depends on how you define it. For example...

use parentModule.childModule;

Edit:

The intention is to create something like this

@mppf
Copy link
Member Author

mppf commented Jun 14, 2018

@LouisJenkinsCS - single namespace really means that attention should be paid to the names of top-level modules as they need to be reasonably unique. So in your example, literally parentModule is probably not a reasonable top-level module name, and neither is Test or Project but something more project-name-y like Hypergraphs or SuperIO is good. Nested modules don't have this issue and are reasonable to use and can have less unique names.

@ben-albrecht
Copy link
Member

known API names that "stick out" for some reason or other

I'd expect channel.isclosed() to be channel.isClosed()

@ben-albrecht
Copy link
Member

ben-albrecht commented Mar 11, 2019

known API names that "stick out" for some reason or other

A few others:

Many modules that exist as a thin wrapper around some C library will use naming schemes from the original C library, e.g. FFTW or Curl. It seems reasonable to leave these as is.

@bradcray
Copy link
Member

single namespace really means that attention should be paid to the names of top-level modules as they need to be reasonably unique.

By this definition, what's an example of a popular non-single-namespace language? I worry (as I think Louis did above) whenever someone uses this term out loud that what everyone hears (myself included) is that all Chapel identifiers are thrown into a single namespace with no hierarchy, scoping, etc. What I think the term has traditionally been intended to be used for w.r.t. Chapel is to say that we don't treat identifiers of one kind differently than identifiers of another kind (i.e., there isn't a collection of function names distinct from the collection of variable names).

@mppf
Copy link
Member Author

mppf commented May 17, 2019

By this definition, what's an example of a popular non-single-namespace language?

Certain interpreted languages manage "imports" by file path. In these languages, you can have multiple "top level modules" with the same name as long as you point to one or the other with the import. I'm not sure I have a great example at the ready but one example is that in Python you can manipulate the Python path just before (and also maybe after) doing an import.

Other examples (for interpreted languages) of reading code from a file:

  • source in bash
  • load in Ruby

What I think the term has traditionally been intended to be used for w.r.t. Chapel is to say that we don't treat identifiers of one kind differently than identifiers of another kind (i.e., there isn't a collection of function names distinct from the collection of variable names).

I agree that this is the proper definition of "single namespace". Originally, I used the term because we need a different identifier for a module name, a class, and a function. Anyway, this is a good clarification in this discussion.

@mppf
Copy link
Member Author

mppf commented May 23, 2019

Regarding the names of types:

Earlier I said this:

Types should be CamelCase or lowercase. User-defined types should generally start with an uppercase letter and be CamelCase (e.g. RandomStream,Timer). Internal types can be lowercase (e.g. int, domain,bigint)
** ? Alternative option? Value types start with lower case, class types start with uppercase?

@bradcray responded in #13030 (comment) :

In #6698, I think Michael suggested that a potential deciding point might be that value-based data types could be camelCase and reference-/class-based data types could be PascalCase, and I think that is an intriguing notion that has resonated pretty well with me over the past 12 hours or so.

I find the "User-defined" vs "Internal" types to be rather confusing (RandomStream and Timer come with the compiler, so aren't they thus "internal"?). I would prefer a rule that would allow Chapel programmers to emulate the style they see in the standard library, since that's probably what they will do in any case. So, if we choose "internal types are lower case" then I'd prefer that to apply only to a small number language-specified types that the compiler knows about (e.g. int, domain). No matter what, if we choose the "internal types are lower case" naming convention, we will need to be clearer about what counts as an "internal type".

I'd also consider having a rule that all types are UpperCamelCase, including Int, but this would involve changing a lot of code. But at least there wouldn't be a confusing distinction there.

Today we have domain, bigint, codepointIndex, string, Error, Timer, RandomStream, LinkedList, Barrier. The UpperCase ones include a mix of classes and records. And it's hard to argue that Error is not an "internal type". So it seems that breaking changes are ahead of us if we seek a consistent style.

@BryantLam
Copy link

BryantLam commented May 24, 2019

@bradc: By this definition, what's an example of a popular non-single-namespace language? ... What I think the term has traditionally been intended to be used for w.r.t. Chapel is to say that we don't treat identifiers of one kind differently than identifiers of another kind (i.e., there isn't a collection of function names distinct from the collection of variable names).

Rust actually makes a distinction between identifiers on modules and structs, etc. and identifiers on variables and functions because the former identifiers are unambiguously referenced with :: separators.

mod thing {
    pub fn doit() {
        println!("fn thing::doit()");
    }
}

fn thing() {
    println!("fn thing()");
}

// This global variable, however, does not work; `thing` is already used by `fn thing()`.
// static thing: &str = "variable";

fn main() {
    thing();
    thing::doit();
}

This code behaves as expected.

@BryantLam
Copy link

One source of (minor) confusion that I have coming from a C/C++ background is the specification of a function's arguments versus the cast operator. proc is used to denote a function definition, but C/C++ doesn't have a keyword for function.

Just a very minor confusion, but it does affect my ability to quickly scan through a piece of code.

proc doSomeIoOperation(x: string) { ... }
//                      ^------- type specification

proc main() {
    var x: string = readFromFile();
//       ^------- type specification

    doSomeIoOperation(x: string);
//                     ^------- cast operator

    proc doOtherIoOperation(x: string) { ... }
//  ^--- denotes procedure   ^------- type specification
}

Whenever I do a cast, I've taken to the habit of binding to the RHS side instead.

var a = 42;
var b = a :real;
writeln(b.type :string);

@bradcray
Copy link
Member

Building on what Michael said above and (I think?) agreeing with him:

Regarding the names of types:

  • I definitely don't like the idea of making all types InitialCaps and don't see much value in it other than "it's a simple rule". I'd be more likely to support making all types initialLower, but wouldn't consider myself a champion of that approach.
  • I agree that there's a bit of a fuzzy line between internal vs. non-internal types that makes it difficult to distinguish between them. I also do't think that it's that useful a distinction... if I create a really kick-ass type in a library that ends up becoming almost as if it's part of the language, what about it suggests that a user would want it to be syntactically distinguished?
  • After a few days thinking about it now, I continue to really like the "value types use initialLower and class types use InitialUpper" convention. It jibes with my instincts for what I'd want to call things like list and timer vs BaseArr and C for example, and provides a useful cue as to whether the type can be subclassed.

So if I ran the world, I'd say "records use initialLower and classes use InitialUpper," and then fall back to "all types use initialLower" if that wasn't palatable. I'd be curious if anyone strongly disagreed with this stance.

@gbtitus
Copy link
Member

gbtitus commented May 29, 2019

So Greg, I think you're saying you're not in favor of having a style convention for the standard modules at all, is that right?

Definitely not. My objection is specific to the proposal that value types be lowerCamelCase and class types UpperCamelCase. I am fine with, for example, a convention that type names are lowerCamelCase and module names are UpperCamelCase, because there the minor lexical difference is augmented by the syntactic cues due to usage. I would even be fine with a convention that type names were lowerCamelCase but class type names had a _class suffix, because such a large lexical difference seems reasonably proportionate to the semantic difference. Note that this latter is merely an example which I hope helps others understand my position, definitely not a proposal for consideration.

Follow-up: What if the convention for standard libraries was checked by the compiler?

👍 Yes, for sure if we have a convention we should have a checker for it.

@bradcray
Copy link
Member

because there the minor lexical difference is augmented by the syntactic cues due to usage.

This seems like splitting hairs to me. In some cases (particularly use M, there are syntactic cues, but in others, not. Is M.foo() a fully-qualified reference to a module-scope symbol (e.g., function or array) within M, a type method being called on a type named M, or a normal method being called on a variable named M?

if we have a convention we should have a checker for it.

If we have a checker for the convention, then does that eliminate the chances of willful or inadvertent surprise and put us back onto stable ground if we were to adopt a class vs. record naming convention?

@gbtitus
Copy link
Member

gbtitus commented May 29, 2019

If we [check the convention], then does that eliminate the chances of willful or inadvertent surprise ...

Sure, that sounds reasonable.

@dmk42
Copy link
Contributor

dmk42 commented May 30, 2019

Michael reminded me to kick in my 2¢.

To me, type inference makes a language easy to write but difficult to read. It takes me a long time to trace back through the code to find out what something is. Visual cues help to reduce that problem. For that reason, I prefer that classes begin with an upper case letter unless they are internal-enough types like string, and most other things begin with a lower case letter.
[Edit] Yup, I know string isn't a class.

It is probably well known that to me, camel case is an abomination.

@mppf
Copy link
Member Author

mppf commented Jun 4, 2019

It seems the idea of records/by value types use lowerCamelCase and classes/by reference types use UpperCamelCase has enough support provided that there is compiler enforcement of it within the standard/internal modules.

@BryantLam
Copy link

BryantLam commented Jun 5, 2019

It seems the idea of records/by value types use lowerCamelCase and classes/by reference types use UpperCamelCase has enough support provided that there is compiler enforcement of it within the standard/internal modules.

I'm for differentiating cases that should be differentiated, but I imagine this might be a (mild) concern when users begin to refactor codes.

  1. What's a generic type t? Is it t or T? There is no by-valueness if it can accept both classes and record.
  2. What if you wrote your code originally using by-value semantics and then later decided to turn it into classes? My own opinion is that a production library API would not rely on the record/class semantics for copies and would just implement its own clone() routine and possibly go so far as to turn the default copy operators into compiler errors. I would prefer if there was a way for library developers to always get the behavior they intend, but am concerned that it would be too inflexible to major code refactors.

As such, I think I'd advocate for types to always be UpperCamelCase. ... (Which conflicts with module naming conventions, but that's a different issue.) But this is just my opinion which could be wrong for large-scale production codes.

Some of these naming-convention distinctions would be resolved by a decent IDE.

@e-kayrakli
Copy link
Contributor

There were some discussions regarding factory functions that should be part of the style guide.

Main discussion is under #14291
There is a related discussion for the Random module in general: #14589

@ben-albrecht
Copy link
Member

ben-albrecht commented Sep 22, 2020

Should there be naming conventions for in-place vs. out-of-place operations?

The proposal to deprecate Random.permutation() (noun) for Random.permute() (verb) in #16400 made me wonder about this.

@mppf
Copy link
Member Author

mppf commented Mar 23, 2021

mppf added a commit that referenced this issue May 13, 2021
Add a draft for a standard module style guide

This PR adds a draft standard module style guide as an aid for the effort of stabilizing the standard modules.
Based in large part upon #6698.

Discussed with lots of people but final review is by @bradcray - thanks!
@mppf
Copy link
Member Author

mppf commented May 22, 2021

See also Property queries: paren-less or paren-ful? #17128

@lydia-duncan
Copy link
Member

Over on the review of #17936, Andy asked if we had a specific policy on the capitalization of initialisms and acronyms:

Should all letters in an initialism\acronym be capitalized or just the first? So if you had a snakecase function: render_html_dom(), should it be camelized as:

renderHtmlDom() or renderHTMLDOM().

Even when you don't have to initialisms back-to-back I still prefer the "only capitalize the first letter style". So renderHtmlPage() over renderHTMLPage()

@bradcray
Copy link
Member

I always wrestle with this question, though the variant I wrestle with is if I choose the renderHTML... style, I'm sometimes tempted to make the letter that follows lowercase (so, renderHTMLpage()). I realize that's not particularly defensible. But it generally makes me prefer the renderHtmlPage() style because it avoids the question.

That said, I feel like I'd want to see more cases in practice to decide whether I'd feel different in different situations (i.e., whether I think this should be a hard and fast rule for standard modules, or whether we'd want to let authors choose based on taste).

@mppf
Copy link
Member Author

mppf commented Jun 23, 2021

One way would be for the acronym to be uppercase unless another word follows it; so renderHTML but renderHtmlDom. In any case, I wouldn't want to see renderHTMLDOM.

@lydia-duncan
Copy link
Member

Over in #19012 (comment) Damian brought up a potential rule for operator-like names - that they be treated as a single word for the purpose of camelCasing. So for instance, divceil or addmul as opposed to divCeil or addMul. Thoughts?

@lydia-duncan
Copy link
Member

Over on #14291, we solidified our discussion on factory functions to the point where it can be included:

  • Factory functions should use create as their prefix
  • Generally, they have been standalone functions. We are leaving the possibility of type methods open, but don't have many instances of that being the right choice in practice
  • There are some factory functions that are allowed to differ. open and spawn are the main examples. We do not view this as contradicting the general policy and may accept new exceptions on a case-by-case basis.

@bradcray
Copy link
Member

of that being the right choice in practice

I might phrase this instead as "being sufficiently better/clearer in practice to garner a lot of support." That said, both David's seem to be more on the "prefer type method" side of the equation, so we may want to make sure we're not squashing voices by having louder ones assert something that a silent majority doesn't agree with.

There are some factory functions that are allowed to differ. open and spawn are the main examples

I wouldn't even call these factory functions. I'd describe this caset as "Some routines exist to do things and the result of doing that thing happens to create a new object." For example, I wouldn't say that numBits(type t) is a factory function for ints or that gethostname() is a factory function for strings, they're just routines that happens to generate an int or string as their result. So similarly, I don't think of open as being a factory function for files so much as a way to request that a file on the FS be opened, which gives you back a file object representing the result (where I'm stealing that idea from Michael who expressed it the last time we discussed this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests