Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3583. Loosen some rule restrictions of checkstyle #921

Merged
merged 3 commits into from Dec 10, 2023

Conversation

maobaolong
Copy link
Member

What changes were proposed in this pull request?

Change config file of checkstyle only, loosen some rule restrictions of checkstyle.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3583

How was this patch tested?

No need

@sodonnel
Copy link
Contributor

I am in favour of this change, but for a change like this, I feel we need a few more +1's and also wait to see if there are any objections. We will try to discuss it on the next community sync call, which is on Monday.

@maobaolong
Copy link
Member Author

@sodonnel Yeah it is necessary

@maobaolong
Copy link
Member Author

@sodonnel How about the result of the +1's int the community sync call?

@elek
Copy link
Member

elek commented May 19, 2020

@sodonnel How about the result of the +1's int the community sync call?

My fault, but it was not discussed during the call. We were out of time. Let me ping the mailing list.

@elek
Copy link
Member

elek commented May 19, 2020

BTW, I am +1, but let's wait a little for more feedback...

@adoroszlai
Copy link
Contributor

+1

@timmylicheng
Copy link
Contributor

+1. Thanks @maobaolong for taking action.

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maobaolong Thanks for working on this! The changes look good to me. +1.

Copy link
Member

@jiwq jiwq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@vivekratnavel
Copy link
Contributor

@maobaolong Thanks for working on this! +1 LGTM

@xiaoyuyao
Copy link
Contributor

Change Line length limit from 80 -> 120 is overall positive change for me, only once in a while I hit it with modern IDEs. A few things I can think of are multiple-window side-by-side will need to be adjusted
e.g.,

  • Three-way merging on IntelliJ with longer lines,
  • .md file editing with rendering window

@vivekratnavel
Copy link
Contributor

Should we wait for more approvals or is it good to be merged?

@elek
Copy link
Member

elek commented Jun 3, 2020

There is a related discussion on the mailing lists.

@fapifta had some concerns, but he said +0 and not -1. (He has different preference, but ready to accept other approach).

As any code change can be vetoed by any committers, we need clarification from @anuengineer.

@anuengineer can you accept this patch (+0) or it should be cancelled? (-1)

@anuengineer
Copy link
Contributor

I think it should be cancelled. Thanks for checking.

@nandakumar131 nandakumar131 added the build Pull request that modifies the build process label Jun 4, 2020
@maobaolong
Copy link
Member Author

There are still lots of contributor around me complain with the 80 character per line limitation.

So a month past, a vehemently discussion has taken place in the mail-list, i think everyone have many arguments, but everyone try to persuade other, but nobody can persuade other.

But we still need a decision, and a reason why, both do nothing and do this change. can be a choice. I think we don't make decision means choose keep 80, it isn't a reasonable result for people who want to change. Now, i found there are 3 choices from the discussion.

  1. keep 80
  2. change to 120
  3. change to 100

It seems we have no better idea than vote, and everyone is equal, i don't think it's a good idea of giving more weight of people who did special contribution for Ozone. For example, founder, member, committer, contributor. In fact, it is too complex, the simple way is everyone is equal, at least voter should be a contributor, majority can be accepted.

Finally, my choice is change to 120, or at least 100, so, i want to vote to 2, and 3. I don't know why if we have to keep consistent with the first great people who write the first line of one repository, in fact, i saw a repository was changed from one language into another language, i also saw a repository was changed from gradle to maven, that is what i want to say, keep change, change for great.

I write this reply here because i'm not sure all contributors in the ozone-dev mail list, but i believe all the contributor can see this page, we can send this link to mail-list, slack channel or any others to let people vote for this decision. If someone have a better idea than vote, i'm very glad to see any other idea about how to finish this proposal other than vote.

@sodonnel
Copy link
Contributor

There are still lots of contributor around me complain with the 80 character per line limitation.

So a month past, a vehemently discussion has taken place in the mail-list, i think everyone have many arguments, but everyone try to persuade other, but nobody can persuade other.

This is the problem with a change like this. Its a debate where there is no "correct" answer. Some like it one way and some like it another. However that is not a reason to not change - as you said, software is always changing and in my opinion the style can change too. We can see that as the Java language has evolved over the versions, software gets refactored etc.

It seems we have no better idea than vote, and everyone is equal, i don't think it's a good idea of giving more weight of people who did special contribution for Ozone.

I believe a vote is the best way to get an idea of what the community wants. Let the people who are actively involved in the project give their opinion.

Its probably best to do this like a normal release or merge vote. Post it to the mailing list, and Slack and give it 7 days for responses. I'd be inclined to with just two options 80 or 120, but add in the 100 option too if you like.

I'd be happy with 100 or 120. Perhaps going for the middle ground (100) would be more sympathetic to the existing code style as the lines would not be too much longer.

@vivekratnavel
Copy link
Contributor

@maobaolong @sodonnel I agree with you both. I like the approach of starting a vote thread similar to a normal release or merge vote and give it 7 days for responses. It will be easy to keep track of votes if we start a vote thread in the mailing list and share the link here and in slack channel.

@xiaoyuyao
Copy link
Contributor

Thanks all for the discussion, this change can be arguable from both sides. If I understand correctly, @anuengineer indicated a valid binding veto here. According to the project bylaw, a vote will not be helpful. The best shot is to convince him to withdraw, otherwise let us cancel the PR and move on with other important issues.

Bylaw of the project https://hadoop.apache.org/bylaws.html
"A valid, binding veto cannot be overruled. If a veto is cast, it must be accompanied by a valid reason explaining the reasons for the veto. The validity of a veto, if challenged, can be confirmed by anyone who has a binding vote. This does not necessarily signify agreement with the veto - merely that the veto is valid.

If you disagree with a valid veto, you must lobby the person casting the veto to withdraw their veto. If a veto is not withdrawn, any action that has been vetoed must be reversed in a timely manner."

@anuengineer
Copy link
Contributor

Veto is cast, it must be accompanied by a valid reason explaining the reasons for the veto.
I am casting a veto on this change, it is not because I am one the first authors of the project, Far from it.

My reason is simple, https://accessibility.digital.gov/visual-design/typography/ argues very clearly too keep them under 75 chars per line. I want to make sure that code base usable for people with visual disability. If you have research or authority which proves that this is not the case, I am willing to lift the veto.

In the absence of information -- that an argument like this is probably for text and not for code -- is absence of information, I don't have any data to modify current stance. So currently this veto stands. I am more than willing to listen to your arguments and respectfully think about it. In fact, my veto is based on reading and researching on this topic. Not based on the fact that I am the first person to write code for Ozone.

@elek
Copy link
Member

elek commented Jun 15, 2020

My reason is simple, https://accessibility.digital.gov/visual-design/typography/ argues very clearly too keep them under 75 chars per line.

According to this study 95 character is better (based on the text sis more convincing as the mentioned article, as it's a reproducible study).

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.503.2885&rep=rep1&type=pdf

@elek
Copy link
Member

elek commented Jun 15, 2020

Actually I think the question is slightly different (based on the conversation from the email list)

Let's say that we have an ideal line length (75 or 95 or 120, I don't know what). Do we need to enforce it in the source representation? Or we can enable for anybody to choose it based on own preference.

Using higher (or unlimited) number makes it possible to use both 120 or 80 or any other number based on the preference as the soft-wrap function of IDEs and an editors are very powerful today (tested IntelliJ)

Originally I had no preference but I convinced that using a high number is more inclusive.

@adoroszlai
Copy link
Contributor

adoroszlai commented Jun 15, 2020

  1. https://accessibility.digital.gov/ is about accessible websites (layout, navigation, etc.). It does not talk about making the source of these websites accessible.

  2. Even if we accept the specific line length recommendation as applicable to our sources, why should it apply only to Java code? Why not, most importantly, to our Markdown docs, which have much longer lines than 80 chars? What about Javascript, CSS, bash scripts, POM files, etc?

  3. What about log files and configuration? Those affect a (hopefully) much wider audience.

@sodonnel
Copy link
Contributor

https://accessibility.digital.gov/ is about accessible websites (layout, navigation, etc.). It does not talk about making the source of these websites accessible.

Exactly. These recommendations are not about reading source code, which is read in a very different way to text. The above link provides no evidence about the ease of reading source code.

With what @elek has tried related to "soft-wrap function of IDEs" I feel code with longer line lengths can still be inclusive. I am even open to removing the line length restriction entirely and handling it at review time, but I worry about the reviews getting bogged down in comments about reducing line lengths on excessively long lines.

The reason I suggested having a vote on the preference, is so we can fairly access what the community desires. If the majority of people want to extend the line length, then for the sake of the community, I feel we would be better to accept the consensuses rather than veto the decision, and I would hope Anu would take onboard the direction the community desires over his own preference.

@fapifta
Copy link
Contributor

fapifta commented Jun 16, 2020

We are currently the part of the Hadoop project, which has the 80 characters as the limit, we I think should change the rule - if we change it - after we become TLP as that is in the making. Also this seems to be a good time to write our guidelines (yes, I loved to read the Flink guidelines Marton posted on the mailing list), that talks about coding style, wrapping, line length, possible exceptions and all the things that we think is important regarding code. First it should not be large, but evolve over time as we have discussions and learning points on the go, but it would be useful to contain all the prior learning points we already have on the project.

Even though I don't want to block the community from the change, I still think we should keep 80-ish as a limit. I wrote about consistency, I wrote about typography, and I wrote about a couple other things in the mail-thread. Also I questioned why not go unlimited then based on the first arguments about inclusiveness and naming problems cited by 120 limit supporters.

Now I just would like to add some colour from a few guys seem way smarter then me...
Kevlin Henney is dealing with this topic in a talk "Seven ineffective habits of many programmers" in several conferences.
In this talk: https://vimeo.com/97329157
At 14:38 he starts talking about spacing punctuation and indentation, and also gives some reason for 80 characters.
After this section there comes the naming, and code structuring piece, and before this section he talks about the problems of verbosity and jokes around comments :)
It is on one hand really convincing for me, also gives some interesting points we can use in the conversation about coding guidelines. If you have time to view the whole, the latter part talks about encapsulation, get/set and the problems with the wide use of'em, and tests at the end.

Dan North at 20:42 in this video (https://youtu.be/4Y0tOi7QWqM?t=1242) talks about "code that fits into one's head", and mentions why contextual consistency matters, and even though he talks about the way larger picture, it is true for source code as well, also he talks about teams have to be opinionated about how they do things, and how they organize things, and that makes me think that, we need general acceptable guidelines on many visual/architectural/algorithmical things that we might not agree on now, but we would like to have consistency on.

@anuengineer
Copy link
Contributor

anuengineer commented Jun 23, 2020

I feel we would be better to accept the consensuses rather than veto the decision, and I would hope Anu would take onboard the direction the community desires over his own preference.

I have stated a logical reason. None of you seem to discuss that. So let me ask this again. Do you know how a visually challenged person operates? Have you used the tools? I have not, and I have rely on someone like Accessibility.gov. Your argument is that it is about text, which part of the source code does not seem to be text? why is that recommendation not applicable? None of you are willing to produce information about why your preference is better, other than it is "my preference".

I have repeatedly said in this conversation that if you are willing to produce data about accessible software use, I am willing to listen. If anything, I think this is a group of people trying to force the new preference without anything other than "ad hominem" attacks and innuendo.

Let us get the fact straight. The https://accessibility.digital.gov/ talks about recommendations to accommodate certain people who have handicaps; you have a wide monitor, you find that it is better to put the burden on them; since you would love to enjoy your wide monitor. I get that; but to accuse me to forcing my preference, when I have pointed out a valid logical objection and point out that we must be inclusive, where do you get the idea that I am forcing the community to be inline with my own preference. So yes, my VETO absolutely stands.

The above link provides no evidence about the ease of reading source code.

I agree, but let me ask you this; how did you come to the conclusion that this is NOT applicable to the source code at all? What is the information used by you to come to that conclusion? That text does not have the word source code, and therefore source is some sort of magical thing? or are you under the impression that visually challenged people does not read source code and we don't need to care about them?

@anuengineer anuengineer closed this Dec 8, 2023
@anuengineer
Copy link
Contributor

The question is whether we as a community listen to the preferences of a plurality of our contributors or not?

I am far more than willing to listen. I am basically saying that it is NOT a preference of mine at all. I studied this issue; and I am asking all people part of this issue to research this. I am NOT saying this is the way it should be. I am asking for a technical study which justifies any of the other positions. Yes, "You" as Individual must have debated that 15 years ago, I am saying that there are significant studies on this topic, so that we can leverage those leanings.

Honestly, It feels like I am not able to debate this on the merits of the case -- NO ONE seems to be interested in researching this -- but the refrain is this is what the majority wants. My veto is NOT against the Majority -- I love democracy, was born in the largest one in the world, and now I live in the oldest democracy. This is NOT a democratic question -- This is simply me asking -- what is the rationale for this decision ? Have we considered all this information that is available. I would love to have a deep discussion on each of these points, and I have a set of questions, which I had promised I will share with @fapifta (it is taking time, but I promise I will get to that). So once more, let us have a discussion about this issue -- and as I pointed out, it is NOT based on my preferences or yours -- but based on extensive amount of studies that are already in place about this. Just saying something is Nonsense without producing data to support your claim -- is not an effective way to have a discussion.

**Let me repeat: ** Let us have a sincere discussion about this issue. My veto was cast based on my research on this topic. I am more than willing to lift my veto, if I am able to see information that justifies other positions. This is NOT a dogmatic issue for me. I am just doing the right thing for all of us.

@anuengineer
Copy link
Contributor

... For technical decisions, only the votes of active committers are binding. ...

According to the bylaws quoted above, only the votes of active committers are binding. @anuengineer , sorry that you are not an active committer as pointed out in this comment.

Moreover, the validity of the veto was challenged and confirmed.

I am an active committer. I am not sure where you go the idea that I am not.

@anuengineer
Copy link
Contributor

anuengineer commented Dec 8, 2023

... For technical decisions, only the votes of active committers are binding. ...

According to the bylaws quoted above, only the votes of active committers are binding. @anuengineer , sorry that you are not an active committer as pointed out in this comment.

Moreover, the validity of the veto was challenged and confirmed.

@arp7 This is a classical example of an attack against the person. We just agreed to discuss the issue. What is @szetszwo point here. Let us shoot the messenger, instead of debating the issue. I think this is a bad slippery slope that will kill this community. I really think we should all be vigilant against NOT having healthy debates, but use arguments like You are not a part of US, because you have a different opinion. If I count the contributions to ozone, I know I have made far more contributions to ozone than @szetszwo. I am humbly requesting us to maintain a discourse that is respectful to each other.

@anuengineer
Copy link
Contributor

Please also note the history of discourse so far in this round:

  1. This is Nonsense.
  2. Anu is NOT a committer.
  3. I wrote extensive document with reasoning on where I am getting data from.
  4. Let us go to the Dev vote route to overrule this Veto.
  5. Again, Anu is not an Active committer
  6. Anu asked @szetszwo to stop attacking the person, but the idea, Anu clearly indicating that he is NOT offended by the attack of people than ideas, since Anu is familiar with how Nicholas argues in Apache Communities.
  7. Anu and Arpit find that the comments have crossed a point of cordiality and both of them agree to back off.
  8. Nicholas again makes the new argument -- that Anu is NOT a committer.

What is missing ?

NO DISCUSSION about the merits of the technical argument, and the long document that I wrote based on my research. I am flagging this pattern with the hope that it will allow us to debate this issue with research and references instead of just trying to destroy free speech and expression.

@steveloughran
Copy link
Contributor

steveloughran commented Dec 8, 2023

Hadoop stayed on 80 for a long time not so much for punched-card development (that's where it came from!) but because it was easier to do side-by-side reviews with the .diff viewer plugin for chrome.

we moved to 100 chars a few years back and I have no regrets. It helps us stay with readable code when using builder APIs:

public interface FutureDataInputStreamBuilder
    extends FSBuilder<CompletableFuture<FSDataInputStream>, FutureDataInputStreamBuilder> {

Here the types of arguments/subclasses are over 80 chars wide, splitting this stuff makes it harder to parse.

And, just like the limits on method length and #of arguments to a method or constructor, it is only a hint.
Which is good when you are writing methods which take builders from the AWS SDK as arguments:

  private <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> void configureEndpointAndRegion(
      BuilderT builder, S3ClientCreationParameters parameters, Configuration conf) {

Although wider lines may appear to go back on the "side by side" reviewing problem, it is considered moot because

  • displays have got wider
  • github is a better review tool which does word wrap
  • vs code and intellij are great for reviews too.

Adopting a wider column count should not be an excuse to overboard with functional programming code to show off how you learned Haskell, Standard ML or OCaml

getOptionalKeys().stream().filter(!key -> key.startsWith(headerPrefix) && key.length() > prefixLen).forEach(key -> headers.put(key.substring(prefixLen), options.get(key)));

why so? makes debugging and tracking down where an NPE came from wrong. A single clause per line is much better as you can set breakpoints, isolate stack traces etc.

    getOptionalKeys().stream()
        .filter(key -> key.startsWith(headerPrefix) && key.length() > prefixLen)
        .forEach(key -> headers.put(key.substring(prefixLen), options.get(key)));

Accordingly, I'd worry less about line length and more about modern java code issues

  • best practices for lambda expressions
  • effective use of optional<>, java17 type inference, ...
  • how to design for shading, e.g. making sure you never return any type from someone else's library in your APIs in case you want to move to shaded libraries, change the implementation etc.
  • how to design for maintenance. We've been maintaining hadoop code for 15+ years. What could we have done better?
  • design for testability.
  • should we embrace runtime exceptions so we can use java streaming APIs, or do we have to reimplement our own functional libraries. It's too late to change old code, hence org.apache.hadoop.util.functional, but new APIs?
    For the last 2 I've really embraced interface/implementation rather than things which require mockito abuse for changing behaviour. Hadoop avoided interfaces in the past because its impossible to add new methods without breaking external code. but with java8 and the ability to define default implementations, that changed.

So here's my proposal

  1. stop worrying about line length. Change the rule, but even if not, +1 patches which break it if it makes the code clearer
  2. come and discuss on hadoop-common how we could do a style guide for the new world, based on others' work
  3. look at the style rules of apache Scala projects, so we can learn from them especially on type inference (all args/return values must declare types)
  4. embrace AssertJ if you haven't already.

if you can't get consensus, move this out of github issues into mailing lists. And focus on what is best for the readability and maintainability of modern code, not stuff we wrote for java 5.

(BTW: all those code snippets are from hadoop trunk, not all my work)

@steveloughran
Copy link
Contributor

oh, and as hadoop went to 100 chars in 2021, if you copy and paste in anything from beyond that time, you either have to reformat or expect complaints.

apache/hadoop#3224

@anuengineer
Copy link
Contributor

anuengineer commented Dec 8, 2023

Thank you @steveloughran. You have always brought a fresh breath to arguments, and you are taking this argument in the right direction. This is what I have been asking..

Hadoop stayed on 80 for a long time not so much for punched-card development (that's where it came from!) but because it was easier to do side-by-side reviews with the .diff viewer plugin for chrome.

I have heard the similar story for a long time, and I have always believed this to be true; until I started reading the
papers behind the cognition. In the same spirit that you have offered to share your learning and understanding, I want to share what I have been learning.

  1. Why do we keep reading most material in the world, even in the digital world -- ePub, PDF etc. in the 80 column mark ? The human minds have been trained for a long long time to keep it around that number. Moving it by 20 chars -- I personally believe should NOT impact us by much. However, each time I read the general literature, I will share those links in a separate post, so it is easier for you to read, keeps on arguing that best comprehension for human minds are under 75 chars.
  2. Look at the new generation of Text based tools we have: ChatGPT, Bard etc. etc. They all seem to follow the same method of comprehension.
  3. The new Java: If you go by the comprehension arguments. The use of elements like var, allows us to write more efficiently in less verbose way. So lambda, vars, all the features of new family of Java would allow all of us express far more ideas in lesser space. This leads to the question, is it a good idea to extend the length of the line then ? Given, we might move to Java 21 (Current LTS) at some point in time. There are some classic papers on Python's Comprehensions, and how the concise expression effects understand and reading of code. Again, I will share each of these papers with you so you can clearly see that I am not just proposing my personal view; but I am saying this is the current state of the research on this topic/

we moved to 100 chars a few years back and I have no regrets. It helps us stay with readable code when using builder APIs:

public interface FutureDataInputStreamBuilder
    extends FSBuilder<CompletableFuture<FSDataInputStream>, FutureDataInputStreamBuilder> {

This is actually a great input. The question I would ask is that is there more density of information now per line, and what is the way that people read and understand.

Here the types of arguments/subclasses are over 80 chars wide, splitting this stuff makes it harder to parse.

And, just like the limits on method length and #of arguments to a method or constructor, it is only a hint. Which is good when you are writing methods which take builders from the AWS SDK as arguments:

  private <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> void configureEndpointAndRegion(
      BuilderT builder, S3ClientCreationParameters parameters, Configuration conf) {

Although wider lines may appear to go back on the "side by side" reviewing problem, it is considered moot because

* displays have got wider

* github is a better review tool which does word wrap

* vs code and intellij are great for reviews too.

This is a major concern. When I am on a laptop, I find hard to review. The language of choice for me these days is Rust, and they do use 100 chars. I am familiar with this. What I do on a regular basis is that is I use 2 monitors when I do code reviews. To avoid these kind of wrapping.

Adopting a wider column count should not be an excuse to overboard with functional programming code to show off how you learned Haskell, Standard ML or OCaml

This. 100 times this. Java is rapidly getting these features, and I love that. Don't get me wrong. The information
density of each character of Java is exploding in the last 5 / 6 releases. Given that, would it not be far better
to actually move to better versions of java to address this issue -- Which we have to at some point in time anyway ?

Why is that -- in my mind -- really valuable and a far better solution NOT being discussed here ?

Another Logical Corollary to this point.

This argument always seems to start from a point, ** I have larger monitor**, followed by the ** I don't believe human beings reading studies are relevant in understanding code**. So let us make this 100 chars as the limit.

What surprises me is that they are NOT able to see the inconsistency of the ask, let me illustrate that here.

Let us assume that every one in the world has a 1080p screen and 21 inches is the minimum size of a monitor (I am fully aware that many of you might be coding on Macs with 4K), but humor me here.

The first question we want to ask is: How many chars can we effectively put on a given line ?

Pixels Per Inch (or PPI) is defined as number of pixels per inch on your monitor.
There are [220 PPI] (https://www.calculatorsoup.com/calculators/technology/ppi-calculator.php) and if you assume a character size 10 pixels wide

Characters per Line= Screen Width in Inches × (PPI​ / Character Width in Pixels)
Characters per Line = 21× (220 / 10) = 462 characters per line.

So the first question in my mind is always, why are you folks NOT asking for 462 characters per line.

The argument that I have a large screen implies I need more characters seems to fail, due to this inconsistency in the ask.
I realize that the people who are making this ask are exceptionally gifted programmers (I am looking at you @steveloughran) and are in no way incapable of understanding this logical issue.

So what they are really asking me is this --

Hi, I am getting irritated with Java 8 kind of code, and then checkstyle ( which I never run locally on my machine, fails in the Github run), please make it fail less.

But my humble point is that you are just responding to your anger, and not realizing the greater good this rule was designed for. That is to prevent information overload.

We have studied extensively the **"Saccades" : One of the important metric used in understanding how people read code and understand code -- is measured by a notion called "Saccades" or "Saccades Per Line" or how we read to understand. The paper indicates that experts make larger jumps (saccades) when reading code, focusing on important elements, while Novices tend to read code more linearly. This implies that fewer, but more targeted, saccades per line might be characteristic of expert reading behavior, and Novices read very differently, and as the information density increases, only people deeply familiar with the domain and code will be able to read and understand it.

As a language gains more capability to express ideas in far fewer characters, we have to be cognizant of that. I know that @arp7 used to love Perl. He used to be able to write it well. Often, I struggled to read Perl due to the information density shenanigans involved by the masters of Perl (for all the new kids, this is a comment that applies to few older people in this discourse, it is a waste of your time to look up the Magic of Perl.)

This leads me to conclude, that real argument that my friends are trying to make is NOT about the monitor size -- but we are irritated that checkstyle fails when we are about to commit.

Perhaps, that is also the reason that Nicholas wants to argue that I have not been checking in and I don't suffer the same pain (which is NOT true, I have been more irritated by these tools that anything. Ask @awk -- I have had many fights with him. )

So let us bring this conversation back into a normal -- let us ask ourselves which pain path is better ? Moving the checkstyle to 100 chars -- which I believe is a path to reduce the pain for a little while, but suffer the consequences of the fear that you are voicing "go overboard with functional programming code to show off how you learned Haskell, Standard ML or OCaml". I guarantee you that any Java programmer who starts coding from Java 17 has far more information density that many of us who started far earlier.

In that case, is the solution truly to take the pain for a little while and get Ozone working with Java 21 LTS or Java 17 LTS ?

getOptionalKeys().stream().filter(!key -> key.startsWith(headerPrefix) && key.length() > prefixLen).forEach(key -> headers.put(key.substring(prefixLen), options.get(key)));

I strongly feel that this complicated. Especially due to the reactive crowd of Java, where they taught everyone it is so cool to return this in each function call, and then chain the calls. Most people I know, would have broken this line into multiple lines, and perhaps (I think, a personal opinion) that it is better to read this as different lines.

why so? makes debugging and tracking down where an NPE came from wrong. A single clause per line is much better as you can set breakpoints, isolate stack traces etc.

    getOptionalKeys().stream()
        .filter(key -> key.startsWith(headerPrefix) && key.length() > prefixLen)
        .forEach(key -> headers.put(key.substring(prefixLen), options.get(key)));

Completely agree. For debugging a single long line is better. But we both know that the real issue
is NOT about the Monitors capabilities or the line length. It is that checkstyle is a Tax we pay for;
It is a Tax we inherited from Hadoop. But I sincerely believe as an open source which is
designed to attract new contributors, it is a tax worth paying because it limits the information
density you can pack into a line. I know a tool cannot be a replacement to diligent review by
the programmers, however, this tool would flag and catch issues, and help the reviewer.

Accordingly, I'd worry less about line length and more about modern java code issues

* best practices for lambda expressions

* effective use of optional<>, java17 type inference, ...

* how to design for shading, e.g. making sure you never return _any_ type from someone else's library in your APIs in case you want to move to shaded libraries, change the implementation etc.

* how to design for maintenance. We've been maintaining hadoop code for 15+ years. What could we have done better?

* design for testability.

* should we embrace runtime exceptions so we can use java streaming APIs, or do we have to reimplement our own functional libraries. It's too late to change old code, hence org.apache.hadoop.util.functional, but new APIs?
  For the last 2 I've really embraced interface/implementation rather than things which require mockito abuse for changing behaviour. Hadoop avoided interfaces in the past because its impossible to add new methods without breaking external code. but with java8 and the ability to define default implementations, that changed.

100% -- This is my opinion too. However the discussion for the last 3 years have always been on "My monitor is big", I don't believe that human being read code. If you address these comments about -- The var, optional etc. you have effectively solved this issue.

But here is the issue. Are all over contributors willing to move up this chain; perhaps tweak the way they write code ?

So here's my proposal

1. stop worrying about line length. Change the rule, but even if not, +1 patches which break it if it makes the code clearer

+1. I am fine with it. I believe that committers should do the right thing. The tools are to help them, NOT to dictate.

2. come and discuss on hadoop-common how we could do a style guide for the new world, based on others' work

I would, but as you see, I am very engaged in defending MY ONLY veto in Apache. {Just kidding, I am not listening to Hadoop-common, I have always been an Ozone man.}

3. look at the style rules of apache Scala projects, so we can learn from them especially on type inference (all args/return values must declare types)

Yes. This is what I am saying too.

4. embrace AssertJ if you haven't already.

if you can't get consensus, move this out of github issues into mailing lists. And focus on what is best for the readability and maintainability of modern code, not stuff we wrote for java 5.

(BTW: all those code snippets are from hadoop trunk, not all my work)

@steveloughran It has been a pleasure to talk to you after a long time. I am thankful for your thoughtful comments, and more over for taking time to write a thoughtful response.

@anuengineer
Copy link
Contributor

anuengineer commented Dec 8, 2023

@steveloughran Unless you are really passionate about this topic, You might not get anything from reading these papers.
I find it fascinating to learn the methods about my own brain when we read things. Hence these intellectual meanderings. I promised I will share some of the reading material in a way that is easy to click and read.

These two are general studies on WHY in-spite of having large monitors, we as human beings have not
moved to 462 character lengths.

  1. The Effect of Line Length and Passage Type on Reading Speed, Comprehension, and User Satisfaction
  2. The Effects of Line Length on Reading Online News
  3. Java Specific --An Empirical Study Assessing Source Code Readability in Comprehension worth reading if you are interested in this topic. One critical argument that this paper makes is that if you can keep the number of nestings -- that is curly braces -- low, it is easier to understand. The 80 character and 100 characters, if you assume a measure provides 2 tabs means 10 more nesting, or 6 nesting more if you assume 4 tabs. This is an important consideration when moving to 100 characters from 80.
  4. If there is only ONE paper that you read, this is it. Most relevant to our discussion. https://optimal-codestyle.github.io/

Once more, thank you for improving the noise to signal ratio of this discussion; appreciate your kindness and time.

@royfielding
Copy link

Code style issues like this are subject to majority opinion of the project. There is no veto.

Literally any opinion can be stretched into a research discussion ad nauseam. It is still just an opinion. A technical reason for which a veto is justified would be if the configuration failed to match a group decision, or if the change causes a failure in deployment that would impact existing users.

Saying the group has to extensively research the subject matter of your opinion just to override a veto that is based on opinion is a nonsensical abuse of the Apache process, even if that opinion is correct. A project must be free to make its own conclusions even if the entire world disagrees.

You have an opinion and you are welcome to share that and have one vote on the process. No veto is allowed.

@szetszwo
Copy link
Contributor

@royfielding , thanks a lot for commenting on this!

@maobaolong , could you update this change? It currently has some conflicts.

@szetszwo szetszwo reopened this Dec 10, 2023
@anuengineer
Copy link
Contributor

@royfielding Thank you for comment. Appreciate it. I will comply with the Apache process.

@szetszwo
Copy link
Contributor

... I am going to presume that you fat fingered and accidentally reopened this issue. My VETO clearly stands ...

I intentionally reopened this last time. Are you open for discussion? Discussion on a closed pull request does not sound right.

@maobaolong
Copy link
Member Author

@szetszwo sure! Will be pleasure and glad to continue to finish this PR!

@maobaolong
Copy link
Member Author

@szetszwo Thanks for reopen this PR for me, I've merge the master branch and resolved conflicts, please take a look.

Thank you!

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maobaolong for updating the patch.

hadoop-hdds/dev-support/checkstyle/checkstyle.xml Outdated Show resolved Hide resolved
@szetszwo szetszwo merged commit cd1380e into apache:master Dec 10, 2023
14 checks passed
@szetszwo
Copy link
Contributor

The pull request is now merged. Thanks, everyone!

@maobaolong
Copy link
Member Author

Thank you everyone for the discussion and review!

@Montura
Copy link
Contributor

Montura commented Dec 11, 2023

@maobaolong, do we need also change the value of RIGHT_MARGIN for IntelliJ hadoop-ozone/dev-support/intellij/ozone-style.xml to 120 instead of 80 as you've done for checkstyle.xml?

@maobaolong
Copy link
Member Author

@Montura Thank you for remind me, the ozone-style.xml should be update together, I submit a new PR here.

#5767

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Pull request that modifies the build process
Projects
None yet