Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate formatter from indent to clang-format #4592

Closed
mc-butler opened this issue Oct 4, 2024 · 23 comments
Closed

Migrate formatter from indent to clang-format #4592

mc-butler opened this issue Oct 4, 2024 · 23 comments
Assignees
Labels
area: build Build system and (cross-)compilation prio: medium Has the potential to affect progress
Milestone

Comments

@mc-butler
Copy link

Important

This issue was migrated from Trac:

Origin https://midnight-commander.org/ticket/4592
Reporter zaytsev (@zyv)

Currently, we use GNU Indent as our formatter. Unfortunately, it has several problems that make it almost unusable:

  1. It requires tons of /* *INDENT-OFF* */ directives to avoid breaking complex formatting.
  2. It randomly adds /* at the beginning and NUL at the end of source files.
  3. It fails to parse complex source files (see Prepare for release mc-4.8.32 #4524):
indent: lib/util.c:522: Error:Stmt nesting error.
...

As I started trying to fix these errors, it turned out that half of our source tree is unformatted, because the formatting loop breaks on a non-zero exit code, and all files after that are simply not processed.

This makes it impossible to add formatting checks to the CI and automatically maintain proper formatting.

I have investigated possible formatting options, and the only viable candidate in my opinion is clang-format:

  1. It uses the clang compiler frontend for parsing, so it is able to process any valid code.
  2. It is also actively developed and mostly stable.
  3. It comes with a preset for GNU style.

As with any formatter change, it is not possible to make the style match 100%, and there will be one big diff, but hopefully life will get better after that.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Oct 4, 2024 at 13:04 UTC (comment 1)

  • Owner set to zaytsev
  • Status changed from new to accepted

Branch: 4592_clang_format
Initial [d4c919faaf26daf46985f649cb3e2233003b91c5]

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Oct 4, 2024 at 13:21 UTC (comment 2)

I added a style based on the GNU style, but customized to produce a minimal diff with our code base. You can see what the result looks like in the second commit.

Before I spend more time adding trailing commas and stuff like that, can we agree in principle that we can switch? Any suggestions for other options we can change?

If we agree, it would be great if we could merge all open branches first, and then I can work on manual tweaks in this branch before it can be merged.

My observations are as follows:

  1. Many of our arrays are reformatted
  2. Include comment alignment is broken
  3. Spaces are being inserted before gettext shortcuts, causing a lot of diff

Arrays

This happens because many of our arrays are missing trailing commas, which causes them to be reformatted instead of keeping the items one per line. This is easy to fix.

For creative manual formatting, you can add empty comments at the end of the line (//) to force the formatter to leave them as they are.

Comments

One could convert C-style comments to line comments and play with the following options:

SpacesBeforeTrailingComments: 1
AlignTrailingComments:
  Kind: Always
  OverEmptyLines: 42

I don't think it's worth it and I can live with the new formatting.

Gettext

This happens, because gettext shortcuts are considered function calls (which they are). I can live with that.

P.S. Many tools like CLion have a built-in integration with clang-format, so formatting will happen mostly automatically if this is merged.

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 4, 2024 at 15:43 UTC (comment 3)

For me, in general. it worse than GNU indent output.

Is there a way to avoid following:

-static void
-G_GNUC_PRINTF (1, 0)
-mc_va_log (const char *fmt, va_list args)
+static void G_GNUC_PRINTF (1, 0) mc_va_log (const char *fmt, va_list args)

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Oct 4, 2024 at 16:06 UTC (comment 4)

Is there a way to avoid following

Like in the last commit, or am I missing something? With this change, it's like all the other functions - first types on one line, then the rest.

For me, in general. it worse than GNU indent output.

If there are other things you don't like, please tell me, I can try to find solutions.

To be honest, I don't really care about the formatting rules, as long as they're fixed and it's done automatically for me in a consistent way. My biggest problem with indent is that, as I've discovered, it doesn't work with our code base, and when I start to fix it, more and more bugs like /*, NUL, etc. appear.

The things I like about clang-format are that it works at all, and that the output is surprisingly tolerable, given that this is fully automatic without any manual changes so far. The * stuff is finally done consistently and correctly, whatever the rules are.

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 5, 2024 at 7:57 UTC (comment 5)

Can't find an option to avoid following:

-    mc_filter->search_condition->is_case_sensitive =
-        mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE);
+    mc_filter->search_condition->is_case_sensitive
+        = mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE);

I want have = at the same line as variable.

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 5, 2024 at 8:04 UTC (comment 6)

static const mc_search_type_str_t mc_search__list_types[]
    = { { N_ ("No&rmal"), MC_SEARCH_T_NORMAL },
        { N_ ("Re&gular expression"), MC_SEARCH_T_REGEX },

I'like to have something like following:

static const mc_search_type_str_t mc_search__list_types[] =
{
    { N_ ("No&rmal"), MC_SEARCH_T_NORMAL },
    { N_ ("Re&gular expression"), MC_SEARCH_T_REGEX },

@mc-butler
Copy link
Author

Changed by mike.dld (mikedld@….com) on Oct 5, 2024 at 8:15 UTC (comment 5.7)

Replying to andrew_b:

Can't find an option to avoid following:

-    mc_filter->search_condition->is_case_sensitive =
-        mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE);
+    mc_filter->search_condition->is_case_sensitive
+        = mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE);

I want have = at the same line as variable.

I believe it's BreakBeforeBinaryOperators.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Oct 5, 2024 at 8:22 UTC (comment 8)

I want have = at the same line as variable.

I think this can be solved with PenaltyBreakAssignment, see my commits. Interestingly, it seems to solve many of the array breaking problems.

Please keep your bad examples coming. There may be more "simple" solutions to make things better. Thank you!

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 5, 2024 at 8:38 UTC (comment 9)

-const char *STR_E_RPL_NOT_EQ_TO_FOUND
-    = N_ ("Num of replace tokens not equal to num of found tokens");
+const char *STR_E_RPL_NOT_EQ_TO_FOUND = N_ (
+    "Num of replace tokens not equal to num of found tokens");

    mc_skin_color->fg = (items_count > 0 && values[0][0])
                            ? mc_skin_color_look_up_alias (mc_skin,
                                                           g_strstrip (g_strdup (values[0])))
                        : (tmp != NULL) ? g_strdup (tmp->fg)
                                        : NULL;

? and ; are not aligned.


-static tty_color_pair_t tty_color_defaults
-    = { .fg = NULL, .bg = NULL, .attrs = NULL, .pair_index = 0 };
+static tty_color_pair_t tty_color_defaults = {
+    .fg = NULL, .bg = NULL, .attrs = NULL, .pair_index = 0
+};

Structure initialization is more readable if members are one-per-line:

static tty_color_pair_t tty_color_defaults = {
    .fg = NULL,
    .bg = NULL,
    .attrs = NULL,
    .pair_index = 0

INDENT-OFF/ON are still there.

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 5, 2024 at 8:42 UTC (comment 10)

A new huge dependency is an argument against clang-format: tiny GNU indent vs. huge clang/llvm.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Oct 5, 2024 at 8:58 UTC (comment 11)

A new huge dependency is an argument against clang-format: tiny GNU indent vs. huge clang/llvm.

I don't think that's a fair argument. It's just for development. The tool itself is only <1 MB. But of course it pulls libclang and libllvm if clang is not installed, and only gcc is used. But what is 100 MB these days?

The problem with the tiny GNU indent is that it hasn't seen any development since 2018, and it doesn't work properly, but it's also unlikely to ever be fixed, because to do proper formatting for C, you basically have to implement a compiler front end.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Oct 5, 2024 at 9:01 UTC (comment 12)

INDENT-OFF/ON are still there.

Yes, I left them in on purpose for now. They are ignored by clang-format. It has it's own ignores:

// clang-format off
// clang-format on

If we switch to clang-format, they can all be removed by regex at the same time.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Oct 5, 2024 at 9:19 UTC (comment 13)

If you put a trailing comma, clang-format will format one per line. There was no way to make indent behave. This is what I meant by "manual tweaks". Maybe when removing indent-comments one can check if trailing commas are needed.

 /* *INDENT-OFF* */
 static tty_color_pair_t tty_color_defaults = {
-    .fg = NULL, .bg = NULL, .attrs = NULL, .pair_index = 0
+    .fg = NULL,
+    .bg = NULL,
+    .attrs = NULL,
+    .pair_index = 0,
 };
 /* *INDENT-ON* */

There is also an option AlignArrayOfStructures: Left, but I think the results are questionable. What do you think?

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Oct 5, 2024 at 9:33 UTC (comment 14)

+const char *STR_E_RPL_NOT_EQ_TO_FOUND = N_ (
+    "Num of replace tokens not equal to num of found tokens");

Here indent was turned off and clang-format breaks, because it's over 100 chars. How do you want to handle this?

  • If this line is important, ignores can be left in place (see manual tweaks)
  • We can increase line limit to e.g. 120 characters
  • Ignore this problem

I don't think there is a way to do it nicer.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Oct 5, 2024 at 9:39 UTC (comment 15)

mc_skin_color->fg =

This is a problem with nested ternaries :( By adding parenthesis the problem is solved. Another manual tweak...

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Oct 5, 2024 at 10:20 UTC (comment 16)

mc_skin_color->fg =

With AlignOperands: false and BreakBeforeTernaryOperators: false it behaves more like indent.

In this particular case the important part is AlignOperands, it saves space and code looks better. BreakBeforeTernaryOperators can be also left on. See last commits for different possibilities.

What you like more?

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 5, 2024 at 11:35 UTC (comment 14.17)

Replying to zaytsev:

  • We can increase line limit to e.g. 120 characters

Yes.


                ? 1
                : 0;

:( There is a lot of space to do

                ? 1 : 0;

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 5, 2024 at 13:23 UTC (comment 18)

Let's take a pause here and merge a 4572_cleanup branch.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Jan 9, 2025 at 11:00 UTC (comment 19)

  • Milestone changed from 4.8.33 to 4.8.34

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Jan 28, 2025 at 12:31 UTC (comment 20)

  • Branch state changed from no branch to on review

Branch: 4592_clang_format
Initial changeset: [d038175698fba89f3883ad5aec93fac3398baee7]

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Feb 2, 2025 at 18:09 UTC (comment 21)

  • Branch state changed from on review to approved
  • Votes set to andrew_b

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Feb 2, 2025 at 18:42 UTC (comment 22)

  • Branch state changed from approved to merged
  • Votes changed from andrew_b to committed-master
  • Resolution set to fixed
  • Status changed from accepted to testing

Committed to master as [fbef24a] .

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Feb 2, 2025 at 18:57 UTC (comment 23)

  • Status changed from testing to closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: build Build system and (cross-)compilation prio: medium Has the potential to affect progress
Development

No branches or pull requests

2 participants