Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run cleanup on guide/C #128

Closed
wants to merge 4 commits into from
Closed

Run cleanup on guide/C #128

wants to merge 4 commits into from

Conversation

fellen
Copy link
Member

@fellen fellen commented Nov 9, 2019

While replacing hard coded links in the documents by entities - for both parts and all languages - , I saw the following issues:

  • Because each author and each translator used a different format style it is unnecessarily hard to patch translations with diff tools. Examples:
    ** Some files use tabs with different length assumptions.
    ** One puts all elements of a table row in one line, another separated each table cell by an empty line.
    ** Some break lines Fortran like at 72, others put several thousand characters in one line.
  • There a many elements of the form <somelongtagname attribs></somelongtagname>, which can be shortened to <somelongtagname attribs />

Eclipse's xml editor has a nice tool to do this beneath some other checks in one run.
Its formatters line breaks for parentheses and commas are not not everyone's cup of tea, but helped me already to find missing closing brackets.

  • If nobody knows a better tool, I would like to use it.
  • After a few experiments, I would prefer to format with 2 spaces indentation and line length 100.

Compress empty element tags, ...
Reformat with 2 spaces indentation and line length 100.
@fellen
Copy link
Member Author

fellen commented Nov 9, 2019

This is only a demo. It should be applied on all languages of both parts.
Also other documenters like @sunfish62, @DaveC49 , ... and translators should comment.

&app;
<acronym>XML</acronym>
data file can be transformed to almost any other data format (e.g.,
<acronym>QIF</acronym>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can figure out how it would be good to distinguish style tags like <acronym> from formatting tags like <para> and block only on formatting tags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, distinguish block elements from inline elements.
I will try to find it in the config or test several tools.

@jralls
Copy link
Member

jralls commented Nov 9, 2019

@gjanssens has a project in hand to convert to msgid/msgstr for translations, which would render the formatting of the translations a non-issue.

An editor or IDE might not be the best tool to use for setting this up because in general it's difficult to integrate them into git so that commits are reformatted automagically. We've used astyle in the past for reformatting C. Something similar for XML would I think be preferable to reformatting by hand in an editor.

@fellen
Copy link
Member Author

fellen commented Nov 9, 2019

@gjanssens has a project in hand to convert to msgid/msgstr for translations, which would render the formatting of the translations a non-issue.

Yes, the draft is PR #120 and got https://wiki.gnucash.org/wiki/Po_Based_Documentation_Translations for discussion of details.

PR #128 is thought as a preparing step for the Todo of unifying URLs etc.

The first cli xml formatter is already in use by the project: xmllint
But either it is very rudimentary or its man page is not well written.

See the result of 'for i in *.xml; do  xmllint --format --c14n --path
../../docbook/ $i --output output/xmllint/$i; done'

The files still contain [TAB]s and the line length was not limited.
@fellen
Copy link
Member Author

fellen commented Nov 9, 2019

In commit 820fb5a xmllint was applied on the files in maint.
Applying them on my eclipse cleaned files might give the desired result for this time, but not for changes in the future.

@fellen
Copy link
Member Author

fellen commented Nov 9, 2019

The next suggestion from https://stackoverflow.com/questions/16090869/how-to-pretty-print-xml-from-the-command-line, tidy seems to have its strenght in conversion between HTML and XML. So it might be of more interest, if we want to convert/cleanup ht-docs.

@cstim
Copy link
Member

cstim commented Nov 9, 2019

Disclaimer: I haven't worked in docs for a long time. That being said, I'd vote for tidying up the XML any time. If it is possible to make the files into a more unified indentation so that patching becomes somewhat easier, just go ahead and do it.

@sunfish62
Copy link
Contributor

Although I have only sporadically contributed (and have no plans at this time to contribute more), I can pretty much tell you that technicalities--such as whether a tab is two spaces, or a line ends with a space or not--are never going to be a priority for me. Should I ever contribute again, someone will most assuredly complain about my failure to follow these obscure issues (all the while completely skipping any discussion of the actual editorial changes i might have made).

My point here is to say I actually don't give a whit about this, and I guarantee (at least for my own part) that any future contribution I make will need someone else to go through it after the fact to make it "correct" for some arbitrarily designated set of standards. If that's what you want to spend time on, great.

@fellen
Copy link
Member Author

fellen commented Nov 11, 2019

My point here is to say I actually don't give a whit about this, and I guarantee (at least for my own part) that any future contribution I make will need someone else to go through it after the fact to make it "correct" for some arbitrarily designated set of standards. If that's what you want to spend time on, great.

After @jralls comments, the idea is currently that you run between make check and git commit... i.e. make format, which will do it for you.

So the question for authors is:
Which style would be the easiest to read?

Indentation only or an empty line before each section,

<para>Text</para>
or

<para>
  Text
</para>

@sunfish62
Copy link
Contributor

For me, I have no preference--especially if the software will do it for me. I tend to follow the example I see.

@DaveC49
Copy link
Contributor

DaveC49 commented Nov 12, 2019 via email

@gjanssens
Copy link
Member

@gjanssens has a project in hand to convert to msgid/msgstr for translations, which would render the formatting of the translations a non-issue.

Yes, the draft is PR #120 and got https://wiki.gnucash.org/wiki/Po_Based_Documentation_Translations for discussion of details.

PR #128 is thought as a preparing step for the Todo of unifying URLs etc.

The whitespace normalization is not really needed for the conversion to a po based workflow. The script I wrote to help in the conversion is tag based. However having said that, it is probably still helpful in situations where human intervention is required.

@gjanssens
Copy link
Member

My point here is to say I actually don't give a whit about this, and I guarantee (at least for my own part) that any future contribution I make will need someone else to go through it after the fact to make it "correct" for some arbitrarily designated set of standards. If that's what you want to spend time on, great.

After @jralls comments, the idea is currently that you run between make check and git commit... i.e. make format, which will do it for you.

While a make format is certainly interesting, I don't think we should to force authors to use it. We can encourage it for sure. If formatting slowly degrades over time we can always do a reformatting from time to time ourselves. Right before or after release would be a good moment.

@gjanssens
Copy link
Member

Frank, David, My preference as a programmer is for
Text
The make check; make formatseems like a viable way to get all the XML code into a standard format which might relieve casual DocBook editors from having to conform too much to a specific standards.
Would it not be possible to include the format checking and fixing within the make check command ?

It's technically fairly easy to have make check also fix the formatting. However that would me mixing two goals that don't go together very well. make check is about validating what is there. A user wouldn't expect a check to change source code. And I wouldn't want that either because if the check fails and meanwhile it had changed your source code, that could cause additional confusion.

So it's better to keep them separate.

@DaveC49
Copy link
Contributor

DaveC49 commented Nov 13, 2019 via email

New dependencies: xmlformat, perl (or alternative ruby)

xmlformat.conf still can be improved

CMake integration still missing
@fellen
Copy link
Member Author

fellen commented Apr 23, 2020

Come out of the ark and throw this crap away in favour of a modern platform like MkDocs which would be much more efficient and effective.

That should be discussed in https://bugs.gnucash.org/show_bug.cgi?id=722016 istead of PRs.

@gjanssens
Copy link
Member

I just checked on repology.org - xmlformat is not widely packaged in distributions. It's available on Debian and Ubuntu, but not on Fedora or Mint to name two other popular distros.

@fellen
Copy link
Member Author

fellen commented Apr 23, 2020

I just checked on repology.org - xmlformat is not widely packaged in distributions. It's available on Debian and Ubuntu, but not on Fedora or Mint to name two other popular distros.

Isn't it part of docs-common on fedora?

@gjanssens
Copy link
Member

Isn't it part of docs-common on fedora?

I can't find a package with that name ?

Config Error->Warning; add source URL
@fellen
Copy link
Member Author

fellen commented Apr 23, 2021

In commit 4b746b7 I have put the perl version and my current configuration in util/xmlformat.

But I did not add it to the different make processes. So you have to run one of the commands in the README file there.

Please test it and report your opinion, ideas to improfe the config …

@fellen fellen closed this Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants