New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run cleanup on guide/C #128
Conversation
Compress empty element tags, ... Reformat with 2 spaces indentation and line length 100.
This is only a demo. It should be applied on all languages of both parts. |
&app; | ||
<acronym>XML</acronym> | ||
data file can be transformed to almost any other data format (e.g., | ||
<acronym>QIF</acronym> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can figure out how it would be good to distinguish style tags like <acronym>
from formatting tags like <para>
and block only on formatting tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, distinguish block elements from inline elements.
I will try to find it in the config or test several tools.
@gjanssens has a project in hand to convert to msgid/msgstr for translations, which would render the formatting of the translations a non-issue. An editor or IDE might not be the best tool to use for setting this up because in general it's difficult to integrate them into git so that commits are reformatted automagically. We've used astyle in the past for reformatting C. Something similar for XML would I think be preferable to reformatting by hand in an editor. |
Yes, the draft is PR #120 and got https://wiki.gnucash.org/wiki/Po_Based_Documentation_Translations for discussion of details. PR #128 is thought as a preparing step for the Todo of unifying URLs etc. |
The first cli xml formatter is already in use by the project: xmllint But either it is very rudimentary or its man page is not well written. See the result of 'for i in *.xml; do xmllint --format --c14n --path ../../docbook/ $i --output output/xmllint/$i; done' The files still contain [TAB]s and the line length was not limited.
In commit 820fb5a xmllint was applied on the files in maint. |
The next suggestion from https://stackoverflow.com/questions/16090869/how-to-pretty-print-xml-from-the-command-line, tidy seems to have its strenght in conversion between HTML and XML. So it might be of more interest, if we want to convert/cleanup ht-docs. |
Disclaimer: I haven't worked in docs for a long time. That being said, I'd vote for tidying up the XML any time. If it is possible to make the files into a more unified indentation so that patching becomes somewhat easier, just go ahead and do it. |
Although I have only sporadically contributed (and have no plans at this time to contribute more), I can pretty much tell you that technicalities--such as whether a tab is two spaces, or a line ends with a space or not--are never going to be a priority for me. Should I ever contribute again, someone will most assuredly complain about my failure to follow these obscure issues (all the while completely skipping any discussion of the actual editorial changes i might have made). My point here is to say I actually don't give a whit about this, and I guarantee (at least for my own part) that any future contribution I make will need someone else to go through it after the fact to make it "correct" for some arbitrarily designated set of standards. If that's what you want to spend time on, great. |
After @jralls comments, the idea is currently that you run between So the question for authors is: Indentation only or an empty line before each section,
|
For me, I have no preference--especially if the software will do it for me. I tend to follow the example I see. |
Frank, David,
My preference as a programmer is for
<para> Text</para>
because it makes finding missing or incorrectly placed tags a lot easier. There are tags which are associated with the
document structure(like <sect1>, <sect2>, <sect3>, <para>) where indenting each tag makes the structure clearer and
those tags more associated with the text formating of a phrase or group of words ( <keycombo>, <guilabel>, ,guimenu>,
<emphasis> and similar) within that structure where it makes sense to keep the tags in-line with the text. I don't
think it is too critical though and as a DOcBook editor like XMLMind produces XML with the other formatI am happy to go
with that
The indenting possibly comes more naturally to those of us with a programming background and far less naturally to those
of us with more of a literary background. I ended up with a foot in both camps having some background in programming and
late in my life becoming a publisher of poetry.
Another factor is that we don't all use the same editor. I have tried emacs, XMLMind, xed and a few others. Emacs
obviously has a following but as I spent a lot of my life not using it, I find it a huge investment of my time to learn
to suck eggs when I can already do that effectively with alternatives.
XMLMind is a Docbook editor which switches between a DocBook View sort of wysiwyg mode using default xsl processing and
an XML View with standard code highlighting which you can swap between and produces XML with the<?xml version="1.0"
encoding="UTF-8"?><chapter version="5.1" xmlns="
http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xila="http://www.w3.org/2001/XInclude/local-attributes" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:trans="http://docbook.org/ns/transclusion" xmlns:svg="http://www.w3.org/2000/svg" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:db="http://docbook.org/ns/docbook">;
<title>Testing it out</title>
<sect1> <title>First section</title>
<para>First paragraph in the first section.</para>
<para>Second para first section.</para> </sect1>
<sect1> <title>Second section</title>
<para>First para second section.</para> </sect1></chapter>
formatting of paragraphs in the XML produced with an indent of spaces. Also has a validator for DocBooks. It is
currently DocBooks 5.1 and AFAIK there is no option to select conformance with a different DocBook version. There is a
provision for setting different xsl and css stylesheets for processing so presumbaly one could locate and load the
stylesheets appropriate for a given docBook project. Indenting is controllable as a preference but no save removing
trailing spaces. I wil try and experiment with setting them to the 4.5 default xsl and css stylesheets.
With xed I have to continually reset the indent conversion to spaces as I also use it in a variety of system editing and
coding functions which have a different indent requirements and it has no ability to set specific options from the
file extension. It does however have an option to save without trailing spaces. Ithink I remember finding a similar
option in emacs as well at one stage.
The make checkmake formatseems like a viable way to get all the XML code into a standard format which might relieve
casual DocBook editors from having to conform too much to a specific standards. Would it not be possible to include the
format checking and fixing within the make check command?
David Cousens
On Mon, 2019-11-11 at 13:22 -0800, Frank H. Ellenberger wrote:
> My point here is to say I actually don't give a whit about this, and I guarantee (at least for my own part) that any
> future contribution I make will need someone else to go through it after the fact to make it "correct" for some
> arbitrarily designated set of standards. If that's what you want to spend time on, great.
After @jralls comments, the idea is currently that you run between make check and git commit... i.e. make format,
which will do it for you.
So the question for authors is:
Which style would be the easiest to read?
Indentation only or an empty line before each section,
<para>Text</para>
or
<para> Text</para>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
[
{
***@***.***": "http://schema.org",
***@***.***": "EmailMessage",
"potentialAction": {
***@***.***": "ViewAction",
"target": "
#128?email_source=notifications\u0026email_token=ABW4CAYJOKOYDX2M3NQLG3DQTHEJTA5CNFSM4JLBXXQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDYFPOQ#issuecomment-552622010
",
"url": "
#128?email_source=notifications\u0026email_token=ABW4CAYJOKOYDX2M3NQLG3DQTHEJTA5CNFSM4JLBXXQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDYFPOQ#issuecomment-552622010
",
"name": "View Pull Request"
},
"description": "View this Pull Request on GitHub",
"publisher": {
***@***.***": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
--
Dr David R Cousens
B.Sc, M.Prof. Acc., Ph.D., G.C.Ed
|
The whitespace normalization is not really needed for the conversion to a po based workflow. The script I wrote to help in the conversion is tag based. However having said that, it is probably still helpful in situations where human intervention is required. |
While a make format is certainly interesting, I don't think we should to force authors to use it. We can encourage it for sure. If formatting slowly degrades over time we can always do a reformatting from time to time ourselves. Right before or after release would be a good moment. |
It's technically fairly easy to have So it's better to keep them separate. |
Thanks Geert.
I take your point about keeping a check and a reformat as separate processes. If we need to we can always make check &&
make format at least on Linux, not sure about Windows and Mac worlds. Despite David T's concerns I think it is still
useful to have some sort of standards to work to or at least towards and even some agreed formatting of the XML/DocBooks
code that authors can access. My own transgressions in this regard were mainly while I was on the learning curve (still
am) with DocBooks and was experimenting with various editors while trying to establish a workflow.
David Cousens
|
New dependencies: xmlformat, perl (or alternative ruby) xmlformat.conf still can be improved CMake integration still missing
That should be discussed in https://bugs.gnucash.org/show_bug.cgi?id=722016 istead of PRs. |
I just checked on repology.org - xmlformat is not widely packaged in distributions. It's available on Debian and Ubuntu, but not on Fedora or Mint to name two other popular distros. |
Isn't it part of docs-common on fedora? |
I can't find a package with that name ? |
Config Error->Warning; add source URL
In commit 4b746b7 I have put the perl version and my current configuration in util/xmlformat. But I did not add it to the different make processes. So you have to run one of the commands in the README file there. Please test it and report your opinion, ideas to improfe the config … |
While replacing hard coded links in the documents by entities - for both parts and all languages - , I saw the following issues:
** Some files use tabs with different length assumptions.
** One puts all elements of a table row in one line, another separated each table cell by an empty line.
** Some break lines Fortran like at 72, others put several thousand characters in one line.
<somelongtagname attribs></somelongtagname>
, which can be shortened to<somelongtagname attribs />
Eclipse's xml editor has a nice tool to do this beneath some other checks in one run.
Its formatters line breaks for parentheses and commas are not not everyone's cup of tea, but helped me already to find missing closing brackets.