-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown: Heading auto id #9858
Markdown: Heading auto id #9858
Conversation
I don't think this will work as there are other places in doxygen that rely on the |
The tests have already specified Id's. Since the id now contains the title (instead of see:
The tests fail because the id differs and that's the whole point of this PR. @albert-github Your opinion on this? |
My opinion is don't change, the procedure has been well thought about and one should as user (nearly) never link to automatically generated anchors as they might change, so they should only be used internally by / in the program. |
Did you try e.g. the example in the attached file: example.tar.gz
|
Yeah, this is due to this line: https://github.com/doxygen/doxygen/blob/master/src/docnode.cpp#L5687. |
Does anybody know why we check for |
The repository gives:
Still I don't understand what the use case would be for this change, like as said before in #9858 (comment) the anchors should only be used internally. |
For better compatibility with GitHub. In GitHub READMEs I often see that another section has been linked to. GitHib itself also generates ids for the headings. Doxygen also generates ids for headings, butt the name is different. You can assign an id by adding
I understand that but the standard ids of Doxygen limit the compatibility of READMEs between Doxygen and GitHub. |
I thought so, there are already a number of issues regarding the links in markdown files (see e.g. #8378 and the comment #8378 (comment)). |
I completely agree with you. We could implement the heading auto-identifier as GitHub has, and add an option to switch between the default ( GitHub's and Pandoc's mechanisms are described here: mity/md4c#175 |
Thanks for the link to the (in-official) description, it is the first time I see something like it. It should also be taken into account that in doxygen link ids are global. Note that with the tag reader it should bet taken into account that the read file might be from an older version of doxygen and thus can have |
GitHub removes the
I would add an option like |
@albert-github I added a option. Would you adjust the text or leave it like it is now. |
If anyone wants to test the GitHub implementation: https://godbolt.org/z/qcvrx1cff |
@tim-gromeyer when you are ready with your changes, can you squash your commits to make it easier to review. And please do not do unnecessary changes like sorting #include's |
src/markdown.cpp
Outdated
{ | ||
// Replace space with '-' | ||
std::replace(str.begin(), str.end(), ' ', '-'); | ||
static std::string_view germanSharpS = "ß"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are thousands of non-ascii characters, why is only this one getting a special treatment? You should assume the string has UTF-8 encoding, see utf8.h for helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the ß, I know GitHub leaves it in the id, see https://github.com/tim-gromeyer/stackoverflow/blob/tests/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add in comment a list of rules to show how characters are replaced.
I'm sorry for sorting your includes. My IDE does it automatically. |
0e3f8bb
to
e16ce96
Compare
@doxygen I finally squashed it! You can review it if you want to. Just ignore the |
Before anybody reviews, I'm currently working on an better approach using some helper functions from utf8.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick remarks / comments / things to change / add
@@ -457,7 +457,6 @@ class TagFileParser | |||
case InMember: | |||
case InPackage: | |||
case InDir: | |||
if (m_curString.endsWith("autotoc_md")) return; | |||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this line will give problems when a tag file is used from the 1.8.16 version of doxygen is used as here in the p1.tag
file of the given example there is the line:
<docanchor file="md_intro" title="Description">autotoc_md0</docanchor>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont really understand this. As far as I can see it starts with autotoc_md, not ends with. It ends with 0 but thats not what we checked for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, probably a bug no one noticed, the tag file indeed gives quite clearly autotoc_md0
so it should be startsWith
instead of endsWith
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah and because it went unnoticed for so long I thought it's unnecessary and removed it. If you want I can add some code and check if we are in the doxygen Mode and than check if it starts with autotoc_md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to add the correct test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do it today tonight(about when I commited yesterday)
src/definition.cpp
Outdated
if (!m_impl->sectionRefs.empty()) { | ||
//printf("%s: writeDocAnchorsToTagFile(%d)\n",qPrint(name()),m_impl->sectionRef.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep also consistent layout, in this case the {
bracket not at the end of the line but on the next line.
(multiple places)
src/definition.cpp
Outdated
@@ -395,13 +395,13 @@ void DefinitionImpl::addSectionsToIndex() | |||
|
|||
void DefinitionImpl::writeDocAnchorsToTagFile(TextStream &tagFile) const | |||
{ | |||
if (!m_impl->sectionRefs.empty()) | |||
{ | |||
#define VECTOR_CONTAINS(x, y) (std::find(x.begin(), x.end(), y) != x.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this define is used multiple times wouldn't it be better to have it as a global lambda or (template) function or in a global include file?
src/docnode.h
Outdated
@@ -1072,9 +1072,9 @@ class DocPara : public DocCompoundNode | |||
const HtmlAttribList &attribs() const { return m_attribs; } | |||
void setAttribs(const HtmlAttribList &attribs) { m_attribs = attribs; } | |||
|
|||
private: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the private don't remove the spaces (consistency).
src/markdown.cpp
Outdated
{ | ||
// Replace space with '-' | ||
std::replace(str.begin(), str.end(), ' ', '-'); | ||
static std::string_view germanSharpS = "ß"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add in comment a list of rules to show how characters are replaced.
src/markdown.cpp
Outdated
static QCString behaviour = Config_getEnumAsString(HEADING_AUTO_IDENTIFIER); | ||
QCString id; | ||
id.sprintf("autotoc_md%d",autoId++); | ||
if (behaviour == "DOXYGEN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't test on the string value but on the enum value like is done e.g. for WARN_AS_ERROR
in message.cpp
.
Makes it also possible to use a switch
statement with additional benefit that some compilers will warn when a value is not handled.
Thank you for the review. I will change it next time when I have time again! |
d6cb80f
to
1b45cae
Compare
I fixed 1 last bug I found, fixed the code formatting ans squashed it. |
4b932a2
to
ee6f48c
Compare
I have now tested it too and it works! |
I see in the HTML directory some strange filenames appear when in the The used source file aa.md:
the used Doxyfile:
I don't think this is nice and that it should be corrected (even though it is an internal name). Example: example.tar.gz |
Unfortunately I have no idea how to fix this, maybe I'll figure it out. |
@albert-github , @tim-gromeyer Thanks for the quick feedback. Commit e1ba6ae should address both issues. |
The error I found seems fixed, I'm glad I could help! |
Looks like I found 2 more problems. With the code:
I get the warnings:
Example: example.tar.gz |
Another one, regarding numbers:
give a lot of warnings like:
and one:
This is due to the fact that the translation of
is:
where the label is:
so starting with a number which is not (yet) supported. Example: example_2.tar.gz (Note: found in groupoffice package) |
I already know this one. That's why I didnt implement pandoc. There was just no way to test if it works. The bug with the |
@albert-github with this commit I've tried to address the 2 "bun" related issues. The last issue with headers starting with a digit, I'm not sure yet if we can or want to support that (not allowed in HTML4 and XHTML). |
There are a number sides on the number issue:
As a solution I can think of:
|
I just found an example on GitHub in respect to numbers and they violate HTML4 / XHTML as in the example I found at |
It includes all the numbers, for example: 2.3.0 / 2015-04-26
================== will be |
I don't think it is a good idea to break HTML4 / XHTML compatibility I think it is better that we translate the (as example
|
Sounds like a good idea, but would be harder to implement. |
I agree that the
file_2.md:
Here we don't get any warnings but we don't jump to the right place but in the original example we get:
so again warnings (there are more warnings here but they are not really relevant for the case here). As far as I can tell the problem arises from the fact that in GitHub the labels just have to be unique inside a file (and subsequent numbering is done just inside the file) whilst for doxygen the labels have to be globally unique. Also in doxygen it is possible that a label is referred outside the file where it is defined but with GitHub this is not the case. This is a hard problem, but probably an approach where the file name + path is incorporated into the label has to be chosen so that the labels remain unique over files and can be numbered inside the file. Note that the internal used labels have to be unique inside doxygen and that the translation from GitHub mangled labels to doxygen labels has to be done but that the translation the other way around is not necessary. Small example: example_small.tar.gz |
If I have understood you correctly, this is not quite right, see https://github.com/tim-gromeyer/stackoverflow/blob/tests/README.md
This is indeed a difficult problem. Too difficult for me to solve it, because of the complexity of doxygen. But I think you are right. With the combination of the relative path and the filename it should be possible.
|
Regarding the number problem I now also found a problem regarding a minus sign at the beginning, in fact the line:
translates into:
which is a correct translation, but doxygen does not like the
(Found by Fossies in the cppcheck 2.10.2 package) |
Why does doxygen not like the |
The
But doxygen does care about XHTML and HTML4 as doxygen documentation is normally not read on GitHub but in a browser. |
This might be true but the GitHub id mangling is HTML5 only. This should definitely be noted in the documentation.
Yes, that's true, but can you name me a browser that doesn't support HTML5. That was an issue 8 years or so ago, but not anymore. |
I think it is a matter of defining the purpose of
Then doxygen should not alter the anchors and links, support leading |
I think both, at least partially.
That sounds like a solution, a complex one.
That's exactly what I was trying to say! |
I think the purpose would be to: allow hand written GitHub pages to be (also) processed by doxygen. I explicitly left out the words "as-is".
I think in this way the labels will work in all of HTML4, HTML5 and XHTML. Not doing this will limit the usability quite a bit. |
This sounds like a good idea, and if I can help implement it, I'll give it a try. |
Based on the comment doxygen#9858 (comment): ``` take the # page / section / ... lines and translate (as done now under MARKDOWN_ID_STYLE=GITHUB) to a label. In case the resulting label starts with a digit or a - sign the generated label should be prepended by a reserved word (like autotoc_md). take the lines like [...](...) (as this is how GitHub refers to sections etc. where the part between ( and )is taken and when it starts with a # it is stripped away (might be more than one, I saw it once so far but I think this is actually a typo in the code but it works) and when the remaining string (i.e. the label) starts with a digit or a - sign the label should be prepended by the reserved word ``` this feature has been implemented.
Based on the ideas as given in #9858 (comment) github labels staring with a digit of |
As I mentioned in #9855, I don't like the default title ID. I also didn't want to use commands or give the title an ID as described here. That's why I created this PR to change the default title id. I have already tested it with HTML and PDF. It worked.