-
Notifications
You must be signed in to change notification settings - Fork 12
[XERCESJ-1781] Javadoc fixes in org.apache.xerces.impl #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * <li>Character | ||
| * <dl> | ||
| * <dt class="REGEX"><kbd>.</kbd> (A period) | ||
| * <dt><code>.</code> (A period) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why class="REGEX" is removed here. That seems OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom CSS isn't ideal within javadoc. The most common way that javadoc is displayed is directly within an IDE where any referenced CSS class will not be used. Also, kbd was not the right choice for the context of what's being documented and changing to a code tag (which is the best option here) is likely to have affected the style that relates to the REGEX class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, looking at site.css there isn't even a .REGEX class defined. I recommend ditching site.css entirely anyway for the reasons above. Perhaps reading published javadocs was useful up until about twenty years ago but any editor worth using will just render the javadoc into a tooltip these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class attributes aren't just for CSS, nor is site.css the only css that can be applied to this.
I also don't think it's reasonable to assume that people only use certain IDEs to browse and render this. Search is a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to know if the class is actually used or a hangover from the past. Even for generated html, a sensible dom structure would be better than custom css rules. I'll take another look at the generated docs with & without the classname to see if there's any css rules being applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked into this using browser dev tools on the generated html docs and can confirm that there's no css rules being brought in for a REGEX class. I checked on the dt element as well as child elements. I also double checked the build/docs/javadocs/xerces2/stylesheet.css file, there's no such class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the class attribute is not just for CSS
- We do not and cannot know all CSS stylesheets that might be applied to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll put class name back on dt and as per other comment plan to ditch improper use of kbd
| + * <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub></var><var>R<sub>2</sub></var><var>...</var><var>R<sub>n</sub></var><kbd>]</kbd> (without <a href="#COMMA_OPTION">"," option</a>) | ||
| + * <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub></var><kbd>,</kbd><var>R<sub>2</sub></var><kbd>,</kbd><var>...</var><kbd>,</kbd><var>R<sub>n</sub></var><kbd>]</kbd> (with <a href="#COMMA_OPTION">"," option</a>) | ||
| + * <dt><code>[R1R2...Rn]</code> (without a {@link #SPECIAL_COMMA} option)</dt> | ||
| + * <dt><code>[R1,R2,...,Rn]</code> (with a {@link #SPECIAL_COMMA} option)</dt> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sub element should be OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can put it back but it doesn't look right. When making these changes I was viewing the output a lot. It's worth viewing the rendered output when reviewing these html changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing doc is not super-well-written but I think it does need to be clear that n is not literal and the subscript does that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a go at doing the markup with sub and see how it looks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look at this. The existing approach is ok for html shown in a browser:
(new at top, old at bottom in this image)
...but doesn't work out so well within an IDE (old at top - (different line)):

I am happy to put some of this stuff back if the html results are the main concern but perhaps it's worth finding a compromise here so that the markup is more readable from an IDE.
For example, by ditching the kbd tag (which is supposed to be for keyboard input anyway), the result is much more readable and retains the var and sub:

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a start on the approach suggested in prev comment and pushed work in progress. There's some more to sort out which hopefully will get done over the weekend
| * <li>Character | ||
| * <dl> | ||
| * <dt class="REGEX"><kbd>.</kbd> (A period) | ||
| * <dt><code>.</code> (A period) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class attributes aren't just for CSS, nor is site.css the only css that can be applied to this.
I also don't think it's reasonable to assume that people only use certain IDEs to browse and render this. Search is a thing.
| * <p>This range matches the character.</p> | ||
| * </li> | ||
| * <li><code>C1-C2</code> | ||
| * <p>This range matches a character which has a code point that is >= <var>C1</var>'s code point and <= <var>C2</var>'s code point.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're still using var here, which you took out most other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it here as it rendered ok when viewing it. A lot of the places where var was used it was within a section that really needed a code block but as kbd was used the structure of the markup was getting messed up pretty bad. I am happy to change to code though.
| + * <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub></var><var>R<sub>2</sub></var><var>...</var><var>R<sub>n</sub></var><kbd>]</kbd> (without <a href="#COMMA_OPTION">"," option</a>) | ||
| + * <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub></var><kbd>,</kbd><var>R<sub>2</sub></var><kbd>,</kbd><var>...</var><kbd>,</kbd><var>R<sub>n</sub></var><kbd>]</kbd> (with <a href="#COMMA_OPTION">"," option</a>) | ||
| + * <dt><code>[R1R2...Rn]</code> (without a {@link #SPECIAL_COMMA} option)</dt> | ||
| + * <dt><code>[R1,R2,...,Rn]</code> (with a {@link #SPECIAL_COMMA} option)</dt> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing doc is not super-well-written but I think it does need to be clear that n is not literal and the subscript does that
| * <li>Character | ||
| * <dl> | ||
| * <dt class="REGEX"><kbd>.</kbd> (A period) | ||
| * <dt><code>.</code> (A period) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the class attribute is not just for CSS
- We do not and cannot know all CSS stylesheets that might be applied to this
676bf71 to
ca157c8
Compare
|
it's worth taking a look at this now. For the comments that have a |
| * @param useNrage Ignored. | ||
| * @return This returns no NrageToken. | ||
| * @param useNrange ignored | ||
| * @return a {@link RangeToken}, returns no NRANGE token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete "a {@link RangeToken},"
is "no NRANGE token" supposed to be one enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on the previous text being NrageToken I just rewrote it using the naming for the int representation of token in Token.NRANGE.
deleting "a {@link RangeToken}," now
| * <ul> | ||
| * <li><code>\ooo</code> (Octal character representations)</li> | ||
| * <li><code>\G</code>, <code>\C</code>, <code>\lc</code></li> | ||
| * <li><code>\ uc</code>, <code>\L</code>, <code>\U</code></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's an extra space between \ and uc that shouldn't be there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially, I thought the same when changing from <kbd>\u005c u</kbd>. I don't think that \uc is a thing unless the c is meaning char which should be represented as hexadecimal value. I'll push a commit with it being \uc which is better than the current situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so this was an originally an encoded backslash. Probably the backslash didn't need to be encoded here since this was not in a string literal, where it would need to be encoded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems that removing the space has broken the build:
[xjavac] Compiling 712 source files to /home/runner/work/xerces-j/xerces-j/build/classes
[xjavac] /home/runner/work/xerces-j/xerces-j/build/src/org/apache/xerces/impl/xpath/regex/RegularExpression.java:92: error: illegal unicode escape
[xjavac] * <li><code>\uc</code>, <code>\L</code>, <code>\U</code></li>
[xjavac] ^
[xjavac] /home/runner/work/xerces-j/xerces-j/build/src/org/apache/xerces/impl/xpath/regex/RegularExpression.java:73: error: illegal unicode escape
[xjavac] * <li><code>,</code> : The parser treats a comma in a character class as a range separator.
[xjavac] ^
[xjavac] 2 errors
can do <code>\u005cuc</code> to fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the original was correct. \u is recognized as the start of a Unicode escape at a very early stage by the Java lexical analyzer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I should put it back to <code>\ uc</code>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, \u005cuc is correct. The tokenizer will read that as \uc
Unicode escapes are processed before anything else happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, \u005cuc is in the current changes so running CI should be fine
fix a few javadoc comments for functions but most of the work is the html markup on the RegularExpression class.