[XERCESJ-1781] Javadoc fixes in org.apache.xerces.impl #41

SingingBush · 2025-10-27T09:01:45Z

fix a few javadoc comments for functions but most of the work is the html markup on the RegularExpression class.

src/org/apache/xerces/impl/xpath/regex/ParserForXMLSchema.java

src/org/apache/xerces/impl/xpath/regex/RegexParser.java

elharo · 2025-10-27T11:20:24Z

src/org/apache/xerces/impl/xpath/regex/RegularExpression.java

 *   <li>Character
 *     <dl>
- *       <dt class="REGEX"><kbd>.</kbd> (A period)
+ *       <dt><code>.</code> (A period)


not sure why class="REGEX" is removed here. That seems OK.

Custom CSS isn't ideal within javadoc. The most common way that javadoc is displayed is directly within an IDE where any referenced CSS class will not be used. Also, kbd was not the right choice for the context of what's being documented and changing to a code tag (which is the best option here) is likely to have affected the style that relates to the REGEX class

in fact, looking at site.css there isn't even a .REGEX class defined. I recommend ditching site.css entirely anyway for the reasons above. Perhaps reading published javadocs was useful up until about twenty years ago but any editor worth using will just render the javadoc into a tooltip these days.

class attributes aren't just for CSS, nor is site.css the only css that can be applied to this.

I also don't think it's reasonable to assume that people only use certain IDEs to browse and render this. Search is a thing.

It would be good to know if the class is actually used or a hangover from the past. Even for generated html, a sensible dom structure would be better than custom css rules. I'll take another look at the generated docs with & without the classname to see if there's any css rules being applied.

I've looked into this using browser dev tools on the generated html docs and can confirm that there's no css rules being brought in for a REGEX class. I checked on the dt element as well as child elements. I also double checked the build/docs/javadocs/xerces2/stylesheet.css file, there's no such class

the class attribute is not just for CSS

We do not and cannot know all CSS stylesheets that might be applied to this

ok, I'll put class name back on dt and as per other comment plan to ditch improper use of kbd

elharo · 2025-10-27T11:21:30Z

src/org/apache/xerces/impl/xpath/regex/RegularExpression.java

-+ *       <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub></var><var>R<sub>2</sub></var><var>...</var><var>R<sub>n</sub></var><kbd>]</kbd> (without <a href="#COMMA_OPTION">"," option</a>)
-+ *       <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub></var><kbd>,</kbd><var>R<sub>2</sub></var><kbd>,</kbd><var>...</var><kbd>,</kbd><var>R<sub>n</sub></var><kbd>]</kbd> (with <a href="#COMMA_OPTION">"," option</a>)
+ *      <dt><code>[R1R2...Rn]</code> (without a {@link #SPECIAL_COMMA} option)</dt>
+ *      <dt><code>[R1,R2,...,Rn]</code> (with a {@link #SPECIAL_COMMA} option)</dt>


sub element should be OK

I can put it back but it doesn't look right. When making these changes I was viewing the output a lot. It's worth viewing the rendered output when reviewing these html changes.

The existing doc is not super-well-written but I think it does need to be clear that n is not literal and the subscript does that

I'll have a go at doing the markup with sub and see how it looks.

I took a quick look at this. The existing approach is ok for html shown in a browser:

(new at top, old at bottom in this image)

...but doesn't work out so well within an IDE (old at top - (different line)):

I am happy to put some of this stuff back if the html results are the main concern but perhaps it's worth finding a compromise here so that the markup is more readable from an IDE.

For example, by ditching the kbd tag (which is supposed to be for keyboard input anyway), the result is much more readable and retains the var and sub:

I've made a start on the approach suggested in prev comment and pushed work in progress. There's some more to sort out which hopefully will get done over the weekend

src/org/apache/xerces/impl/xpath/regex/RegularExpression.java

src/org/apache/xerces/impl/xpath/regex/REUtil.java

elharo · 2025-10-27T16:34:21Z

src/org/apache/xerces/impl/xpath/regex/RegularExpression.java

 *   <li>Character
 *     <dl>
- *       <dt class="REGEX"><kbd>.</kbd> (A period)
+ *       <dt><code>.</code> (A period)


class attributes aren't just for CSS, nor is site.css the only css that can be applied to this.

I also don't think it's reasonable to assume that people only use certain IDEs to browse and render this. Search is a thing.

elharo · 2025-10-27T16:40:05Z

src/org/apache/xerces/impl/xpath/regex/RegularExpression.java

+ *             <p>This range matches the character.</p>
+ *         </li>
+ *         <li><code>C1-C2</code>
+ *             <p>This range matches a character which has a code point that is >= <var>C1</var>'s code point and &lt;= <var>C2</var>'s code point.</p>


You're still using var here, which you took out most other places

I left it here as it rendered ok when viewing it. A lot of the places where var was used it was within a section that really needed a code block but as kbd was used the structure of the markup was getting messed up pretty bad. I am happy to change to code though.

elharo · 2025-10-27T16:40:39Z

src/org/apache/xerces/impl/xpath/regex/RegularExpression.java

-+ *       <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub></var><var>R<sub>2</sub></var><var>...</var><var>R<sub>n</sub></var><kbd>]</kbd> (without <a href="#COMMA_OPTION">"," option</a>)
-+ *       <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub></var><kbd>,</kbd><var>R<sub>2</sub></var><kbd>,</kbd><var>...</var><kbd>,</kbd><var>R<sub>n</sub></var><kbd>]</kbd> (with <a href="#COMMA_OPTION">"," option</a>)
+ *      <dt><code>[R1R2...Rn]</code> (without a {@link #SPECIAL_COMMA} option)</dt>
+ *      <dt><code>[R1,R2,...,Rn]</code> (with a {@link #SPECIAL_COMMA} option)</dt>


The existing doc is not super-well-written but I think it does need to be clear that n is not literal and the subscript does that

src/org/apache/xerces/impl/xpath/regex/RegexParser.java

src/org/apache/xerces/impl/xpath/regex/RegularExpression.java

elharo · 2025-11-01T12:44:18Z

src/org/apache/xerces/impl/xpath/regex/RegularExpression.java

 *   <li>Character
 *     <dl>
- *       <dt class="REGEX"><kbd>.</kbd> (A period)
+ *       <dt><code>.</code> (A period)


the class attribute is not just for CSS

We do not and cannot know all CSS stylesheets that might be applied to this

SingingBush · 2025-11-02T11:19:33Z

it's worth taking a look at this now. For the comments that have a + at the start, I'm not sure if whoever put that expected them to effectively be removed from the Javadoc but they do get rendered. So with that in mind should they be removed altogether?
If they are not to be removed then the + should be removed as it messes up the generated doc. (see related lines under Character class)

elharo · 2025-11-02T14:04:08Z

src/org/apache/xerces/impl/xpath/regex/ParserForXMLSchema.java

-     * @param useNrage Ignored.
-     * @return This returns no NrageToken.
+     * @param useNrange ignored
+     * @return a {@link RangeToken}, returns no NRANGE token


delete "a {@link RangeToken},"

is "no NRANGE token" supposed to be one enum?

based on the previous text being NrageToken I just rewrote it using the naming for the int representation of token in Token.NRANGE.

deleting "a {@link RangeToken}," now

elharo · 2025-11-03T12:28:04Z

src/org/apache/xerces/impl/xpath/regex/RegularExpression.java

+ *  <ul>
+ *    <li><code>\ooo</code> (Octal character representations)</li>
+ *    <li><code>\G</code>, <code>\C</code>, <code>\lc</code></li>
+ *    <li><code>\ uc</code>, <code>\L</code>, <code>\U</code></li>


I think that's an extra space between \ and uc that shouldn't be there

potentially, I thought the same when changing from <kbd>\u005c u</kbd>. I don't think that \uc is a thing unless the c is meaning char which should be represented as hexadecimal value. I'll push a commit with it being \uc which is better than the current situation.

Ah, so this was an originally an encoded backslash. Probably the backslash didn't need to be encoded here since this was not in a string literal, where it would need to be encoded

seems that removing the space has broken the build:

[xjavac] Compiling 712 source files to /home/runner/work/xerces-j/xerces-j/build/classes [xjavac] /home/runner/work/xerces-j/xerces-j/build/src/org/apache/xerces/impl/xpath/regex/RegularExpression.java:92: error: illegal unicode escape [xjavac] * <li><code>\uc</code>, <code>\L</code>, <code>\U</code></li> [xjavac] ^ [xjavac] /home/runner/work/xerces-j/xerces-j/build/src/org/apache/xerces/impl/xpath/regex/RegularExpression.java:73: error: illegal unicode escape [xjavac] * <li><code>,</code> : The parser treats a comma in a character class as a range separator. [xjavac] ^ [xjavac] 2 errors

can do <code>\u005cuc</code> to fix it

Yes, the original was correct. \u is recognized as the start of a Unicode escape at a very early stage by the Java lexical analyzer.

so I should put it back to <code>\ uc</code>?

No, \u005cuc is correct. The tokenizer will read that as \uc

Unicode escapes are processed before anything else happens.

cool, \u005cuc is in the current changes so running CI should be fine

elharo reviewed Oct 27, 2025

View reviewed changes

elharo requested changes Oct 27, 2025

View reviewed changes

elharo requested changes Oct 31, 2025

View reviewed changes

src/org/apache/xerces/impl/xpath/regex/RegexParser.java Outdated Show resolved Hide resolved

src/org/apache/xerces/impl/xpath/regex/RegexParser.java Outdated Show resolved Hide resolved

src/org/apache/xerces/impl/xpath/regex/RegularExpression.java Outdated Show resolved Hide resolved

SingingBush added 4 commits November 1, 2025 12:31

[XERCESJ-1781] Javadoc fixes in org.apache.xerces.impl

0a21cb3

[XERCESJ-1781] Javadoc fixes in org.apache.xerces.impl

472407f

[XERCESJ-1781] Javadoc fixes in org.apache.xerces.impl

97c23d5

[XERCESJ-1781] Javadoc fixes based on code review

08f1479

elharo requested changes Nov 1, 2025

View reviewed changes

[XERCESJ-1781] adding REGEX class back to dt and dd that had it

ca157c8

SingingBush force-pushed the javadoc/XERCES-1781-part-6 branch from 676bf71 to ca157c8 Compare November 1, 2025 13:21

[XERCESJ-1781] adding some of the var and sub tags back

e058c16

SingingBush requested a review from elharo November 2, 2025 11:19

elharo reviewed Nov 2, 2025

View reviewed changes

[XERCESJ-1781] changes for code review

8720584

SingingBush requested a review from elharo November 2, 2025 15:39

elharo reviewed Nov 3, 2025

View reviewed changes

[XERCESJ-1781] changes for code review

e3010fb

SingingBush requested a review from elharo November 3, 2025 16:29

[XERCESJ-1781] changes for code review

00e8b40

elharo approved these changes Nov 5, 2025

View reviewed changes

elharo merged commit dcacb20 into apache:main Nov 5, 2025
4 checks passed

SingingBush deleted the javadoc/XERCES-1781-part-6 branch November 5, 2025 19:58

[XERCESJ-1781] Javadoc fixes in org.apache.xerces.impl #41

[XERCESJ-1781] Javadoc fixes in org.apache.xerces.impl #41

Uh oh!

Conversation

SingingBush commented Oct 27, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SingingBush Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SingingBush commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SingingBush Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

SingingBush Nov 1, 2025 •

edited

Loading

SingingBush commented Nov 2, 2025 •

edited

Loading

SingingBush Nov 3, 2025 •

edited

Loading