Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Allow Most chars for variable Values #27

Merged

Conversation

tkruse
Copy link
Contributor

@tkruse tkruse commented Jun 27, 2018

Fix #25: allow any string except null as var value. XML-Unescaping is already done by the SAXParser, no code change needed. But allowing blanks for values required relaxing SystemTestDocReader. Identifiers could also be relaxed in the future that way (I see no reason beyond western-thinking esthetics to restrict to ascii Java word chars), but that is not so urgent for me, programmers should use ASCII for actual identifiers.

  • TODO Fix failing tests, add tests for newly valid non-word characters (blank, japanese, special)

@@ -77,5 +100,6 @@ public static void assertPropertyIdentifiers( Collection<String> properties) thr
}

private static final Pattern identifierRegex_ = Pattern.compile( "[\\w\\-]+");
private static final Pattern varValueRegex_ = Pattern.compile( "[^\\p{Cntrl}]*");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoiding chars like newline, escape, bell. Those could be encoded in XML1.1, but the SAX parser fails to recognize them.

@@ -194,8 +202,7 @@ public String toIdPath( String attributeName, String attributeValue) throws SAXE
*/
public String getAttribute( Attributes attributes, String attributeName)
{
String value = attributes.getValue( attributeName);
return StringUtils.isBlank( value)? null : value;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a single whitespace seems like an important boundary test case value to me, for testing various functions.

@@ -125,7 +128,7 @@ public void writeAttribute( String name, String value)
print( " ");
print( name);
print( "=\"");
print( value);
print( StringEscapeUtils.escapeXml10(value));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about all the differences between XML 1.0 and 1.1. This one suggests to go with 1.0 when in doubt: https://stackoverflow.com/questions/6260975/should-i-learn-xml-1-0-or-xml-1-1

@tkruse tkruse force-pushed the feature/values-not-identifiers branch from 8790e78 to c8a1cb9 Compare June 27, 2018 05:26
@tkruse tkruse force-pushed the feature/values-not-identifiers branch from c8a1cb9 to 296203c Compare June 29, 2018 06:52
@kerrykimbrough
Copy link
Contributor

These changes seem basically OK. But I don't see any new unit tests to verify them.

@tkruse
Copy link
Contributor Author

tkruse commented Jun 30, 2018

Yes, new tests need to be added, my free time is almost over, but I can work on adding tests next week.

@tkruse tkruse force-pushed the feature/values-not-identifiers branch from fd3aa83 to f041e0a Compare June 30, 2018 06:43
@tkruse
Copy link
Contributor Author

tkruse commented Jun 30, 2018

I have added more tests, and an equivalent change to SystemInputDefReader. Also I allowed whitespace and some control characters additionally, as they can be represented in XML1.1.

Some control characters cannot be parsed with the JDK parser however (such as newline I think), so users would need to add a more XML1.1 compliant parser on the classpath, such as in maven adding simply:

<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.12.0</version>
  <scope>runtime</scope>
</dependency>

This replaces the SAXParser implementation automatically. I manually added to tests and saw that additional control character sequences can be parsed that way. But I did not add tests for that since tcases probably should not depend on xerces.

For other identifiers such as System and function name, I now think it would be preferable to allow international character sets, but that's more of a nuisance than a blocker, so up to you.

For properties I wish I could use some more characters like {[]}=%#, to auto-generate property names that do not clash with user-properties easily, and look decent. Not sure if you are willing to negotiate about that.

@tkruse tkruse force-pushed the feature/values-not-identifiers branch from f041e0a to 2d685f8 Compare June 30, 2018 06:52
@kerrykimbrough
Copy link
Contributor

For other identifiers such as System and function name, I now think it would be preferable to allow international character sets

I agree -- see new issue #30. This has to be done carefully, but I will investigate soon.

@kerrykimbrough
Copy link
Contributor

For properties I wish I could use some more characters like {[]}=%#, to auto-generate property names

For now, perhaps prefixing with something like "__" might be a reasonable convention.

@@ -125,7 +128,7 @@ public void writeAttribute( String name, String value)
print( " ");
print( name);
print( "=\"");
print( value);
print( NumericEntityEscaper.below(0x20).translate(StringEscapeUtils.escapeXml11(value)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a comment to explain this translation. For example, why are both NumericEntityEscaper and StringEscapeUtils required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment

@tkruse tkruse force-pushed the feature/values-not-identifiers branch from 033e8bc to 4aee214 Compare July 1, 2018 14:04
@kerrykimbrough kerrykimbrough merged commit 0871938 into Cornutum:master Jul 1, 2018
@tkruse tkruse deleted the feature/values-not-identifiers branch July 8, 2018 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants