fix for TIKA-3400 contributed by kamaci#441
Conversation
| } | ||
| //== is actually substantially faster than .equals(String) | ||
| if (typeAtt.type() == UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL]) { | ||
| if (typeAtt.type().equals(UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL])) { |
There was a problem hiding this comment.
This was done out of a notional sense of efficiency. I'm not sure we need to change it.
There was a problem hiding this comment.
So, parameter of the TypeAttribute#setType can be exactly that String (UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL]) ?
There was a problem hiding this comment.
This relies on the Lucene not changing the underlying static strings: https://github.com/apache/lucene/blob/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/email/UAX29URLEmailTokenizer.java#L61
There was a problem hiding this comment.
OK. I think that they had to use enum instead of a string array for such a thing 😊 I'll rollback that lines at my PR.
|
|
||
| } | ||
| if ((gctxid != ExtendedGUID.nil() || | ||
| if ((!gctxid.equals(ExtendedGUID.nil()) || |
There was a problem hiding this comment.
Good catch! We should probably make a static constant ExtendedGUID.NIL to avoid unnecessary object creation.
There was a problem hiding this comment.
To be clear, I'm not asking you to do the static thing on this issue. Your catch is important. Thank you!
a8e0a52 to
b294d8f
Compare
|
@tballison I've updated the PR. Checks fail due to |
|
I was just able to replicate that in Java 11 on a Mac. ubuntu w Java 8 passes... Ugh... I pushed a simple fix for now. |
|
@tballison is there anything left to do for this PR? |
No description provided.