-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed fix for WW-4971 (broken includes for non-UTF8 content). #257
Proposed fix for WW-4971 (broken includes for non-UTF8 content). #257
Conversation
|
||
try { | ||
if (utf8_String == null) { | ||
utf8_String = new String(utf16_String.getBytes("UTF-8"), "UTF-8"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use public String(byte bytes[], Charset charset)
with java.nio.charset.StandardCharsets
.
According to the Javadoc "These charsets are guaranteed to be available on every implementation of the Java platform."
Benefit: No need to handle UnsupportedEncodingException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, as using the static StandardCharsets
would allow for immediate assignment in the initializer block. That would permit elimination of the setup()
method.
@@ -41,12 +41,15 @@ | |||
/** The URL extension to use to determine if the request is meant for a Struts action */ | |||
public static final String STRUTS_ACTION_EXTENSION = "struts.action.extension"; | |||
|
|||
/** Comma separated list of patterns (java.util.regex.Pattern) to be excluded from Struts2-processing */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to separate those formatting and "fix typo" changes into one separate commit?
This would simplify the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "fix typo" changes were minimal enough that I figured it would be OK. It might be possible to separate those changes out, but I'm pretty new to Git. Presumably that would involve a new commit/push/pull request sequence (after trying to undo the formatting fixes) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following steps will do it.
# reset last commit
git reset HEAD~
# see your changes unstaged
git status
# partially add formatting and typo changes
# see for example https://johnkary.net/blog/git-add-p-the-most-powerful-git-feature-youre-not-using-yet/
git add -p .
git commit -m"Fix formatting and some typos"
# partially add main changes
git add -p .
git commit -m"WW-4971 Fix broken includes for non-UTF8 content"
# check that all wanted changes are included
git status
# log should contain two new commits
git log
# http://weiqingtoh.github.io/force-with-lease/
git push --force-with-lease
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hint: If anything goes wrong or is unclear, don't do the push and instead repull from github.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote an instruction how to use GitHub, let me know if anything is unclear and I will be happy to improve it :)
https://struts.apache.org/submitting-patches.html#contributing-with-github
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the helpful Git instructions, @sepe81 and @lukaszlenart. The instructions on the Struts website seem to be pretty clear for "normal" contribution usage, and @sepe81 's instructions here seemed to work well for breaking up the changes into two sections (something well beyond my Git skills without such help).
I ended up doing them in the reverse order he suggested (1st commit was for the proposed fix only, 2nd commit overlays typo fixes on top of that one), so hopefully that will be OK. :) :)
Let me know if it worked out, if not I'll try again (this time around the 1st commit also includes some of the suggested changes from comments, hopefully it's a bit cleaner).
} | ||
} | ||
|
||
protected void tearDown() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for empty override of tearDown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tearDown()
could be eliminated along with setup()
in relation to your earlier comments.
|
||
|
||
protected void setUp() throws Exception { | ||
super.setUp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for super call b/c impl in abstract super class is emtpy
|
||
protected String value; | ||
private HttpServletRequest req; | ||
private HttpServletResponse res; | ||
private static String defaultEncoding; | ||
private String defaultEncoding; // Made non-static (during WW-4971 fix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the comments shouldn't contain to many of those ticket references. Therefore we have the git history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a few spots in the source that refer to ticket references, After seeing that, it seemed like a good idea to identify the ticket reference for the change via comment. If the Git history is considered sufficient/preferred those comment references could be eliminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a good practice and if you feel it is needed that's ok - but we can also depend on git history - your choice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a good practice and if you feel it is needed that's ok - but we can also depend on git history - your choice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a good practice and if you feel it is needed that's ok - but we can also depend on git history - your choice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a good practice and if you feel it is needed that's ok - but we can also depend on git history - your choice :)
|
||
public Include(ValueStack stack, HttpServletRequest req, HttpServletResponse res) { | ||
super(stack); | ||
this.req = req; | ||
this.res = res; | ||
defaultEncoding = "UTF-8"; // Set UTF-8 for defaultEncoding, when not set in configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why to override defaultEncoding
in constructor? It will be set in setDefaultEncoding
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a case of being extra defensive (in case setDefaultEncoding()
injection parameter was ever changed to optional). It's probably unlikely to occur, in which case that assignment in the constructor could be eliminated.
@JCgH4164838Gh792C124B5 I've run your tests in old version of Struts and all of them succeed. Can you include some test that fails with current version? |
Hello @aleksandr-m. Unfortunately the only way I could find to demonstrate the failure was interactively (with a parent page in ISO-8859-1 using s:include on a child page with ISO-8859-1 high-ASCII accented characters). The unit tests that were added were primarily to demonstrate the proposed changes to The Unfortunately I couldn't figure out a mock-test that would pass with the proposed fix, but fail in previous versions. However I was able to demonstrate it interactively (truncation on older 2.5.x, but success using the appropriate encoding override with the fixed version). Do you think there's another angle we could use in this scenario ? |
@JCgH4164838Gh792C124B5 Cannot reproduce it. Parent and included pages have
Can you create a some simple app demonstrating this issue. |
Hello @aleksandr-m. |
6d16cb9
to
c7a1ea9
Compare
@JCgH4164838Gh792C124B5 Thanks. It helped. In my previous setup there was include inside include inside include and the top most was still with utf-8. :) So it is the parent jsp's encoding which matters. About the fix: Why can't we use response encoding each time to determine correct encoding? I.e. |
Hi @aleksandr-m, I'm glad the reproducer helped. :) The proposed fix attempts to preserve the current 2.5.x behaviour (using defaultEncoding all the time), to avoid accidentally impacting existing applications. Some apps might be inadvertently impacted if the current default behaviour was to change, so a flag to turn on "use response encoding" seemed reasonable. Even with the new option and the old default behaviour, there might be situations where there's a need to only change the behaviour of a single s:include (or a small number) of s:include tag encodings (with multiple encodings involved in source and display), so a tag-level override provides a facility to do that. The logic checks to allow the 2 new options, as well as preserve the old behaviour by default shouldn't be too expensive (and there should be a minor efficiency gain from the reduction in loop condition checks). Anyway, that was pretty much the thought process involved and where the idea for the two different options came from. Thanks for taking the time to look into and consider the proposed fix, and let me know if the above explanation seems reasonable. |
It is the encoding of response that matters. I don't see point in having different encoding in the
It is 2.6 we can break some things. :) |
Yeah... it would be better to target 2.6 (the |
Hello again. |
Ok, looks good to me. We can release one more version of 2.5.x :) |
@JCgH4164838Gh792C124B5 Would it be possible to include some parts of your reasoning into the commit comment? IMHO this would be the best place for later reference. @lukaszlenart Not sure how you handle this for the project. |
@sepe81 not sure what you mean by that? Right now we are supporting 2.3 and 2.5 branch and working on 2.6. Porting those changes into 2.6 is matter of cherry-picking those two commits. |
@lukaszlenart I don't mean the cherry-picking. With "include some parts of your reasoning into the commit comment" I intend to enrich the implicit documentation of each commit as to be found under 7. Use the body to explain what and why vs. how. |
Ach... you meant JCgH4164838Gh792C124B5's comment - it would be better to put it in the issue on JIRA but it can stay here as well. There is a direct reference in the issue back to this PR. |
Well, we can merge this into 2.5.x branch but that means additional work in 2.6, if we want to make it properly. Deprecation of constant and attribute in tag, etc. As I see it, it is just a bug that needs to be fixed. Using different encoding than in http response seems to be meaningless and leads to errors.
@JCgH4164838Gh792C124B5 Can you think of any real world scenarios where it is applicable? |
Hi @aleksandr-m and @lukaszlenart.
The above strategy might be sufficient to mitigate the concerns @aleksandr-m outlined, while still providing a relatively safe and flexible solution for both 2.5.x and 2.6 users. If you think it's reasonable then I will go ahead and attempt the steps outlined in 1) above. Please let me know what you think. |
@JCgH4164838Gh792C124B5 I think that is better. Maybe we should rename |
…retains existing behaviour by default, but provides a configuration flag users can set (to true) in order to enable usage of response (page) encoding for s:include tags. A WARN level log output is also generated for failed FastByteArrayOutputStream decoding (can be suppressed by log configuration).
… non-UTF8 content).
c7a1ea9
to
7bca982
Compare
Hello yet again. :) |
I think we are good to merge this, right? |
Fix WW-4971 (broken includes for non-UTF8 content).
osm! do you plan to port this branch into |
@lukaszlenart Already done. :) Cherry picked it to master and changed |
@aleksandr-m osm!!! |
Adds a config flag to make all s:include tags use response page encoding.
Adds an encoding override property to the s:include tag (flexibility).
Adds a warning log output to FastByteArrayBuffer when decoding has errors.
Fixes WW-4971