Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix for WW-4971 (broken includes for non-UTF8 content). #257

Merged

Conversation

JCgH4164838Gh792C124B5
Copy link
Contributor

@JCgH4164838Gh792C124B5 JCgH4164838Gh792C124B5 commented Oct 19, 2018

Adds a config flag to make all s:include tags use response page encoding.
Adds an encoding override property to the s:include tag (flexibility).
Adds a warning log output to FastByteArrayBuffer when decoding has errors.

Fixes WW-4971


try {
if (utf8_String == null) {
utf8_String = new String(utf16_String.getBytes("UTF-8"), "UTF-8");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use public String(byte bytes[], Charset charset) with java.nio.charset.StandardCharsets.

According to the Javadoc "These charsets are guaranteed to be available on every implementation of the Java platform."

Benefit: No need to handle UnsupportedEncodingException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, as using the static StandardCharsets would allow for immediate assignment in the initializer block. That would permit elimination of the setup() method.

@@ -41,12 +41,15 @@
/** The URL extension to use to determine if the request is meant for a Struts action */
public static final String STRUTS_ACTION_EXTENSION = "struts.action.extension";

/** Comma separated list of patterns (java.util.regex.Pattern) to be excluded from Struts2-processing */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to separate those formatting and "fix typo" changes into one separate commit?

This would simplify the review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "fix typo" changes were minimal enough that I figured it would be OK. It might be possible to separate those changes out, but I'm pretty new to Git. Presumably that would involve a new commit/push/pull request sequence (after trying to undo the formatting fixes) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following steps will do it.

# reset last commit
git reset HEAD~

# see your changes unstaged
git status

# partially add formatting and typo changes
# see for example https://johnkary.net/blog/git-add-p-the-most-powerful-git-feature-youre-not-using-yet/
git add -p .
git commit -m"Fix formatting and some typos"

# partially add main changes
git add -p .
git commit -m"WW-4971 Fix broken includes for non-UTF8 content"

# check that all wanted changes are included
git status

# log should contain two new commits
git log

# http://weiqingtoh.github.io/force-with-lease/
git push --force-with-lease

Copy link
Contributor

@sepe81 sepe81 Oct 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hint: If anything goes wrong or is unclear, don't do the push and instead repull from github.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote an instruction how to use GitHub, let me know if anything is unclear and I will be happy to improve it :)
https://struts.apache.org/submitting-patches.html#contributing-with-github

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the helpful Git instructions, @sepe81 and @lukaszlenart. The instructions on the Struts website seem to be pretty clear for "normal" contribution usage, and @sepe81 's instructions here seemed to work well for breaking up the changes into two sections (something well beyond my Git skills without such help).
I ended up doing them in the reverse order he suggested (1st commit was for the proposed fix only, 2nd commit overlays typo fixes on top of that one), so hopefully that will be OK. :) :)
Let me know if it worked out, if not I'll try again (this time around the 1st commit also includes some of the suggested changes from comments, hopefully it's a bit cleaner).

}
}

protected void tearDown() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for empty override of tearDown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tearDown() could be eliminated along with setup() in relation to your earlier comments.



protected void setUp() throws Exception {
super.setUp();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for super call b/c impl in abstract super class is emtpy


protected String value;
private HttpServletRequest req;
private HttpServletResponse res;
private static String defaultEncoding;
private String defaultEncoding; // Made non-static (during WW-4971 fix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the comments shouldn't contain to many of those ticket references. Therefore we have the git history.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few spots in the source that refer to ticket references, After seeing that, it seemed like a good idea to identify the ticket reference for the change via comment. If the Git history is considered sufficient/preferred those comment references could be eliminated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good practice and if you feel it is needed that's ok - but we can also depend on git history - your choice :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good practice and if you feel it is needed that's ok - but we can also depend on git history - your choice :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good practice and if you feel it is needed that's ok - but we can also depend on git history - your choice :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good practice and if you feel it is needed that's ok - but we can also depend on git history - your choice :)

@coveralls
Copy link

coveralls commented Oct 19, 2018

Coverage Status

Coverage increased (+0.02%) to 46.841% when pulling 7bca982 on JCgH4164838Gh792C124B5:localS2_2_5_x_Branch into f4d7dbe on apache:struts-2-5-x.


public Include(ValueStack stack, HttpServletRequest req, HttpServletResponse res) {
super(stack);
this.req = req;
this.res = res;
defaultEncoding = "UTF-8"; // Set UTF-8 for defaultEncoding, when not set in configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to override defaultEncoding in constructor? It will be set in setDefaultEncoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a case of being extra defensive (in case setDefaultEncoding() injection parameter was ever changed to optional). It's probably unlikely to occur, in which case that assignment in the constructor could be eliminated.

@aleksandr-m
Copy link
Contributor

@JCgH4164838Gh792C124B5 I've run your tests in old version of Struts and all of them succeed. Can you include some test that fails with current version?

@JCgH4164838Gh792C124B5
Copy link
Contributor Author

@JCgH4164838Gh792C124B5 I've run your tests in old version of Struts and all of them succeed. Can you include some test that fails with current version?

Hello @aleksandr-m. Unfortunately the only way I could find to demonstrate the failure was interactively (with a parent page in ISO-8859-1 using s:include on a child page with ISO-8859-1 high-ASCII accented characters). The unit tests that were added were primarily to demonstrate the proposed changes to Include and FastByteArrayBuffer don't break any existing functionality.

The FastByteArrayBufferTest.testISO8859_1WriteToUTF8 unit test demonstrates the truncation failure that can occur when input is considered "isMalformed" by the decoder, and causes the log warning to be produced now in that circumstance. Although the test will pass in older Struts 2.5.x or 2.3.x versions the failure is "silent" in the older versions (can only be noticed by noting the result is truncated visually in the case of an s:include for instance).

Unfortunately I couldn't figure out a mock-test that would pass with the proposed fix, but fail in previous versions. However I was able to demonstrate it interactively (truncation on older 2.5.x, but success using the appropriate encoding override with the fixed version). Do you think there's another angle we could use in this scenario ?

@aleksandr-m
Copy link
Contributor

aleksandr-m commented Oct 20, 2018

@JCgH4164838Gh792C124B5 Cannot reproduce it. Parent and included pages have <%@ page contentType="text/html; charset=ISO-8859-1"%> and content of included page which is below is fully rendered in resulting page.

aaa
ççç
zzz

Can you create a some simple app demonstrating this issue.

@JCgH4164838Gh792C124B5
Copy link
Contributor Author

Hello @aleksandr-m.
I've attempted to create a simple reproducer app (WAR) demonstrating this issue using Struts 2.5.18. For some reason Github doesn't like it as an attachment (even with a .zip extension), so the reproducer was added to the WW-4791 Jira issue instead.
In my case the reproducer demonstrates the issue in this environment: Windows 10 (64-bit), Java 1.7.0_79 (32-bit), Tomcat 7.0.91. Hopefully it will demonstrate the issue on other environments as well, please let me know how it behaves for you.

@aleksandr-m
Copy link
Contributor

@JCgH4164838Gh792C124B5 Thanks. It helped. In my previous setup there was include inside include inside include and the top most was still with utf-8. :) So it is the parent jsp's encoding which matters.

About the fix: Why can't we use response encoding each time to determine correct encoding? I.e. res.getCharacterEncoding() and fall back to defaultEncoding / systemEncoding.

@JCgH4164838Gh792C124B5
Copy link
Contributor Author

Hi @aleksandr-m, I'm glad the reproducer helped. :)

The proposed fix attempts to preserve the current 2.5.x behaviour (using defaultEncoding all the time), to avoid accidentally impacting existing applications. Some apps might be inadvertently impacted if the current default behaviour was to change, so a flag to turn on "use response encoding" seemed reasonable. Even with the new option and the old default behaviour, there might be situations where there's a need to only change the behaviour of a single s:include (or a small number) of s:include tag encodings (with multiple encodings involved in source and display), so a tag-level override provides a facility to do that. The logic checks to allow the 2 new options, as well as preserve the old behaviour by default shouldn't be too expensive (and there should be a minor efficiency gain from the reduction in loop condition checks).

Anyway, that was pretty much the thought process involved and where the idea for the two different options came from. Thanks for taking the time to look into and consider the proposed fix, and let me know if the above explanation seems reasonable.

@aleksandr-m
Copy link
Contributor

It is the encoding of response that matters. I don't see point in having different encoding in the s:include. Is there any?

The proposed fix attempts to preserve the current 2.5.x behaviour

It is 2.6 we can break some things. :)

@lukaszlenart
Copy link
Member

Yeah... it would be better to target 2.6 (the master branch) instead of 2.5.x

@JCgH4164838Gh792C124B5
Copy link
Contributor Author

Hello again.
Having an override available in the s:include tag provides targeted flexibility (sometimes you can encounter weird encoding/decoding scenarios with source/destination and a localized override can get things to "just work" for such outlier cases).
As far as 2.5.x vs. 2.6 consideration, applications using 2.5.x won't necessarily be able to move to 2.6.x right away due to various factors. Since the current (non-patched) behaviour of 2.5.18 (and I believe 2.3.36) for s:include isn't working in the identified scenario I think there's value in fixing it in 2.5.x at least. :)
The proposed s:include fix (or variant thereof) could be applied to 2.5.x (2.5.19) and either forward-ported to 2.6 master as-is or as a "stripped-down" version (apply response encoding if available/valid, then fallback to default encoding, then fallback to system encoding).
Thanks for your consideration.

@lukaszlenart
Copy link
Member

Ok, looks good to me. We can release one more version of 2.5.x :)

@sepe81
Copy link
Contributor

sepe81 commented Oct 23, 2018

@JCgH4164838Gh792C124B5 Would it be possible to include some parts of your reasoning into the commit comment? IMHO this would be the best place for later reference.

@lukaszlenart Not sure how you handle this for the project.

@lukaszlenart
Copy link
Member

@sepe81 not sure what you mean by that? Right now we are supporting 2.3 and 2.5 branch and working on 2.6. Porting those changes into 2.6 is matter of cherry-picking those two commits.

@sepe81
Copy link
Contributor

sepe81 commented Oct 23, 2018

@lukaszlenart I don't mean the cherry-picking. With "include some parts of your reasoning into the commit comment" I intend to enrich the implicit documentation of each commit as to be found under 7. Use the body to explain what and why vs. how.

@lukaszlenart
Copy link
Member

Ach... you meant JCgH4164838Gh792C124B5's comment - it would be better to put it in the issue on JIRA but it can stay here as well. There is a direct reference in the issue back to this PR.

https://issues.apache.org/jira/browse/WW-4971

@aleksandr-m
Copy link
Contributor

Well, we can merge this into 2.5.x branch but that means additional work in 2.6, if we want to make it properly. Deprecation of constant and attribute in tag, etc.

As I see it, it is just a bug that needs to be fixed. Using different encoding than in http response seems to be meaningless and leads to errors.

sometimes you can encounter weird encoding/decoding scenarios with source/destination and a localized override can get things to "just work" for such outlier cases

@JCgH4164838Gh792C124B5 Can you think of any real world scenarios where it is applicable?

@JCgH4164838Gh792C124B5
Copy link
Contributor Author

Hi @aleksandr-m and @lukaszlenart.
The only definite "real-world" scenario I can provide would be isolating change (only applying an override to single problematic s:include minimizes change risk for existing apps to a single JSP). However since there are concerns about the value/utility of that feature, it could certainly be omitted.
Taking into consideration your comments about how having the tag-based override making things more complicated for inclusion in 2.6, I have a suggestion for a possible way forward:

  1. I could attempt another 2-part update (logic change, then typo cleanup) that drops the tag override and only implements the configuration-setting option (usePageEncoding) fix for 2.5.x. As before, it would be false by default (opt-in) to avoid inadvertently impacting/breaking existing 2.5.x apps.
  2. The logic in 1), if accepted, could be pulled into 2.6 and then updated to invert the usePageEncoding initial state (i.e. have usePageEncoding set true by default, but allow a user to revert to defaultEncoding by setting the flag to false in struts.xml configuration).

The above strategy might be sufficient to mitigate the concerns @aleksandr-m outlined, while still providing a relatively safe and flexible solution for both 2.5.x and 2.6 users. If you think it's reasonable then I will go ahead and attempt the steps outlined in 1) above. Please let me know what you think.

@aleksandr-m
Copy link
Contributor

@JCgH4164838Gh792C124B5 I think that is better. Maybe we should rename usePageEncoding to useResponseEncoding. Wdyt?

…retains existing behaviour by default, but provides a configuration flag users can set (to true) in order to enable usage of response (page) encoding for s:include tags. A WARN level log output is also generated for failed FastByteArrayOutputStream decoding (can be suppressed by log configuration).
@JCgH4164838Gh792C124B5
Copy link
Contributor Author

Hello yet again. :)
I just attempted another 2-part commit to this branch/pull request (as per the most recent comments above). The 1st part is just the logic/fix changes, the 2nd part overlays typo and whitespace fixes (same pattern as the previous 2-part commit).
The latest revision incorporates the changes @aleksandr-m recommended, including renaming usePageEncoding to useResponseEncoding. Crossing-fingers that this latest one is clean enough for consideration - let me know if it looks OK. :) :)

@lukaszlenart
Copy link
Member

I think we are good to merge this, right?

@aleksandr-m aleksandr-m merged commit 8293add into apache:struts-2-5-x Nov 1, 2018
aleksandr-m added a commit that referenced this pull request Nov 1, 2018
@lukaszlenart
Copy link
Member

osm! do you plan to port this branch into master to support the same in 2.6?

@aleksandr-m
Copy link
Contributor

@lukaszlenart Already done. :) Cherry picked it to master and changed useResponseEncoding in Include to true.

@lukaszlenart
Copy link
Member

@aleksandr-m osm!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants