Proposed fix for WW-4971 (broken includes for non-UTF8 content). #257

JCgH4164838Gh792C124B5 · 2018-10-19T06:47:55Z

Adds a config flag to make all s:include tags use response page encoding.
Adds an encoding override property to the s:include tag (flexibility).
Adds a warning log output to FastByteArrayBuffer when decoding has errors.

Fixes WW-4971

sepe81 · 2018-10-19T07:07:45Z

core/src/test/java/org/apache/struts2/util/FastByteArrayBufferTest.java

+
+        try {
+            if (utf8_String == null) {
+                utf8_String = new String(utf16_String.getBytes("UTF-8"), "UTF-8");


You could use public String(byte bytes[], Charset charset) with java.nio.charset.StandardCharsets.

According to the Javadoc "These charsets are guaranteed to be available on every implementation of the Java platform."

Benefit: No need to handle UnsupportedEncodingException.

Fair point, as using the static StandardCharsets would allow for immediate assignment in the initializer block. That would permit elimination of the setup() method.

sepe81 · 2018-10-19T07:09:52Z

core/src/main/java/org/apache/struts2/StrutsConstants.java

@@ -41,12 +41,15 @@
    /** The URL extension to use to determine if the request is meant for a Struts action */
    public static final String STRUTS_ACTION_EXTENSION = "struts.action.extension";

-	/** Comma separated list of patterns (java.util.regex.Pattern) to be excluded from Struts2-processing */


Would it be possible to separate those formatting and "fix typo" changes into one separate commit?

This would simplify the review.

The "fix typo" changes were minimal enough that I figured it would be OK. It might be possible to separate those changes out, but I'm pretty new to Git. Presumably that would involve a new commit/push/pull request sequence (after trying to undo the formatting fixes) ?

The following steps will do it.

# reset last commit git reset HEAD~ # see your changes unstaged git status # partially add formatting and typo changes # see for example https://johnkary.net/blog/git-add-p-the-most-powerful-git-feature-youre-not-using-yet/ git add -p . git commit -m"Fix formatting and some typos" # partially add main changes git add -p . git commit -m"WW-4971 Fix broken includes for non-UTF8 content" # check that all wanted changes are included git status # log should contain two new commits git log # http://weiqingtoh.github.io/force-with-lease/ git push --force-with-lease

Hint: If anything goes wrong or is unclear, don't do the push and instead repull from github.

I wrote an instruction how to use GitHub, let me know if anything is unclear and I will be happy to improve it :)
https://struts.apache.org/submitting-patches.html#contributing-with-github

Thank you for the helpful Git instructions, @sepe81 and @lukaszlenart. The instructions on the Struts website seem to be pretty clear for "normal" contribution usage, and @sepe81 's instructions here seemed to work well for breaking up the changes into two sections (something well beyond my Git skills without such help).
I ended up doing them in the reverse order he suggested (1st commit was for the proposed fix only, 2nd commit overlays typo fixes on top of that one), so hopefully that will be OK. :) :)
Let me know if it worked out, if not I'll try again (this time around the 1st commit also includes some of the suggested changes from comments, hopefully it's a bit cleaner).

sepe81 · 2018-10-19T07:12:02Z

core/src/test/java/org/apache/struts2/util/FastByteArrayBufferTest.java

+        }
+    }
+
+    protected void tearDown() throws Exception {


no need for empty override of tearDown

The tearDown() could be eliminated along with setup() in relation to your earlier comments.

sepe81 · 2018-10-19T07:13:31Z

core/src/test/java/org/apache/struts2/util/FastByteArrayBufferTest.java

+
+
+    protected void setUp() throws Exception {
+        super.setUp();


no need for super call b/c impl in abstract super class is emtpy

sepe81 · 2018-10-19T07:15:56Z

core/src/main/java/org/apache/struts2/components/Include.java


    protected String value;
    private HttpServletRequest req;
    private HttpServletResponse res;
-    private static String defaultEncoding;
+    private String defaultEncoding;   // Made non-static (during WW-4971 fix)


IMO the comments shouldn't contain to many of those ticket references. Therefore we have the git history.

I think there are a few spots in the source that refer to ticket references, After seeing that, it seemed like a good idea to identify the ticket reference for the change via comment. If the Git history is considered sufficient/preferred those comment references could be eliminated.

Yes, this is a good practice and if you feel it is needed that's ok - but we can also depend on git history - your choice :)

coveralls · 2018-10-19T07:43:17Z

Coverage increased (+0.02%) to 46.841% when pulling 7bca982 on JCgH4164838Gh792C124B5:localS2_2_5_x_Branch into f4d7dbe on apache:struts-2-5-x.

aleksandr-m · 2018-10-19T19:48:58Z

core/src/main/java/org/apache/struts2/components/Include.java


    public Include(ValueStack stack, HttpServletRequest req, HttpServletResponse res) {
        super(stack);
        this.req = req;
        this.res = res;
+        defaultEncoding = "UTF-8";  // Set UTF-8 for defaultEncoding, when not set in configuration


Why to override defaultEncoding in constructor? It will be set in setDefaultEncoding.

Just a case of being extra defensive (in case setDefaultEncoding() injection parameter was ever changed to optional). It's probably unlikely to occur, in which case that assignment in the constructor could be eliminated.

aleksandr-m · 2018-10-19T19:59:35Z

@JCgH4164838Gh792C124B5 I've run your tests in old version of Struts and all of them succeed. Can you include some test that fails with current version?

JCgH4164838Gh792C124B5 · 2018-10-20T04:39:41Z

@JCgH4164838Gh792C124B5 I've run your tests in old version of Struts and all of them succeed. Can you include some test that fails with current version?

Hello @aleksandr-m. Unfortunately the only way I could find to demonstrate the failure was interactively (with a parent page in ISO-8859-1 using s:include on a child page with ISO-8859-1 high-ASCII accented characters). The unit tests that were added were primarily to demonstrate the proposed changes to Include and FastByteArrayBuffer don't break any existing functionality.

The FastByteArrayBufferTest.testISO8859_1WriteToUTF8 unit test demonstrates the truncation failure that can occur when input is considered "isMalformed" by the decoder, and causes the log warning to be produced now in that circumstance. Although the test will pass in older Struts 2.5.x or 2.3.x versions the failure is "silent" in the older versions (can only be noticed by noting the result is truncated visually in the case of an s:include for instance).

Unfortunately I couldn't figure out a mock-test that would pass with the proposed fix, but fail in previous versions. However I was able to demonstrate it interactively (truncation on older 2.5.x, but success using the appropriate encoding override with the fixed version). Do you think there's another angle we could use in this scenario ?

aleksandr-m · 2018-10-20T10:24:25Z

@JCgH4164838Gh792C124B5 Cannot reproduce it. Parent and included pages have <%@ page contentType="text/html; charset=ISO-8859-1"%> and content of included page which is below is fully rendered in resulting page.

aaa
ççç
zzz

Can you create a some simple app demonstrating this issue.

JCgH4164838Gh792C124B5 · 2018-10-20T16:36:37Z

Hello @aleksandr-m.
I've attempted to create a simple reproducer app (WAR) demonstrating this issue using Struts 2.5.18. For some reason Github doesn't like it as an attachment (even with a .zip extension), so the reproducer was added to the WW-4791 Jira issue instead.
In my case the reproducer demonstrates the issue in this environment: Windows 10 (64-bit), Java 1.7.0_79 (32-bit), Tomcat 7.0.91. Hopefully it will demonstrate the issue on other environments as well, please let me know how it behaves for you.

aleksandr-m · 2018-10-21T17:21:09Z

@JCgH4164838Gh792C124B5 Thanks. It helped. In my previous setup there was include inside include inside include and the top most was still with utf-8. :) So it is the parent jsp's encoding which matters.

About the fix: Why can't we use response encoding each time to determine correct encoding? I.e. res.getCharacterEncoding() and fall back to defaultEncoding / systemEncoding.

JCgH4164838Gh792C124B5 · 2018-10-21T22:27:20Z

Hi @aleksandr-m, I'm glad the reproducer helped. :)

The proposed fix attempts to preserve the current 2.5.x behaviour (using defaultEncoding all the time), to avoid accidentally impacting existing applications. Some apps might be inadvertently impacted if the current default behaviour was to change, so a flag to turn on "use response encoding" seemed reasonable. Even with the new option and the old default behaviour, there might be situations where there's a need to only change the behaviour of a single s:include (or a small number) of s:include tag encodings (with multiple encodings involved in source and display), so a tag-level override provides a facility to do that. The logic checks to allow the 2 new options, as well as preserve the old behaviour by default shouldn't be too expensive (and there should be a minor efficiency gain from the reduction in loop condition checks).

Anyway, that was pretty much the thought process involved and where the idea for the two different options came from. Thanks for taking the time to look into and consider the proposed fix, and let me know if the above explanation seems reasonable.

aleksandr-m · 2018-10-22T18:46:58Z

It is the encoding of response that matters. I don't see point in having different encoding in the s:include. Is there any?

The proposed fix attempts to preserve the current 2.5.x behaviour

It is 2.6 we can break some things. :)

lukaszlenart · 2018-10-22T18:53:29Z

Yeah... it would be better to target 2.6 (the master branch) instead of 2.5.x

JCgH4164838Gh792C124B5 · 2018-10-23T03:16:42Z

Hello again.
Having an override available in the s:include tag provides targeted flexibility (sometimes you can encounter weird encoding/decoding scenarios with source/destination and a localized override can get things to "just work" for such outlier cases).
As far as 2.5.x vs. 2.6 consideration, applications using 2.5.x won't necessarily be able to move to 2.6.x right away due to various factors. Since the current (non-patched) behaviour of 2.5.18 (and I believe 2.3.36) for s:include isn't working in the identified scenario I think there's value in fixing it in 2.5.x at least. :)
The proposed s:include fix (or variant thereof) could be applied to 2.5.x (2.5.19) and either forward-ported to 2.6 master as-is or as a "stripped-down" version (apply response encoding if available/valid, then fallback to default encoding, then fallback to system encoding).
Thanks for your consideration.

lukaszlenart · 2018-10-23T06:48:02Z

Ok, looks good to me. We can release one more version of 2.5.x :)

sepe81 · 2018-10-23T07:16:04Z

@JCgH4164838Gh792C124B5 Would it be possible to include some parts of your reasoning into the commit comment? IMHO this would be the best place for later reference.

@lukaszlenart Not sure how you handle this for the project.

lukaszlenart · 2018-10-23T07:22:31Z

@sepe81 not sure what you mean by that? Right now we are supporting 2.3 and 2.5 branch and working on 2.6. Porting those changes into 2.6 is matter of cherry-picking those two commits.

sepe81 · 2018-10-23T07:28:37Z

@lukaszlenart I don't mean the cherry-picking. With "include some parts of your reasoning into the commit comment" I intend to enrich the implicit documentation of each commit as to be found under 7. Use the body to explain what and why vs. how.

lukaszlenart · 2018-10-23T07:34:19Z

Ach... you meant JCgH4164838Gh792C124B5's comment - it would be better to put it in the issue on JIRA but it can stay here as well. There is a direct reference in the issue back to this PR.

https://issues.apache.org/jira/browse/WW-4971

aleksandr-m · 2018-10-23T19:09:31Z

Well, we can merge this into 2.5.x branch but that means additional work in 2.6, if we want to make it properly. Deprecation of constant and attribute in tag, etc.

As I see it, it is just a bug that needs to be fixed. Using different encoding than in http response seems to be meaningless and leads to errors.

sometimes you can encounter weird encoding/decoding scenarios with source/destination and a localized override can get things to "just work" for such outlier cases

@JCgH4164838Gh792C124B5 Can you think of any real world scenarios where it is applicable?

JCgH4164838Gh792C124B5 · 2018-10-24T01:04:03Z

Hi @aleksandr-m and @lukaszlenart.
The only definite "real-world" scenario I can provide would be isolating change (only applying an override to single problematic s:include minimizes change risk for existing apps to a single JSP). However since there are concerns about the value/utility of that feature, it could certainly be omitted.
Taking into consideration your comments about how having the tag-based override making things more complicated for inclusion in 2.6, I have a suggestion for a possible way forward:

I could attempt another 2-part update (logic change, then typo cleanup) that drops the tag override and only implements the configuration-setting option (usePageEncoding) fix for 2.5.x. As before, it would be false by default (opt-in) to avoid inadvertently impacting/breaking existing 2.5.x apps.
The logic in 1), if accepted, could be pulled into 2.6 and then updated to invert the usePageEncoding initial state (i.e. have usePageEncoding set true by default, but allow a user to revert to defaultEncoding by setting the flag to false in struts.xml configuration).

The above strategy might be sufficient to mitigate the concerns @aleksandr-m outlined, while still providing a relatively safe and flexible solution for both 2.5.x and 2.6 users. If you think it's reasonable then I will go ahead and attempt the steps outlined in 1) above. Please let me know what you think.

aleksandr-m · 2018-10-24T19:16:26Z

@JCgH4164838Gh792C124B5 I think that is better. Maybe we should rename usePageEncoding to useResponseEncoding. Wdyt?

…retains existing behaviour by default, but provides a configuration flag users can set (to true) in order to enable usage of response (page) encoding for s:include tags. A WARN level log output is also generated for failed FastByteArrayOutputStream decoding (can be suppressed by log configuration).

… non-UTF8 content).

JCgH4164838Gh792C124B5 · 2018-10-25T04:25:47Z

Hello yet again. :)
I just attempted another 2-part commit to this branch/pull request (as per the most recent comments above). The 1st part is just the logic/fix changes, the 2nd part overlays typo and whitespace fixes (same pattern as the previous 2-part commit).
The latest revision incorporates the changes @aleksandr-m recommended, including renaming usePageEncoding to useResponseEncoding. Crossing-fingers that this latest one is clean enough for consideration - let me know if it looks OK. :) :)

lukaszlenart · 2018-10-31T12:09:34Z

I think we are good to merge this, right?

Fix WW-4971 (broken includes for non-UTF8 content).

lukaszlenart · 2018-11-02T07:22:26Z

osm! do you plan to port this branch into master to support the same in 2.6?

aleksandr-m · 2018-11-02T20:16:06Z

@lukaszlenart Already done. :) Cherry picked it to master and changed useResponseEncoding in Include to true.

lukaszlenart · 2018-11-03T09:40:49Z

@aleksandr-m osm!!!

sepe81 suggested changes Oct 19, 2018

View reviewed changes

aleksandr-m reviewed Oct 19, 2018

View reviewed changes

JCgH4164838Gh792C124B5 force-pushed the localS2_2_5_x_Branch branch from 6d16cb9 to c7a1ea9 Compare October 20, 2018 19:19

sepe81 approved these changes Oct 23, 2018

View reviewed changes

JCgH4164838Gh792C124B5 added 2 commits October 24, 2018 23:41

Fix typos and whitespace (on top of WW-4971 Fix broken s:includes for…

7bca982

… non-UTF8 content).

JCgH4164838Gh792C124B5 force-pushed the localS2_2_5_x_Branch branch from c7a1ea9 to 7bca982 Compare October 25, 2018 04:15

aleksandr-m merged commit 8293add into apache:struts-2-5-x Nov 1, 2018

aleksandr-m added a commit that referenced this pull request Nov 1, 2018

Merge pull request #257 from JCgH4164838Gh792C124B5/localS2_2_5_x_Branch

9fa53dd

Fix WW-4971 (broken includes for non-UTF8 content).

Proposed fix for WW-4971 (broken includes for non-UTF8 content). #257

Proposed fix for WW-4971 (broken includes for non-UTF8 content). #257

Conversation

JCgH4164838Gh792C124B5 commented Oct 19, 2018 • edited by lukaszlenart Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sepe81 Oct 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Oct 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aleksandr-m commented Oct 19, 2018

JCgH4164838Gh792C124B5 commented Oct 20, 2018

aleksandr-m commented Oct 20, 2018 • edited Loading

JCgH4164838Gh792C124B5 commented Oct 20, 2018

aleksandr-m commented Oct 21, 2018

JCgH4164838Gh792C124B5 commented Oct 21, 2018

aleksandr-m commented Oct 22, 2018

lukaszlenart commented Oct 22, 2018

JCgH4164838Gh792C124B5 commented Oct 23, 2018

lukaszlenart commented Oct 23, 2018

sepe81 commented Oct 23, 2018

lukaszlenart commented Oct 23, 2018

sepe81 commented Oct 23, 2018

lukaszlenart commented Oct 23, 2018

aleksandr-m commented Oct 23, 2018

JCgH4164838Gh792C124B5 commented Oct 24, 2018

aleksandr-m commented Oct 24, 2018

JCgH4164838Gh792C124B5 commented Oct 25, 2018

lukaszlenart commented Oct 31, 2018

lukaszlenart commented Nov 2, 2018

aleksandr-m commented Nov 2, 2018

lukaszlenart commented Nov 3, 2018

JCgH4164838Gh792C124B5 commented Oct 19, 2018 •

edited by lukaszlenart

Loading

sepe81 Oct 20, 2018 •

edited

Loading

coveralls commented Oct 19, 2018 •

edited

Loading

aleksandr-m commented Oct 20, 2018 •

edited

Loading