Skip to content

Conversation

@jwagenleitner
Copy link
Contributor

URLConnect.getContentEncoding returns the Content-Encoding
HTTP Header [1] which is not a charset. Since this method would
have either returned null or an invalid charset, the code path
specifying the encoding would normally not have been executed.
The charset may be contained in the Content-Type header, but
rather than attempt to parse that string which would require
closing the connection, this fix avoids opening the connection
and relies on the default charset.

[1] https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11

@jwagenleitner
Copy link
Contributor Author

The rationale behind not attempting to actually get the charset (if available) from the URLConnection is because it hasn't been an issue to date. So rather than introduce the complexity of parsing the Content-Type header it seems reasonable to defer until someone actually reports an issue. The original issue that introduced the code GROOVY-3934 was not an issue about charset.

@jwagenleitner jwagenleitner force-pushed the groovy8056-content-encoding branch from 29a641c to 3a9e082 Compare February 19, 2017 01:02
@jwagenleitner
Copy link
Contributor Author

Travis is failing on the JDK8 build which I think is unrelated to this PR and seems to have started recently when the gradle version was bumped to 3.3. Tests pass locally for me.

@jwagenleitner jwagenleitner force-pushed the groovy8056-content-encoding branch from ae3320e to f25f4c1 Compare February 19, 2017 07:01
@daniellansun
Copy link
Contributor

@jwagenleitner Java 8 requires more memory, we can request sudo to gain more system resource of travis-ci instance: c79ef79

URLConnect.getContentEncoding returns the Content-Encoding
HTTP Header [1] which is not a charset.  Since this method would
have either returned null or an invalid charset, the code path
specifying the encoding would normally not have been executed.
The charset may be contained in the Content-Type header, but
rather than attempt to parse that string which would require
closing the connection, this fix avoids opening the connection.

[1] https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11
@jwagenleitner jwagenleitner force-pushed the groovy8056-content-encoding branch from f25f4c1 to a0e3335 Compare February 20, 2017 15:53
@jwagenleitner
Copy link
Contributor Author

I rebased to pick up the latest travisci change.

@jwagenleitner
Copy link
Contributor Author

An example to show contentEncoding vs contentType

url = 'https://raw.githubusercontent.com/jwagenleitner/groovy/a0e3335ab1d46fbe7ca7a0391fdf864c12d15f01/src/test/groovy/util/GroovyScriptEngineReloadingTest.groovy'

uc = new URL(url).openConnection()
uc.requestMethod = 'GET'

assert uc.contentEncoding == null
assert uc.contentType.toLowerCase().contains('utf-8')

uc = new URL(url).openConnection()
uc.requestMethod = 'GET'
uc.setRequestProperty('accept-encoding', 'gzip')

assert uc.contentEncoding == 'gzip'
assert uc.contentType.toLowerCase().contains('utf-8')

@jwagenleitner
Copy link
Contributor Author

@blackdrag @paulk-asert if either of you have a chance I would be curious to know your opinion about this proposed change/fix.

@jwagenleitner
Copy link
Contributor Author

Closing this in favor of PR #557 just in case someone is relying on the old behavior.

@jwagenleitner jwagenleitner deleted the groovy8056-content-encoding branch June 2, 2018 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants