Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When :body is string it is encoded using platform default encoding #18

Closed
lread opened this issue May 23, 2022 · 2 comments · Fixed by #25
Closed

When :body is string it is encoded using platform default encoding #18

lread opened this issue May 23, 2022 · 2 comments · Fixed by #25

Comments

@lread
Copy link
Collaborator

lread commented May 23, 2022

Symptom

Clj-http-lite was passing for me on all platforms except Windows.

Diagnosis

I captured HTTP traffic with RawCap and looked at it with WireShark.
I learned that clj-http-lite is not converting my request body to UTF-8 on Windows.

I took a peek a the code and see

(client (-> req (assoc :body (.getBytes ^String body)

This will encode the body to the platform's default character set.
I think the JDK is moving to UTF-8 for all platforms?
But I'm still seeing a default character set of windows-1252 under JDK11.

C:\Users\lee>java --version
openjdk 11.0.15 2022-04-19
OpenJDK Runtime Environment Temurin-11.0.15+10 (build 11.0.15+10)
OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (build 11.0.15+10, mixed mode)

C:\Users\lee>clojure
Clojure 1.10.3
user=> (.displayName (java.nio.charset.Charset/defaultCharset))
"windows-1252"

Workaround

Clj-http-lite will pass through the body if it is not a String.

So you can do the conversion to UTF-8 yourself, I am using something similar to this:

(assoc params :body (.getBytes (json/generate-string (or payload {}))
                               "UTF-8"))
lread added a commit to lread/etaoin that referenced this issue May 23, 2022
When running babashka, we now use the compatible http-client-lite.
(Thanks borkdude your etaoin pod work was a very useful reference!)
(I worked around an issue in http-client-lite when running on Windows.
See clj-commons/clj-http-lite#18)

Reader conditionals select the existing http-client when running on from
the JVM.

Reader conditionals mean .cljc, which often means Clojure and
ClojureScript, but in our case it means JVM Clojure and Babashka
Clojure. Update configs for cljdoc and clj-kondo to let them know
that this is not a ClojureScript project.

Facility to test our bb implementation added via a new --platform option
on our `bb test` task. I'm not sure of the best way to run babashka
tests, but my first stab generates a runner.clj with appropriate
namespaces and classpath then runs that.

Etaoin does have some debug logging, so Babashka's log level is adjusted
from the default of debug to info in the generated test runner.

Finally adjusted our CI matrix to include the bb platform.

Contributes to clj-commons#380
lread added a commit to lread/etaoin that referenced this issue May 23, 2022
When running babashka, we now use the compatible http-client-lite.
- Thanks borkdude, your etaoin pod work was a very useful reference!
- I worked around an issue in http-client-lite when running on Windows.
See clj-commons/clj-http-lite#18.

Reader conditionals select the existing http-client when running on from
the JVM.

Reader conditionals mean .cljc, which often means Clojure and
ClojureScript, but in our case it means JVM Clojure and Babashka
Clojure. Update configs for cljdoc and clj-kondo to let them know
that this is not a ClojureScript project.

Facility to test our bb implementation added via a new --platform option
on our `bb test` task. I'm not sure of the best way to run babashka
tests, but my first stab generates a runner.clj with appropriate
namespaces and classpath then runs that.

Etaoin does have some debug logging, so Babashka's log level is adjusted
from the default of debug to info in the generated test runner.

Finally adjusted our CI matrix to include the bb platform.

Contributes to clj-commons#380
@borkdude
Copy link
Collaborator

borkdude commented Jun 1, 2022

PR welcome

@lread
Copy link
Collaborator Author

lread commented Aug 11, 2022

Started to look at this one.

Took me a while to figure out that lein seems to be overriding the default character encoding to UTF-8 on Windows:

>lein repl
nREPL server started on port 51498 on host 127.0.0.1 - nrepl://127.0.0.1:51498
REPL-y 0.5.1, nREPL 0.9.0
Clojure 1.11.1
OpenJDK 64-Bit Client VM 11.0.15+10
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

user=> (.displayName (java.nio.charset.Charset/defaultCharset))
"UTF-8"

Clojure cli (also tested with deps.exe) does not do this on Windows:

>clojure
Clojure 1.11.0
user=> (.displayName (java.nio.charset.Charset/defaultCharset))
"windows-1252"

For this reason I'll likely flesh out the existing deps.edn and switch to running our tests from Clojure cli.

lread added a commit to lread/clj-http-lite that referenced this issue Aug 13, 2022
Was previously always encoding to system default character encoding,
which is UTF-8 unless on Windows then it is windows-1252.

Now:
- encoding body to requested body encoding
- and defaulting to UTF-8 if no body encoding requested

Also:
- Added extended characters to http bodies in tests to exercise
body encoding/decoding
- Verified :auto body decoding with an extra test

Note that :auto body decoding might not be entirely correct.
It will only look at content-type charset if content-type starts with
"text/". Diagnosing/fixing this is out of scope for this PR.

Fixes clj-commons#18
@lread lread closed this as completed in #25 Aug 14, 2022
lread added a commit that referenced this issue Aug 14, 2022
* Body encoding now always defaults to UTF-8

Was previously always encoding to system default character encoding,
which is UTF-8 unless on Windows then it is windows-1252.

Now:
- encoding body to requested body encoding
- and defaulting to UTF-8 if no body encoding requested

Also:
- Added extended characters to http bodies in tests to exercise
body encoding/decoding
- Verified :auto body decoding with an extra test

Note that :auto body decoding might not be entirely correct.
It will only look at content-type charset if content-type starts with
"text/". Diagnosing/fixing this is out of scope for this PR.

Fixes #18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants