When :body is string it is encoded using platform default encoding #18

lread · 2022-05-23T20:35:59Z

Symptom

Clj-http-lite was passing for me on all platforms except Windows.

Diagnosis

I captured HTTP traffic with RawCap and looked at it with WireShark.
I learned that clj-http-lite is not converting my request body to UTF-8 on Windows.

I took a peek a the code and see

clj-http-lite/src/clj_http/lite/client.clj

Line 110 in 6b53000

(client (-> req (assoc :body (.getBytes ^String body)

This will encode the body to the platform's default character set.
I think the JDK is moving to UTF-8 for all platforms?
But I'm still seeing a default character set of windows-1252 under JDK11.

C:\Users\lee>java --version
openjdk 11.0.15 2022-04-19
OpenJDK Runtime Environment Temurin-11.0.15+10 (build 11.0.15+10)
OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (build 11.0.15+10, mixed mode)

C:\Users\lee>clojure
Clojure 1.10.3
user=> (.displayName (java.nio.charset.Charset/defaultCharset))
"windows-1252"

Workaround

Clj-http-lite will pass through the body if it is not a String.

So you can do the conversion to UTF-8 yourself, I am using something similar to this:

(assoc params :body (.getBytes (json/generate-string (or payload {}))
                               "UTF-8"))

The text was updated successfully, but these errors were encountered:

When running babashka, we now use the compatible http-client-lite. (Thanks borkdude your etaoin pod work was a very useful reference!) (I worked around an issue in http-client-lite when running on Windows. See clj-commons/clj-http-lite#18) Reader conditionals select the existing http-client when running on from the JVM. Reader conditionals mean .cljc, which often means Clojure and ClojureScript, but in our case it means JVM Clojure and Babashka Clojure. Update configs for cljdoc and clj-kondo to let them know that this is not a ClojureScript project. Facility to test our bb implementation added via a new --platform option on our `bb test` task. I'm not sure of the best way to run babashka tests, but my first stab generates a runner.clj with appropriate namespaces and classpath then runs that. Etaoin does have some debug logging, so Babashka's log level is adjusted from the default of debug to info in the generated test runner. Finally adjusted our CI matrix to include the bb platform. Contributes to clj-commons#380

When running babashka, we now use the compatible http-client-lite. - Thanks borkdude, your etaoin pod work was a very useful reference! - I worked around an issue in http-client-lite when running on Windows. See clj-commons/clj-http-lite#18. Reader conditionals select the existing http-client when running on from the JVM. Reader conditionals mean .cljc, which often means Clojure and ClojureScript, but in our case it means JVM Clojure and Babashka Clojure. Update configs for cljdoc and clj-kondo to let them know that this is not a ClojureScript project. Facility to test our bb implementation added via a new --platform option on our `bb test` task. I'm not sure of the best way to run babashka tests, but my first stab generates a runner.clj with appropriate namespaces and classpath then runs that. Etaoin does have some debug logging, so Babashka's log level is adjusted from the default of debug to info in the generated test runner. Finally adjusted our CI matrix to include the bb platform. Contributes to clj-commons#380

borkdude · 2022-06-01T08:03:40Z

PR welcome

lread · 2022-08-11T22:37:05Z

Started to look at this one.

Took me a while to figure out that lein seems to be overriding the default character encoding to UTF-8 on Windows:

>lein repl
nREPL server started on port 51498 on host 127.0.0.1 - nrepl://127.0.0.1:51498
REPL-y 0.5.1, nREPL 0.9.0
Clojure 1.11.1
OpenJDK 64-Bit Client VM 11.0.15+10
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

user=> (.displayName (java.nio.charset.Charset/defaultCharset))
"UTF-8"

Clojure cli (also tested with deps.exe) does not do this on Windows:

>clojure
Clojure 1.11.0
user=> (.displayName (java.nio.charset.Charset/defaultCharset))
"windows-1252"

For this reason I'll likely flesh out the existing deps.edn and switch to running our tests from Clojure cli.

Was previously always encoding to system default character encoding, which is UTF-8 unless on Windows then it is windows-1252. Now: - encoding body to requested body encoding - and defaulting to UTF-8 if no body encoding requested Also: - Added extended characters to http bodies in tests to exercise body encoding/decoding - Verified :auto body decoding with an extra test Note that :auto body decoding might not be entirely correct. It will only look at content-type charset if content-type starts with "text/". Diagnosing/fixing this is out of scope for this PR. Fixes clj-commons#18

* Body encoding now always defaults to UTF-8 Was previously always encoding to system default character encoding, which is UTF-8 unless on Windows then it is windows-1252. Now: - encoding body to requested body encoding - and defaulting to UTF-8 if no body encoding requested Also: - Added extended characters to http bodies in tests to exercise body encoding/decoding - Verified :auto body decoding with an extra test Note that :auto body decoding might not be entirely correct. It will only look at content-type charset if content-type starts with "text/". Diagnosing/fixing this is out of scope for this PR. Fixes #18

lread mentioned this issue May 23, 2022

road to bb: http client lite & bb tests clj-commons/etaoin#438

Merged

This was referenced Aug 11, 2022

tests: fully switch to deps.edn #20

Closed

ci: add Windows to test matrix #21

Closed

lread mentioned this issue Aug 13, 2022

Body encoding now applied and always defaults to UTF-8 #25

Merged

lread closed this as completed in #25 Aug 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When :body is string it is encoded using platform default encoding #18

When :body is string it is encoded using platform default encoding #18

lread commented May 23, 2022

borkdude commented Jun 1, 2022

lread commented Aug 11, 2022

When :body is string it is encoded using platform default encoding #18

When :body is string it is encoded using platform default encoding #18

Comments

lread commented May 23, 2022

Symptom

Diagnosis

Workaround

borkdude commented Jun 1, 2022

lread commented Aug 11, 2022