Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some UTF-8 chars seem to break articles #113

Closed
martinklepsch opened this issue Sep 9, 2018 · 6 comments
Closed

Some UTF-8 chars seem to break articles #113

martinklepsch opened this issue Sep 9, 2018 · 6 comments

Comments

@martinklepsch
Copy link
Member

@martinklepsch martinklepsch commented Sep 9, 2018

This article contains some UTF-8 characters that don't seem to go well with our code.

Need to look into it some more but likely the issue is somewhere here:

(let [doc-slug-path (:doc-slug-path route-params)
doc-tree (doctree/add-slug-path (-> cache-contents :version :doc))
doc-p (->> doc-tree
doctree/flatten*
(filter #(= doc-slug-path (:slug-path (:attrs %))))
first)
doc-html (or (some-> doc-p :attrs :cljdoc/markdown rich-text/markdown-to-html)
(some-> doc-p :attrs :cljdoc/asciidoc rich-text/asciidoc-to-html))]

(presumably when trying to find the matching article inside the doctree)

@martinklepsch

This comment has been minimized.

Copy link
Member Author

@martinklepsch martinklepsch commented Sep 9, 2018

Seems that this is something with Pedestal. Pedestal reports the following URL for the GET request:

io.pedestal.http {:msg "GET /d/lambdaisland/kaocha/0.0-118/doc/01-kaocha-%E8%80%83%E5%AF%9F", :line 80}

but to correctly resolve the article we'd need something like this:

/d/lambdaisland/kaocha/0.0-118/doc/01-kaocha-考察

Seems there is a thing ServletRequest#setCharacterEncoding() to set the used character encoding for processing requests. Haven't yet confirmed this is the issue but at least Pedestal doesn't seem to be calling this setCharacterEncoding.

@plexus

This comment has been minimized.

Copy link
Contributor

@plexus plexus commented Sep 9, 2018

Seems like a regular case of URL percent encoding/decoding

(require '[lambdaisland.uri.normalize :as n])

(n/percent-decode "/d/lambdaisland/kaocha/0.0-118/doc/01-kaocha-%E8%80%83%E5%AF%9F")
;;=> "/d/lambdaisland/kaocha/0.0-118/doc/01-kaocha-考察"

lambdaisland.uri.normalize is unreleased though but the same functionality should be available in ring/pedestal and I think even in the Java standard libs.

Update:

java.net.URLDecoder.decode(url, "UTF-8");
martinklepsch added a commit that referenced this issue Sep 9, 2018
This should probably be handled by Pedestal itself

#113
@martinklepsch

This comment has been minimized.

Copy link
Member Author

@martinklepsch martinklepsch commented Sep 9, 2018

Thanks, I was kind of thinking this should be done by Pedestal but regardless of that we probably don't want to wait for an upstream release to get it fixed.
42c01c3 introduces a change that manually url-decodes the article slugs.

@martinklepsch

This comment has been minimized.

Copy link
Member Author

@martinklepsch martinklepsch commented Sep 9, 2018

@martinklepsch

This comment has been minimized.

Copy link
Member Author

@martinklepsch martinklepsch commented Sep 9, 2018

Closing this issue, might bug Pedestal people about this but it's working for now 🚀

@martinklepsch

This comment has been minimized.

Copy link
Member Author

@martinklepsch martinklepsch commented Sep 11, 2018

Seems it is a known issue in Pedestal, introduced in a recent version: pedestal/pedestal#588

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.