Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some UTF-8 chars seem to break articles #113

Closed
martinklepsch opened this issue Sep 9, 2018 · 6 comments
Closed

Some UTF-8 chars seem to break articles #113

martinklepsch opened this issue Sep 9, 2018 · 6 comments
Labels
Good First Issue Want to help ship cljdoc? These are good issues to start with. Guides Issues related to rendering guides or articles originating from Markdown or similar formats. Help Wanted

Comments

@martinklepsch
Copy link
Member

This article contains some UTF-8 characters that don't seem to go well with our code.

Need to look into it some more but likely the issue is somewhere here:

(let [doc-slug-path (:doc-slug-path route-params)
doc-tree (doctree/add-slug-path (-> cache-contents :version :doc))
doc-p (->> doc-tree
doctree/flatten*
(filter #(= doc-slug-path (:slug-path (:attrs %))))
first)
doc-html (or (some-> doc-p :attrs :cljdoc/markdown rich-text/markdown-to-html)
(some-> doc-p :attrs :cljdoc/asciidoc rich-text/asciidoc-to-html))]

(presumably when trying to find the matching article inside the doctree)

@martinklepsch martinklepsch added Help Wanted Good First Issue Want to help ship cljdoc? These are good issues to start with. Guides Issues related to rendering guides or articles originating from Markdown or similar formats. labels Sep 9, 2018
@martinklepsch
Copy link
Member Author

Seems that this is something with Pedestal. Pedestal reports the following URL for the GET request:

io.pedestal.http {:msg "GET /d/lambdaisland/kaocha/0.0-118/doc/01-kaocha-%E8%80%83%E5%AF%9F", :line 80}

but to correctly resolve the article we'd need something like this:

/d/lambdaisland/kaocha/0.0-118/doc/01-kaocha-考察

Seems there is a thing ServletRequest#setCharacterEncoding() to set the used character encoding for processing requests. Haven't yet confirmed this is the issue but at least Pedestal doesn't seem to be calling this setCharacterEncoding.

@plexus
Copy link
Contributor

plexus commented Sep 9, 2018

Seems like a regular case of URL percent encoding/decoding

(require '[lambdaisland.uri.normalize :as n])

(n/percent-decode "/d/lambdaisland/kaocha/0.0-118/doc/01-kaocha-%E8%80%83%E5%AF%9F")
;;=> "/d/lambdaisland/kaocha/0.0-118/doc/01-kaocha-考察"

lambdaisland.uri.normalize is unreleased though but the same functionality should be available in ring/pedestal and I think even in the Java standard libs.

Update:

java.net.URLDecoder.decode(url, "UTF-8");

martinklepsch added a commit that referenced this issue Sep 9, 2018
This should probably be handled by Pedestal itself

#113
@martinklepsch
Copy link
Member Author

Thanks, I was kind of thinking this should be done by Pedestal but regardless of that we probably don't want to wait for an upstream release to get it fixed.
42c01c3 introduces a change that manually url-decodes the article slugs.

@martinklepsch
Copy link
Member Author

@martinklepsch
Copy link
Member Author

martinklepsch commented Sep 9, 2018

Closing this issue, might bug Pedestal people about this but it's working for now 🚀

@martinklepsch
Copy link
Member Author

Seems it is a known issue in Pedestal, introduced in a recent version: pedestal/pedestal#588

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Want to help ship cljdoc? These are good issues to start with. Guides Issues related to rendering guides or articles originating from Markdown or similar formats. Help Wanted
Projects
None yet
Development

No branches or pull requests

2 participants