Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issue with unicode #144

Closed
erebor opened this issue Jan 29, 2013 · 4 comments
Closed

Encoding issue with unicode #144

erebor opened this issue Jan 29, 2013 · 4 comments
Assignees
Labels
Milestone

Comments

@erebor
Copy link
Member

erebor commented Jan 29, 2013

Reported (and fix suggested) by @brianmario

For this content:

https://github.com/foo-users/foo
へと vicmd キーマップを足してみている試み、
アニメーションgifです。

この辺りでやっています。
https://github.com/foo/bar/compare/master...tb;keymap

in erebor/asciidoctor@master:lib/asciidoctor/substituters.rb#L358

m[2] is "https://github.com/foo-users/foo\nへと"

Notice it matched the newline and first couple of unicode chars from the next line.

That's the first part.

Second, in erebor/asciidoctor@master:lib/asciidoctor/substituters.rb#L378 the resulting string needs to be in the same encoding as the original (the result var in this case).

Quick fix might be to do something like

r = "…"
r.force_encoding(result.encoding) if r.respond_to?(:force_encoding)
r

Brian says that fixed it for him locally.

What's happening there is the resulting string inside the block (the one we're building up) ends up tagged as US-ASCII but result is UTF-8 so when gsub goes to join the strings back together it blows up

@mojavelinux
Copy link
Member

Just what I was hoping for, a use case for fleshing out encoding support!

Is this critical for 0.1.0, or can it be addressed after?

@erebor
Copy link
Member Author

erebor commented Jan 29, 2013

It would be a nice wedge for me to push on getting an update pushed
internally sooner. But that's not your problem. ;)

On Tue, Jan 29, 2013 at 11:46 AM, Dan Allen notifications@github.comwrote:

Just what I was hoping for, a use case for fleshing out encoding support.

Is this critical for 0.1.0, or can it be addressed after?


Reply to this email directly or view it on GitHubhttps://github.com/erebor/asciidoctor/issues/144#issuecomment-12847572.

@mojavelinux
Copy link
Member

On Tue, Jan 29, 2013 at 10:49 AM, Ryan Waldron notifications@github.comwrote:

It would be a nice wedge for me to push on getting an update pushed
internally sooner. But that's not your problem. ;)

Gotcha. Perhaps we can schedule it for a 0.1.x point release, which will be
more likely as a candidate for deployment.

@mojavelinux
Copy link
Member

Aha! I've got a fix that will work across Ruby versions.

Turns out, this is ERB biting us again. Previously, I had add the magic encoding directive to all the block-level templates, thinking those were the only ones that would be invoked directly. However, the substitutions are loading the Inline templates directly and concatenating them to the string. That's where the encodings are getting mixed up.

All I needed to do is add the magic encoding directive to all the ERB templates, and it all works.

I also noticed that the link macro is catching the endline as part of the URL, which it shouldn't...so I'm fixing that too.

I can have a patch ready w/ a test shortly.

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Jan 30, 2013
- needed to add magic encoding line to all erb templates
- add example from issue to encodings test case
erebor added a commit that referenced this issue Jan 30, 2013
resolves issue #144 - encoding issue w/ utf-8
@erebor erebor closed this as completed Jan 30, 2013
@ghost ghost assigned mojavelinux Aug 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants