Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby 1.9 character encoding changes #188

Closed
blakesmith opened this issue Jul 1, 2010 · 15 comments
Closed

Ruby 1.9 character encoding changes #188

blakesmith opened this issue Jul 1, 2010 · 15 comments

Comments

@blakesmith
Copy link

With Ruby 1.8, incorrect UTF-8 encoded characters are silently ignored. If you have a post with incorrect UTF-8 characters in the content body, they will show up in your rendered page as question marks (unknown characters).

A user upgrading from Ruby 1.8 to Ruby 1.9 who's site seemed to be working fine would get a weird error when trying to render their site (assuming it had incorrectly encoded UTF-8 characters):

/Users/blake/projects/jekyll/lib/jekyll/convertible.rb:26:in `read_yaml': invalid byte sequence in UTF-8
 (ArgumentError)
        from /Users/blake/projects/jekyll/lib/jekyll/post.rb:39:in `initialize'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:110:in `new'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:110:in `block in read_posts'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:108:in `each'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:108:in `read_posts'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:169:in `read_directories'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:79:in `read'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:71:in `process'
        from ../jekyll/bin/jekyll:150:in `'

This doesn't really help the user fix the problem post. This commit will at least display the problem post so that the user knows what needs to be fixed for the site to render successfully.

This is mainly an issue of how Ruby decides to handle String encodings by default. You can read more about it here: http://blog.grayproductions.net/articles/ruby_19s_string

@lmmendes
Copy link

lmmendes commented Sep 9, 2010

In my case i was getting the following error:

/usr/local/rvm/gems/ruby-1.9.1-p378/gems/jekyll-0.7.0/lib/jekyll/convertible.rb:26:in `read_yaml': invalid byte sequence in US-ASCII (ArgumentError)
    from /usr/local/rvm/gems/ruby-1.9.1-p378/gems/jekyll-0.7.0/lib/jekyll/page.rb:24:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.1-p378/gems/jekyll-0.7.0/lib/jekyll/site.rb:185:in `new'
    from /usr/local/rvm/gems/ruby-1.9.1-p378/gems/jekyll-0.7.0/lib/jekyll/site.rb:185:in `block in read_directories'
    from /usr/local/rvm/gems/ruby-1.9.1-p378/gems/jekyll-0.7.0/lib/jekyll/site.rb:175:in `each'

And solved the problem declaring the following locale in my shell:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

@tatey
Copy link

tatey commented Nov 9, 2010

Just got bitten by this after recently switching to 1.9 as my default Ruby. Thanks for the patch.

@lloydh
Copy link

lloydh commented May 3, 2011

I think I'm running into this problem, but only when running the jekyll command via SSH, not if I run jekyll directly on the host machine. Jekyll also runs without errors on the client machine — it's only over SSH that I encounter this problem:

/usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/convertible.rb:26:in `read_yaml': invalid byte sequence in US-ASCII (ArgumentError)
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/post.rb:39:in `initialize'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:119:in `new'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:119:in `block in read_posts'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:117:in `each'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:117:in `read_posts'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:211:in `read_directories'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:88:in `read'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:79:in `process'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/bin/jekyll:164:in `<top (required)>'
    from /usr/local/bin/jekyll:19:in `load'
    from /usr/local/bin/jekyll:19:in `<main>'

I haven't tried lmmendes' fix yet (sorry, how/where do I declare those locales, and just on the host machine, or both?) but does anybody have any ideas why SSH is creating these problems?

Thanks.

@Kwpolska
Copy link

Kwpolska commented May 3, 2011

Put these two lines to .bashrc:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

@lloydh
Copy link

lloydh commented May 3, 2011

Thanks Kwpolska.

I ended up having to put those lines in my .profile, but they did the trick.

@dengwh
Copy link

dengwh commented Aug 20, 2011

I just got the similar error.
My environment is Windows XP with ruby 1.9.2.
Any recommends under Windows?

Thanks.

@fhemberger
Copy link
Contributor

@dengwh, for Windows set the same environment variables. In your cmd.exe, type

set LC_ALL=en_US.UTF-8
set LANG=en_US.UTF-8

@stereobooster
Copy link
Contributor

@dengwh for windows you can use

chcp 65001  

seems connected to #117

@sdsalyer
Copy link

I'm trying to get a post-receive hook to work on Arch Linux with Ruby 1.9 and I'm getting this ASCII error. I've tried adding the UTF-8 settings to my .profile, but I'm still getting the error. I assume the git hook doesn't use my .profile, though. Any further suggestions?

EDIT: I just applied to patch to this file and it works fine now. Duh... and Thank you!

@stereobooster
Copy link
Contributor

connected to #226, #201

@ehtb
Copy link

ehtb commented Jul 12, 2012

This fix worked for me, whereas the others didn't: http://stackoverflow.com/a/8274677/1303499

@ghost
Copy link

ghost commented Jul 16, 2012

I had a text file with a ü, but accidentally had it saved with ANSI encoding. Changing the encoding to UTF-8 fixed it for me. @stereobooster patch would be very helpful though.

@kevinSuttle
Copy link

Still getting errors but it just started out of nowhere:

/Users/kevinsuttle/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/jekyll-0.11.2/lib/jekyll/convertible.rb:29:in `read_yaml': invalid byte sequence in UTF-8 (ArgumentError)

This isn't new by the way. See issues 117, 188, 493, 135.

@parkr
Copy link
Member

parkr commented Jan 2, 2013

Merged in #718.

@heidsoft
Copy link

Liquid Exception: invalid byte sequence in UTF-8 in index.html

@jekyll jekyll locked and limited conversation to collaborators Feb 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests