Skip to content

Commit

Permalink
[Sass] Determine encoding from @charset if specified.
Browse files Browse the repository at this point in the history
Closes sassgh-183
Closes sassgh-184
  • Loading branch information
nex3 committed May 29, 2010
1 parent 67b1b19 commit 9e92944
Show file tree
Hide file tree
Showing 6 changed files with 262 additions and 8 deletions.
5 changes: 5 additions & 0 deletions doc-src/SASS_CHANGELOG.md
Expand Up @@ -5,6 +5,11 @@

## 3.0.7 (Unreleased)

### Encoding Support

Add support for `@charset` for declaring the encoding of a stylesheet.
For details see {file:SASS_REFERENCE.md#encodings the reference}.

### Bug Fixes

* When compiling a file named `.sass` but with SCSS syntax specified,
Expand Down
26 changes: 26 additions & 0 deletions doc-src/SASS_REFERENCE.md
Expand Up @@ -303,6 +303,32 @@ Available options are:
{#quiet-option} `:quiet`
: When set to true, causes warnings to be disabled.

### Encodings

When running on Ruby 1.9 and later, Sass is aware of the character encoding of documents
and will handle them the same way that CSS would.
By default, Sass assumes that all stylesheets are encoded
using whatever coding system your operating system defaults to.
For many users this will be `UTF-8`, the de facto standard for the web.
For some users, though, it may be a more local encoding.

If you want to use a different encoding for your stylesheet
than your operating system default,
you can use the `@charset` declaration just like in CSS.
Add `@charset "encoding-name";` at the beginning of the stylesheet
(before any whitespace or comments)
and Sass will interpret it as the given encoding.
Note that whatever encoding you use, it must be convertible to Unicode.

Sass will also respect any Unicode BOMs and non-ASCII-compatible Unicode encodings
[as specified by the CSS spec](http://www.w3.org/TR/CSS2/syndata.html#charset),
although this is *not* the recommended way
to specify the character set for a document.
Note that Sass does not support the obscure `UTF-32-2143`,
`UTF-32-3412`, `EBCDIC`, `IBM1026`, and `GSM 03.38` encodings,
since Ruby does not have support for them
and they're highly unlikely to ever be used in practice.

## CSS Extensions

### Nested Rules
Expand Down
72 changes: 72 additions & 0 deletions lib/haml/util.rb
Expand Up @@ -434,6 +434,78 @@ def check_encoding(str)
return str
end

# Like {\#check\_encoding}, but also checks for a `@charset` declaration
# at the beginning of the file and uses that encoding if it exists.
#
# The Sass encoding rules are simple.
# If a `@charset` declaration exists,
# we assume that that's the original encoding of the document.
# Otherwise, we use whatever encoding Ruby has.
# Then we convert that to UTF-8 to process internally.
# The UTF-8 end result is what's returned by this method.
#
# @param str [String] The string of which to check the encoding
# @yield [msg] A block in which an encoding error can be raised.
# Only yields if there is an encoding error
# @yieldparam msg [String] The error message to be raised
# @return [(String, Encoding)] The original string encoded as UTF-8,
# and the source encoding of the string (or `nil` under Ruby 1.8)
# @raise [Encoding::UndefinedConversionError] if the source encoding
# cannot be converted to UTF-8
# @raise [ArgumentError] if the document uses an unknown encoding with `@charset`
def check_sass_encoding(str, &block)
return check_encoding(str, &block), nil if ruby1_8?
# We allow any printable ASCII characters but double quotes in the charset decl
bin = str.dup.force_encoding("BINARY")
encoding = Haml::Util::ENCODINGS_TO_CHECK.find do |enc|
bin =~ Haml::Util::CHARSET_REGEXPS[enc]
end
charset, bom = $1, $2
if charset
charset = charset.force_encoding(encoding).encode("UTF-8")
if endianness = encoding[/[BL]E$/]
begin
Encoding.find(charset + endianness)
charset << endianness
rescue ArgumentError # Encoding charset + endianness doesn't exist
end
end
str.force_encoding(charset)
elsif bom
str.force_encoding(encoding)
end

str = check_encoding(str, &block)
return str.encode("UTF-8"), str.encoding
end

unless ruby1_8?
# @private
def _enc(string, encoding)
string.encode(encoding).force_encoding("BINARY")
end

# We could automatically add in any non-ASCII-compatible encodings here,
# but there's not really a good way to do that
# without manually checking that each encoding
# encodes all ASCII characters properly,
# which takes long enough to affect the startup time of the CLI.
ENCODINGS_TO_CHECK = %w[UTF-8 UTF-16BE UTF-16LE UTF-32BE UTF-32LE]

CHARSET_REGEXPS = Hash.new do |h, e|
h[e] =
begin
# /\A(?:\uFEFF)?@charset "(.*?)"|\A(\uFEFF)/
Regexp.new(/\A(?:#{_enc("\uFEFF", e)})?#{
_enc('@charset "', e)}(.*?)#{_enc('"', e)}|\A(#{
_enc("\uFEFF", e)})/)
rescue
# /\A@charset "(.*?)"/
Regexp.new(/\A#{_enc('@charset "', e)}(.*?)#{_enc('"', e)}/)
end
end
end

# Checks to see if a class has a given method.
# For example:
#
Expand Down
32 changes: 27 additions & 5 deletions lib/sass/css.rb
Expand Up @@ -14,7 +14,12 @@ module Sass
# Sass::CSS.new("p { color: blue }").render(:sass) #=> "p\n color: blue"
# Sass::CSS.new("p { color: blue }").render(:scss) #=> "p {\n color: blue; }"
class CSS
# @param template [String] The CSS code
# @param template [String] The CSS stylesheet.
# This stylesheet can be encoded using any encoding
# that can be converted to Unicode.
# If the stylesheet contains an `@charset` declaration,
# that overrides the Ruby encoding
# (see {file:SASS_REFERENCE.md#encodings the encoding documentation})
# @option options :old [Boolean] (false)
# Whether or not to output old property syntax
# (`:color blue` as opposed to `color: blue`).
Expand All @@ -37,18 +42,35 @@ def initialize(template, options = {})
# @return [String] The resulting Sass or SCSS code
# @raise [Sass::SyntaxError] if there's an error parsing the CSS template
def render(fmt = :sass)
Haml::Util.check_encoding(@template) do |msg, line|
raise Sass::SyntaxError.new(msg, :line => line)
end

check_encoding!
build_tree.send("to_#{fmt}", @options).strip + "\n"
rescue Sass::SyntaxError => err
err.modify_backtrace(:filename => @options[:filename] || '(css)')
raise err
end

# Returns the original encoding of the document,
# or `nil` under Ruby 1.8.
#
# @return [Encoding, nil]
# @raise [Encoding::UndefinedConversionError] if the source encoding
# cannot be converted to UTF-8
# @raise [ArgumentError] if the document uses an unknown encoding with `@charset`
def source_encoding
check_encoding!
@original_encoding
end

private

def check_encoding!
return if @checked_encoding
@checked_encoding = true
@template, @original_encoding = Haml::Util.check_sass_encoding(@template) do |msg, line|
raise Sass::SyntaxError.new(msg, :line => line)
end
end

# Parses the CSS template and applies various transformations
#
# @return [Tree::Node] The root node of the parsed tree
Expand Down
40 changes: 37 additions & 3 deletions lib/sass/engine.rb
Expand Up @@ -133,6 +133,11 @@ def comment?
}.freeze

# @param template [String] The Sass template.
# This template can be encoded using any encoding
# that can be converted to Unicode.
# If the template contains an `@charset` declaration,
# that overrides the Ruby encoding
# (see {file:SASS_REFERENCE.md#encodings the encoding documentation})
# @param options [{Symbol => Object}] An options hash;
# see {file:SASS_REFERENCE.md#sass_options the Sass options documentation}
def initialize(template, options={})
Expand All @@ -155,9 +160,12 @@ def initialize(template, options={})
#
# @return [String] The CSS
# @raise [Sass::SyntaxError] if there's an error in the document
# @raise [Encoding::UndefinedConversionError] if the source encoding
# cannot be converted to UTF-8
# @raise [ArgumentError] if the document uses an unknown encoding with `@charset`
def render
return _to_tree.render unless @options[:quiet]
Haml::Util.silence_haml_warnings {_to_tree.render}
return _render unless @options[:quiet]
Haml::Util.silence_haml_warnings {_render}
end
alias_method :to_css, :render

Expand All @@ -170,10 +178,28 @@ def to_tree
Haml::Util.silence_haml_warnings {_to_tree}
end

# Returns the original encoding of the document,
# or `nil` under Ruby 1.8.
#
# @return [Encoding, nil]
# @raise [Encoding::UndefinedConversionError] if the source encoding
# cannot be converted to UTF-8
# @raise [ArgumentError] if the document uses an unknown encoding with `@charset`
def source_encoding
check_encoding!
@original_encoding
end

private

def _render
rendered = _to_tree.render
return rendered if ruby1_8?
return rendered.encode(source_encoding)
end

def _to_tree
@template = check_encoding(@template) {|msg, line| raise Sass::SyntaxError.new(msg, :line => line)}
check_encoding!

if @options[:syntax] == :scss
root = Sass::SCSS::Parser.new(@template).parse
Expand All @@ -190,6 +216,14 @@ def _to_tree
raise e
end

def check_encoding!
return if @checked_encoding
@checked_encoding = true
@template, @original_encoding = check_sass_encoding(@template) do |msg, line|
raise Sass::SyntaxError.new(msg, :line => line)
end
end

def tabulate(string)
tab_str = nil
comment_tab_str = nil
Expand Down
95 changes: 95 additions & 0 deletions test/sass/engine_test.rb
@@ -1,4 +1,5 @@
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
require File.dirname(__FILE__) + '/../test_helper'
require 'sass/engine'
require 'stringio'
Expand Down Expand Up @@ -2038,6 +2039,94 @@ def test_ascii_incompatible_encoding_error
assert_equal(3, e.sass_line)
assert_equal('Invalid UTF-16LE character "\xFE"', e.message)
end

def test_same_charset_as_encoding
assert_renders_encoded(<<CSS, <<SASS)
@charset "utf-8";
fóó {
a: b; }
CSS
@charset "utf-8"
fóó
a: b
SASS
end

def test_different_charset_than_encoding
assert_renders_encoded(<<CSS.force_encoding("IBM866"), <<SASS)
@charset "ibm866";
fóó {
a: b; }
CSS
@charset "ibm866"
fóó
a: b
SASS
end

def test_different_encoding_than_system
assert_renders_encoded(<<CSS.encode("IBM866"), <<SASS.encode("IBM866"))
тАЬ {
a: b; }
CSS
тАЬ
a: b
SASS
end

def test_multibyte_charset
assert_renders_encoded(<<CSS.encode("UTF-16BE"), <<SASS.encode("UTF-16BE").force_encoding("UTF-8"))
@charset "utf-16be";
fóó {
a: b; }
CSS
@charset "utf-16be"
fóó
a: b
SASS
end

def test_multibyte_charset_without_endian_specifier
assert_renders_encoded(<<CSS.encode("UTF-32LE"), <<SASS.encode("UTF-32LE").force_encoding("UTF-8"))
@charset "utf-32";
fóó {
a: b; }
CSS
@charset "utf-32"
fóó
a: b
SASS
end

def test_utf8_bom
assert_renders_encoded(<<CSS, <<SASS.force_encoding("BINARY"))
fóó {
a: b; }
CSS
\uFEFFfóó
a: b
SASS
end

def test_utf16le_bom
assert_renders_encoded(<<CSS.encode("UTF-16LE"), <<SASS.encode("UTF-16LE").force_encoding("BINARY"))
fóó {
a: b; }
CSS
\uFEFFfóó
a: b
SASS
end

def test_utf32be_bom
assert_renders_encoded(<<CSS.encode("UTF-32BE"), <<SASS.encode("UTF-32BE").force_encoding("BINARY"))
fóó {
a: b; }
CSS
\uFEFFfóó
a: b
SASS
end
end

private
Expand All @@ -2046,6 +2135,12 @@ def assert_hash_has(hash, expected)
expected.each {|k, v| assert_equal(v, hash[k])}
end

def assert_renders_encoded(css, sass)
result = render(sass)
assert_equal css.encoding, result.encoding
assert_equal css, result
end

def render(sass, options = {})
munge_filename options
Sass::Engine.new(sass, options).render
Expand Down

0 comments on commit 9e92944

Please sign in to comment.