Skip to content
This repository

:< separator in LDIF not being parsed correctly #16

Open
twiz718 opened this Issue February 04, 2013 · 1 comment

2 participants

Alex Khanin Alexey Chebotar
Alex Khanin

dn: MYDNHERE
sn: Khanin
givenName: Alex
whenCreated: 20080910232037.0Z
displayName: Khanin, Alex
department: MYDEPTHERE
sAMAccountName: myloginhere
mail: MYEMAILHERE
manager: MYMGRDNHERE
thumbnailPhoto:< file:///var/tmp/ldapsearch-thumbnailPhoto-S8oDGY

This file:///var/tmp/ldapsearch-thumbnailPhoto-S8oDGY exists and is readable (contains JPEG data).

If you try to run a LDAP::LDIF.parse_file() on this ldif you get the following error:

from script/rails:6:in `(root)'irb(main):004:0> LDAP::LDIF.parse_file("/var/tmp/akhanin.ldif")

ArgumentError: invalid byte sequence in UTF-8
from org/jruby/RubyRegexp.java:1487:in =~'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:105:in
unsafe_char?'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:323:in parse_entry'
from org/jruby/RubyArray.java:1613:in
each'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:184:in parse_entry'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:481:in
parse_file'
from org/jruby/RubyIO.java:1183:in open'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:439:in
parse_file'
from (irb):4:in evaluate'
from org/jruby/RubyKernel.java:1066:in
eval'
from org/jruby/RubyKernel.java:1392:in loop'
from org/jruby/RubyKernel.java:1174:in
catch'
from org/jruby/RubyKernel.java:1174:in catch'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands/console.rb:47:in
start'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands/console.rb:8:in start'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands.rb:41:in
(root)'
from org/jruby/RubyKernel.java:1027:in require'
from script/rails:6:in
(root)'irb(main):005:0>

When I run "file" on that thumbnailPhoto I get the following:
ldapsearch-thumbnailPhoto-S8oDGY: JPEG image data, JFIF standard 1.01

Now if I remove the last line in the ldif (with the thumbnail ":<" reference), it parses just fine.

Alexey Chebotar
Collaborator

The problem is that ruby-ldap was not written to work with UTF-8, and method unsafe_char? fails when parsing a file

# return *true* if +str+ contains a character with an ASCII value > 127 or
# a NUL, LF or CR. Otherwise, *false* is returned.
#
def LDIF.unsafe_char?( str )
  # This could be written as a single regex, but this is faster.
  str =~ /^[ :]/ || str =~ /[\x00-\x1f\x7f-\xff]/
end

Wikipedia:

ASCII was incorporated into the Unicode character set as the first 128 symbols, so the ASCII characters have the same numeric codes in both sets. This allows UTF-8 to be backward compatible with ASCII, a significant advantage.

so, sequence \x00-\x1f is correct and pass, but \x7f-\xff is invalid in UTF-8 and should be replaced to another one or even few sequences, but I do not know on which exactly

Patches are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.