tenderlove / nokogiri
- Source
- Commits
- Network (54)
- Issues (17)
- Downloads (21)
- Wiki (7)
- Graphs
-
Branch:
master
click here to add a description
click here to add a homepage
-
5 comments Created 13 days ago by romanbsd1.4.2xUnable to output the HTML::Document in the original encodingtenderlovexConsider this scenario:
require 'open-uri'
require 'nokogiri'
puts Nokogiri::VERSION_INFO
puts Nokogiri::LIBXML_ICONV_ENABLED
f = open('http://www.hometheater.co.il/').read; true
puts f.encoding.to_s
p = Nokogiri::HTML(f); true
puts p.encoding
puts p.meta_encoding
p.to_htmlResult:
{"warnings"=>[], "nokogiri"=>"1.4.1", "libxml"=>{"binding"=>"extension", "compiled"=>"2.7.6", "loaded"=>"2.7.6"}} true
ASCII-8BIT
windows-1255
windows-1255
encoding error : output conversion failed due to conv error, bytes 0xEE 0xD7 0x92 0xD7
I/O error : encoder errorSo it effectively prohibits me from outputting this page in the original "windows-1255" encoding.
Comments
-
I've been running into some weirdness in switching from Nokogiri 1.3.3 to 1.4.1.
I've written a small script to demo the namespace weirdness.
If you run it using 1.4.1, hopefully you should see whats wrong. oai_dc:dc element namespaces beginning with xmlns aren't being properly added.
Now, here is the weirdness: It's exactly the same code, but in one the namespaces are strings, the other they are symbols!
Huh?
Furthermore, things start working fine if I
- remove any element from the builder, like "xml.identifier"
- move the strings xml generation before the symbols xml generation
- keep namespaces as symbols, but change the attribute :verb on xml.request to a string
These all seem very random. Is this desired behaviour, or some very weird quirk I was unlucky enough to stumble upon?
Comments
-
3 comments Created 15 days ago by PhrogzDocumentFragment#xpath fails to find specific attribute for elements at the root of the fragment1.4.2xrequire 'nokogiri' html = DATA.read doc1 = Nokogiri::HTML(html) doc2 = Nokogiri::HTML::DocumentFragment.parse(html) ELEMENT_ONLY = ".//h2" WITH_ID = ".//h2[@id='foo']" p doc1.xpath(ELEMENT_ONLY).first['id'], doc1.xpath(WITH_ID), doc2.xpath(ELEMENT_ONLY).first['id'], doc2.xpath(WITH_ID) #=> "foo" #=> [#<Nokogiri::XML::Element:0x80a3c168 name="h2" attributes=[#<Nokogiri::XML::Attr:0x80a3bbb8 name="id" value="foo">] children=[#<Nokogiri::XML::Text:0x80a3b288 "Heading 1">]>] #=> "foo" #=> [] __END__ <h2 id="foo">Heading 1</h2>Same problem applies to
at_xpath.Comments
Workaround is to use
css/at_csson theDocumentFragment, which only works if youridattributes do not have colons or periods in the name.The plot thickens. Apparently it fails to find elements at the root of the fragment, but succeeds if they're nested:
require 'nokogiri' s1 = "<a href='foo'>hi</a>" s2 = "<a href='foo'>hi</a>\n" s3 = "<a href='foo'>hi</a><a href='bar'>bye</a>" s4 = "<a href='foo'>hi</a>\n<a href='bar'>bye</a>" s5 = "<p><a href='foo'>hi</a></p>" s6 = "<a href='foo'>hi</a><p><a href='bar'>bye</a></p>" [s1,s2,s3,s4,s5,s6].each do |s| fragment = Nokogiri::HTML::DocumentFragment.parse(s) p s, fragment.xpath('.//a[@href]').length puts "" end #=> "<a href='foo'>hi</a>" #=> 0 #=> #=> "<a href='foo'>hi</a>\n" #=> 0 #=> #=> "<a href='foo'>hi</a><a href='bar'>bye</a>" #=> 0 #=> #=> "<a href='foo'>hi</a>\n<a href='bar'>bye</a>" #=> 0 #=> #=> "<p><a href='foo'>hi</a></p>" #=> 1 #=> #=> "<a href='foo'>hi</a><p><a href='bar'>bye</a></p>" #=> 1Similarly, an xpath like
.//a/@hrefwill only select the attribute in elements not at the root of the fragment.Please log in to comment.
tenderlove
Wed Jan 27 21:39:06 -0800 2010
| link
I believe this is related to the fact that we just need to redo the partial implementation. I suggest that if you can, grab a prerelease version of nokogiri and use the Node#parse method.
We're going to try backing the fragment code with Node#parse for the next release.
-
3 comments Created about 1 month ago by mattheworiordantenderlovexCSS3 not selector1.4.2xI am trying to use the :not() selector of CSS3 to filter out the first list item from an unordered list.
When I run this code however, I run into an error "RuntimeError: xmlXPathCompOpEval: function first-child not found"
d = Nokogiri::HTML.parse('<html><body><ul><li id="1">1</li><li id="2">2</li><li id="3"<3</li></ul></body></html>')
d.css('ul li:not(:first-child)')I have tried changing the CSS selector as follows, and got some strange results:
d.css('ul li:not(#2)') # works
d.css('ul li:not(li)') # fails with error Nokogiri::CSS::SyntaxError: unexpected 'li' after ':not('
d.css('ul li:not(:last-child)') # fails with the same original error of function last-child not found
I believe that according to the spec at http://www.w3.org/TR/css3-selectors/#pseudo-classes, these CSS3 selectors should work?Any help would be appreciated.
Matt
Comments
tenderlove
Tue Jan 05 09:01:53 -0800 2010
| link
Yes, this looks like a bug. :-(
I believe the CSS parser is accepting this string, but translating it to the wrong XPath. I will investigate
tenderlove
Wed Jan 20 13:38:34 -0800 2010
| link
Updating with the CSS spec reference:
Please log in to comment.
thumblemonks
Mon Feb 08 15:01:09 -0800 2010
| link
I'm having this problem as well. Would be swell if it were fixed :) Doesn't seem like there is a function to support "not".
-
1 comment Created about 1 month ago by libc1.4.2xCan't force encoding of SAX parsertenderlovexOne liner (must be in utf-8)
Nokogiri::HTML::SAX::Parser.new(Nokogiri::XML::SAX::Document.new).parse_memory('<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">И', 'utf-8')causes
encoding error : input conversion failed due to input error, bytes 0x98 0x00 0x98 0x00 encoding error : input conversion failed due to input error, bytes 0x98 0x00 0x98 0x00 encoding error : input conversion failed due to input error, bytes 0x98 0x00 0x98 0x00 I/O error : encoder errorIt tries to convert document to windows-1251 (encoding specified in meta), not utf-8 (encoding I'm forcing it to).
After starring at libxml code for some time, I come up with this diff http://gist.github.com/266416
. The diff changes xmlSwitchEncoding to xmlSwitchToEncoding. These functions have the same description ( http://xmlsoft.org/html/libxml-parserInternals.html#xmlSwitchEncoding ).The test could look like http://gist.github.com/266432
Thanks
Comments
Please log in to comment.version of patch with ext and ffi: http://gist.github.com/288568
-
3 comments Created 2 months ago by sunshineco1.4.2xXML::EntityReference migrationtenderlovexGiven the following input:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> </head> <body> <p>Foo <a href="/" title="François">François</a> bar.</p> </body> </html>The output of
Nokogiri::XML(example_input).to_sis:<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> </head> <body> <p>Foo ç<a href="/" title="Franois">François</a> bar.</p> </body> </html>Notice that the
çentity reference has migrated from thetitle="..."attribute to a position just before the<a>element. The entity reference has become a sibling ofFooand<a>.Is this a misunderstanding on my part and an abuse of
XML::Documentor a Nokogiri bug?Comments
tenderlove
Tue Dec 29 10:59:36 -0800 2009
| link
Could be a bug in libxml2. I need to investigate
sunshineco
Tue Dec 29 11:07:50 -0800 2009
| link
I should note that this problem manifests when processed via
Nokogiri::XML(example_input).to_sbut not withNokogiri::HTML(example_input).to_s.Please log in to comment.
tenderlove
Tue Dec 29 11:27:04 -0800 2009
| link
Ah, that is interesting. I suspect a problem in libxml2. Thanks for the update!
-
When Nokogiri finds a syntax error in an xpath query it obviously raises and exception. Unfortunately the query isn't included in the error message. It would be great if the query was also displayed so you don't have to print it from the code.
1) Error: test_spec {Definition::Profile} 004 [should return a reminder message](Definition::Profile): Nokogiri::XML::XPath::SyntaxError: Invalid expression nokogiri (1.4.1) lib/nokogiri/xml/node.rb:142:in `evaluate' nokogiri (1.4.1) lib/nokogiri/xml/node.rb:142:in `xpath' nokogiri (1.4.1) lib/nokogiri/xml/node.rb:139:in `map' nokogiri (1.4.1) lib/nokogiri/xml/node.rb:139:in `xpath' lib/definition.rb:121:in `reminder' /test/lib/definition_test.rb:102:in `test_spec {Definition::Profile} 004 [should return a reminder message]'Comments
Please log in to comment. -
XML::Builder should support setting the DTD tag for a document
0 comments Created 7 days ago by davemanrivI ended up asking about this on IRC because it was non-obvious from the documentation that it wasn't supported, and someone else just asked on the mailing list, so I'm adding.
Comments
Please log in to comment. -
21 comments Created 6 months ago by darrylflavorjonesxdocument.rb:104: [BUG] object allocation during garbage collection phaseREExrequire 'nokogiri' GC_HACK = false GC.disable if GC_HACK # will delay "error : Name is not from the document dictionnary" gc_count = 0 cycles = 0 loop do cycles = cycles + 1 if GC_HACK if gc_count > 10000 GC.enable GC.start p "gc start cycles: #{cycles}" sleep 10 gc_count = 0 GC.disable end gc_count = gc_count + 1 end p "cycles: #{cycles}" if cycles%1000 == 0 doc = Nokogiri::XML::Document.parse("<bad>blinky</bad>") doc.xpath('/bad').each{ |t| new_node = Nokogiri::XML::Node.new('bad', doc) new_node.content = 'clyde' t.replace(new_node) } end # spits out: # "element bad: error : Name is not from the document dictionnary 'bad'" # between 2 and 20 thousand times then dies with: #/opt/ruby-enterprise-1.8.6-20090610/lib/ruby/gems/1.8/gems/nokogiri-1.3.2/lib /nokogiri/xml/document.rb:104: [BUG] object allocation during garbage collection phase #ruby 1.8.6 (2008-08-11) [i686-linux] # #Aborted # setting GC_HACK = true will still give the warning but not # crash (or at least hasn't crashed yet at 250000 cycles :) ) # system: # ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-linux] # Ruby Enterprise Edition 20090610 # libxml 2.7.3 # nokogiri (1.3.2) # gentoo (linux 2.6.29 SMP PREEMPT) # does not seem to happen on macos # might be a REE bugComments
flavorjones
Tue Aug 18 23:27:27 -0700 2009
| link
Darryl,
I can't reproduce this on Ubuntu using ruby 1.8.7-p72, 1.8.6-p369, 1.9.1-p243, or 1.8.7-p174. I'm building REE now to test with that.
-mike
flavorjones
Tue Aug 18 23:33:33 -0700 2009
| link
Cannot reproduce with ruby-enterprise-1.8.6-20090610
My next best guess is that this is libxml2-version-dependent, since I ran the above tests with 2.6.32. I'll try 2.7.3.
flavorjones
Wed Aug 19 04:51:48 -0700 2009
| link
OK, unable to reproduce with libxml2 2.7.3.
Can you provide more information about your configuration? Please include the output from "nokogiri -v".
tenderlove
Sun Oct 04 21:16:59 -0700 2009
| link
I can't repro this with libxml2 2.7.5 and REE ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-darwin10.0.0] Ruby Enterprise Edition 20090610.
No updates on this ticket for 2 months, so I'll assume it's fixed in master. Please reopen and update if the problem still exists.
sunshineco
Mon Dec 07 02:32:32 -0800 2009
| link
I am running into this problem repeatedly on Windows Vista while developing a website with nanoc3 (http://nanoc.stoneship.org/). It is very difficult to reproduce outside of the larger project generation session, but I have managed to arrive at a scenario within
irbwhich results in this crash 100% of the time (for me). Nokogiri was installed viagem install nokogiri.C:\>ruby -v ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-mingw32] C:\>nokogiri -v --- warnings: [] nokogiri: 1.4.0 libxml: binding: extension compiled: 2.7.3 loaded: 2.7.3To reproduce, run
irband paste the following text into the window:require 'nokogiri' d = Nokogiri::HTML(' <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> </body> </html> ') d.to_htmlNote that when
d.to_htmlemits the document, the charset has been magically and suspiciously mutated from"charset=utf-8"to"charset=IBM437". Perhaps this is related to thelibxml2bug mentioned in this thread: http://groups.google.com/group/nokogiri-talk/msg/607fefd4f43d7accAfter pasting the above content into
irb, the actual "object allocation during garbage collection phase" crasher is triggered by pressing the up arrow (readline-recall) three times. (Though triggered by readline interaction in this reproduction recipe, the same crash has been triggered in other ways. During website development, a simpleputsinvocation can trigger it.) Upon the third up arrow press, the following diagnostics are produced:C:\>irb C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416: [BUG] Segmentation fault ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-mingw32] -- control frame ---------- c:0039 p:---- s:0210 b:0210 l:000209 d:000209 CFUNC :chars c:0038 p:---- s:0208 b:0208 l:000207 d:000207 CFUNC :each c:0037 p:---- s:0206 b:0206 l:000205 d:000205 CFUNC :inject c:0036 p:0220 s:0202 b:0202 l:000201 d:000201 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416 c:0035 p:0066 s:0195 b:0194 l:000193 d:000193 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8433 c:0034 p:1268 s:0186 b:0186 l:000185 d:000185 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:2790 c:0033 p:4100 s:0155 b:0155 l:000154 d:000154 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3653 c:0032 p:0101 s:0117 b:0117 l:000116 d:000116 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4580 c:0031 p:0284 s:0114 b:0114 l:000113 d:000113 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4641 c:0030 p:0021 s:0107 b:0107 l:000106 d:000106 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4705 c:0029 p:0104 s:0103 b:0103 l:000102 d:000102 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727 c:0028 p:0097 s:0098 b:0098 l:000097 d:000097 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/readline.rb:40 c:0027 p:0051 s:0090 b:0090 l:000089 d:000089 METHOD C:/ruby/lib/ruby/1.9.1/irb/input-method.rb:115 c:0026 p:0016 s:0086 b:0086 l:002604 d:000085 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:131 c:0025 p:0037 s:0083 b:0083 l:000082 d:000082 METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:263 c:0024 p:0011 s:0078 b:0078 l:002604 d:000077 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:130 c:0023 p:---- s:0076 b:0076 l:000075 d:000075 FINISH c:0022 p:---- s:0074 b:0074 l:000073 d:000073 CFUNC :call c:0021 p:0022 s:0071 b:0071 l:000070 d:000070 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:189 c:0020 p:0019 s:0067 b:0067 l:000066 d:000066 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:103 c:0019 p:0026 s:0063 b:0063 l:000062 d:000062 METHOD C:/ruby/lib/ruby/1.9.1/irb/slex.rb:205 c:0018 p:0055 s:0055 b:0055 l:000054 d:000054 METHOD C:/ruby/lib/ruby/1.9.1/irb/slex.rb:75 c:0017 p:0041 s:0050 b:0050 l:000049 d:000049 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:287 c:0016 p:0017 s:0046 b:0046 l:000045 d:000045 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:263 c:0015 p:0027 s:0041 b:0041 l:000024 d:000040 BLOCK C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:234 c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :loop c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:230 c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC :catch c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:229 c:0008 p:0042 s:0022 b:0022 l:002604 d:002604 METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:145 c:0007 p:0011 s:0019 b:0019 l:001a7c d:000018 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:69 c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC :catch c:0004 p:0172 s:0011 b:0011 l:001a7c d:001a7c METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:68 c:0003 p:0039 s:0006 b:0006 l:002604 d:00122c EVAL C:/ruby/bin/irb:12 c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH c:0001 p:0000 s:0002 b:0002 l:002604 d:002604 TOP --------------------------- C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416: [BUG] object allocation during garbage collection phase ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-mingw32] -- control frame ---------- c:0039 p:---- s:0210 b:0210 l:000209 d:000209 CFUNC :chars c:0038 p:---- s:0208 b:0208 l:000207 d:000207 CFUNC :each c:0037 p:---- s:0206 b:0206 l:000205 d:000205 CFUNC :inject c:0036 p:0220 s:0202 b:0202 l:000201 d:000201 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416 c:0035 p:0066 s:0195 b:0194 l:000193 d:000193 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8433 c:0034 p:1268 s:0186 b:0186 l:000185 d:000185 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:2790 c:0033 p:4100 s:0155 b:0155 l:000154 d:000154 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3653 c:0032 p:0101 s:0117 b:0117 l:000116 d:000116 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4580 c:0031 p:0284 s:0114 b:0114 l:000113 d:000113 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4641 c:0030 p:0021 s:0107 b:0107 l:000106 d:000106 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4705 c:0029 p:0104 s:0103 b:0103 l:000102 d:000102 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727 c:0028 p:0097 s:0098 b:0098 l:000097 d:000097 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/readline.rb:40 c:0027 p:0051 s:0090 b:0090 l:000089 d:000089 METHOD C:/ruby/lib/ruby/1.9.1/irb/input-method.rb:115 c:0026 p:0016 s:0086 b:0086 l:002604 d:000085 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:131 c:0025 p:0037 s:0083 b:0083 l:000082 d:000082 METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:263 c:0024 p:0011 s:0078 b:0078 l:002604 d:000077 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:130 c:0023 p:---- s:0076 b:0076 l:000075 d:000075 FINISH c:0022 p:---- s:0074 b:0074 l:000073 d:000073 CFUNC :call c:0021 p:0022 s:0071 b:0071 l:000070 d:000070 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:189 c:0020 p:0019 s:0067 b:0067 l:000066 d:000066 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:103 c:0019 p:0026 s:0063 b:0063 l:000062 d:000062 METHOD C:/ruby/lib/ruby/1.9.1/irb/slex.rb:205 c:0018 p:0055 s:0055 b:0055 l:000054 d:000054 METHOD C:/ruby/lib/ruby/1.9.1/irb/slex.rb:75 c:0017 p:0041 s:0050 b:0050 l:000049 d:000049 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:287 c:0016 p:0017 s:0046 b:0046 l:000045 d:000045 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:263 c:0015 p:0027 s:0041 b:0041 l:000024 d:000040 BLOCK C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:234 c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :loop c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:230 c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC :catch c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:229 c:0008 p:0042 s:0022 b:0022 l:002604 d:002604 METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:145 c:0007 p:0011 s:0019 b:0019 l:001a7c d:000018 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:69 c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC :catch c:0004 p:0172 s:0011 b:0011 l:001a7c d:001a7c METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:68 c:0003 p:0039 s:0006 b:0006 l:002604 d:00122c EVAL C:/ruby/bin/irb:12 c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH c:0001 p:0000 s:0002 b:0002 l:002604 d:002604 TOP --------------------------- -- Ruby level backtrace information----------------------------------------- C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416:in `chars' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416:in `each' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416:in `inject' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416:in `_rl_adjust_point' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8433:in `_rl_find_next_mbchar' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:2790:in `update_line' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3653:in `rl_redisplay' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4580:in `_rl_internal_char_cleanup' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4641:in `readline_internal_charloop' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4705:in `readline_internal' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727:in `readline' C:/ruby/lib/ruby/site_ruby/1.9.1/readline.rb:40:in `readline' C:/ruby/lib/ruby/1.9.1/irb/input-method.rb:115:in `gets' C:/ruby/lib/ruby/1.9.1/irb.rb:131:in `block (2 levels) in eval_input' C:/ruby/lib/ruby/1.9.1/irb.rb:263:in `signal_status' C:/ruby/lib/ruby/1.9.1/irb.rb:130:in `block in eval_input' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `call' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `buf_input' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:103:in `getc' C:/ruby/lib/ruby/1.9.1/irb/slex.rb:205:in `match_io' C:/ruby/lib/ruby/1.9.1/irb/slex.rb:75:in `match' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:287:in `token' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:263:in `lex' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:234:in `block (2 levels) in each_top_level_statement' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `loop' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `block in each_top_level_statement' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `catch' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `each_top_level_statement' C:/ruby/lib/ruby/1.9.1/irb.rb:145:in `eval_input' C:/ruby/lib/ruby/1.9.1/irb.rb:69:in `block in start' C:/ruby/lib/ruby/1.9.1/irb.rb:68:in `catch' C:/ruby/lib/ruby/1.9.1/irb.rb:68:in `start' C:/ruby/bin/irb:12:in `<main>' [NOTE] You may encounter a bug of Ruby interpreter. Bug reports are welcome. For details: http://www.ruby-lang.org/bugreport.html This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information.Buried in the middle of the diagnostic is the "object allocation during garbage collection phase" diagnostic.
Note that the crash does not occur if the
d.to_htmlis removed from the input.
flavorjones
Mon Dec 07 06:42:41 -0800 2009
| link
Thanks for reporting. Give me a day or so to reproduce and investigate.
flavorjones
Mon Dec 07 19:45:17 -0800 2009
| link
Note that I can't repro this on Linux with the same ruby, nokogiri and libxml2 versions. Going to try on windows.
sunshineco
Mon Dec 07 20:01:55 -0800 2009
| link
Not unexpected with this sort of problem, it does seem to be a moving target. Today, after restarting the Windows machine, I can still reproduce it, but the circumstances have changed slightly. It now crashes upon the second up-arrow press rather than the third.
I wonder also if the magically and randomly changing
charsetat output is related: perhaps some trashed or uninitialized memory. I have seen several values show up at output time forcharset, including "utf-8", "IBM437", and "US-ASCII" (if I recall correctly), depending upon the input to Nokogiri::HTML() even though all inputs explicitly specify "utf-8" via "meta http-equiv".
sunshineco
Mon Dec 07 20:17:53 -0800 2009
| link
I should note that, in my tests at least, using the
Nokogiri::XML()constructor rather thanNokogiri::HTML()(and emitting viato_s()) sidesteps the problem. Whether this is because the corruption is not occurring in this case or because it is less severe is unknown. Given that the original reporter of this bug was testing withNokogiri::XML::Document.parse(), one might suspect that the problem is still present with XML though manifesting externally less frequently.
flavorjones
Mon Dec 07 20:45:32 -0800 2009
| link
Can you do me a huge favor, and check if this is occurring with Nokogiri 1.3.3?
I ask because the auxiliary DLLs (zlib, iconv, libxml, libxslt) we released with 1.4.0 were from a different source than in previous versions.
sunshineco
Mon Dec 07 23:40:40 -0800 2009
| link
I have not been able to reproduce this crash with Nokogiri 1.3.3 (
nokogiri-1.3.3-x86-mingw32.gem).C:\>nokogiri -v --- warnings: [] nokogiri: 1.3.3 libxml: binding: extension compiled: 2.7.3 loaded: 2.7.3
sunshineco
Mon Dec 07 23:44:48 -0800 2009
| link
Regarding 1.4.0, I also was able to reproduce the crash reliably on Windows Vista with the command:
nokogiri --type html dummy.htmlwhere
dummy.htmlcontains the minimal HTML document indicated earlier.Once
nokogiristartsirb, I enter the following two expressions and then press up-arrow a few times, resulting in a crash.@doc @doc.to_html
sunshineco
Tue Dec 08 00:01:06 -0800 2009
| link
This problem is very much a moving target. Having finished testing 1.3.3, I removed all versions of Nokogiri and re-installed 1.4.0. Following re-installation, I can no longer get it to crash via
nokogiri --type html dummy.html. I also can no longer trigger the crash during website generation, which is how I originally discovered the issue since I could hardly keep it from crashing at that time.The earlier mentioned technique of pasting the sample code into a DOS window running
irb, however, still crashes 1.4.0 reliably when up arrow is pressed a couple times.
flavorjones
Tue Dec 08 20:36:10 -0800 2009
| link
whoop, just reproduced on windows. will update when I have more info.
sunshineco
Tue Dec 08 22:49:04 -0800 2009
| link
Perhaps this ticket should be re-opened? It is still marked as closed.
tenderlove
Wed Dec 09 09:41:19 -0800 2009
| link
Ugh. Apparently even I can't reopen issues. Can we open a new ticket a reference this one?
flavorjones
Wed Dec 09 10:39:16 -0800 2009
| link
Done. Moving the conversation to #188.
sunshineco
Wed Dec 09 12:53:23 -0800 2009
| link
tenderlove
Wed Dec 09 12:58:22 -0800 2009
| link
Wow. Apparently I can't reopen it while viewing the closed ticket. I have to go to the index. :-(
Well, it's reopened now.
Please log in to comment.
sunshineco
Thu Dec 10 01:47:35 -0800 2009
| link
My earlier note where I said that I could side-step the crash by using Nokogiri::XML rather than HTML was indeed apparently just an accidental workaround. I just ran into this same crash in another situation using HTML::DocumentFragment, but replacing it with XML::DocumentFragment made no difference. The crash still occurs.
-
3 comments Created 9 months ago by flavorjonesFFI ruby object caching should be rewritten to not use id2refffixid2ref is slow and may be turned off by default in JRuby 1.4.
discussed with wmeissner, and the probable path is to build an API into FFI that is an hash table containing address => weakref(ruby_object).
Comments
nicksieger
Mon Nov 16 13:32:31 -0800 2009
| link
FYI, id2ref (and objectspace) is turned off by default in JRuby. We made a conscious decision to do this because it's expensive and not feasible to manage all live objects with JRuby.
nicksieger
Mon Nov 16 13:59:28 -0800 2009
| link
Also: tools to help implement the caching in Java/JRuby:
http://java.sun.com/javase/6/docs/api/java/lang/ref/WeakReference.html
http://java.sun.com/javase/6/docs/api/java/util/WeakHashMap.htmlNote the last item may not be exactly what is needed, it's a map w/ weak keys, not a map that weakly references its values.
Please log in to comment.
flavorjones
Mon Nov 16 14:07:16 -0800 2009
| link
Nick, thanks for the pointers (no pun intended).
-
1 comment Created 9 months ago by flavorjonesFFI: support varargs in error/exception callbacksffixwe should open JIRA tickets for vararg support in FFI callbacks
then we should format the libxml error messages properly in the error/exception callbacks
Comments
Please log in to comment.
flavorjones
Sun Jun 21 19:38:17 -0700 2009
| link
@tmm1 poked me about this. I'll open a ticket for it tonight.
-
0 comments Created 6 months ago by flavorjonesFFI needs unlinkedNodes to be optimizedffix -
This is a continuation of bug #212. Some code that demonstrates the problem.
Stick in a filenokogiri-bug.rband runruby nokogiri-bug.rb.require "nokogiri" xml = <<-XML <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Stupid minimal XHTML page.</title> </head> <body> <h1>Stupid minimal XHTML page.</h1> </body> </html> XML # The following code is expected to print twice # the above <html> tag + all content. # The code does not operate with gem nokogiri 1.4.1, demonstrating a bug. xmldoc = Nokogiri::XML(xml) # Getting the html node by finding the first element child # of the entire document: html_node_from_parent = xmldoc.children.find {|e| e.element?} puts(html_node_from_parent.to_s) # Getting the same html node through a different path: # By starting at the first node in the document, # which corresponds to "<!DOCTYPE html ..." # and is a Nokogiri::XML::DTD object. doctype_node = xmldoc.child puts(doctype_node.to_s) puts(doctype_node.class.to_s) # According to the Nokogiri 1.4.1 documentation at # nokogiri-1.4.1/rdoc/classes/Nokogiri/XML/DTD.html , # Nokogiri::XML::Node is the parent of Nokogiri::XML::DTD. # According to the documentation at # nokogiri-1.4.1/rdoc/classes/Nokogiri/XML/Node.html#M000193 , # Nokogiri::XML::Node features a next_element method. # This method does not work on that doctype node. # Trying to call it produces the error # "undefined method `next_element' for #<Nokogiri::XML::DTD:0xb78a0324> (NoMethodError)" # The same problem has been seen with other subclasses of Nokogiri::XML::Node. html_node_from_sybling = doctype_node.next_element puts(html_node_from_sybling)Comments
tenderlove
Sat Feb 06 23:42:57 -0800 2010
| link
This code works fine for me. Can you give me the output of this command:
nokogiri -v
flavorjones
Sun Feb 07 19:40:25 -0800 2010
| link
This also works for me with Nokogiri 1.4.1.
AndreasKrueger
Mon Feb 08 02:11:34 -0800 2010
| link
For what it's worth: I keep getting this WARNING, which I have ignored thus far:
WARNING: Nokogiri was built against LibXML version 2.6.32, but has dynamically loaded 2.7.5Here is what you requested:
$ nokogiri -v --- warnings: [] libxml: loaded: 2.7.5 binding: extension compiled: 2.7.5 nokogiri: 1.4.1
flavorjones
Mon Feb 08 06:09:36 -0800 2010
| link
Not to be overly pedantic here, but that warning is warning you about something. It's warning you that you built against a different version than you're loading at runtime. It's warning you for lots of worthy reasons. The problem you're experiencing is probably one of them.
So, let's work on correcting the condition causing the warning first. Can you re-install the Nokogiri gem (simply 'sudo gem install nokogiri') and let us know if a) the warning no longer appears, and b) if this problem no longer plagues you.
flavorjones
Mon Feb 08 06:12:10 -0800 2010
| link
It's also interesting to see that when you run the command line script, 'nokogiri -v', you don't get any warnings. But when you run your code, you do. I don't have an explanation for why this is, since I don't know your environment, but that could complicate things.
Please log in to comment.
tenderlove
Mon Feb 08 08:29:56 -0800 2010
| link
If you're loading something else (like image magick) that is compiled against libxml2, that can cause this problem.
-
I'm on OS X 10.6 and Ruby 1.9.1. When installing the nokogiri gem I get:
make install /opt/local/bin/ginstall -c -m 0755 nokogiri.bundle /Users/erdah/.rvm/gems/ruby-1.9.1-p378/gems/nokogiri-1.4.1/lib/nokogiri make: /opt/local/bin/ginstall: No such file or directory make: *** [/Users/erdah/.rvm/gems/ruby-1.9.1-p378/gems/nokogiri-1.4.1/lib/nokogiri/nokogiri.bundle] Error 1Resolution:
mkdir -p /opt/local/bin/
sudo ln -s /usr/bin/install /opt/local/bin/ginstallComments
Please log in to comment. -
Extend nokogiri API to support libxml API call xmlSchemaValidateStream
0 comments Created 4 days ago by gjeudythis would allow validating the xml schema as the stream is parsed with SAX parsing method for example. Currently one has to validate/parse the whole xml schema upfront and then re-parse to do the actual processing.
Comments
Please log in to comment. -
I had some crashes with buggy stylesheets. The stylesheets had endless loops or similar problems (stack overflow?). nokogiri should throw an exception. This might be also an problem in libxslt.
Comments
nokogiri 1.4.1, ruby 1.9.1p243, libxslt 1.1.24, libxml 2.7.3-r2 on x86_64
tenderlove
Sun Feb 07 10:33:05 -0800 2010
| link
I'm going to need a stylesheet and some xml in order to deal with this. Unless I can reproduce this bug, there is nothing I can do.
sure. I will provide a xml and a xslt-file to reproduce this
Please log in to comment.
flavorjones
Tue Feb 09 19:18:03 -0800 2010
| link
Ping. Any chance we can get XML and XSLT to reproduce?
- 1.2.3▾
- 1.3.0▾
- 1.3.1▾
- 1.3.2▾
- 1.3.3▾
- 1.4.0▾
- 1.4.1▾
- 1.4.2▾
- REE▾
- ffi▾
- flavorjones▾
- jruby▾
- libxml2▾
- namespace-confusion▾
- tenderlove▾
- unclear▾
- Apply to Selection
-
Change Color…
Previewpreview
- Rename…
- Delete




Huh. It works for me. What version of iconv are you using?
iconv --versionAlso,
iconv -lmight helpHappens both on FC12 Linux:
iconv (GNU libc) 2.11.1
and on my Mac:
iconv (GNU libiconv 1.13)
I verified by otool -L that libxml2 which nokogiri uses is in fact linked to this dylib.
I'm using
ruby 1.9.1p378 (2010-01-10 revision 26272) [i686-linux]
and
ruby 1.9.1p378 (2010-01-10 revision 26272) [i386-darwin10.2.0]
respectively.
iconv -l is rather long, how would you like me to provide it?
Just put the iconv output in a gist if you don't mind.
I tried this with 1.8.7, let me try with 1.9.1p378. I'm starting to suspect it's the encoding stuff in Ruby rather than libxml2.
Yes, I suspect that too.
http://gist.github.com/289660