Every repository with this icon (
Every repository with this icon (
| Description: | Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser with XPath and CSS selector support. edit |
-
I have this quick demo code to show the issue.
Comments
-
2 comments Created 5 months ago by tommorrisXPath is not namespace awarenamespace-confusionxNokogiri currently does not do namespaced XPath queries properly.
Take the following XML:
sampledoc = <<-EOF;
<?xml version="1.0" ?>
;
<rdf:RDF>
<rdf:Description rdf:about="http://example.org/one">;
<ex:name>Foo</ex:name>
</rdf:Description>
</rdf:RDF>
<rdf:RDF>
<rdf:Description rdf:about="http://example.org/two">;
<ex:name>Bar</ex:name>
</rdf:Description>
</rdf:RDF>
EOF(I wrote it to test Reddy, an RDF library I've written which currently works on top of libxml-ruby, but which I'd like to port to Nokogiri so that I can have it run on JRuby thanks to the nokogiri support thanks to FFI in JRuby 1.3.0.)
Now, according to the RDF/XML Syntax specification, an RDF/XML document can be parsed from multiple root rdf:RDF nodes. This document gives this example: it has two rdf:RDF nodes, correctly namespaced in the http://www.w3.org/1999/02/22-rdf-syntax-ns# namespace. But because Nokogiri treats namespaces as basically nothing more than mildly clever attributes, the following XPath query fails:
Nokogiri::XML(sampledoc).xpath("rdf:RDF", 'rdf' => "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
Thanks for fixing the .namespace issue so it returns a namespace object rather than a prefix. Prefixes have no semantic value - the namespace URIs are what matters.
Comments
tenderlove
Sat Jul 04 14:41:33 -0700 2009
| link
Nokogiri's xpath function is quite aware of namespaces. In fact, nokogiri's xpath function is a thin wrapper around libxml2, so I'm curious why you think it treats namespaces as "clever attributes"?
Your sample document does not declare your rdf tags inside a namespace. Please try a sample RDF document that declares the tags using namespaces. Here is a good example:
http://www.w3schools.com/rdf/rdf_example.asp
Notice how the sample rdf document provided by w3schools declares it's namespaces, where your document does not.
Also, xml documents may only have one root node. If there are multiple nodes, it is not a legal XML document:
http://www.w3.org/TR/REC-xml/#dt-root
According to the RDF/XML spec, it may have multiple rdf:RDF, but they must be inside a root tag:
http://www.w3.org/TR/rdf-syntax-grammar/#section-grammar-summary
Also see section 7.2.8
-
2 comments Created 4 months ago by henriktcmalloc error parsing "<a><b></a>" fragment with REEREExWorks fine with MRI, but not with REE:
henrik@Nyx ~/Code$ which nokogiri /opt/ruby-enterprise-1.8.6-20090610/bin/nokogiri henrik@Nyx ~/Code$ nokogiri -v --- nokogiri: 1.3.3 warnings: [] libxml: compiled: 2.7.3 loaded: 2.7.3 binding: extension henrik@Nyx ~/Code$ which ruby /opt/ruby-enterprise-1.8.6-20090610/bin/ruby henrik@Nyx ~/Code$ ruby -rubygems -e 'require "nokogiri"; puts Nokogiri::HTML::DocumentFragment.parse("<a><b></a>")' src/tcmalloc.cc:186] Attempt to free invalid pointer: 0x201a90 Abort trap henrik@Nyx ~/Code$Expected output is
<a><b></b></a>without the error.
Comments
tenderlove
Mon Sep 14 21:41:54 -0700 2009
| link
I'm not sure what to do about this. It's working for me:
tenderlove
Sun Oct 04 21:19:21 -0700 2009
| link
I can't repro this and the ticket hasn't been updated for almost a month. I will assume this is fixed on master.
Please reopen and update the ticket if it's still breaking against master. Thanks.
-
Requiring Nokogiri 1.2.3 under JRuby 1.2 does not load
1 comment Created 7 months ago by francois$ cat test.rb require "rubygems"
require "nokogiri"doc = Nokogiri::HTML(File.read(ARGV[0]))
p doc$ jruby -w test.rb data.html /Users/francois/Library/Java/JRuby/jruby-1.2.0/lib/ruby/gems/1.8/gems/nokogiri-1.2.3-java/lib/nokogiri/xml/node.rb:180: undefined method
next_sibling' for classNokogiri::XML::Node' (NameError)from /Users/francois/Library/Java/JRuby/jruby-1.2.0/lib/ruby/gems/1.8/gems/nokogiri-1.2.3-java/lib/nokogiri/xml/node.rb:31:in `require' from /Users/francois/Library/Java/JRuby/current/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from /Users/francois/Library/Java/JRuby/jruby-1.2.0/lib/ruby/gems/1.8/gems/nokogiri-1.2.3-java/lib/nokogiri/xml.rb:3 from /Users/francois/Library/Java/JRuby/jruby-1.2.0/lib/ruby/gems/1.8/gems/nokogiri-1.2.3-java/lib/nokogiri/xml.rb:31:in `require' from /Users/francois/Library/Java/JRuby/current/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from /Users/francois/Library/Java/JRuby/jruby-1.2.0/lib/ruby/gems/1.8/gems/nokogiri-1.2.3-java/lib/nokogiri.rb:10 from /Users/francois/Library/Java/JRuby/jruby-1.2.0/lib/ruby/gems/1.8/gems/nokogiri-1.2.3-java/lib/nokogiri.rb:36:in `require' from /Users/francois/Library/Java/JRuby/current/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require' from test.rb:2$ jruby -S gem list -l nokogiri
LOCAL GEMS
nokogiri (1.2.3)
$ jruby --version jruby 1.2.0 (ruby 1.8.6 patchlevel 287) (2009-03-16 rev 9419) [i386-java]
Comments
flavorjones
Fri May 01 21:04:15 -0700 2009
| link
Francois,
Thanks for using Nokogiri!
Nokogiri is not supported under JRuby 1.2. You can take a look at issue #8 for the current status on JRuby support.
We are targetting JRuby 1.3 for an alpha release of Nokogiri-FFI. Currently all tests pass on 1.3.0RC1, which was just released today.
-mike
-
2 comments Created 7 months ago by halorgium1.3.0xSubclasses of Nokogiri::XML::Node are not able to have #initialize overridentenderlovexI have a failing test at:
http://github.com/halorgium/nokogiri/commit/0c32aa9b753aa2aa6f331524ec807fdd5c37468cComments
tenderlove
Sat May 16 14:13:35 -0700 2009
| link
Ya, thought of this last night. I'll have a fix soon.
tenderlove
Sat May 16 19:20:58 -0700 2009
| link
adding initialize to all nodes. closed by 9f904ba
Squashed commit of the following:commit 14514e7
Author: Aaron Patterson aaron.patterson@gmail.com
Date: Sat May 16 19:18:40 2009 -0700fixing up attribute initializecommit 2969620
Author: Aaron Patterson aaron.patterson@gmail.com
Date: Sat May 16 19:16:45 2009 -0700fixing xml comment initializecommit ed4affc
Author: Aaron Patterson aaron.patterson@gmail.com
Date: Sat May 16 19:13:12 2009 -0700fixing entity reference initializecommit 87706dd
Author: Aaron Patterson aaron.patterson@gmail.com
Date: Sat May 16 19:11:07 2009 -0700adding initialize for processing instructioncommit e02d099
Author: Aaron Patterson aaron.patterson@gmail.com">aaron.patterson@gmail.com
Date: Sat May 16 19:05:42 2009 -0700getting document fragment workingcommit eed074e
Author: Aaron Patterson aaron.patterson@gmail.com
Date: Sat May 16 19:02:03 2009 -0700fixing text node initializecommit 84cfbed
Author: Aaron Patterson aaron.patterson@gmail.com
Date: Sat May 16 19:00:28 2009 -0700starting work on initialize method -
6 comments Created 7 months ago by halorgium1.3.0xSAX Push Parser should yield more specific informationtenderlovexIf possible, separating the XML namespaces from the attributes in the #start_element hook would be a big win.
And sending back the values as already-formed hashes would be nice too.Comments
I wrote a patch for this. It uses #start_element_ns and #end_element_ns. It maintains backward compatibility with #start_element and #end_element.
http://github.com/sprsquish/nokogiri/commit/775c298b49b0b085b024aa098aaf412b90269d4a
tenderlove
Mon May 18 17:55:02 -0700 2009
| link
I'm mostly OK with this patch. It shouldn't use rb_str_new2 as that will produce strings of the incorrect encoding in 1.9.
I'll apply it and fix it up.
If you haven't already, I can clean it up and re-submit. Those rb_str_new2 calls were oversites on my part.
tenderlove
Tue May 19 09:46:39 -0700 2009
| link
If you don't mind, that would be great! If you clean up the rb_str_new2 calls, I'll apply it.
Okay, I removed the direct calls to rb_str_new*, fixed the little space mistake in xml.rb and gave proper credit to the libxml guys.
I force updated the remote branch figuring no one else was paying any attention to my fork.
http://github.com/sprsquish/nokogiri/commit/660362805ff06cd36465d08bf6dda67af6867c4f
tenderlove
Tue May 19 10:12:16 -0700 2009
| link
applied. thanks!
-
Is there a version of Nokogiri that works with libxml2 shipped with Ubuntu 8.10
11 comments Created 8 months ago by gnufiedWell, I have been trying to install nokogiri on my Ubuntu 8.10 which has libxml2 version 2.6.32. Is there a version of nokogiri that works with that libxml2 version?
Comments
Specifically the error that I get is:
checking for xmlParseDoc() in -lxml2... no libxml2 is missing. try 'port install libxml2' or 'yum install libxml2'
tenderlove
Thu Apr 23 09:04:23 -0700 2009
| link
yes, but you need the libxml2 development packages installed.
do this: "aptitude install libxml2-dev"
You may need to do a similar thing for libxslt
Yup, I do have libxml2-dev installed:
root@shire:~# apt-get install libxml2-dev libxslt-dev Reading package lists... Done Building dependency tree Reading state information... Done libxml2-dev is already the newest version. Note, selecting libxslt1-dev instead of libxslt-dev libxslt1-dev is already the newest version.
And yet above error while installing nokogiri 1.2.3
tenderlove
Thu Apr 23 10:08:37 -0700 2009
| link
You wouldn't happen to have libxml-dev installed, would you?
Well lets see:
root@shire:~# apt-get remove --purge libxml-dev Reading package lists... Done Building dependency tree Reading state information... Done Package libxml-dev is not installed, so not removed
tenderlove
Thu Apr 23 10:23:15 -0700 2009
| link
alright. Run 'gem env' then go to the "INSTALLATION DIRECTORY". There should be a file "gems/nokogiri-1.2.3/ext/nokogiri/mkmf.log". Get that file and put it on gist or something. That will give me more insights in to your system config, and should tell us what the problem is.
Here is the gist: http://gist.github.com/100625
tenderlove
Thu Apr 23 11:26:41 -0700 2009
| link
Do you have ruby-dev installed as well?
I have ruby compiled from source and as shared library and hence that could be the reason behind "unable to find ruby-static" errors.
In related note:
5070 lib % nm libxml2.a|grep xmlParseDoc hemant pts/7 00017200 T xmlParseDoc 000092b0 T xmlParseDocTypeDecl 00016560 T xmlParseDocument U xmlParseDocument U xmlParseDoc U xmlParseDocument U xmlParseDocument U xmlParseDocument /usr/libHowever, I have shared library version of libxml2 as well ( but nm command won't work on shared libraries AFAIK). This is getting nasty no doubt, I will see if I can configure nokogiri with custom compiled libxml2.
tenderlove
Thu Apr 23 22:53:25 -0700 2009
| link
Okay. I'm going to close this because it sounds like environment issues
-
2 comments Created 3 months ago by david1.4.0xAdding a node with a default namespace stores it as 'no-namespace' in the parenttenderlovexThis works:
doc = Nokogiri::XML("<element><child xmlns="woop:de:doo" /></element>") doc.at("//xmlns:child", 'xmlns' => 'woop:de:doo') #=> <child xmlns="woop:de:doo" />This doesn't:
doc = Nokogiri::XML::Document.new e = Nokogiri::XML::Node.new('element', doc) c = Nokogiri::XML::Node.new('child', doc) c.add_namespace(nil, 'woop:de:doo') e.add_child(c) doc.add_child(c) doc.at("//xmlns:child", 'xmlns' => 'woop:de:doo') #=> nilComments
I'd also like to add that if you have a document like this:
<element> <c1 xmlns="one" /> <c2 xmlns="two" /> </element>then
doc.root.collect_namespaces.inspect #=> {'xmlns' => 'two'}
tenderlove
Fri Sep 11 21:48:15 -0700 2009
| link
Yup. That is the danger of collect_namespaces. I think that method should be removed.
The first problem is fixed here: c6e5fa0
-
Adding a Document to a Node causes segfault on program exit
1 comment Created 2 months ago by david$ ruby -rubygems -rnokogiri -e 'Nokogiri::XML("").root << Nokogiri::XML::Document.new'
: [BUG] Segmentation fault
ruby 1.9.1p243 (2009-07-16 revision 24175) [i486-linux]-- control frame ----------
c:0001 p:0000 s:0002 b:0002 l:0011a4 d:0011a4 TOP
-- Ruby level backtrace information-----------------------------------------
-- C level backtrace information ------------------------------------------- 0xb76cd6e9 /usr/lib/libruby-1.9.1.so.1.9(rb_vm_bugreport+0x69) [0xb76cd6e9]
0xb75e907f /usr/lib/libruby-1.9.1.so.1.9 [0xb75e907f]
0xb75e911a /usr/lib/libruby-1.9.1.so.1.9(rb_bug+0x3a) [0xb75e911a]
0xb7674fa4 /usr/lib/libruby-1.9.1.so.1.9 [0xb7674fa4]
0xb7768410 [0xb7768410]
0xb767b817 /usr/lib/libruby-1.9.1.so.1.9(st_foreach+0x17) [0xb767b817]
0xb72d0aa9 /var/lib/gems/1.9.1/gems/nokogiri-1.3.3/lib/nokogiri/nokogiri.so [0xb72d0aa9]
0xb75f954d /usr/lib/libruby-1.9.1.so.1.9 [0xb75f954d]
0xb75f96e4 /usr/lib/libruby-1.9.1.so.1.9 [0xb75f96e4]
0xb75f98dc /usr/lib/libruby-1.9.1.so.1.9(rb_gc_call_finalizer_at_exit+0x17c) [0xb75f98dc]
0xb75eb0ee /usr/lib/libruby-1.9.1.so.1.9 [0xb75eb0ee]
0xb75ec436 /usr/lib/libruby-1.9.1.so.1.9(ruby_cleanup+0x116) [0xb75ec436]
0xb75ec5ee /usr/lib/libruby-1.9.1.so.1.9(ruby_run_node+0x5e) [0xb75ec5ee]
0x80487e8 ruby(main+0x68) [0x80487e8]
0xb73dfb56 /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb73dfb56]
0x80486e1 ruby [0x80486e1][NOTE] You may encounter a bug of Ruby interpreter. Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html^CAborted (core dumped)
-- SNIP --
This is ruby 1.9.1, but the same thing happened to me on 1.8.7.
The reason why I think this is significant is because I mistakenly was adding a Document to a Node in my code, and that kept failing with me understanding why, so the same thing may happen to other people.
Thanks.
Comments
tenderlove
Tue Oct 13 11:17:11 -0700 2009
| link
raising an exception if someone tries to reparent a *::Document. closed by c557764
-
I don't know if this really a bug or if I'm being really stupid, but here goes:
So, when you have a double quote character inside your attribute value, nokogiri does this
attribute='a string with a double " quote'
Unfortunately, this is making the xerces java parser (or whatever parser Openfire uses) throw a hissy fit. While I understand this may be a xerces bug (OWPOU), not a nokogiri one, it would still be nice if we could have the option of using
attribute="a string with a double " quote"
(I tried using send(:native_content=, "...") but that has the same result.
Comments
tenderlove
Mon Nov 09 15:41:15 -0800 2009
| link
What version of libxml2 are you using? Past the contents of this command, if you can:
$ nokogiri -vAlso, if you could post the code to reproduce this, that would be great. So far, libxml2 is behaving how you want it to:
d = Nokogiri::XML('<root />') d.root['foo'] = 'hello " world' puts d.to_xml # => '<root foo="hello " world"/>'
tenderlove
Tue Nov 24 20:44:29 -0800 2009
| link
Closing as there has been no response.
-
[PATCH] adding Builder#<< for appending raw strings
4 comments Created 2 months ago by dudleyfHere's a tiny patch implementing the functionality talked about here[0].
[0] http://rubyforge.org/pipermail/nokogiri-talk/2009-March/000224.html
diff --git a/lib/nokogiri/xml/builder.rb b/lib/nokogiri/xml/builder.rb index 89cd63a..5cdcafd 100644 --- a/lib/nokogiri/xml/builder.rb +++ b/lib/nokogiri/xml/builder.rb @@ -277,6 +277,12 @@ module Nokogiri @doc.to_xml end + ### + # Append the given raw XML +string+ to the document + def << string + @doc.fragment(string).children.each { |x| insert(x) } + end + def method_missing method, *args, &block # :nodoc: if @context && @context.respond_to?(method) @context.send(method, *args, &block) diff --git a/test/xml/test_builder.rb b/test/xml/test_builder.rb index d4a6e26..12b1f86 100644 --- a/test/xml/test_builder.rb +++ b/test/xml/test_builder.rb @@ -117,6 +117,26 @@ module Nokogiri assert_equal 'hello', builder.doc.at('baz').content end + def test_raw_append + builder = Nokogiri::XML::Builder.new do |xml| + xml.root do + xml << 'hello' + end + end + + assert_equal 'hello', builder.doc.at('//root/foo').content + end + + def test_raw_append_with_instance_eval + builder = Nokogiri::XML::Builder.new do + root do + self << 'hello' + end + end + + assert_equal 'hello', builder.doc.at('//root/foo').content + end + def test_cdata builder = Nokogiri::XML::Builder.new do root { -- 1.6.4.3Comments
tenderlove
Sun Oct 04 20:40:05 -0700 2009
| link
I've applied the patch, but next time please make sure the tests pass. After applying the patch, I got these errors:
1) Error: test_raw_append(Nokogiri::XML::TestBuilder): NoMethodError: undefined method `content' for nil:NilClass test/xml/test_builder.rb:127:in `test_raw_append' 2) Error: test_raw_append_with_instance_eval(Nokogiri::XML::TestBuilder): NoMethodError: undefined method `content' for nil:NilClass test/xml/test_builder.rb:137:in `test_raw_append_with_instance_eval'
tenderlove
Sun Oct 04 20:40:29 -0700 2009
| link
XML Builder can append raw strings. closed by 98b10d2
tenderlove
Mon Oct 05 08:38:30 -0700 2009
| link
No problem! :-)
-
1 comment Created about 1 month ago by manalang1.4.0-java doesn't work on jruby-1.4.01.4.1xZ:\rich\dev\blather\examples>jruby -rrubygems echo.rb [user]@[domain].com/ruby [pwd] [server]
C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/nokogiri-1.4.0-java/lib/nokogiri/ffi/libxml.rb:6: Nokogiri requires JRuby 1.4.0RC1 or later on Windows (RuntimeError)
from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/nokogiri-1.4.0-java/lib/nokogiri/ffi/libxml.rb:31:in `require' from C:/Program Files/jruby-1.4.0/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/nokogiri-1.4.0-java/lib/nokogiri.rb:11 from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/nokogiri-1.4.0-java/lib/nokogiri.rb:31:in `require' from C:/Program Files/jruby-1.4.0/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/blather-0.4.7/lib/blather.rb:58 from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/blather-0.4.7/lib/blather.rb:3:in `each' from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/blather-0.4.7/lib/blather.rb:3 ... 9 levels... from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/blather-0.4.7/lib/blather/client.rb:36:in `require' from C:/Program Files/jruby-1.4.0/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require' from echo.rb:3Comments
tenderlove
Tue Nov 24 20:45:42 -0800 2009
| link
-
3 comments Created 8 months ago by jmhodges1.3.0xsegfault when adding a blank NodeSet as next siblingtenderlovexThis code segfaults ruby and I think that's pretty charming. Haven't dug into it, yet.
h = Nokogiri::HTML.parse("<p></p>") node = h.at('p') node.add_next_sibling(node.children)Comments
This might be related to lighthouse ticket 53 but that ticket was resolved before the patch was made. Specific comment is:
We need to do the same thing for add_previous_sibling, and add_next_sibling I suspect.
tenderlove
Wed Apr 29 08:25:33 -0700 2009
| link
Nope, I think this is a different problem. I'll take a look
tenderlove
Wed Apr 29 23:06:41 -0700 2009
| link
-
2 comments Created 8 months ago by jmhodges1.3.0xadding a non-empty NodeSet raises an obscure errortenderlovexThis code raises a RuntimeError with the message
"Could not reparent node (xmlDocCopyNode)":h = Nokogiri::HTML.parse("<p>sometext</p>") n = h.at('p') n.add_next_sibling(n.children)We should probably pick one of these two things to do instead:
raise an error that says "Hey, dummy, that's a NodeSet, not a Node. We need you to add just n.children[0] through n.children.last one by one with add_next_sibling"
flatten the NodeSet out for the user
Comments
tenderlove
Wed Apr 29 08:26:50 -0700 2009
| link
I think this is the same issue as #29. I'm going to raise an ArgumentError exception in those methods if you're not passing in a Node.
tenderlove
Wed Apr 29 22:59:15 -0700 2009
| link
Oops. This got fixed with 762c832
-
13 comments Created 7 months ago by jmhodgesflavorjonesxmemory corruption from meta tags claiming a charset of ISO-8859-1libxml2xHere's an example. google-try contains nothing but the meta tag as seen in the read.
j = Nokogiri::HTML.parse('<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">') => <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">irb(main):002:0> l=nil; j.traverse{|e| l = e.to_s if e.name == "meta" } => nil irb(main):003:0> l => "<\003 ction_view/template_handler.rb=\"\" ction_view/template_handler.rb=\"\"></\003>"Sometimes this segfaults. Sometimes it works fine. Hunting.
Oh, and if you remove the 'charset=ISO-8859-1', it does not happen ever.
And this is OS X, libxml2 2.7.3 (2.7.3_0 in MacPorts).
Comments
flavorjones
Sat May 16 13:27:47 -0700 2009
| link
jmhodges - what version of Nokogiri are you using?
With 1.2.3 I cannot reproduce. On 1.2.4 I get an encoding error:
encoding error : output conversion failed due to conv error, bytes 0xE7 0xB7 0x10 0x3D I/O error : encoder errorYeah, 1.2.4 and 1.2.3 for me. Here's a gist of an irb session that didn't segfault immediately with nokogiri 1.2.3 and here's one that did with nokogiri 1.2.4.
As you can see, I only get the error you did when the code does not segfault in 1.2.3. What is the output of meta_tag.to_s (l.to_s in those posts) on your machine?
I did see similar errors in a larger file that had some entities seemingly translated to utf-8 while the file was being parsed (I believe! Not sure!) as ISO-8859-1.
Sorry, added a simpler bit of code that expresses it. Using #at instead of #traverse.
flavorjones
Sat May 16 18:07:58 -0700 2009
| link
whoop, I've managed to get valgrind to complain. consider it reproduced, and I'm on the case.
Cool, I wasn't able to get the OS X beta valgrind to talk to me much about it but I haven't used it before. If you want, I can run it again and post something up (assuming, you're not already on OS X).
flavorjones
Sat May 16 19:39:02 -0700 2009
| link
This is a libxml2 bug. I just wrote a C program that reproduces it, and will be submitting it to libxml2's tracker tonight.
flavorjones
Sat May 16 19:46:48 -0700 2009
| link
C program to reproduce is at http://gist.github.com/112897
flavorjones
Sat May 16 19:59:16 -0700 2009
| link
Bug has been filed at http://bugzilla.gnome.org/show_bug.cgi?id=582913
Also, sorry for not hunting this down myself. I got lost in a thicket of build problems last night.
flavorjones
Sun May 17 16:35:03 -0700 2009
| link
awwww, now you made me blush.
flavorjones
Sun Jun 07 15:09:57 -0700 2009
| link
closing this ticket, since it's now in the good hands of the libxml2 team.
flavorjones
Wed Aug 12 16:51:03 -0700 2009
| link
F yer I: Daniel V (maintainer of libxml2) has updated the libxml2 bugzilla ticket. Here's his comment
Okay, I have fixed htmlSetMetaEncoding() to be nicer, not destroy existing meta encoding elements just update the property and only if needed, which is never the case if you just output part of the current document without asking for encoding changes. Fixed in git (<a href="/tenderlove/nokogiri/commit/8d7c1b7ab296ea2e8c8d18d7b8f3d24e0963f8ff">8d7c1b7</a>) thanks for the report, DanielYou might want to check out that version and verify that it addresses your particular pain. Have a nice day.
-
6 comments Created 5 months ago by latompaffixIs jRuby/FFI leaking memory?flavorjonesxRun the following under jruby 1.3.1 and nokogiri 1.3.2
On my machine, I get up to a "ran 800 times", then my machine is starting to run really slow.
The test eats up ~500Mb right off the bat, which leads me to believe memory is leaking.require 'rubygems' require 'nokogiri' require 'open-uri' REPORT_EVERY=100 NUM_THREADS=1 def test_nokogiri() threads=[] ; a =-1 NUM_THREADS.times do threads << Thread.new('some_thread') do |t| xml = open('http://railstips.org/assets/2008/8/9/timeline.xml').read while true do doc = Nokogiri::HTML(xml) (p "ran #{a} times") if ((a+=1) % REPORT_EVERY == 0) end end end threads.each { |aThread| aThread.join } end test_nokogiriComments
flavorjones
Fri Jul 24 23:10:01 -0700 2009
| link
I concur that this appears to be a memory leak. I would like to point out, though, that when running the same code on ruby-ffi (MRI), there is no leak.
This leads me to quietly, gently, and without proof, suggest that perhaps this is a JRuby/FFI problem.
I will put together some tests to try to reproduce these results in a vanilla (non-Nokogiri) case, and perhaps narrow the search for causes.
flavorjones
Sun Jul 26 23:55:30 -0700 2009
| link
I have created a self-contained self case demonstrating the problem, which is available at http://gist.github.com/156081
I've opened a JIRA ticket for the JRuby team: http://jira.codehaus.org/browse/JRUBY-3832
I'll leave this ticket open for a few days, and will provide updates when I hear from the JRuby team.
I believe it's a ffi/jruby problem too.
I get the same kind of problem if replace
doc = Nokogiri::HTML(xml)with
doc=Nokogiri::LibXML.xmlReadMemory(xml, xml.length, nil,nil, nil)which goes straight to the "attached" libXML function.
flavorjones
Mon Jul 27 19:43:13 -0700 2009
| link
See an update from Wayne (JFFI team) here:
This fix is available now if you want to build JFFI yourself, or else you can wait until JRuby 1.3.2 (or 1.4.0). Details at the above URL.
Closing this ticket.
flavorjones
Sun Sep 06 18:24:41 -0700 2009
| link
Just an FYI, the latest JFFI does not appear to fix this issue. I've updated the JFFI ticket.
flavorjones
Sat Sep 12 14:57:35 -0700 2009
| link
Another update: current JFFI addresses this issue. Though it is still possible, with aggressive-enough memory usage, to get an OOM memory condition, it is vanishingly likely that using Nokogiri will cause a full-blown OOM.
The original test code no longer suffers from the monstrous memory leak. I'm considering this issue 'closed', for realsies. Let me know if you're still experiencing issues after upgrading to the latest JFFI.
-
strings returned by xpath expression "/text()" are bad formatted
1 comment Created 3 months ago by jneyStrings returned by Nokogiri::HTML(open(url)).xpath("//xpath_expression/text()").to_ary are such formatted that comparaison return false on identic strings.
To avoid the problem i have to do it : Nokogiri::HTML(open(url)).xpath("//xpath_expression").collect(&:text)Comments
tenderlove
Sun Aug 30 10:45:37 -0700 2009
| link
Right, because it returns a Nokogiri::XML::Text node. That is different than a string.
-
1 comment Created 4 months ago by samsm1.4.0xDocumentFragment lacks detailed searchtenderlovexfragment = '<p id="content">hi</p>' Nokogiri::HTML.fragment(fragment).search('#content').length # this returns zero Nokogiri::HTML(fragment).search('#content').length # this returns 1Searching for an element ('p') does work, but using any CSS selector or XPath seems to always produce zero results. Non-fragment search works as I'd expect.
Comments
tenderlove
Fri Aug 28 22:23:02 -0700 2009
| link
delegating DocumentFragment#css to the fragment children. closed by ed10f01
-
6 comments Created 5 months ago by pdlug1.3.3xCreating a new document by with a root node cloned from another causes segfaulttenderlovexWhen trying to create a new document from part of an existing document via #dup a segfault results:
doc = Nokogiri::XML('test')
doc2 = Nokogiri::XML::Document.new
doc2.root = doc.root.dup(1)
Comments
Should have noted that this occurs both on Mac OS X (libxml2 2.7.3) and Gentoo Linux (libxml2 2.7.2).
flavorjones
Sun Jul 12 22:04:44 -0700 2009
| link
This is more of the node dictionary allocation issue, which occurs when nodes have resources owned by another document. In this case, when the node is dup()ed, it still references dictionary strings owned by the original document. At GC time, things blow up.
flavorjones
Sun Jul 12 22:05:26 -0700 2009
| link
Basically, libxml2 does not support moving nodes from one document to another, nor does it support moving dupes of a node to another document. Sigh.
tenderlove
Sun Jul 12 23:04:06 -0700 2009
| link
But this shouldn't be that problem, right? I thought we copied the tree on dups?
flavorjones
Mon Jul 13 04:07:36 -0700 2009
| link
This is that problem! Really. I think if we copy the document, everything works correctly. But copying a node and jamming it into another doc is definitely going to break libxml.
tenderlove
Wed Jul 15 18:07:50 -0700 2009
| link
moving roots around will copy them and gc old roots. closed by 10f5710
-
5 comments Created 24 days ago by rgrove1.4.1x<br /> preceded by a newline is lost when parsing an HTML fragmentflavorjonesxWhen a
<br />element is preceded by a newline in an HTML fragment, Nokogiri seems to remove it when the fragment is parsed. Here's an irb session demonstrating the issue (using Nokogiri 1.4.0 with libxml2 2.7.6):>> require 'rubygems' => false >> require 'nokogiri' => true >> html = "First line\nSecond line<br />Broken line" => "First line\nSecond line<br />Broken line" >> fragment = Nokogiri::HTML::DocumentFragment.parse(html) => #<Nokogiri::HTML::DocumentFragment:0x80c94f5c name="#document-fragment" children=[#<Nokogiri::XML::Text:0x80c94c64 "First line\nSecond lineBroken line">]> >> fragment.to_xhtml => "First line\nSecond lineBroken line" >> fragment.to_html => "First line\nSecond lineBroken line"If I remove the newline, the fragment is parsed just fine:
>> html = "First line<br />Broken line" => "First line<br />Broken line" >> fragment = Nokogiri::HTML::DocumentFragment.parse(html) => #<Nokogiri::HTML::DocumentFragment:0x80c8c118 name="#document-fragment" children=[#<Nokogiri::XML::Text:0x80c8be20 "First line">, #<Nokogiri::XML::Element:0x80c8bd80 name="br">, #<Nokogiri::XML::Text:0x80c8bc7c "Broken line">]> >> fragment.to_xhtml => "First line<br />Broken line"Comments
This also applies to other HTML -- I have observed it with anchors, i.e., "One line\nTwo line\n\n<a href="http://brokenlink.com">This won't be a link after parsing</a>"
However, if I wrap the text block in a <div> and </div>, it works. I'm thinking that the newline must somehow interfere with Nokogiri's ability to discern the insides as HTML.
flavorjones
Thu Dec 03 12:09:37 -0800 2009
| link
OK, will investigate.
flavorjones
Thu Dec 03 19:27:01 -0800 2009
| link
fixing leading text node with newline in fragment parsing. closed by b659302.
Great turnaround time, thanks a lot! Between this and the fix to the document.root.namespace exception you recently did, I'd love to see a gem bump soon so I can strip out my collection of kluge-fixes. :)
flavorjones
Fri Dec 04 10:37:18 -0800 2009
| link
Should be bumped this weekend. Cross your fingers.
-
This segfaults, if you look at it funny out of the corner of your eye:
require "nokogiri" class TextHandler < Nokogiri::XML::SAX::Document def initialize @chunks = [] end attr_reader :chunks def cdata_block(string) characters(string) end def characters(string) @chunks << string.strip if string.strip != "" end end th = TextHandler.new parser = Nokogiri::XML::SAX::Parser.new(th) parser.parse(<<-XML) <?xml version="1.0" encoding="utf-8"?> <root> <stuff> one </stuff> <stuff> two </stuff> </root> XMLI was able to duplicate consistently for awhile, but I uninstalled and reinstalled nokogiri a few times, and now it works. It would reach the end of the document before segfaulting. The end_document event would fire, and then it would segfault shortly thereafter.
Comments
sporkmonger
Sat Oct 10 21:10:51 -0700 2009
| link
Looks like 2.7.3.
tenderlove
Mon Oct 12 17:54:21 -0700 2009
| link
If you were on libxml2, 2.6.16, then I wouldn't be surprised. That version was very old an unstable.
I can't repro this (even with thousands of iterations), so I will assume it's a bug with 2.6.16. I am going to close this, but if you are able to repro with 2.7.3, please reopen this ticket. Thanks!
sporkmonger
Mon Oct 12 17:59:47 -0700 2009
| link
Shouldn't require thousands of iterations, it's a happens-every-time kind of bug. However, I may have been wrong about the version of nokogiri I was using. It might have been edge.
-
1 comment Created about 1 month ago by chriseppstein1.4.1xAncestor search doesn't work with a css query.tenderlovexSee this script for an example: http://gist.github.com/227429
Comments
tenderlove
Thu Nov 05 19:46:04 -0800 2009
| link
Node#matches? works in nodes contained by a DocumentFragment. closed by d41db1a
-
This simple script generates invalid output
Comments
flavorjones
Tue Nov 17 22:38:37 -0800 2009
| link
Yo. This is fixed in master. Commit 1fefd59
-
10 comments Created 3 months ago by bfolkens1.4.0xinner_html= dropping some elementstenderlovexI'm having some trouble on the following environment. The code below fails on a linux install but not on a macports install. Both environments are:
$ nokogiri -v --- warnings: [] libxml: loaded: 2.7.3 binding: extension compiled: 2.7.3 nokogiri: 1.3.3But on the linux environment, the following code:
require 'test/unit' require 'rubygems' require 'nokogiri' class BugTest < Test::Unit::TestCase def test_should_parse_inner_text text = '<base><one>1</one><two>2</two></base>' doc = Nokogiri::XML(text) doc.search('base').each do |base_tag| base_tag.name = 'span' base_tag.inner_html = "<sup>#{base_tag.at('one').inner_text}</sup>/<sub>#{base_tag.at('two').inner_text}</sub>" end assert_equal '<span><sup>1</sup>/<sub>2</sub></span>', doc.to_html.strip end endFails with:
test_should_parse_inner_text(BugTest) [oo.rb:15]: <"<span><sup>1</sup>/<sub>2</sub></span>"> expected but was <"<span><sup>1</sup></span>">.Am I missing something obvious, or is this a bug? The above code is an abstraction from a larger project I'm working on, so I've tried to reduce it to the base of the issue. It passes the test on the MacPorts install (same version of libxml2 and nokogiri as on the Linux install).
The Ruby -v on linux is:
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]And the MacPorts install is:
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9]Comments
tenderlove
Sat Sep 12 11:35:53 -0700 2009
| link
Strange. Are you sure nokogiri -v returns the same thing on the linux box? That seems crazy.
Yeah - I thought I was going crazy so I even did a diff. Weirdest thing I've seen... AFAIK libxml2 doesn't really depend on much does it? Or is there something else that Nokogiri depends on that might be causing this? I tried libxml2 2.7.2 just in case, but still had the same problem.
tenderlove
Sat Sep 12 11:53:41 -0700 2009
| link
libxml2 only depends on iconv and zlib. Neither of those should cause this problem.
What linux are you running?
Gentoo (default/linux/x86/2008.0 profile) over the 2.6.18-xenU-ec2-v1.0 kernel
libxml2: 2.7.3-r2
libc: 2.8_p20080602-r1
zlib: 1.2.3-r1
tenderlove
Sat Sep 12 12:01:46 -0700 2009
| link
Okay. I'll get a gentoo box up and running. Might be a little while before I get this one to repro. :-(
Thanks a ton! In the meantime I'm trying to upgrade glibc and anything else that might be out of date, just to try some different versions of things.
FWIW - The new glibc (2.9_p20081201-r2) didn't affect anything.
Here's another take on it, if it helps at all:
require 'test/unit' require 'rubygems' require 'nokogiri' module Nokogiri::XML class Node include Test::Unit::Assertions def inner_html=(tags) children.each { |x| x.remove } assert_equal ['sup', 'sub'], document.fragment(tags).children.map {|n| n.name } document.fragment(tags).children.to_a.each do |node| add_child node end self end end end class BugTest < Test::Unit::TestCase def test_should_parse_inner_html text = '<base><one>1</one><two>2</two></base>' doc = Nokogiri::XML(text) base_tag = doc.at('base') base_tag.inner_html = "<sup>#{base_tag.at('one').inner_text}</sup><sub>#{base_tag.at('two').inner_text}</sub>" assert_equal ['sup', 'sub'], base_tag.children.map {|n| n.name } end endSuccessful return on the MacPorts install, and on the Linux install:
1) Failure: test_should_parse_inner_html(BugTest) [oo2.rb:12:in `inner_html=' oo2.rb:27:in `test_should_parse_inner_html']: <["sup", "sub"]> expected but was <["sup"]>.In fact, even just this code returns only the first element and not the other:
Nokogiri::XML::DocumentFragment.parse("<one>1</one><two>2</two>")Unless it's wrapped in another outer element like:
<x><one>1</one><two>2</two></x>...then it returns the whole thing. And then obviously on the MacPorts install it returns an accurate copy regardless of the surrounding element.
I think I narrowed this down finally. For whatever reason, my local copy of Nokogiri (even though the gem was labeled 1.3.3) showed this diff from the copy on the linux machine (which was recently installed):
8,9c8,13 < @html_eh = node.kind_of? Nokogiri::HTML::DocumentFragment < --- > @klass = if node.kind_of?(Nokogiri::HTML::DocumentFragment) > Nokogiri::HTML::DocumentFragment > else > Nokogiri::XML::DocumentFragment > end > # 23,25c27,28 < regex = @html_eh ? %r{^\s*<#{Regexp.escape(name)}}i : < %r{^\s*<#{Regexp.escape(name)}} < --- > regex = (@klass == Nokogiri::HTML::DocumentFragment) ? %r{^\s*<#{Regexp.escape(name)}}i \ > : %r{^\s*<#{Regexp.escape(name)}}So a fresh install on my MacPorts version now fails as well - lol - not quite the expected result. However, installing the gem from the master works great - so looks like this was already fixed ;)
tenderlove
Mon Sep 14 21:07:45 -0700 2009
| link
Ugh. You're right. It fails against 1.3.3. I was checking against master. :-(
I guess I can stop fighting with VirtualBox now. Thanks for letting me know!
-
Node/NodeSet.before and after methods scrub script and style tags
3 comments Created 6 months ago by vamseeI already asked this question on nokogiri-talk, but haven't got any replies. So I'm assuming this is a bug. I'm trying to add a chunk of HTML from a rails partial, which includes some script and style tags. Unfortunately, if I try to add this raw html with Node or NodeSet.before/after, all the contents of style and script tags are lost. Also, the parsing is not done properly unless I remove html comments from the raw html I'm trying to feed. Here's an example:
doc.xpath("//head/*[1]").before("<script>var xb=25;</script>")
If I try to retrieve the added node, here's what happens:
doc.xpath("//head/*[1]")
=> <script></script>Comments
flavorjones
Mon Jun 15 06:19:48 -0700 2009
| link
Working on it.
flavorjones
Mon Jun 15 06:58:12 -0700 2009
| link
making sure HTML fragments include comments and cdata blocks. closed by 8019e01.
-
I'm reading in an HTML, inserting a node and then serializing it. Here's before and after for one particular SCRIPT block:
<script type="text/javascript"> /* <![CDATA[ */ var jsexec = dj.util.JSExec(dj.context.jsexec); if (window.AT_VARS.articleType.indexOf('The+Mossberg+Solution') > -1) { globalPerfTesting = true;djPerf.init( { type: 'gomez', frequency: '100', acctId: '72D329', pgId:'ArticleType: The+Mossberg+Solution', grpId: 'Article Pages' } );}djPerf.mark('JSEXEC: top-to-9'); /* ]]> */ </script>After:
<script type="text/javascript"> <![CDATA[ /* <![CDATA[ */ var jsexec = dj.util.JSExec(dj.context.jsexec); if (window.AT_VARS.articleType.indexOf('The+Mossberg+Solution') > -1) { globalPerfTesting = true;djPerf.init( { type: 'gomez', frequency: '100', acctId: '72D329', pgId:'ArticleType: The+Mossberg+Solution', grpId: 'Article Pages' } );}djPerf.mark('JSEXEC: top-to-9'); /* ]]]]><![CDATA[> */ ]]> </script> </pre> <p>Looks like the commented CDATA confuses it? I believe you can reproduce this by parsing and serializing this URL:</p> <p>http://online.wsj.com/article/SB124514075458818255.html</p> <p>The end result is a load of JS parsing errors from the browser.</p>]]>Comments
tenderlove
Wed Jul 01 11:30:17 -0700 2009
| link
What version of libxml2 are you using? It seems to be serializing OK (no strange CDATA stuff) for me, and I'm on 2.7.3
tenderlove
Wed Jul 01 13:10:32 -0700 2009
| link
If you're running nokogiri 1.3.x, just do this:
$ nokogiri -vOtherwise, in irb, require nokogiri, then print out Nokogiri::LIBXML_VERSION
tenderlove
Wed Jul 01 14:48:37 -0700 2009
| link
I can't seem to reproduce this at all. Would you mind writing a failing test case? I've tried the following test case using libxml2 version 2.7.3, 2.6.30, and 2.6.26. All of them pass. I'm afraid I'm not writing the test correctly.
require 'nokogiri' require 'test/unit' require 'open-uri' HTML = open('http://online.wsj.com/article/SB124514075458818255.html').read class TestCDATA < Test::Unit::TestCase def test_cdata before = HTML.scan(/CDATA/).length doc = Nokogiri::HTML HTML assert_equal before, doc.to_s.scan(/CDATA/).length end endThis fails for me:
require 'nokogiri' require 'test/unit' require 'open-uri' HTML = open('http://online.wsj.com/article/SB124514075458818255.html').read class TestCDATA < Test::Unit::TestCase def test_cdata before = HTML.scan(/\[/).length doc = Nokogiri::HTML HTML assert_equal before, doc.serialize.scan(/\[/).length end end
tenderlove
Wed Jul 01 15:28:14 -0700 2009
| link
Okay. This is definitely a problem in libxml2. 2.6.26, 2.6.30, and 2.6.32 all fail. The good news is that libxml2 2.7.3 passes the test.
Updating macports to 2.7.3 is easy (just port install libxml2 and libxslt). I'm not sure about CentOS though. I'm going to close this since it's a bug in libxml2.
-
2 comments Created 5 months ago by mperhamXpath search works with Hpricot, fails with Nokogirinamespace-confusionxI'll attach the test case.
Correct result is 10
[Hpricot] size = 10 [Nokogiri] size = 0Environment:
nokogiri: 1.3.2
warnings: []libxml:
compiled: 2.7.3 loaded: 2.7.3 binding: extensionComments
Test case: http://gist.github.com/149234
flavorjones
Fri Jul 17 17:58:14 -0700 2009
| link
you need to properly use namespaces. Nokogiri supports standard XPath, and Hpricot does not.
puts "[Nokogiri] size = #{xml.xpath('//xmlns:Video').size}" # => 10the above gives you the right answer of 10.
you can read more about this on tenderlove's blog, at http://tenderlovemaking.com/2009/04/23/namespaces-in-xml/
-
Adding namespaces to child nodes should work inside default namespaces.
Comments
tenderlove
Fri Mar 06 16:23:00 -0800 2009
| link
Totally fixed this thing
-
This is a test.
Comments
-
We're duping a node, but not assigning a GC function.
Comments
tenderlove
Fri Mar 13 10:49:01 -0700 2009
| link
Done
-
1 comment Created 9 months ago by tenderlove1.3.0xAdd DTD validation supporttenderlovexNokogiri should have DTD validation support
Comments
tenderlove
Fri Mar 27 13:12:41 -0700 2009
| link
Added relaxng and xml schema support 469d528
-
1 comment Created 9 months ago by tenderlove1.3.0xAdd meta encoding getters and setters for HTML::DocumenttenderlovexAdd meta encoding getters and setters for HTML::Document
http://xmlsoft.org/html/libxml-HTMLtree.html#htmlSetMetaEncoding
Comments
tenderlove
Mon Mar 23 17:18:55 -0700 2009
| link
fixed in 507c953
-
Add html tag information lookup. We should be able to look up html tag information.
http://xmlsoft.org/html/libxml-HTMLparser.html#htmlTagLookup
Comments
tenderlove
Fri Mar 27 21:59:42 -0700 2009
| link
Added tag lookup code
-
testing
Comments
-
Get nokogiri working with JRuby via FFI
Comments
flavorjones
Wed Mar 18 15:44:25 -0700 2009
| link
Watch me pull this rabbit out of my hat ...
flavorjones
Fri Apr 17 10:17:13 -0700 2009
| link
ruby-ffi (MRI) now passes all tests except 4, which involve how whitespace is being escaped.
jruby FFI is another story. hit another blocker last night where a regression was introduced. will open a JRUBY ticket on kenai once I have a failing spec.
flavorjones
Sun Apr 19 17:28:37 -0700 2009
| link
opened http://jira.codehaus.org/browse/JRUBY-3584, which was a regression
flavorjones
Sun Apr 19 21:17:03 -0700 2009
| link
FFI branch now has only 15 errors against jruby master.
flavorjones
Fri May 01 05:33:07 -0700 2009
| link
FFI branch has been merged into master. Waiting for jruby 1.3 to be released.
flavorjones
Fri May 01 15:48:33 -0700 2009
| link
All tests pass on jruby 1.3.0RC1.
flavorjones
Mon May 11 06:41:28 -0700 2009
| link
Closing! Nokogiri 1.3.0 will contain this code. Currently our RC (which looks like 1.2.4) is totally green on JRuby 1.3.0 and later.
tenderlove
Mon May 11 08:24:24 -0700 2009
| link
Woot! Internet High Five
It is not work with jruby 1.3.1.
I have following error:
jirb
irb(main):001:0> require 'rubygems'
=> true irb(main):002:0> require 'nokogiri'
LoadError: Could not open any of [xml2, xslt, exslt]
from /home/dan/installed/jruby/lib/ruby/1.8/ffi/library.rb:18:in `ffi_lib' from /home/dan/installed/jruby-1.3.1-rails/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-java/lib/nokogiri/ffi/libxml.rb:5 from /home/dan/installed/jruby-1.3.1-rails/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-java/lib/nokogiri/ffi/libxml.rb:31:in `require' from /home/dan/installed/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from /home/dan/installed/jruby-1.3.1-rails/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-java/lib/nokogiri.rb:10 from /home/dan/installed/jruby-1.3.1-rails/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-java/lib/nokogiri.rb:36:in `require' from /home/dan/installed/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require' from (irb):3
flavorjones
Mon Aug 17 04:43:08 -0700 2009
| link
This issue is closed. Please open a new one.
Also, please make sure you actually have libxml2 and libxslt installed on your machine.
I create issue here:
http://github.com/tenderlove/nokogiri/issues/#issue/120But the format looks terrible.
-
1 comment Created 9 months ago by tenderlove1.3.0xImplement Nokogiri::XML::Node#<=>tenderlovexNodes can be compared as to where they are in the document tree. Implementing spaceship would let us sort a node set by appearance in the document.
Comments
tenderlove
Sun Mar 22 00:16:13 -0700 2009
| link
Fixed here: df9a6f2
-
NodeSet#dup is broken. The duplicate set should contain the same nodes as the duplicated set.
Comments
tenderlove
Thu Mar 19 09:50:46 -0700 2009
| link
fixed in d5b5a08
-
2 comments Created 9 months ago by tenderlove1.2.3xNokogiri::HTML('') should return an empty documenttenderlovexNokogiri::HTML('') should return an empty document
Comments
tenderlove
Thu Mar 19 09:55:15 -0700 2009
| link
fixed here. c1459a3
-
The Reader API kind of sucks. Make it better.
Comments
tenderlove
Tue Apr 21 21:58:43 -0700 2009
| link
The reader API actually was fine. It just needed more documentation. closed by 1153cc3
-
1 comment Created 8 months ago by tenderlove1.3.0xAdd [] and []= to NodeBuildertenderlovexThose methods would be handy for adding attributes to a node.
Comments
tenderlove
Tue Apr 21 21:35:16 -0700 2009
| link
adding NodeBuilder#[] and NodeBuilder#[]= closed by 9dd0a39
-
Make sure the SAX parsers have RDoc
Comments
tenderlove
Thu Apr 23 23:51:55 -0700 2009
| link
adding lots of documentation. closed by 10d8617
-
1 comment Created 8 months ago by tenderlove1.3.0xMake the HTML SAX interface match the xml interfacetenderlovexMake the HTML SAX interface match the xml interface
Comments
tenderlove
Thu Apr 23 23:33:29 -0700 2009
| link
gah, I suck. It does match the xml interface because it inherits from the xml interface. ugh.
-
Write an rdoc test to make sure that everything has RDoc.
Comments
tenderlove
Sat Apr 25 23:08:54 -0700 2009
| link
adding an rdoc test and adding lots of rdoc. closed by acddc4a
-
1 comment Created 8 months ago by tenderlove1.3.0xAdd documentation to Nokogiri::XML::SAX::ParsertenderlovexAdd documentation to Nokogiri::XML::SAX::Parser
Comments
tenderlove
Sat Apr 25 21:36:19 -0700 2009
| link
adding documentation for Nokogiri::XML::SAX::Parser closed by 6a45aef
-
1 comment Created 7 months ago by tenderlovecss attribute conditionals can break1.3.0xrequire 'nokogiri' require 'open-uri' url = 'http://www.google.com/advanced_search?hl=en' doc = Nokogiri.parse( open(url).read ) list = [ 'input[@name^="as_"]', # okay 'input[@name^= "as_"]', # error 'input[@name ^="as_"]', # error 'input[@name ^= "as_"]' # error ] list.each do | css | begin doc.search( css ) puts "#{css} - okay" rescue StandardError => e puts "#{css} - #{e.message}" end endComments
tenderlove
Thu May 07 14:58:04 -0700 2009
| link
making the parser return normalized strings. closed by 21c3478
-
We should implement NodeSet intersection:
http://xmlsoft.org/html/libxml-xpathInternals.html#xmlXPathIntersection
Comments
flavorjones
Sat May 09 10:27:40 -0700 2009
| link
implemented NodeSet#& (intersection). closed by f63a739.
-
We should probably rewrite this in C for a speed boost:
http://xmlsoft.org/html/libxml-xpathInternals.html#xmlXPathNodeSetMerge
Comments
flavorjones
Sat May 09 07:43:18 -0700 2009
| link
converted NodeSet#+ to C. closed by be5fbf8.
-
This might be nice:
http://xmlsoft.org/html/libxml-xpathInternals.html#xmlXPathHasSameNodes
Comments
flavorjones
Sat May 09 10:34:05 -0700 2009
| link
xmlXPathHasSameNodes returns true if any node is present in both. it should be named xmlXPathHasAnyNodeInCommon.
tenderlove
Sat May 09 12:38:37 -0700 2009
| link
What a terrible name for a function!
tenderlove
Thu May 14 22:33:21 -0700 2009
| link
implementing a bunch of NodeSet elements. closed by 8188b10
-
This might be pretty nice as well:
http://xmlsoft.org/html/libxml-xpathInternals.html#xmlXPathNodeSetContains
Comments
flavorjones
Sun May 10 22:09:43 -0700 2009
| link
implement NodeSet#include? using libxml function (instead of relying on Enumerable). closed by 5229c4f.
-
2 comments Created 7 months ago by tenderlove1.3.0xNokogiri.XML() should take a blocktenderlovexI think we need a configuration object similar to the configuration objects for the save methods. I hate using these freaking constants.
I want to implement something like this:
Nokogiri::XML(File.read(ARGV[0])) do |cfg| cfg.strict.no_network endComments
flavorjones
Fri May 15 13:42:06 -0700 2009
| link
Ooh, I like. +1.
tenderlove
Tue May 19 14:40:08 -0700 2009
| link
Block configuration for parsing HTML and XML. Closed by c73ac78
Squashed commit of the following:commit a428cae
Author: Aaron Patterson aaron.patterson@gmail.com
Date: Tue May 19 14:27:33 2009 -0700parse now takes a configuration blockcommit 1780cc7
Author: Aaron Patterson aaron.patterson@gmail.com
Date: Tue May 19 14:16:53 2009 -0700parse takes block for optionscommit ef1a2a7
Author: Aaron Patterson aaron.patterson@gmail.com
Date: Tue May 19 14:01:06 2009 -0700moving constants around -
2 comments Created 7 months ago by tenderlove1.3.0xFinding attributes that have a namespacetenderlovexRight now, it is difficult to locate an attribute that has a namespace. We need to make that easy.
For example:
<root xmlns:aaron='http://tlm.com/'> <foo one='two' aaron:one='three' /> </root>The foo node has two attributes, but using Node#[] can only get you the first one declared. Ideally, it should support syntax like this:
# First look for an attribute with no namespace, if it exists, return it, otherwise look for an # attribute with the same name, but ignore namespace. node['one'] # => 'two' node['aaron:one'] # => 'three' node[['aaron', 'one']] # => 'three'Comments
tenderlove
Fri May 15 16:45:44 -0700 2009
| link
unfortunately my dream syntax will not work for the same reason that you must register urls in xpath queries. ugh. So I added a couple methods for looking up nodes with a fully qualified ns.
tenderlove
Fri May 15 16:46:32 -0700 2009
| link
adding a couple methods for finding attributes with namespaces. closed by e46e1a9
-
Subclassing a node does not work well.
class MyXMLNode < Nokogiri::XML::Node end MyXMLNode.new("foo", @doc).class => Nokogiri::XML::ElementComments
tenderlove
Fri May 15 18:04:26 -0700 2009
| link
nodes may be subclassed. closed by 182394d
-
1 comment Created 7 months ago by tenderlove1.3.0xadd Node#namespace_nodestenderlovexAdd a method to node to get back a list of namespaces.
Comments
tenderlove
Sat May 16 23:23:57 -0700 2009
| link
Fixed with e7a4588
-
SAX parser start_element_ns is broken wrt to attributes
2 comments Created 6 months ago by tenderloveSAX parser start_element_ns is broken when it comes to attributes. The start_element_ns method should take a list of Attribute objects, not a hash. Namespace names are not unique, so we must pass an href along with the attribute.
Here is an example of two nodes with attributes of the same name, but the attributes DO NOT BELONG TO THE SAME HREF:
require 'nokogiri' doc = Nokogiri::XML(<<-eoxml) <root xmlns:foo='http://foo.example.com/'> <a foo:bar='hello' /> <b xmlns:foo='http://bar.example.com/'> <a foo:bar='hello' /> </b> </root> eoxml doc.css('a').each do |b| b.attribute_nodes.each do |attr_node| puts "#{attr_node.name} => #{attr_node.namespace.href}" end endThis will be an API breaking change. I am annoyed.
Comments
tenderlove
Fri Jun 19 16:28:18 -0700 2009
| link
The current sax parser completely ignores attribute namespace hrefs.
tenderlove
Sat Jun 20 19:34:31 -0700 2009
| link
Fixed with 507b912
-
1 comment Created 5 months ago by tenderlove1.4.1xFigure out how to attach a DTD to a documenttenderlovexI would like to be able to attach a DTD to an HTML document so that the id() xpath function works.
Comments
tenderlove
Tue Dec 01 21:00:01 -0800 2009
| link
Blech.
-
1 comment Created 5 months ago by tenderloveNokogiri.parse always assumes XML with the document is an IOtenderlovexI think we should switch this to assume it is an HTML document when an IO is provided
Comments
tenderlove
Sun Jul 26 18:54:51 -0700 2009
| link
Nokogiri.parse will assume HTML if parameter is an IO object. closed by 42d3548
-
1 comment Created 5 months ago by tenderlove1.4.0xDocument encoding should be yielded on SAX parsingtenderlovexJust what the title says. :-)
Comments
tenderlove
Sun Aug 09 19:39:48 -0700 2009
| link
adding an xml declaration SAX callback handler. closed by b1d7523
-
2 comments Created 4 months ago by tenderlove1.4.0xffixPackage libxml2 dll's with jruby gemtenderlovexPackage libxml2 dll's with jruby gem so that the jruby gem can work on windows / jruby
Comments
tenderlove
Thu Jul 30 09:44:59 -0700 2009
| link
I forgot, people seem to get this error:
C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/1.8/ffi/ffi.rb: 114:in `create _invoker': Function 'calloc' not found in [exslt] (FFI::NotFoundError) from C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/1.8/ ffi/library. rb:50:in `attach_function' from C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/1.8/ ffi/library. rb:48:in `each' from C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/1.8/ ffi/library. rb:48:in `attach_function' from C:/Program Files/JRuby/jruby-1.3.1/lib/ruby/gems/1.8/gems/ nokogiri- 1.3.2-x86-mswin32/lib/nokogiri/ffi/libxml.rb:54 from C:/Program Files/JRuby/jruby-1.3.1/lib/ruby/gems/1.8/gems/ nokogiri- 1.3.2-x86-mswin32/lib/nokogiri/ffi/libxml.rb:31:in `require' from C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/ site_ruby/1.8/ru bygems/custom_require.rb:31:in `require' from C:/Program Files/JRuby/jruby-1.3.1/lib/ruby/gems/1.8/gems/ nokogiri- 1.3.2-x86-mswin32/lib/nokogiri.rb:10 from C:/Program Files/JRuby/jruby-1.3.1/lib/ruby/gems/1.8/gems/ nokogiri- 1.3.2-x86-mswin32/lib/nokogiri.rb:36:in `require' from C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/ site_ruby/1.8/ru bygems/custom_require.rb:36:in `require' from TestNokogiri.rb:3
tenderlove
Wed Aug 12 11:45:21 -0700 2009
| link
This was fixed here: 70ad006
-
1 comment Created 4 months ago by tenderlove1.4.0xffixFix platform detection codetenderlovexWe need to be able to tell when a user is running on windows, not by just the platform. Right now, the code looks at the platform when it needs to look at the OS. Switch to this (from Luis):
RUBY_PLATFORM + RbConfig::CONFIG['host_os'] Similar approach is being used by mspec and the RubySpec to determine which API behavior should be checked for Java on Windows.Comments
tenderlove
Thu Aug 06 21:24:55 -0700 2009
| link
using host OS to figure out ENV["PATH"]. closed by 544b431
-
1 comment Created 4 months ago by tenderlove1.4.0xConvert meta_encoding and meta_encoding= to rubytenderlovexThese methods need to be converted to Ruby. The current meta_encoding= method will call xmlFreeNode() on the old meta tag which will cause a segv:
require 'nokogiri' doc = Nokogiri::HTML DATA.read node = doc.at('meta') puts node.name doc.meta_encoding = 'EUC-JP' p node __END__ <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Hello World</title> </head> <body> <h1>Hello Again</h1> </body> </html>Comments
tenderlove
Tue Oct 13 21:38:38 -0700 2009
| link
meta_encoding and meta_encoding= are implemented in ruby. closed by ceffd26
-
1 comment Created 3 months ago by tenderlove1.4.0xImplement Nokogiri::XML::ElementDecl#contenttenderlovexImplement Nokogiri::XML::ElementDecl#content
look at tree.h
struct _xmlElement, content member
Comments
tenderlove
Sat Sep 12 17:14:18 -0700 2009
| link
updating changelog. closed by 1f658f0
-
1 comment Created 3 months ago by tenderloveimplement Nokogiri::XML::DTD#external_id and system_idtenderloveximplement Nokogiri::XML::DTD#external_id and system_id
look at xmlDtdPtr
Comments
tenderlove
Sat Sep 12 14:06:51 -0700 2009
| link
adding DTD external id an system id. closed by 303b2b2
-
I think we should remove Node#collect_namespaces. Since namespace names are not unique, I don't know that this method is very useful.
Comments
flavorjones
Mon Sep 14 15:36:38 -0700 2009
| link
+1
tenderlove
Mon Sep 14 22:08:31 -0700 2009
| link
You're supposed to use the upvote button! ;-)
Although, I like the +1 better because it's not anonymous.
tenderlove
Sun Oct 04 20:34:09 -0700 2009
| link
This was removed in c7eb4b2
-
1 comment Created about 1 month ago by tenderlove1.4.1xAdd the ":self" psuedo selectortenderlovexAdd the ":self" pseudo selector so that people can have CSS expressions like this:
":self > foo"which would be equivalent to this:
"./foo"Comments
tenderlove
Mon Nov 09 15:37:11 -0800 2009
| link
Fixed in 55fbf25
-
Comments
Commit bfa172d allows to set the RECOVER option for PushParser. This resolves the issue.
tenderlove
Wed Dec 02 23:40:33 -0800 2009
| link
Ok, cool. I'll close this ticket then.
-
Comments
tenderlove
Wed Dec 02 23:46:50 -0800 2009
| link
adding the filter method on node set. closed by 90d4de3
-
1 comment Created about 1 month ago by tenderloveXML::Namespace#inspect may be broken1.4.1xI think the inspect method is broken. Broken or not, we need to figure out this crash:
http://groups.google.com/group/nokogiri-talk/msg/6f8e4ac93fbebf39
Comments
tenderlove
Tue Nov 24 20:43:29 -0800 2009
| link
This was fixed in 8111e7b
-
I added the xmldecl method to the SAX callbacks, and that broke the SOAP adapter. That broken this dudes code:
http://groups.google.com/group/nokogiri-talk/msg/dca72612114cfcc5
We need to figure out a way to get the adapter under test without loading in soap4r
Comments
flavorjones
Tue Nov 17 23:38:16 -0800 2009
| link
I've pushed a branch named 'soap4r-bug' that reproduces this problem in the Nokogiri test suite.
tenderlove
Tue Nov 24 20:32:36 -0800 2009
| link
adding tests for soap4r adapter and bugfixes. closed by ecdcb0a
-
The Tutorials link is broken on Firefox (including when JavaScript is enabled)
1 comment Created about 1 month ago by shlomifHi all!
The tutorials link on the Nokogiri site is broken on Firefox 3.5.5 on Mandriva Linux Cooker (standard Mandriva package), even when JavaScript is enabled (and it should work even when it isn't.).
I don't see anything in the error console.
Please fix it.
Regards,
-- Shlomi Fish
Comments
tenderlove
Tue Nov 24 20:43:48 -0800 2009
| link
This was fixed.
-
3 comments Created 7 months ago by purp1.3.0xNokogiri::XML::NodeSet << operator refuses to add Nokogiri::XML::ElementsflavorjonesxI've got two nodesets. When I try:
nodeset1 << nodeset2... Nokogiri laughs at me with:
ArgumentError: node must be a Nokogiri::XML::Node from (irb):29:in `<<' from (irb):29 from :0Ironically, the elements of a NodeSet are Nokogiri::XML::Elements, which is just an empty subclass of Nokogiri::XML::Node.
It looks like the offending bit is line 52 of xml_node_set.c:
static VALUE push(VALUE self, VALUE rb_node) { xmlNodeSetPtr node_set; xmlNodePtr node; if(! rb_funcall(rb_node, rb_intern("is_a?"), 1, cNokogiriXmlNode)) // <-- THIS ONE rb_raise(rb_eArgError, "node must be a Nokogiri::XML::Node"); Data_Get_Struct(self, xmlNodeSet, node_set); Data_Get_Struct(rb_node, xmlNode, node); xmlXPathNodeSetAdd(node_set, node); return self; }Comments
flavorjones
Wed May 06 06:22:09 -0700 2009
| link
Well, if we're going with array semantics:
[1,2,3] << [7,8,9] => [1,2,3,[7,8,9]]
which isn't what you want, and isn't supported by NodeSet.
Now, if you wanted to do
nodeset1 += nodeset2 [1,2,3] += [7,8,9] => [1,2,3,7,8,9]
that's a syntax I could get behind.
tenderlove
Wed May 06 09:23:49 -0700 2009
| link
I agree. A NodeSet may only contain Nokogiri::XML::Node. I like adding the + method to NodeSet though.
flavorjones
Wed May 06 11:49:03 -0700 2009
| link
implemented NodeSet#delete, NodeSet#- (difference operator) and NodeSet#+ (concatenation operator). Closed by 2a7bd9d.
-
Nokogiri won't find iconv.h in /usr/include under Cygwin.
$ cat mkmf.log find_header: checking for iconv.h in /usr/include,/opt/local/include,/usr/local/ include,/usr/include... -------------------- no "gcc -E -I. -I/usr/lib/ruby/1.8/i386-cygwin -I. -g -O2 -g -DXP_UNIX -O3 - Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline conft est.c -o conftest.i" checked program was: /* begin */ 1: #include <iconv.h> /* end */ "gcc -E -I. -I/usr/lib/ruby/1.8/i386-cygwin -I. -g -O2 -g -DXP_UNIX -O3 - Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline -I/usr /include conftest.c -o conftest.i" checked program was: /* begin */ 1: #include <iconv.h> /* end */ "gcc -E -I. -I/usr/lib/ruby/1.8/i386-cygwin -I. -g -O2 -g -DXP_UNIX -O3 - Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline -I/opt /local/include conftest.c -o conftest.i" checked program was: /* begin */ 1: #include <iconv.h> /* end */ "gcc -E -I. -I/usr/lib/ruby/1.8/i386-cygwin -I. -g -O2 -g -DXP_UNIX -O3 - Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline -I/usr /local/include conftest.c -o conftest.i" checked program was: /* begin */ 1: #include <iconv.h> /* end */ "gcc -E -I. -I/usr/lib/ruby/1.8/i386-cygwin -I. -g -O2 -g -DXP_UNIX -O3 - Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline -I/usr /include conftest.c -o conftest.i" checked program was: /* begin */ 1: #include <iconv.h> /* end */ -------------------- $ ls /usr/include/i* /usr/include/icmp.h /usr/include/io.h /usr/include/itclIntDecls.h /usr/include/iconv.h /usr/include/itcl.h /usr/include/itk.h /usr/include/ieeefp.h /usr/include/itclDecls.h /usr/include/itkDecls.h /usr/include/inttypes.h /usr/include/itclInt.hComments
tenderlove
Thu May 21 10:36:35 -0700 2009
| link
Can you try compiling that C file? I'm not sure why mkmf is complaining.
tenderlove
Wed Jul 01 15:36:49 -0700 2009
| link
I think this is working now:
-
5 comments Created about 1 month ago by bhauff1.4.1xJRuby 1.4.0, Nokogiri 1.4.0 and WindowsflavorjonesxWhen requiring nokogiri from irb or cucumber there is an FFI error:
FFI::NotFoundError: Function '__xmlParserVersion' not found in [msvcrt]I am including links to show the entire issue:
http://pastie.org/690973 - when running cucumber
http://pastie.org/691987 - when running irb (through JRuby)Comments
The main issue here is that FFI can't find the DLLs needed. I see that nokogiri does some tricks with PATH, but it won't work with dll loading, since LoadLibrary() call won't see PATH changes. So, either libs need to be specified with the full path, or they should be places in some place which is already on PATH, before JRuby starts.
Also it worth noting than on Jruby's master branch we changed a bit how ffi_lib works, so that it won't silently skip the DLL if it's not found, so with JRuby master version the failure is immediate and more clear. I also verified that if I put the DLLs into place on PATH and then start JRuby, nokogiri loads fine and works.
Here's the patch that solves the problem on Windows (under JRuby):
http://gist.github.com/233100Essentially, since ffi_lib doesn't do any magic at all about the paths, we should provide the fully qualified path names for every DLL, and in Windows format, since those paths will be directly transferred to LoadLibrary() call.
I have tested this patch with Windows, JRuby 1.4.0 and Nokogiri 1.4.0 and it works for me.
flavorjones
Thu Nov 19 15:29:31 -0800 2009
| link
ffi + windows + jruby dll loading fix (thanks, Vladimir Sizikov!). Closed by e4976fd
-
1 comment Created 2 months ago by EmpactSegmentation Fault on modified re-raise1.4.0xxml = Nokogiri::XML('<xml />') begin xml.xpath('http://') rescue Nokogiri::XML::XPath::SyntaxError => e raise e, "howdy" endresults in:
[BUG] Segmentation fault ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-darwin10] Abort trap
for
--- warnings: [] libxml: loaded: 2.7.5 binding: extension compiled: 2.7.5 nokogiri: 1.3.3
Comments
tenderlove
Tue Oct 13 20:49:06 -0700 2009
| link
duplicating erorrs works. yay! closed by 33922d7
-
Anchor tags tightly wrapping another element generate unwanted whitespace on #to_xhtml
3 comments Created 2 months ago by dasil003This is just weird. Observe the reduced test cases:
s = '<a><b>see</b></a>' n = Nokogiri::HTML::DocumentFragment.parse(s) n.to_xhtml => "<a>\n <b>see</b>\n</a>"to_html works:
s = '<a><b>see</b></a>' n = Nokogiri::HTML::DocumentFragment.parse(s) n.to_html => "<a><b>see</b></a>"as does adding a text node in the source:
s = '<a> <b>see</b></a>' n = Nokogiri::HTML::DocumentFragment.parse(s) n.to_xhtml => "<a> <b>see</b></a>"Comments
BTW, I just discovered it also affects the OBJECT tag.
tenderlove
Wed Oct 14 15:06:55 -0700 2009
| link
I don't think this is a bug. The default save options for to_xhtml say to format or "pretty print" the document. If your document contains space nodes, it will preserve them. If there are no blank nodes in the document, it will add them to make the output formatted.
If you don't want formatting, you can change the to_xhtml save options:
s = '<a><b>see</b></a>' n = Nokogiri::HTML::DocumentFragment.parse(s) puts n.to_xhtml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XHTML) -
4 comments Created 6 months ago by FotoVerite1.3.1xSymbol not found: _xmlRelaxNGSetParserStructuredErrorstenderlovexOld problem with a new twist. Have the latest libxml2 installed. Happens in 1.3 not in 1.2 which is why I can continue developing for the moment
Comments
tenderlove
Mon Jun 01 14:27:58 -0700 2009
| link
What version of libxml2 do you have installed?
FotoVerite
Mon Jun 01 15:33:04 -0700 2009
| link
2.6.30
tenderlove
Mon Jun 01 15:34:09 -0700 2009
| link
Hmmm.. That is not the latest. 2.7.3 is the latest. I will fix it to work with 2.6.30 though
tenderlove
Mon Jun 01 16:56:06 -0700 2009
| link
we should use macros rather than runtime tests. closed by a7a19dd
-
Comments
-
Right now, the JRuby gem does not ship with the DLLs.
To work around this, I tried copying the DLLs from the Windows MRI gem, but then Nokogiri gives the following error:
FFI::NotFoundError: Function 'calloc' not found in [exslt]
Comments
Same problems here which is unfortunate since the community really needs a cross-platform XML library. The developers of our project work on Windows, Linux as well as Solaris and deploy on Solaris. Nokogiri would be perfect in this regard if this bug was just fixed.
Please, this problem has existed for about a year and is a show-stopper for many projects. I know the developers time are limited and they do this in their spare time, but if wide usage of the library is of any interest, addressing this bug is a quick win to expand the use of Nokogiri.
tenderlove
Sun Oct 04 20:59:03 -0700 2009
| link
Hey everyone, this is a bug in JRuby. I've filed a ticket with them here:
http://jira.codehaus.org/browse/JRUBY-4052
Once they get it sorted out, I will close this ticket. :-)
I am confused about what it is we're not doing. If calloc is a (fairly) standard POSIX function, what is it we should be doing differently?
tenderlove
Tue Oct 06 11:49:50 -0700 2009
| link
@headius I think the libc functions should be loaded by default. Wayne gives a workaround in the JIRA ticket, but I think it's unreasonable to require me to change my FFI code depending on the platform. The current code works on everything but windows.
Fix added to git://github.com/jojje/nokogiri.git for people to try having access to a Windows environment. Requires JRuby 1.4.0RC1 or higher due to some needed fixes regarding FFI.
tenderlove
Tue Oct 13 13:52:01 -0700 2009
| link
I applied this:
I'm closing this ticket since it should be fixed for JRuby / Windows people with 1.4.0
-
Hi guys,
I get the following error message while downloading an XML file and opening it using nokogiri:
res = Net::HTTP.post_form(URI.parse(....), {...}) doc = Nokogiri::XML(Nokogiri::XML(res.body).xpath("//text()").to_s.gsub("& lt;", "<").gsub("& gt;", ">"))I have installed the latest nightly on OS X 10.5.6.
/Library/Ruby/Gems/1.8/gems/nokogiri-1.3.3.20091004000018/lib/nokogiri/xml/document.rb:33: [BUG] Segmentation fault ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0] Abort trapI have also tried to split the constructor calls:
doc = Nokogiri::XML(res.body).xpath("//text()").to_s.gsub("< ;", "<").gsub("> ;", ">")I have about 10 different XML files, and it crashes randomly on the different files, so I can't say that it's one specific file.
The XML files vary in size from 3mb to 150mb.
The files are very basic XML:
<?xml version="1.0" encoding="utf-8"?> <string>....</string>where the string element contains escaped XML. Unfortunately the XML files are data we receive from an external vendor, so not really anything we can do about that.
I can try to normalize the data using gsub and then use nokogiri on the xmlified data.
Regards
William
Comments
tenderlove
Tue Oct 06 08:49:35 -0700 2009
| link
Can you run "nokogiri -v" and add the output to this ticket?
It seems that that did the trick. I have not had any segfaults since.
libxml:
loaded: 2.7.3 binding: extension compiled: 2.7.3 nokogiri: 1.3.3I am still busy testing it, but so far so good. I will keep you posted and close this ticket when I am sure that it is not an issue anymore.
tenderlove
Tue Oct 06 09:05:55 -0700 2009
| link
Okay, sounds good. You might also want to try upgrading libxml2. The latest libxml2 is 2.7.5 and I know they've packed in a bunch of bug fixes. If that doesn't do the trick, would you mind sending us a sample of the XML you're using to make it crash? It shouldn't SEGV under any circumstances. :-)
tenderlove
Tue Oct 13 13:49:22 -0700 2009
| link
Any updates on this?
tenderlove
Thu Oct 15 09:16:59 -0700 2009
| link
I'm closing this since there have been no updates. Please reopen if you're still having problems! Thanks!
-
Code to reproduce
>> Nokogiri::HTML(%(<a>tag</a> <a href="test\0test"> you don't see me!)).to_html => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n<a>tag</a> <a href=\"test\"></a>\n</body></html>\n"\0 outside works though.
My ruby version is 1.8.7, nokogiri version is 1.4.0, libxml version is 2.7.5.
Comments
flavorjones
Thu Dec 03 20:33:50 -0800 2009
| link
this is an issue with all versions of libxml2, at least back to 2.6.16
flavorjones
Thu Dec 03 20:59:11 -0800 2009
| link
this is because a null byte is treated as a string terminator in C. you should work around this by removing null bytes from your document.
tenderlove
Fri Dec 04 08:41:11 -0800 2009
| link
This is a bug in libxml2. We'll work with them to fix it, but there is nothing we can do about it in the nokogiri code base.
-
Provide DOM/SAX examples of loading a DTD so named entities can be resolved
4 comments Created about 1 month ago by yobCurrently if I attempt to parse a document that has named entities in it I get an exception.
The files I'm trying to parse conform to a DTD that defines the valid entities, word on the street is that by loading the DTD i may be able to avoid the exception. See http://groups.google.com/group/nokogiri-talk/browse_thread/thread/8225dfb0ffbe0098
Comments
tenderlove
Sun Dec 06 14:45:34 -0800 2009
| link
Figured it out!
Awesome, thanks. Now I just need to grok the XML catalog tomfoolery so I can stop libxml fetching the DTD over the net everytime I parse a file
tenderlove
Sun Dec 06 15:47:08 -0800 2009
| link
What I've been doing, and I admit this probably isn't the "right way", is doing a "sub" on my XML that contains a dtd, and point it at the filesystem:
http://github.com/tenderlove/markup_validity/blob/master/lib/markup_validity/validator.rb#L11-32
I'm hesitant to use sub, only because we sometimes deal with very large files (in the hundred of MB range). I think I've nutted out the system XML catalog stuff, see http://github.com/yob/onix-dtd.
>Thanks again for your help.
-
don't blow up on HTML nodes with undeclared namespaces
1 comment Created 8 months ago by flavorjonesMicrosoft Word notably generates this kind of HTML
Comments
flavorjones
Fri Apr 17 09:57:27 -0700 2009
| link
closed by 395d79
-
1 comment Created 8 months ago by flavorjones1.3.0xreparenting an unlinked node results in double-freeflavorjonesxthis code, run under valgrind, demonstrates:
doc = Nokogiri::XML <<-EOHTML <root> <a> <b/> </a> </root> EOHTML root = doc.at("root") a = root.at("a") b = a.at("b") a.add_next_sibling(b.unlink) puts doc.to_sComments
flavorjones
Tue Apr 21 22:27:59 -0700 2009
| link
Fixing edge case where a node can be unlinked then reparented. Closed by 763ee20.
-
Comments
flavorjones
Fri May 08 15:06:56 -0700 2009
| link
implemented NodeSet#index. closed by bf611db.
-
Comments
flavorjones
Sat May 09 09:49:14 -0700 2009
| link
implemented NodeSet#slice (aliased to []) which takes a start and a length, or a Range. closed by bb323ac.
-
1 comment Created 7 months ago by flavorjones1.3.0xreplace memcpy with AbstractMemory#put_bytes() in FFI implementationflavorjonesxFrom Wayne Meissner:
I noticed in nokogiri, in Document.read_io that you use LibXML.memcpy to transfer data from a ruby string to a native memory buffer.
It'll be much faster to use AbstractMemory#put_bytes(), as the memcpy mapping will triple copy the string data on JRuby.
basically, what happens behind the scene is:
- The string data is copied to a temporary native memory buffer
- The temp memory is passed as the parameter
- memcpy copies the data from the temp memory to the destination memory you specified
- The temporary native memory is copied back to the ruby string.
compared to put_bytes:
- The string data is copied once from the ruby byte array backing the ruby string into the destination memory address.
- There is no step 2.
For large files, this might make a difference.
Comments
flavorjones
Mon May 11 15:53:25 -0700 2009
| link
FFI: IO reader callbacks now use AbstractMemory#put_bytes instead of memcpy, per Wayne's advice. along the way, refactored callbacks. closed by 0b87111.
-
frequently getting the following error when running on MRI FFI:
/home/mike/code/nokogiri/test/helper.rb:11: libxml version info: {"nokogiri"=>"1.2.4", "warnings"=>[], "libxml"=>{"platform"=>"ruby", "binding"=>"ffi", "loaded"=>"2.6.31"}} Loaded suite -e Started ... Finished in 3.285882 seconds. 1) Error: test_attribute_roundtrip(TestReader): RangeError: 0xdb7eb02a is recycled object /home/mike/code/nokogiri/lib/nokogiri/ffi/structs/common_node.rb:11:in `_id2ref' /home/mike/code/nokogiri/lib/nokogiri/ffi/structs/common_node.rb:11:in `ruby_node' /home/mike/code/nokogiri/lib/nokogiri/ffi/xml/node.rb:252:in `wrap' /home/mike/code/nokogiri/lib/nokogiri/ffi/xml/node.rb:107:in `attribute_nodes' /home/mike/code/nokogiri/lib/nokogiri/ffi/xml/reader.rb:42:in `attribute_nodes' /home/mike/code/nokogiri/lib/nokogiri/xml/reader.rb:52:in `attributes' ./test/test_reader.rb:142:in `test_attribute_roundtrip' /home/mike/code/nokogiri/lib/nokogiri/xml/reader.rb:61:in `call' /home/mike/code/nokogiri/lib/nokogiri/xml/reader.rb:61:in `each' ./test/test_reader.rb:141:in `test_attribute_roundtrip' 544 tests, 1367 assertions, 0 failures, 1 errors rake aborted! Command failed with status (1): [/usr/bin/ruby1.8 -w -Ilib:ext:bin:test -e ...]Comments
flavorjones
Mon May 11 20:04:06 -0700 2009
| link
note that i can repro this with 2.6.31 and 2.6.32, but not 2.7.2 or 2.7.3
flavorjones
Mon May 11 21:43:37 -0700 2009
| link
fixing problem with recycled xml reader attribute nodes. closed by e9c79b4.
-
We should probably alias #clone to #dup.
Comments
flavorjones
Thu May 14 15:46:59 -0700 2009
| link
Node#clone is now an alias for #dup. closed by 708d62f.
-
We might want to write up some docs that present functionality broken down the same way that jQuery's docs do it (which I find totally useful and obvious):
- core (parsing)
- selectors
- attributes
- traversing
- manipulation
Comments
tenderlove
Fri May 15 09:39:55 -0700 2009
| link
I like this. Should we do it in the wiki? Or as RDoc? Or both?
I'd say both of them... with the hpricot api seemingly unavailable following why's disappearance, better / more complete documentation for nokogiri would be a great help.
tenderlove
Sat Nov 07 12:23:39 -0800 2009
| link
We're doing our best. All methods are currently documented. Unfortunately "better" is a very subjective word. That being said, patches are greatly appreciated!
We've launched a new website: http://nokogiri.org
And we're working on tutorials. :-(
flavorjones
Tue Nov 24 20:50:01 -0800 2009
| link
Tutorials are up on http://nokogiri.org
tenderlove
Tue Nov 24 20:50:35 -0800 2009
| link
I think we can close this.
-
3 comments Created 7 months ago by flavorjonesFFI ruby object caching should be rewritten to not use id2refffixid2ref is slow and may be turned off by default in JRuby 1.4.
discussed with wmeissner, and the probable path is to build an API into FFI that is an hash table containing address => weakref(ruby_object).
Comments
nicksieger
Mon Nov 16 13:32:31 -0800 2009
| link
FYI, id2ref (and objectspace) is turned off by default in JRuby. We made a conscious decision to do this because it's expensive and not feasible to manage all live objects with JRuby.
nicksieger
Mon Nov 16 13:59:28 -0800 2009
| link
Also: tools to help implement the caching in Java/JRuby:
http://java.sun.com/javase/6/docs/api/java/lang/ref/WeakReference.html
http://java.sun.com/javase/6/docs/api/java/util/WeakHashMap.htmlNote the last item may not be exactly what is needed, it's a map w/ weak keys, not a map that weakly references its values.
flavorjones
Mon Nov 16 14:07:16 -0800 2009
| link
Nick, thanks for the pointers (no pun intended).
-
i wish i could delete this ticket
Comments
flavorjones
Thu May 14 19:32:39 -0700 2009
| link
duplicate of #50. whoops.
-
2 comments Created 7 months ago by flavorjonesffixuse FFI::IO.native_read in IoCallbacks once it's releasedflavorjonesxsee http://jira.codehaus.org/browse/JRUBY-3636 for details
Comments
flavorjones
Sun May 17 20:01:21 -0700 2009
| link
From Wayne:
Hi Mike, I implemented FFI::IO.native_read properly in JRuby now, so FFI::IO.native_read into a MemoryPointer (or any form of AbstractMemory) will be basically zero copy. To use it, you need to 1. Check out the 'ffi-1.4' branch from JRuby 2. If you're on anything other than i386-linux, you need to recompile jffi - check it out from jffi.kenai.com - 'ant clean jar test' - If you're on MacOS, copy dist/Darwin.jar to jruby's build_lib/jffi-Darwin.jar - other arches are ${cpu}-${OS}.jar 3. Rebuild jruby That should enable the new native_read. If you didn't get all the steps of jffi building and copying right, it'll asplode. It also has a tiny string micro-optimization that may make a difference in nokogiri, since it is mostly string reads from native memory. What do you use to benchmark nokogiri? There are a few other places that could do with improvements.
flavorjones
Mon Jun 15 22:01:36 -0700 2009
| link
done. see 232ed10
-
1 comment Created 7 months ago by flavorjonesFFI: support varargs in error/exception callbacksffixwe should open JIRA tickets for vararg support in FFI callbacks
then we should format the libxml error messages properly in the error/exception callbacks
Comments
flavorjones
Sun Jun 21 19:38:17 -0700 2009
| link
@tmm1 poked me about this. I'll open a ticket for it tonight.
-
2 comments Created 7 months ago by flavorjones1.3.0xfragment serialization is inconsistentflavorjonesxfrag = Nokogiri::HTML.fragment("<p>foo</p>") frag.to_xml # => "<p>foo</p>\n" frag.serialize # => "<p>foo</p>\n"but
frag.to_s # => "<><p>foo</p></>" frag.to_html # => "<><p>foo</p></>"Comments
flavorjones
Wed May 20 20:18:21 -0700 2009
| link
aaron suggests aliasing to_s and to_html to inner_html for DocumentFragments. I concur.
flavorjones
Wed May 20 20:23:56 -0700 2009
| link
HTML fragments now serialize consistently. closed by 57c2ba2.
-
1 comment Created 7 months ago by flavorjonesffixuse get_array_of_pointer for performance improvementflavorjonesxshould be in JRuby 1.3 final, not sure about MRI-FFI right now. in particular, this can be used in SAX parser implementation.
Comments
flavorjones
Tue Nov 24 20:42:52 -0800 2009
| link
FFI: cleanup. closed by c0fbd68.
-
2 comments Created 6 months ago by flavorjonesCSS queries on NodeSet should return matching toplevel nodes1.3.2xBecause a CSS search on a Node only returns children of that node, a search on NodeSet will search across the collective children of the nodes in the set.
We should return matching toplevel nodes.
tenderlove suggested that CSS queries on a NodeSet should get translated as follows:
"div" => "self::div | .//div"Comments
flavorjones
Mon Jun 08 00:07:56 -0700 2009
| link
def test_node_set_css_searches_match_self html = Nokogiri::HTML("<html><body><div class='a'></div></body></html>") set = html.xpath("/html/body/div") assert_equal set.first, set.css(".a").first end
tenderlove
Tue Jun 09 09:36:06 -0700 2009
| link
node sets now search the top level list of nodes in addition to children. closed by e65d71c
-
3 comments Created 6 months ago by flavorjones1.3.2xjruby breakage - 1.3.0 and 1.3.1ffixTests fails on JRuby 1.3.0 and 1.3.1. Notably, 1.3.0RC2 does not break. WTF?
Comments
flavorjones
Mon Jun 15 20:22:46 -0700 2009
| link
git bisect indicates this is the commit causing nokogiri issues:
http://github.com/jruby/jruby/commit/80dcf0897e7b1abc28487e5e9fe02b04887d95cf
flavorjones
Mon Jun 15 20:46:01 -0700 2009
| link
asked Wayne for help.
flavorjones
Mon Jun 15 21:48:01 -0700 2009
| link
closed by 80b3b2f
-
20 comments Created 5 months ago by flavorjonesffixFFI: Invalid callback parameter type: STRING on FreeBSD/amd64flavorjonesxOriginally at http://jira.codehaus.org/browse/JRUBY-3781
FreeBSD 7.2/amd64, jruby 1.3.1, nokogiri 1.3.2-java, libxml2-2.6.32
Loading nokogiri fails with:
irb(main):002:0> require 'nokogiri' ArgumentError: Invalid callback parameter type: STRING from /home/lovec/bin/jruby/lib/ruby/1.8/ffi/ffi.rb:120:in `create_invoker' from /home/lovec/bin/jruby/lib/ruby/1.8/ffi/library.rb:50:in `attach_function' from /home/lovec/bin/jruby/lib/ruby/1.8/ffi/library.rb:48:in `each' from /home/lovec/bin/jruby/lib/ruby/1.8/ffi/library.rb:48:in `attach_function' from /home/lovec/bin/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3.2-java/lib/nokogiri/ffi/libxml.rb:138 from /home/lovec/bin/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3.2-java/lib/nokogiri/ffi/libxml.rb:31:in `require' from /home/lovec/bin/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from /home/lovec/bin/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3.2-java/lib/nokogiri.rb:10 from /home/lovec/bin/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3.2-java/lib/nokogiri.rb:36:in `require' from /home/lovec/bin/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require' from (irb):3I .inspected the parameters passed to the function that fails if that helps
function => #<Library Symbol library=xml2 symbol=xmlSaveToIO address=0x82b702da0> args => [#<FFI::CallbackInfo [ pointer, string, int32 ], int32>, #<FFI::CallbackInfo [ pointer ], int32>, #<FFI::Type::Builtin:POINTER size=8 alignment=8>, #<FFI::Type::Builtin:STRING size=8 alignment=8>, #<FFI::Type::Builtin:INT32 size=4 alignment=4>] ret => #<FFI::Type::Builtin:POINTER size=8 alignment=8> options => {:convention=>:default, :type_map=>nil, :enums=>nil} FFI::Invoker.new(function, args, find_type(ret), options)Comments
flavorjones
Tue Jun 30 04:34:45 -0700 2009
| link
We need to repro on a 64-bit machine. Sigh.
having the same problem. would love to use nokogiri but have to use jruby and deploy apps using warbler. and i get the same error every time i tried. hope someone can help.
we would need that fix too. We have 64-bit machines, but no idea how to fix it. Maybe if you provide a test framework or a patched version we could try to reproduce and give more input on this issue.
Irg, showstopper for me! I would love to see that one fixed.
flavorjones
Tue Jul 21 20:06:59 -0700 2009
| link
Hey kids, we hear you loud and clear. I've just got to find a 64-bit machine to repro on. Will try a little harder.
flavorjones
Fri Jul 24 22:40:16 -0700 2009
| link
I am unable to reproduce. Here's what I'm running on:
- ubuntu hardy (8.04) [Linux 2.6.24-19-xen #1 SMP Sat Jul 12 00:15:59 UTC 2008 x86_64 GNU/Linux]
- jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_06) [amd64-java]
- nokogiri (1.3.2)
- libxml2 2.6.31.dfsg-2ubuntu1.3
I'd love it if some people would post their specific configs here, so we can try to isolate what's going on.
hey mike,
i just created a fresh rails app which simply parses some XML and returns the result using render :text, created a war-file and deployed it in a tomcat on one of our testservers. and this is the complete exception from when the app should be deployed: http://gist.github.com/154787
plus some information about the environment:
Linux version 2.6.26-2-xen-686 (Debian 2.6.26-17)
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) Server VM (build 11.0-b16, mixed mode)
Apache Tomcat Version 6.0.14
Nokogiri 1.3.2-java
JRuby 1.3.1
JRuby-Rack 0.9.4if you need anything more, please let me know.
flavorjones
Sun Jul 26 19:42:41 -0700 2009
| link
Smacks (Daniel),
Can you provide a short script that reproduces this? Neither Aaron nor I have been able to reproduce, although it's clear that bad things are happening.
Specifically, I'm wondering if this is due to an interaction with another library you have loaded in your environment.
-mike
hi mike,
i uploaded a very basic rails app to:
http://rapidshare.de/files/47958747/nokotest.tar.gz.htmlall it's supposed to do, is to parse some xml (home controller) and return the result.
you can run it using ruby/jruby on a mongrel or whatever and it works fine. for the problem to unfold you need to put the nokotest.war which you can find in the project's war-folder, copy it to you tomcat/webapps-folder and deploy the server. watch the logs. hope that helps.I also experience this on JRuby on Windows (after bypassing the issue where 'calloc' is not found):
Configuration:
Windows 2008 Server Enterprise SP1 (32-bit) on Intel Core 2 6300
jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM) Client VM 1.
6.0_14) [x86-java]
nokogiri-1.3.3-x86-mswin32 or -1.3.3-javaStack trace:
irb(main):004:0> require 'nokogiri'
ArgumentError: Invalid callback parameter type: STRING
from c:/jruby/lib/ruby/1.8/ffi/ffi.rb:120:in `create_invoker' from c:/jruby/lib/ruby/1.8/ffi/library.rb:50:in `attach_function' from c:/jruby/lib/ruby/1.8/ffi/library.rb:48:in `each' from c:/jruby/lib/ruby/1.8/ffi/library.rb:48:in `attach_function' from C:/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-x86-mswin32/lib/nokogiri/ffi/libxml.rb:138 from C:/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-x86-mswin32/lib/nokogiri/ffi/libxml.rb:31:in `require' from c:/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from C:/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-x86-mswin32/lib/nokogiri.rb:15 from C:/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-x86-mswin32/lib/nokogiri.rb:31:in `require' from c:/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from (irb):8I create issue here: http://github.com/tenderlove/nokogiri/issues#issue/120
Today I found the issue I followed in Jruby's JIRA has been created here: http://github.com/tenderlove/nokogiri/issues#issue/90 . Sorry for create another issue.Actually, I solved the problem on my machine by using exslt. Hope you can solve the problem like me.
flavorjones
Tue Aug 18 23:20:08 -0700 2009
| link
Daniel (smacks),
Can you confirm that you have libxslt installed in your environment?
Thanks,
-mikegoing to verify that it's installed on all our testservers and test again. will get back to you soon!
ok it's still weird. it's running in a local tomcat, i got it running on one of our testservers as well ... but trying another one gives me the same problem. also checked for installed libs. see the output below. guess that's all there is to install?!
/sbin/ldconfig -p | grep libxml
libxml2.so.2 (libc6) => /usr/lib/libxml2.so.2 libxml2.so (libc6) => /usr/lib/libxml2.so libxml.so.1 (libc6) => /usr/lib/libxml.so.1 libxml.so (libc6) => /usr/lib/libxml.so/sbin/ldconfig -p | grep libxslt
libxslt.so.1 (libc6) => /usr/lib/libxslt.so.1 libxslt.so (libc6) => /usr/lib/libxslt.sosome more infos:
dpkg-query -l | grep libxml
ii libxml-dev (1.8.17-14+etch1) ii libxml-ruby (0.3.8-1) ii libxml-ruby1.8 (0.3.8-1) ii libxml1 (1.8.17-14+etch1) ii libxml2 (2.6.27.dfsg-6+etch1) ii libxml2-dev (2.6.27.dfsg-6+etch1)dpkg-query -l | grep libxslt
ii libxslt-ruby (0.9.2-1) ii libxslt-ruby1.8 (0.9.2-1) ii libxslt1-dev (1.1.19-3) ii libxslt1.1 (1.1.19-3)I solved the problem and record them here:
http://maodan520.spaces.live.com/blog/cns!E0C8D36B1650926A!237.entryMake sure do not use libxml2 or libxslt, you should use libexslt. Good luck.
flavorjones
Mon Sep 21 19:19:24 -0700 2009
| link
Is anyone still having this issue? It's unclear to me from the above threads whether this was solved by installing the proper libraries ... Let me know, please!
i'll make sure to report back within a week. quite busy, so i need to find some time for testing. sorry.
flavorjones
Mon Sep 28 23:38:31 -0700 2009
| link
Ping?
flavorjones
Tue Sep 29 14:13:17 -0700 2009
| link
Closing for now. Please reopen if anyone can repro, and help me repro.
-
3 comments Created 5 months ago by flavorjonesimprove performance building large documents with XML::Builder1.3.3xsome benchmark numbers:
user system total real nokogiri: 1000 docs, 10 stories, to string 5.180000 0.470000 5.650000 ( 5.663716) nokogiri: 100 docs, 100 stories, to string 5.500000 0.480000 5.980000 ( 5.984575) nokogiri: 10 docs, 1000 stories, to string 7.900000 0.490000 8.390000 ( 8.409311) nokogiri: 1 docs, 10000 stories, to string 23.530000 0.500000 24.030000 ( 24.247925)Comments
flavorjones
Fri Jul 17 17:25:05 -0700 2009
| link
removing O(n) penalty in node new/unlink/reparent by replace xmlXPathNodeSet with a hash. closed by f34f3bd.
flavorjones
Fri Jul 17 17:27:17 -0700 2009
| link
new benchmarks:
user system total real nokogiri: 1000 docs, 10 stories, to string 4.980000 0.530000 5.510000 ( 5.546052) nokogiri: 100 docs, 100 stories, to string 5.360000 0.520000 5.880000 ( 5.900826) nokogiri: 10 docs, 1000 stories, to string 6.360000 0.560000 6.920000 ( 6.973381) nokogiri: 1 docs, 10000 stories, to string 6.270000 0.500000 6.770000 ( 6.783409)
flavorjones
Fri Jul 17 17:29:14 -0700 2009
| link
and just for posterity, here is the same documents generated by Builder::XmlMarkup (the Rails default builder):
builder: 1000 docs, 10 stories, to string 13.570000 1.300000 14.870000 ( 14.887572) builder: 100 docs, 100 stories, to string 13.470000 1.270000 14.740000 ( 14.748797) builder: 10 docs, 1000 stories, to string 13.210000 1.310000 14.520000 ( 14.544887) builder: 1 docs, 10000 stories, to string 13.370000 1.280000 14.650000 ( 14.889584) -
0 comments Created 5 months ago by flavorjonesFFI needs unlinkedNodes to be optimizedffixExtension commit f34f3bd needs to be ported.
Comments
-
7 comments Created 2 months ago by flavorjonesextconf.rb have_func() always fails under Ruby Enterprise build systemREExruby-enterprise-1.8.6-20090610:
checking for xmlRelaxNGSetParserStructuredErrors()... no checking for xmlRelaxNGSetParserStructuredErrors()... no checking for xmlRelaxNGSetValidStructuredErrors()... no checking for xmlSchemaSetValidStructuredErrors()... no checking for xmlSchemaSetParserStructuredErrors()... noComments
flavorjones
Wed Sep 30 22:58:10 -0700 2009
| link
root cause:
"gcc -o conftest -I/usr/include/libxml2 -I/usr/include -I. -I/home/mike/builds/ruby-enterprise-1.8.6-20090610-install/lib/ruby/1.8/i686-linux -I/home/mike/code/nokogiri/ext/nokogiri -I/usr/include/libxml2 -I/usr/include -I. -I/home/mike/builds/ruby-enterprise-1.8.6-20090610-install/lib/ruby/1.8/i686-linux -I/home/mike/code/nokogiri/ext/nokogiri -g -O2 -g -DXP_UNIX -O3 -Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline conftest.c -L/opt/local/lib -Wl,-R/opt/local/lib -L. -rdynamic -Wl,-export-dynamic -lexslt -lxslt -lxml2 -lruby-static -lexslt -lxslt -lxml2 -ldl -lcrypt -lm -lc" conftest.c: In function ‘t’: conftest.c:3: warning: implicit declaration of function ‘xmlRelaxNGSetParserStructuredErrors’ /usr/bin/ld: cannot find -lruby-static collect2: ld returned 1 exit status checked program was: /* begin */ 1: /*top*/ 2: int main() { return 0; } 3: int t() { xmlRelaxNGSetParserStructuredErrors(); return 0; } /* end */suggested fix:
diff --git a/ext/nokogiri/extconf.rb b/ext/nokogiri/extconf.rb index 7c21e7d..b77552d 100644 --- a/ext/nokogiri/extconf.rb +++ b/ext/nokogiri/extconf.rb @@ -129,7 +129,7 @@ unless find_library('exslt', 'exsltFuncRegister', *LIB_DIRS) abort "libxslt is missing. try 'port install libxslt' or 'yum install libxslt-devel'" end -def nokogiri_link_command ldflags, opt='', libpath=$LIBPATH +def nokogiri_link_command ldflags, opt='', libpath=$DEFLIBPATH|$LIBPATH old_link_command ldflags, opt, libpath end
flavorjones
Wed Sep 30 23:03:20 -0700 2009
| link
proposed fix from Michael Reinsch:
- def nokogiri_link_command ldflags, opt='', libpath=$LIBPATH + def nokogiri_link_command ldflags, opt='', libpath=$DEFLIBPATH|$LIBPATHwhich appears to work for me. Aaron, thoughts?
(which is what is used in mkmf.rb, link_command for ruby enterprise)
tenderlove
Sun Oct 04 20:53:33 -0700 2009
| link
Ugh. This is a PITA, but basically we can't make everyone happy. If I add this patch, I may as well remove that "nokogiri_link_command" stuff all together. Let me try to explain why:
Someone has ruby installed in /usr/lib, they also have libxml2 installed in /usr/lib. They've installed a newer version of libxml2 in /usr/local. We try to be nice and search /opt/local/lib in addition to /usr/local/lib before falling back to /usr/lib. Unfortunately, if the custom directory (/opt/local or /usr/lib) isn't supplied to dir_config(), it won't search that path. We can only supply one directory. If mkmf doesn't find it in that directory, then it falls back to /usr/lib.
That means we either get /opt/local/lib or /usr/lib, unless the user intervenes with a --with-xml-lib=/whatever --with-xml-include=/whataver, or we use my Super Hack® code. Unfortunately my Super Hack® screws over people with custom ruby installs because it will never find the ruby-static library.
I'm going to apply this fix (and by apply, I mean remove my custom code). I'd rather it "just work" for people with custom ruby installs. People with custom libxml2 installs can use the command line arguments.
tenderlove
Sun Oct 04 20:54:09 -0700 2009
| link
removing my Super Hack® closed by fbe7217
flavorjones
Mon Oct 05 05:33:42 -0700 2009
| link
Aaron, the --with-xml-lib and --with-xml-include options do not appear to affect have_func(), since it consistently uses the wrong header files.
-
NodeSet.wrap does not preserve document structure
2 comments Created 2 months ago by flavorjonesFailing spec:
def test_wrap_preserves_document_structure assert_equal "employeeId", @xml.at_xpath("//employee").children.detect{|j| ! j.text? }.name @xml.xpath("//employeeId[text()='EMP0001']").wrap("<wrapper/>") assert_equal "wrapper", @xml.at_xpath("//employee").children.detect{|j| ! j.text? }.name endComments
flavorjones
Mon Oct 19 20:10:58 -0700 2009
| link
NodeSet.wrap now preserves document structure. closed by f7388be.
flavorjones
Mon Oct 19 20:12:12 -0700 2009
| link
and 2d3db36
-
Fragment nodes with namespaces should work properly
1 comment Created about 1 month ago by flavorjonesReported by Iñaki Baz Castillo on the mailing list.
Creating a fragment with a namespace makes the prefix part of the tag name, and (arbitrarily?) uses the namespace of the document root's first child.
Comments
flavorjones
Tue Oct 27 07:58:44 -0700 2009
| link
Closed by 597195f
-
1 comment Created 20 days ago by flavorjones1.4.1xalias next= and prev= for add_next_sibling and add_previous_siblingflavorjonesxalso maybe alias next -> next_sibling and previous -> previous_sibling
Comments
flavorjones
Fri Dec 04 12:10:58 -0800 2009
| link
aliasing Node#next= and Node#previous= to Node#add*sibling(). closed by 6b6bf52.
-
see #109 for more details.
Comments
flavorjones
Mon Dec 14 20:03:07 -0800 2009
| link
never mind. #109 is reopened.
-
called out by Nick Sieger: http://www.engineyard.com/blog/2009/xml-parsing-in-ruby/
Comments
flavorjones
Fri Dec 11 06:16:58 -0800 2009
| link
Ya, never mind, he should use the Node constants.
-
3 comments Created 10 days ago by flavorjonesDocumentFragments should support decorators like a DocumentflavorjonesxI'd like this for Loofah so I don't have as much special code for DocumentFragments, and so the nodes in a fragment have scrub! methods just like document nodes.
Comments
flavorjones
Tue Dec 15 05:26:13 -0800 2009
| link
Added test coverage, and decorators are working find on fragments, their nodes and nodesets.
flavorjones
Tue Dec 15 05:39:32 -0800 2009
| link
Ah! Looks like children() is not properly decorated. Looks like we could do some cleaning up of node set decorating in general.
flavorjones
Tue Dec 15 06:38:04 -0800 2009
| link
I'll open a new ticket. Issues collaborator fail. Again. Sigh.
-
1 comment Created 10 days ago by flavorjonesffixMake FFI pass tests again (1.4.1)flavorjonesxSigh. C extension is out of sync with FFI again.
Comments
flavorjones
Tue Dec 15 05:14:18 -0800 2009
| link
never mind. this is the in context parsing that just needs to be ported.
-
Node#children returns an undecorated NodeSet
1 comment Created 10 days ago by flavorjonesIn general, our decoration of NodeSets could be cleaned up
Comments
flavorjones
Tue Dec 15 20:01:12 -0800 2009
| link
NodeSets are now always decorated. Added lots of test coverage to node set decoration and document, and cleaned up the implementation. Closed by 56e8c96.
-
XML::NodeSet#include? doesn't use XML::Node#==
3 comments Created 6 months ago by paddorI defined Nokogiri::XML::Node#== for myself in 1.2.3 to compare only by attributes/content/number of children. This worked well.
Now in 1.3.1 (maybe 1.3 too) it doesn't work anymore. Nokogiri::NodeSet#include? doesn't use Nokogiri::XML::Node#== anymore.
I think this is a bug.
Thank you.Comments
tenderlove
Tue Jun 16 13:46:20 -0700 2009
| link
You made a monkey patch that I don't want to support. Sorry. A work around for you could be to call to_a on NodeSet.
OK, I understand this.
But how did you do that? I can't find the method definition for NodeSet#include?. So it has to be mixed in by Enumerable and that one should use Node#==.
-
In things such as JSTL and ESI, we have this pattern where people intersperse namespaced XML which form a valid document among other bits of text, which are implicitly treated as CDATA. It would be so awesome if, on creating an XML document, we could tell it which namespaces to parse, and everything else would be handled as CDATA implicitly. That'd be hot.
Comments
tenderlove
Mon Nov 30 11:24:22 -0800 2009
| link
Sorry, we can't tell libxml2 to do that. :-(
-
Nokogiri::HTML(data)
src/tcmalloc.cc:186] Attempt to free invalid pointer: 0x20e030Nokogiri::VERSION => "1.3.2"
Nokogiri::LIBXML_VERSION => "2.6.32"
ruby -v => ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-darwin9.7.0] Ruby Enterprise Edition 20090610Where 'data' is... http://gist.github.com/199496
Thanks for the help!
Comments
Still crashes for me using libxml2 2.7.5 and nokogiri 1.3.3 and ree-1.8.6-20090610. It does however work with normal MRI: ruby 1.8.6 (2009-08-04 patchlevel 383) [i686-darwin9.8.0]
tenderlove
Sun Oct 04 21:03:28 -0700 2009
| link
This isn't crashing for me. I'm using:
[apatterson@higgins nokogiri (master)]$ ruby -v ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-darwin10.0.0] Ruby Enterprise Edition 20090610 [apatterson@higgins nokogiri (master)]$ ruby -I lib bin/nokogiri -v --- nokogiri: 1.3.3 warnings: [] libxml: compiled: 2.7.5 loaded: 2.7.5 binding: extension [apatterson@higgins nokogiri (master)]$Can you try with the nightly? To install the nightly build, do this:
$ sudo gem install nokogiri -s http://tenderlovemaking.com
tenderlove
Tue Oct 13 13:50:41 -0700 2009
| link
I can't reproduce this. Please reopen if the problem persists. I need more details to fix this if there is a problem. Thanks!
-
Wondering about the possibility of things like to_xml/to_xhtml/etc... being able to write/stream to an IO or possibly even a proc (chunks at a time) instead of waiting for the entire document to be appended to a string and finally returned to the caller.
Would offer the same benefits as parsing from an IO or using a push-parser model, but for encoding as well. This would be especially useful in EventMachine apps, but would also be nice to be able to stream-encode directly to a socket or any other IO.Comments
tenderlove
Mon Jul 27 10:25:51 -0700 2009
| link
It already does what you describe:
http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Node.html#M000316
brianmario
Mon Jul 27 10:35:33 -0700 2009
| link
Awesome!
Any chance of supporting a callback or something for chunked streaming to the caller?
This would be especially useful in an EventMachine app.
tenderlove
Mon Jul 27 10:42:43 -0700 2009
| link
No. You can write a custom IO object that responds to "write" and "close":
class MyIO def initialize &write @write = write end def write data @write.call data end def close; end end doc = Nokogiri::XML(File.open(ARGV[0])) doc.write_to(MyIO.new { |data| puts data })Then you can do whatever you'd like.
brianmario
Mon Jul 27 10:56:45 -0700 2009
| link
that'll work, thanks
-
Included files referenced by relative paths in xml schemas can't be found.
0 comments Created 14 days ago by jmcnevinI'm trying to validate a document against the ONIX 2.1 reference schema, which includes the following lines...
<xs:include schemaLocation="ONIX_BookProduct_CodeLists.xsd"/> <xs:include schemaLocation="ONIX_XHTML_Subset.xsd"/>When running the validation, I receive this error:
Nokogiri::XML::SyntaxError: Element '{http://www.w3.org/2001/XMLSchema}include': Failed to load the document 'ONIX_BookProduct_CodeLists.xsd' for inclusion.All of the schema files reside in the same directory. I see a closed ticket that says this problem has been fixed, but I'm still not having any luck on my end.
Comments
-
nokogiri-1.3.3 introduces dependency on st.h -- error: st.h: No such file or directory
1 comment Created 4 months ago by TylerRick1.3.2 installs fine but I can't seem to build/install 1.3.3. I'm running Ubuntu 9.04.
What is st.h and how do I get it to find it?
Thanks!
> sudo gem1.9 install nokogiri -v 1.3.2 Building native extensions. This could take a while... Successfully installed nokogiri-1.3.2 1 gem installed Installing ri documentation for nokogiri-1.3.2... Installing RDoc documentation for nokogiri-1.3.2... > sudo gem1.9 install nokogiri -v 1.3.3 Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /usr/bin/ruby1.9 extconf.rb install nokogiri -v 1.3.3 checking for iconv.h in /opt/local/include/,/opt/local/include/libxml2,/opt/local/include,/opt/local/include,/opt/local/include/libxml2,/usr/local/include,/usr/local/include/libxml2,/usr/include,/usr/include/libxml2,/usr/include,/usr/include/libxml2... yes checking for libxml/parser.h in /opt/local/include/,/opt/local/include/libxml2,/opt/local/include,/opt/local/include,/opt/local/include/libxml2,/usr/local/include,/usr/local/include/libxml2,/usr/include,/usr/include/libxml2,/usr/include,/usr/include/libxml2... yes checking for libxslt/xslt.h in /opt/local/include/,/opt/local/include/libxml2,/opt/local/include,/opt/local/include,/opt/local/include/libxml2,/usr/local/include,/usr/local/include/libxml2,/usr/include,/usr/include/libxml2,/usr/include,/usr/include/libxml2... yes checking for libexslt/exslt.h in /opt/local/include/,/opt/local/include/libxml2,/opt/local/include,/opt/local/include,/opt/local/include/libxml2,/usr/local/include,/usr/local/include/libxml2,/usr/include,/usr/include/libxml2,/usr/include,/usr/include/libxml2... yes checking for xmlParseDoc() in -lxml2... yes checking for xsltParseStylesheetDoc() in -lxslt... yes checking for exsltFuncRegister() in -lexslt... yes checking for xmlRelaxNGSetParserStructuredErrors()... yes checking for xmlRelaxNGSetParserStructuredErrors()... yes checking for xmlRelaxNGSetValidStructuredErrors()... yes checking for xmlSchemaSetValidStructuredErrors()... yes checking for xmlSchemaSetParserStructuredErrors()... yes creating Makefile make cc -I. -I/usr/include/libxml2 -I/usr/include -I/usr/include/ruby-1.9.0/x86_64-linux -I/usr/include/ruby-1.9.0 -I. -DHAVE_XMLRELAXNGSETPARSERSTRUCTUREDERRORS -DHAVE_XMLRELAXNGSETPARSERSTRUCTUREDERRORS -DHAVE_XMLRELAXNGSETVALIDSTRUCTUREDERRORS -DHAVE_XMLSCHEMASETVALIDSTRUCTUREDERRORS -DHAVE_XMLSCHEMASETPARSERSTRUCTUREDERRORS -I/usr/include/libxml2 -I/usr/include -I/usr/include/ruby-1.9.0/x86_64-linux -I/usr/include/ruby-1.9.0 -I. -fPIC -fno-strict-aliasing -g -g -O2 -O2 -g -Wall -Wno-parentheses -fPIC -g -DXP_UNIX -O3 -Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline -o xml_reader.o -c xml_reader.c In file included from /usr/include/ruby-1.9.0/ruby.h:15, from ./nokogiri.h:6, from ./xml_reader.h:4, from xml_reader.c:1: /usr/include/ruby-1.9.0/ruby/ruby.h: In function ‘rb_type’: /usr/include/ruby-1.9.0/ruby/ruby.h:973: warning: conversion to ‘int’ from ‘VALUE’ may alter its value In file included from ./nokogiri.h:81, from ./xml_reader.h:4, from xml_reader.c:1: ./xml_document.h:5:16: error: st.h: No such file or directory xml_reader.c: In function ‘attribute_nodes’: xml_reader.c:171: warning: cast discards qualifiers from pointer target type xml_reader.c: In function ‘attribute_at’: xml_reader.c:199: warning: conversion to ‘int’ from ‘long int’ may alter its value xml_reader.c: In function ‘from_memory’: xml_reader.c:466: warning: conversion to ‘int’ from ‘long int’ may alter its value xml_reader.c:474: warning: conversion to ‘int’ from ‘long int’ may alter its value xml_reader.c: In function ‘from_io’: xml_reader.c:506: warning: conversion to ‘int’ from ‘long int’ may alter its value make: *** [xml_reader.o] Error 1Comments
tenderlove
Thu Aug 06 13:35:47 -0700 2009
| link
Ruby 1.9.0 is not supported. You should upgrade to 1.9.1-p129 or even the 1.9.2. 1.9.0 is too broken to be supported. :-(
-
Segmentation fault ruby 1.9.1p0 (2009-01-30 revision 21907) [x86_64-linux]
2 comments Created 5 months ago by ibcI've created a syntactically wrong XSD file and run:
Nokogiri::XML::Schema(File.read(XSD))In Ruby 1.9.1 compiled from sources in Linux Ubuntu 64 bits, I get a segmentfault:
ruby1.9 nokogiri_02.rb /usr/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.3.2/lib/nokogiri/xml/schema.rb:37:in `from_document': Element '{http://www.w3.org/2001/XMLSchema}element': The content is not valid. Expected is (annotation?, ((simpleType | complexType)?, (unique | key | keyref)*)). (Nokogiri::XML::SyntaxError)
from /usr/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.3.2/lib/nokogiri/xml/schema.rb:37:in `new' from /usr/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.3.2/lib/nokogiri/xml/schema.rb:8:in `Schema' from /home/ibc/Proyectos/Ruby-XCAP-Client/lib/pres_rules.rb:11:in `<class:PresRules>' from /home/ibc/Proyectos/Ruby-XCAP-Client/lib/pres_rules.rb:5:in `<module:XCAPClient>' from /home/ibc/Proyectos/Ruby-XCAP-Client/lib/pres_rules.rb:1:in `<top (required)>' from /home/ibc/Proyectos/Ruby-XCAP-Client/xcap-client.rb:11:in `require' from /home/ibc/Proyectos/Ruby-XCAP-Client/xcap-client.rb:11:in `<top (required)>' from nokogiri_02.rb:3:in `require' from nokogiri_02.rb:3:in `<main>':399: [BUG] Segmentation fault
ruby 1.9.1p0 (2009-01-30 revision 21907) [x86_64-linux]-- control frame ----------
c:0001 p:0000 s:0002 b:0002 l:000648 d:000648 TOP :399
-- Ruby level backtrace information-----------------------------------------
-- C level backtrace information ------------------------------------------- 0x4e882b ruby1.9(rb_vm_bugreport+0x3b) [0x4e882b]
0x5168b0 ruby1.9 [0x5168b0]
0x516a21 ruby1.9(rb_bug+0xb1) [0x516a21]
0x4940df ruby1.9 [0x4940df]
0x7f4f8188f080 /lib/libpthread.so.0 [0x7f4f8188f080]
0x4431de ruby1.9(rb_obj_is_kind_of+0x12e) [0x4431de]
0x41aa77 ruby1.9(ruby_cleanup+0x1d7) [0x41aa77]
0x41ab5a ruby1.9(ruby_run_node+0x3a) [0x41ab5a]
0x417f3d ruby1.9(main+0x4d) [0x417f3d]
0x7f4f80c635a6 /lib/libc.so.6(__libc_start_main+0xe6) [0x7f4f80c635a6]
0x417e29 ruby1.9 [0x417e29]
This doesn't occur with Ruby 1.8.
Comments
tenderlove
Wed Jul 15 18:10:22 -0700 2009
| link
We don't support 1.9.1-p0. p0 is way too buggy for us to support.
Please upgrade to 1.9.1-p129 and let us know if it still breaks! Also, please include the XSD file in the bug report.
-
Add a method Node#add_at_pos to insert a node into a specific position
1 comment Created 21 days ago by ibcLet's imagine a XML document like:
<?xml version='1.0' encoding='UTF-8'?> <cp:ruleset xmlns:cp="urn:ietf:params:xml:ns:common-policy"> <cp:identity> <cp:one id="sip:alice@example.org"/> <cp:one id="sip:bob@example.org"/> <cp:one id="sip:carol@example.org"/> </cp:identity> </cp:ruleset>I want to insert a new node
<cp:one id="sip:new@example.org"/>into the second position (between "alice" and "bob").
For now it's required to play with Xpath to searh for the nodes into <cp:identity>, take the first node and run "add_next_sibling(first_node)".
It would be great a new method Node#add_at_pos(node, position, force=false) so:
- 'node' is the new node to add.
- 'position' is the position the new node will take.
- If 'force' is true then the method would insert the node into the last position if 'position' is greater than the number of nodes + 1. When false it would raise an exception (i.e. "WrongIndex").
So the above operation would be:
fragment = doc.fragment('<cp:one id="sip:new@example.org"/>') parent_node = doc.xpath("cp:ruleset/cp:identity", @ns).first parent_node.add_at_pos(fragment, 2)Comments
flavorjones
Thu Dec 03 21:54:30 -0800 2009
| link
XML::Node#add_child now accepts an optional +position+ argument. Closed by fec1b08.
-
29 comments Created 20 days ago by ibcDon't modify namespace prefixes for new fragments added into the documentunclearxRecently a commit fixed a bug when inserting a fragment with namespace prefixes into a node:
http://github.com/tenderlove/nokogiri/commit/597195ff8fe471e5350581c2d5cce704fcf87439
However it's coded under wrong assumptions (IMHO). Imagine this case:
<?xml version='1.0' encoding='UTF-8'?> <cp:ruleset xmlns="default.ns" xmlns:cp="urn:common-policy"> <cp:rule id="empty"> </cp:rule> <cp:rulesetI want to insert a node:
<many id="all"/>into
<cp:rule id="empty">node.
Note that the new node has no NS prefix but it's not required at all since the XML document has a default namespace (xmlns="default.ns").
However when inserting this new node Nokogiri converts it to:
<cp:many id="all"/>.This is obviously wrong for sure. I want that the new node belongs to the NS "default.ns" so I have to avoid using NS prefix.
Note that this has nothing to do with Xpath. This is, to get the new inserted node (if Nokogiri would insert it without adding a wrong "cp" prefix) I must use namespaces into the Xpath query:
new_node = doc.xpath("ccpp:ruleset/ccpp:rule/yuhu:many", {"ccpp"=> "urn:common-policy", "yuhu"=>"default.ns"}So the namespaces used in the Xpath string are just useful for Nokogiri/libxml2 to search into the XML document, just it. But when insert a new node, its namespace prefixes don't require to match those used in the Xpath.
In the thread about the above commit, Mike Dalessio said 3 assumptions:
http://groups.google.com/group/nokogiri-talk/browse_thread/thread/f7f6509ad14ce340I would like to fix them:
1) "document fragments should not have a namespace, by default"
This is wrong. They could have or not, and that just depends on the existing XML document and the NS it uses. In case the document has a default namespace for the new node then this node must contain no NS prefix (as shown above in the example).
2) "if a namespace is specified in the node fragment (like your <cp:one> fragment above), Nokogiri should check if the prefix matches any of the namespace definitions on the document root node. If it finds a match, the node should have that namespace. So in your above example, the node name would be "one" under the namespace with the prefix "cp"."
This assumption fails. Namespace definitions on the document root node don't matter. Instead the namespace definitions on the parent node are the only important ones.
Imagine this XML:<?xml version="1.0"?> <foo xmlns="urn:test:default-namespace"> <ns1:bar xmlns:ns1="urn:test:namespace1-uri" xmlns="urn:test:namespace1-uri"> <baz/> <ns2:baz xmlns:ns2="urn:test:namespace2-uri"/> </ns1:bar> <ns3:hi xmlns:ns3="urn:test:namespace3-uri"> <there/> </ns3:hi> </foo>NS definitions on root node is just xmlns="urn:test:default-namespace". However, NS definitions on node are the following:
"xmlns"=>"urn:test:namespace1-uri", "xmlns:ns1"=>"urn:test:namespace1-uri"This already works with Nokogiri when using the following Xpath query:
doc.xpath("df:foo/df2:bar/df2:baz/namespace::*', {"df"=>"urn:ietf:params:xml:ns:common-policy", "df2"=>"urn:test:namespace1-uri"})3) "if a namespace is specified in the node fragment but does NOT match any of the namespace definitions on the document root, then the prefix will be silently ignored (which is libxml2's default behavior when parsing documents)."
If this occurs it means that the insert operation is wrong and the only way to know it is by validating the XML against its schema. Leaving it with no NS prefix doesn't mean that the resulting XML is correct.
So IMHO what Nokogiri should do is very easy: just nothing.
When inserting a node Nokogiri shouldn't check the node namespace prefix, neither try to guess the appropriate one, neither replace it. It's the client responsability to use the appropriate namespaces prefixes.
For this, the client can get the parent node namespaces definitions used in the XML document by using Xpath with "namespace::*" (as explained above).Finally, I tell that this wrong behavior of Nokogiri is breaking my application since I need to insert a node with no NS prefix into a parent node which has NS prefix (as in my first example), but Nokogiri corrupts the resulting XML by adding the parent NS prefixes to the new node.
Comments
Let me correct myself when I said:
"So IMHO what Nokogiri should do is very easy: just nothing. When inserting a node Nokogiri shouldn't check the node namespace prefix, neither try to guess the appropriate one, neither replace it."
I understand that this is not possible as Nokogiri requires all the nodes having correct NS prefixes (this is, prefixes matching the node namespace bindings). So it's not possible for Nokogiri to insert a node with unknown NS prefix into the document.
So IMHO what Nokogiri should do is:
If the new node has NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains these prefixes:
a) If so, insert the new node verbatim and that's all.
b) If new node prefixes (or absense of them) don't match parent node NS definitions (including a possible default NS for this parent node) then Nokogiri could do the following:
b1) Remove the new node prefixes (as it does now). However in a XML in which NS is required for all the nodes this would just work if the parent node (and not the root node) has a default namespace (and anyhow this doesn't ensure that the XML would be valid when checking its XML schema).
b2) Raise an exception. It should just occur if the document doesn't allow nodes without NS (it could be default NS in the parent node so no prefix is required).
I really like (and need) option "b2" working as in my project I work with strict XML documents requiring namespace por all the nodes. However for this to work it would be required a new parsing/document option like "REQUIRE_NS_FOR_ALL_NODES".
So when "REQUIRE_NS_FOR_ALL_NODES" isenabled:
If the new node has NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains these prefixes:
a) If so, insert the new node verbatim and that's all.
b) If not, raise an exception (like "WrongNamespacePrefix").If the new node has no NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains a default namespace.
a) If so, insert the new node verbatim and that's all.
b) If not, raise an exception (like "WrongNamespacePrefix").When "REQUIRE_NS_FOR_ALL_NODES" is dissabled (as the current Nokogiri's behavior) then:
If the new node has NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains these prefixes:
a) If so, insert the new node verbatim and that's all.
b) If not, remove the prefixes. But never attemp to replace the prefix with others.If the new node has no NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains a default namespace.
a) If so, insert the new node verbatim and that's all (Nokogiri FAILS in this point as I described in the report!!!).Again, I strongly suggest never to replace or add a NS prefix into the new node. Inserting a valid node is responsability of the user. Also, Nokogiri FAILS when adding a NS prefix as reported above.
flavorjones
Mon Dec 07 07:20:58 -0800 2009
| link
Hi,
The Nokogiri core developers practice "test driven development". This
means that, when we want to change Nokogiri's behavior, either by
adding a feature or fixing a bug, we first write a test (or tests)
that completely specifies the new behavior. You can see all of these
tests under the /test directory in Nokogiri's source respository.We like very much to get bug reports and feature requests that include
one or both of the following:- An indication of which existing test(s) should be changed
- New (failing) test(s), specifying the desired behavior
Or, less desirably, we would like to see clear, runnable sample code
that is easily convertible into a failing test.The reason we like to see these tests in a bug report is because the
core developers (who maintain Nokogiri in their spare time, for free),
do not have to spend time trying to translate imprecise English into
Ruby code.Secondarily, a test case indicates to us that the reporter has thought
through the issue completely, and cares enough about the issue to have
taken the time to present the argument clearly and concisely in the
lingua franca of the Ruby community (i.e., Ruby).Why am I bringing this up now? This issue, so far, contains over 1100
words and no test cases. Although there are a few snippets of XML and
Ruby code, I do not consider any of them to be a complete runnable
example or test.I do not doubt there is clear logic in your argument; I just don't
have the time to read it all and translate it into
requirements. Explaining your issue in Ruby tests instead of English
will be clearer, more dense and will better communicate your issue to
the core developers.Thank you for reporting this issue. I am looking forward to seeing
your failing tests, and better understanding what you are asking the
core developers to build for you.Cheers.
Ok, I've sent a mail to Nokogiri maillist containing a test_unit describing the issue I mean.
However the issue is not easy to describe just with unit cases so I strongly suggest, please, to read my above report after checking the test unit.To simplify a lot all the report: Nokogiri should NEVER add/guess/replace the NS prefixes of a new inserted node. It's responsability of the user/client to use the appropriate ns prefixes.
test unit created: http://gist.github.com/250927
flavorjones
Mon Dec 07 09:04:48 -0800 2009
| link
Thank you for the two test cases you submitted. I think I understand the first one, which states that a new fragment, without a namespace declared in it, should be given the default document namespace (if there is one).
I do not understand the second test case. It is nearly identical to the first, and specifies behavior in which a non-default namespace is applied to the fragment. Can you please explain why this behavior is expected?
I think I understand the first one, which states that a new fragment, without a namespace declared in it, should be given the default document namespace (if there is one).
And in case the document has no default namespace then the new node has no namespace binding, so adding parent node's namespace is wrong (or unexpected since Nokogiri has no way to determine the appropriate prefix, neither if the node must or not have a prefix).
I do not understand the second test case. It is nearly identical to the first, and specifies behavior in which a non-default namespace is applied to the fragment. Can you please explain why this behavior is expected?
The second case shows an error of the user who is trying to insert a node belonging to "urn:strint-rules" namespace using wrong prefix (no prefix in this case).
So from Nokogiri's point of view the new node belongs to parent node default namespace (I will comment this later as it's buggy IMHO) but Nokogiri adds "cp" prefix.This is, the XML will get wrong since the user is inserting a wrong node (wrong prefix). Anyhow the XML should get as follow:
<?xml version="1.0" encoding="UTF-8"?> <cp:ruleset xmlns="default.ns" xmlns:cp="urn:common-policy" xmlns:sr="urn:strint-rules"> <cp:rule id="1"/> <cp:rule id="2"/> <sr:strict_rule name="sr1"/> <condition id="1" name="I belong to default namespace"/> <cp:rule id="3"/> <sr:strict_rule name="sr2"/> <strict_rule id="3" name="I'm wrong as I should have 'sr' prefix"/> </cp:ruleset>But Nokogiri adds parent node prefix "cp" which, as in case 1, is wrong.
flavorjones
Mon Dec 07 09:22:34 -0800 2009
| link
You have not clarified the second test case at all. It appears to me that you are conflating DTD validation with the construction of a document.
Am I correct in assuming that you think Nokogiri should "know" that <strict_rule> belongs in the "sr" namespace because of some sort of DTD declaration?
No, I just say that Nokogiri should insert the new "strict_rule" node without prefix (as always) and its namespace binding should belong to "default.ns".
Yes, the resulting document would be incorrect according to DTD in case "default.ns" namespace has no element called "string_rule", but Nokogiri shouln't care of it at all.
I would like to add something important: After inspecting how Nokogiri inserts a new fragment (by doing 'doc.fragment' and so) I must say that it's 100% wrong.
Nokogiri tries to detect the new node namespace bindings by running "doc.fragment(new_node)" and it just inspects root node namespace bindings. This is wrong. It's not possible to know the appropriate bindings for a new node until we already know the exact parent node in which it will be inserted.
So IMHO Document#fragment method should entirely dissapear and instead Node#fragment should exist. Then the way to insert a node would be:
1) parent_node = xml.xpath("/ns1:root/ns1:child/ns2:list", {"ns1"=>.... 2) frag = parent_node.fragment(new_node) 3) parent_node.add_child(frag)In point 1 we get the parent node in which we want to insert the new node.
In point 2 Nokogiri inspects the appropriate namespace bindings for the new node according to the parent node. To do it, the best way I know is by feching parent node namespaces with Xpath:parent_node_ns = xml.xpath("/ns1:root/ns1:child/ns2:list/namespace::*", {"ns1"=>....This returns an array of Namespace objects so then Nokogiri can inspect the new_node prefixes and match them against the list of Namespace objects.
Please note that taking the root node namespaces is completely wrong. For example take a looko to this XML:
<?xml version='1.0' encoding='UTF-8'?> <root xmlns="urn:ns1"> <ns2:child xmlns:ns2="urn:ns2" xmlns="urn:ns3"> <elem name="I belong to 'urn:ns3'"/> </ns2:child> </root>- "root" node has no prefix but default ns "urn:ns1".
- "ns2:child" declares a new default ns for its children ("urn:ns3") which replaces the root node default namespace.
- So "elem" belongs to "urn:ns3" rather than "urn:ns1".
I've updated my test_unit and now it shows more cases, including two more to explain and prove what I said in my last comment:
flavorjones
Mon Dec 07 13:50:23 -0800 2009
| link
Please rewrite these tests in the style you see in test/xml/test_document_fragment.rb.
I am more confused now than when we started. I am pretty sure that is not what you intended, so please try to explain your issue again, in a simpler / smaller / more concise example.
Ok, I must recognize that this issue is very complex as it gets deep into XML's most exotic cases when handling namespaces.
Let's start from the beginning. The following test_unit is simpler (I hope). Let me know your opinion. Thanks.
flavorjones
Mon Dec 07 20:42:25 -0800 2009
| link
I have rewritten your tests, removing unnecessary markup and code (there was quite a bit that was unnecessary). I removed the last test, since it was redundant, and added a new test case for clarity (and to demonstrate how fragment namespaces currently work).
Please review https://gist.github.com/c149f6b74f811b6b93ae and let me know if these test cases accurately reflect your desired implementation.
Thanks, it's ok. However I've added one more test to your file:
The new test (last one) tries to probe than we should always speak about "parent node namespaces binding" rather than "root node namespa
- An indication of which existing test(s) should be changed












Oh god... its because its not XHTML, its HTML... which has no requirement for closing tags. Dorko!