tenderlove / nokogiri
- Source
- Commits
- Network (51)
- Issues (12)
- Downloads (21)
- Wiki (7)
- Graphs
-
Branch:
master
-
4 comments Created about 1 month ago by jsjuni1.4.2xallow creation and deletion of Entity Declaration elementstenderlovexI'm writing a utility to rewrite URIs in some RDF/XML files. This behavior is suprising:
$ irb -r nokogiri irb(main):001:0> doc = Nokogiri::XML::Document.new => #<Nokogiri::XML::Document:0x..fdbbf990a name="document"> irb(main):002:0> ed = Nokogiri::XML::EntityDecl.new('ed', doc) => #<Nokogiri::XML::EntityDecl:0x..fdbbf1dea "<ed/>"> irb(main):003:0> ed.content => nil irb(main):004:0> ed.content = 'test' => "test" irb(main):005:0> ed.content => nilThat's not what's intended, is it?
Comments
-
31 comments Created 25 days ago by ibcDon't modify namespace prefixes for new fragments added into the documentunclearxRecently a commit fixed a bug when inserting a fragment with namespace prefixes into a node:
http://github.com/tenderlove/nokogiri/commit/597195ff8fe471e5350581c2d5cce704fcf87439
However it's coded under wrong assumptions (IMHO). Imagine this case:
<?xml version='1.0' encoding='UTF-8'?> <cp:ruleset xmlns="default.ns" xmlns:cp="urn:common-policy"> <cp:rule id="empty"> </cp:rule> <cp:rulesetI want to insert a node:
<many id="all"/>into
<cp:rule id="empty">node.
Note that the new node has no NS prefix but it's not required at all since the XML document has a default namespace (xmlns="default.ns").
However when inserting this new node Nokogiri converts it to:
<cp:many id="all"/>.This is obviously wrong for sure. I want that the new node belongs to the NS "default.ns" so I have to avoid using NS prefix.
Note that this has nothing to do with Xpath. This is, to get the new inserted node (if Nokogiri would insert it without adding a wrong "cp" prefix) I must use namespaces into the Xpath query:
new_node = doc.xpath("ccpp:ruleset/ccpp:rule/yuhu:many", {"ccpp"=> "urn:common-policy", "yuhu"=>"default.ns"}So the namespaces used in the Xpath string are just useful for Nokogiri/libxml2 to search into the XML document, just it. But when insert a new node, its namespace prefixes don't require to match those used in the Xpath.
In the thread about the above commit, Mike Dalessio said 3 assumptions:
http://groups.google.com/group/nokogiri-talk/browse_thread/thread/f7f6509ad14ce340I would like to fix them:
1) "document fragments should not have a namespace, by default"
This is wrong. They could have or not, and that just depends on the existing XML document and the NS it uses. In case the document has a default namespace for the new node then this node must contain no NS prefix (as shown above in the example).
2) "if a namespace is specified in the node fragment (like your <cp:one> fragment above), Nokogiri should check if the prefix matches any of the namespace definitions on the document root node. If it finds a match, the node should have that namespace. So in your above example, the node name would be "one" under the namespace with the prefix "cp"."
This assumption fails. Namespace definitions on the document root node don't matter. Instead the namespace definitions on the parent node are the only important ones.
Imagine this XML:<?xml version="1.0"?> <foo xmlns="urn:test:default-namespace"> <ns1:bar xmlns:ns1="urn:test:namespace1-uri" xmlns="urn:test:namespace1-uri"> <baz/> <ns2:baz xmlns:ns2="urn:test:namespace2-uri"/> </ns1:bar> <ns3:hi xmlns:ns3="urn:test:namespace3-uri"> <there/> </ns3:hi> </foo>NS definitions on root node is just xmlns="urn:test:default-namespace". However, NS definitions on node are the following:
"xmlns"=>"urn:test:namespace1-uri", "xmlns:ns1"=>"urn:test:namespace1-uri"This already works with Nokogiri when using the following Xpath query:
doc.xpath("df:foo/df2:bar/df2:baz/namespace::*', {"df"=>"urn:ietf:params:xml:ns:common-policy", "df2"=>"urn:test:namespace1-uri"})3) "if a namespace is specified in the node fragment but does NOT match any of the namespace definitions on the document root, then the prefix will be silently ignored (which is libxml2's default behavior when parsing documents)."
If this occurs it means that the insert operation is wrong and the only way to know it is by validating the XML against its schema. Leaving it with no NS prefix doesn't mean that the resulting XML is correct.
So IMHO what Nokogiri should do is very easy: just nothing.
When inserting a node Nokogiri shouldn't check the node namespace prefix, neither try to guess the appropriate one, neither replace it. It's the client responsability to use the appropriate namespaces prefixes.
For this, the client can get the parent node namespaces definitions used in the XML document by using Xpath with "namespace::*" (as explained above).Finally, I tell that this wrong behavior of Nokogiri is breaking my application since I need to insert a node with no NS prefix into a parent node which has NS prefix (as in my first example), but Nokogiri corrupts the resulting XML by adding the parent NS prefixes to the new node.
Comments
Let me correct myself when I said:
"So IMHO what Nokogiri should do is very easy: just nothing. When inserting a node Nokogiri shouldn't check the node namespace prefix, neither try to guess the appropriate one, neither replace it."
I understand that this is not possible as Nokogiri requires all the nodes having correct NS prefixes (this is, prefixes matching the node namespace bindings). So it's not possible for Nokogiri to insert a node with unknown NS prefix into the document.
So IMHO what Nokogiri should do is:
If the new node has NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains these prefixes:
a) If so, insert the new node verbatim and that's all.
b) If new node prefixes (or absense of them) don't match parent node NS definitions (including a possible default NS for this parent node) then Nokogiri could do the following:
b1) Remove the new node prefixes (as it does now). However in a XML in which NS is required for all the nodes this would just work if the parent node (and not the root node) has a default namespace (and anyhow this doesn't ensure that the XML would be valid when checking its XML schema).
b2) Raise an exception. It should just occur if the document doesn't allow nodes without NS (it could be default NS in the parent node so no prefix is required).
I really like (and need) option "b2" working as in my project I work with strict XML documents requiring namespace por all the nodes. However for this to work it would be required a new parsing/document option like "REQUIRE_NS_FOR_ALL_NODES".
So when "REQUIRE_NS_FOR_ALL_NODES" isenabled:
If the new node has NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains these prefixes:
a) If so, insert the new node verbatim and that's all.
b) If not, raise an exception (like "WrongNamespacePrefix").If the new node has no NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains a default namespace.
a) If so, insert the new node verbatim and that's all.
b) If not, raise an exception (like "WrongNamespacePrefix").When "REQUIRE_NS_FOR_ALL_NODES" is dissabled (as the current Nokogiri's behavior) then:
If the new node has NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains these prefixes:
a) If so, insert the new node verbatim and that's all.
b) If not, remove the prefixes. But never attemp to replace the prefix with others.If the new node has no NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains a default namespace.
a) If so, insert the new node verbatim and that's all (Nokogiri FAILS in this point as I described in the report!!!).Again, I strongly suggest never to replace or add a NS prefix into the new node. Inserting a valid node is responsability of the user. Also, Nokogiri FAILS when adding a NS prefix as reported above.
flavorjones
Mon Dec 07 07:20:58 -0800 2009
| link
Hi,
The Nokogiri core developers practice "test driven development". This
means that, when we want to change Nokogiri's behavior, either by
adding a feature or fixing a bug, we first write a test (or tests)
that completely specifies the new behavior. You can see all of these
tests under the /test directory in Nokogiri's source respository.We like very much to get bug reports and feature requests that include
one or both of the following:- An indication of which existing test(s) should be changed
- New (failing) test(s), specifying the desired behavior
Or, less desirably, we would like to see clear, runnable sample code
that is easily convertible into a failing test.The reason we like to see these tests in a bug report is because the
core developers (who maintain Nokogiri in their spare time, for free),
do not have to spend time trying to translate imprecise English into
Ruby code.Secondarily, a test case indicates to us that the reporter has thought
through the issue completely, and cares enough about the issue to have
taken the time to present the argument clearly and concisely in the
lingua franca of the Ruby community (i.e., Ruby).Why am I bringing this up now? This issue, so far, contains over 1100
words and no test cases. Although there are a few snippets of XML and
Ruby code, I do not consider any of them to be a complete runnable
example or test.I do not doubt there is clear logic in your argument; I just don't
have the time to read it all and translate it into
requirements. Explaining your issue in Ruby tests instead of English
will be clearer, more dense and will better communicate your issue to
the core developers.Thank you for reporting this issue. I am looking forward to seeing
your failing tests, and better understanding what you are asking the
core developers to build for you.Cheers.
Ok, I've sent a mail to Nokogiri maillist containing a test_unit describing the issue I mean.
However the issue is not easy to describe just with unit cases so I strongly suggest, please, to read my above report after checking the test unit.To simplify a lot all the report: Nokogiri should NEVER add/guess/replace the NS prefixes of a new inserted node. It's responsability of the user/client to use the appropriate ns prefixes.
test unit created: http://gist.github.com/250927
flavorjones
Mon Dec 07 09:04:48 -0800 2009
| link
Thank you for the two test cases you submitted. I think I understand the first one, which states that a new fragment, without a namespace declared in it, should be given the default document namespace (if there is one).
I do not understand the second test case. It is nearly identical to the first, and specifies behavior in which a non-default namespace is applied to the fragment. Can you please explain why this behavior is expected?
I think I understand the first one, which states that a new fragment, without a namespace declared in it, should be given the default document namespace (if there is one).
And in case the document has no default namespace then the new node has no namespace binding, so adding parent node's namespace is wrong (or unexpected since Nokogiri has no way to determine the appropriate prefix, neither if the node must or not have a prefix).
I do not understand the second test case. It is nearly identical to the first, and specifies behavior in which a non-default namespace is applied to the fragment. Can you please explain why this behavior is expected?
The second case shows an error of the user who is trying to insert a node belonging to "urn:strint-rules" namespace using wrong prefix (no prefix in this case).
So from Nokogiri's point of view the new node belongs to parent node default namespace (I will comment this later as it's buggy IMHO) but Nokogiri adds "cp" prefix.This is, the XML will get wrong since the user is inserting a wrong node (wrong prefix). Anyhow the XML should get as follow:
<?xml version="1.0" encoding="UTF-8"?> <cp:ruleset xmlns="default.ns" xmlns:cp="urn:common-policy" xmlns:sr="urn:strint-rules"> <cp:rule id="1"/> <cp:rule id="2"/> <sr:strict_rule name="sr1"/> <condition id="1" name="I belong to default namespace"/> <cp:rule id="3"/> <sr:strict_rule name="sr2"/> <strict_rule id="3" name="I'm wrong as I should have 'sr' prefix"/> </cp:ruleset>But Nokogiri adds parent node prefix "cp" which, as in case 1, is wrong.
flavorjones
Mon Dec 07 09:22:34 -0800 2009
| link
You have not clarified the second test case at all. It appears to me that you are conflating DTD validation with the construction of a document.
Am I correct in assuming that you think Nokogiri should "know" that <strict_rule> belongs in the "sr" namespace because of some sort of DTD declaration?
No, I just say that Nokogiri should insert the new "strict_rule" node without prefix (as always) and its namespace binding should belong to "default.ns".
Yes, the resulting document would be incorrect according to DTD in case "default.ns" namespace has no element called "string_rule", but Nokogiri shouln't care of it at all.
I would like to add something important: After inspecting how Nokogiri inserts a new fragment (by doing 'doc.fragment' and so) I must say that it's 100% wrong.
Nokogiri tries to detect the new node namespace bindings by running "doc.fragment(new_node)" and it just inspects root node namespace bindings. This is wrong. It's not possible to know the appropriate bindings for a new node until we already know the exact parent node in which it will be inserted.
So IMHO Document#fragment method should entirely dissapear and instead Node#fragment should exist. Then the way to insert a node would be:
1) parent_node = xml.xpath("/ns1:root/ns1:child/ns2:list", {"ns1"=>.... 2) frag = parent_node.fragment(new_node) 3) parent_node.add_child(frag)In point 1 we get the parent node in which we want to insert the new node.
In point 2 Nokogiri inspects the appropriate namespace bindings for the new node according to the parent node. To do it, the best way I know is by feching parent node namespaces with Xpath:parent_node_ns = xml.xpath("/ns1:root/ns1:child/ns2:list/namespace::*", {"ns1"=>....This returns an array of Namespace objects so then Nokogiri can inspect the new_node prefixes and match them against the list of Namespace objects.
Please note that taking the root node namespaces is completely wrong. For example take a looko to this XML:
<?xml version='1.0' encoding='UTF-8'?> <root xmlns="urn:ns1"> <ns2:child xmlns:ns2="urn:ns2" xmlns="urn:ns3"> <elem name="I belong to 'urn:ns3'"/> </ns2:child> </root>- "root" node has no prefix but default ns "urn:ns1".
- "ns2:child" declares a new default ns for its children ("urn:ns3") which replaces the root node default namespace.
- So "elem" belongs to "urn:ns3" rather than "urn:ns1".
I've updated my test_unit and now it shows more cases, including two more to explain and prove what I said in my last comment:
flavorjones
Mon Dec 07 13:50:23 -0800 2009
| link
Please rewrite these tests in the style you see in test/xml/test_document_fragment.rb.
I am more confused now than when we started. I am pretty sure that is not what you intended, so please try to explain your issue again, in a simpler / smaller / more concise example.
Ok, I must recognize that this issue is very complex as it gets deep into XML's most exotic cases when handling namespaces.
Let's start from the beginning. The following test_unit is simpler (I hope). Let me know your opinion. Thanks.
flavorjones
Mon Dec 07 20:42:25 -0800 2009
| link
I have rewritten your tests, removing unnecessary markup and code (there was quite a bit that was unnecessary). I removed the last test, since it was redundant, and added a new test case for clarity (and to demonstrate how fragment namespaces currently work).
Please review https://gist.github.com/c149f6b74f811b6b93ae and let me know if these test cases accurately reflect your desired implementation.
Thanks, it's ok. However I've added one more test to your file:
The new test (last one) tries to probe than we should always speak about "parent node namespaces binding" rather than "root node namespaces" or "parent node namespaces".
Node namespaces bindings are got as follows:
- Take the namespaces of root node (level1).
- Take the namespaces of level2 and add them to the the previous list (replacing those matching the prefix).
- Take the namespaces of level3 and add them to the the previous list (replacing those matching the prefix).
- So we get the namespaces bindings for node level3 which is different than the namespaces declared in node level3.
Xpath and Libxml2 (and Nokogiri) allows getting node namespaces bindings by running:
doc.xpath("ns1:level1/ns2:level2/ns3:level3/namespace::*)This gets an array of namespace definitions corresponding to level3 namespaces bindings, which is different than just the namespaces definitions existing in level3 (as explained before).
flavorjones
Tue Dec 08 05:04:07 -0800 2009
| link
Hello. This last test is redundant, but I will include it if it
makes you happy.You are explaining things in English again. I promise you, I am
not reading any of your explanations of functionality. I am only
reading what you code. Show me, don't tell me.I am trying (and have been trying) to explain to you that the
spec code you are writing is your only means of communicating
changes in Nokogiri behavior to the core team. Test-driven
development: Development is driven by tests.So, is https://gist.github.com/c149f6b74f811b6b93ae a complete
specification of the behavior that you desire? If so, then we can
start to discuss:- whether this is desired behavior
- whether this is feasible to implement given current design and libxml limitations
- whether it conflicts with the current behavior, thereby possibly breaking existing applications
I hope, you are getting a sense of how the Nokogiri core
developers prefer to go about their work. I think you'll find
that most of the Ruby community does things this same way.Please let me know your thoughts.
Well, I don't agree too much with the names you set to the tests:
- test_adding_a_fragment_should_use_the_document_default_namespace_when_root_is_nondefault
- test_adding_a_fragment_should_use_the_document_default_namespace_when_root_is_default
- test_adding_a_fragment_should_have_no_namespace_when_root_has_no_namespace
I just wanted to clarify that "document default namespace" (or "root node namespace" which has same meaning here) should NEVER be considered when inserting a new node, and instead "parent node namespaces bindings" should just be inspected (which is different than "parent node declared namespaces"). Just it. I'm sorry but I don't know how to clarify this with a test unit (this is what I tryed to explain with my lastest test though).
But if we agree on it then your test sounds good for me. So starting to discuss:
- Yes, this is the desired behavior.
- I expect this is feasible to implement as libxml2 (and Nokogiri) allows feching node namespaces bindings (using "namespace::") so Nokogiri can use this info to set the appropriate namespaces in the new inserted node. But for this, 'Document#fragment' should be removed and instead 'Node#fragment' should exist, since the new node namespaces bindings depends on its parent node* rather than on the root node (see "test_adding_a_fragment_should_use_parent_node_namespaces_rather_than_root_node_namespaces").
- I hope this doesn't break any existing application. It just fixes some corner cases (well, no so "corner") and IMHO nobody should expect a wrong result from Nokogiri.
I hope this is a good point to start. Regards.
I've created a new test unit just to probe that "node namespaces bindings" is diffetent than "node declared namespaces":
http://gist.github.com/251636
flavorjones
Tue Dec 08 06:03:13 -0800 2009
| link
Yes, this is the desired behavior I expect this is feasible to implement
Sounds like you've got everything covered, and you don't need my advice. Please submit a patch when your implementation is complete and all tests pass.
Thanks for using Nokogiri.
Please, I just meant that I expect it to be feasible since libmlx2 and Nokogiri already implements fetching node namespaces correctly. I don't intend to say that it's easy to implement. I'm really sorry if my words weren't the most appropriate.
I can try to help in testing and specifications but I've no enough level to code such feature by myself. Please excuse my limited English.
ok, I'm already working on it. I've found where the issue is: in "xml/fragment_handler.rb" as it assumes it must inspect namespaces into document root. This is not valid since it's not possible to determine the namespaces fo a new node without knowing which will be its parent node.
So it's required a new method Node#fragment:
def fragment tags DocumentFragment.new(self, tags) endSo I'm also modifying DocumentFragment and FragmentHandler so Nokogiri could get the namespaces binding for the parent node in which the element will be inserted.
I just have a doubt I don't know how to achieve:
Let's assume I've a Node element called "parent_node" and I want to get its namespaces bindings, this is, the same output as if I run:
doc.xpath(path_to_parent_node + "/namespace::*", ns)So I would get an Array of XML::Namespace (please check http://gist.github.com/251636).
>How could I get the same Array just by having the Node "parent_node" rather than the xpath expression?
Please help me with this. I think I could fix the problem.
Thanks.
flavorjones
Tue Dec 08 11:14:14 -0800 2009
| link
i will write a new method (which will require C code) to give you back the namespaces that are in scope for a node. In the meantime, though, you can use node.xpath("./namespace::*") as a substitute.
flavorjones
Tue Dec 08 15:12:19 -0800 2009
| link
See branch 'namespaces' in tenderlove's repo. the method Node#namespaces returns a hash of all namespaces in scope for a node. Previously it returned only namespaces declared on the node.
That change will not be in 1.4.1, but will probably be in 1.4.2.
Thanks, I'm inspecting it right now.
I'm modyfing DocumentFragment so it creates a FragmentHandler with 3 parameters (instead of 2):FragmentHandler.new(parent_node, self, tags)Then in FragmentHandler#initialize I store 'parent_node' into @parent_node attribute.
Then I would use your new code to get namespaces bindings for the current parent node and use them to set the namespaces for each element in the new node.I'll work on it tomorrow. Thanks a lot.
I think I've a working fix for this issue. Please check the branch "namespaces" in my forked Nokogiri:
http://github.com/ibc/nokogiri/commit/1fd50936f1d4d21172f0f8e1ea7f07c888691766With this code, the following test units are passed:
http://gist.github.com/252440Note that in these tests I use "Node#fragment" rather than "Document#fragment".
Previously, "Node#fragment" was an "alias" of "Document#fragment", but in my new code this method uses the current node (parent node) to get the namespace scopes and use them for the new fragment.About the official test units, with the new code just one fails ("test_fragment_namespace_resolves_against_document_root"), but it works if we replace:
frag = doc.fragmentwith:
frag = doc.root.fragmentThe fact is that IMHO "doc.fragment" should be deprecated as fragments belong to nodes rather than documents.
A workaround wouuld be creating a "Document#fragment" method with just calls to "self.root.fragment".Please let me know your opinion about the commit. Thanks a lot.
There is still a corner case in which inserting a node would fail: when the new fragment node also contains namespace declarations.
For this, it's required to insert the fragment top level node, get its namespace scopes and use them when inserting its children (and so on).
I'll work on it.I've open a new report #189 with the suggestion of a different approach to insert a fragment.
Could this report be open again please? It's now "closed" but has activity and commited code :)
I've done a lot of improvements and now Nokogiri allows fragments containing namespace declarations and subnodes using them, and also subnodes containing namespace declaration and their subnodes using them. And also handling prefixed attributes with the namespaces declared into the fragment:
http://github.com/ibc/nokogiri/tree/namespacesI've created a "total" test_unit for this issue:
http://gist.github.com/254109There are 10 tests, some of them very exotic.
7 of them fail under current Nokogiri HEAD.
Just one fails under my fork and it fails due to a reported bug #192.So if you can help me with bug #192 then I expect that inserting a complex fragment would work really well :)
Thanks.
Hi, have you developers had a chance to check my commit and test unit?
Is there any update or comment for this report?If you need me to provide more data (along with the already provided test cases and patch) please ask it to me.
Thanks a lot.
PS: Could this report be open again please?
flavorjones
Tue Dec 29 21:31:16 -0800 2009
| link
I've reopened this ticket.
The namespaces feature I mentioned on Dec 8th is now in master and scheduled for release in 1.4.2.
You have opened 3 (actually 4, but we closed one) simultaneous tickets for what is apparently the same issue, and have commented verbosely on each.
I'm afraid that I still do not understand what you are trying to do, or have done. Your patience while I look at your code will be appreciated.
- An indication of which existing test(s) should be changed
-
21 comments Created 5 months ago by darrylflavorjonesxdocument.rb:104: [BUG] object allocation during garbage collection phaseREExrequire 'nokogiri' GC_HACK = false GC.disable if GC_HACK # will delay "error : Name is not from the document dictionnary" gc_count = 0 cycles = 0 loop do cycles = cycles + 1 if GC_HACK if gc_count > 10000 GC.enable GC.start p "gc start cycles: #{cycles}" sleep 10 gc_count = 0 GC.disable end gc_count = gc_count + 1 end p "cycles: #{cycles}" if cycles%1000 == 0 doc = Nokogiri::XML::Document.parse("<bad>blinky</bad>") doc.xpath('/bad').each{ |t| new_node = Nokogiri::XML::Node.new('bad', doc) new_node.content = 'clyde' t.replace(new_node) } end # spits out: # "element bad: error : Name is not from the document dictionnary 'bad'" # between 2 and 20 thousand times then dies with: #/opt/ruby-enterprise-1.8.6-20090610/lib/ruby/gems/1.8/gems/nokogiri-1.3.2/lib /nokogiri/xml/document.rb:104: [BUG] object allocation during garbage collection phase #ruby 1.8.6 (2008-08-11) [i686-linux] # #Aborted # setting GC_HACK = true will still give the warning but not # crash (or at least hasn't crashed yet at 250000 cycles :) ) # system: # ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-linux] # Ruby Enterprise Edition 20090610 # libxml 2.7.3 # nokogiri (1.3.2) # gentoo (linux 2.6.29 SMP PREEMPT) # does not seem to happen on macos # might be a REE bugComments
flavorjones
Tue Aug 18 23:27:27 -0700 2009
| link
Darryl,
I can't reproduce this on Ubuntu using ruby 1.8.7-p72, 1.8.6-p369, 1.9.1-p243, or 1.8.7-p174. I'm building REE now to test with that.
-mike
flavorjones
Tue Aug 18 23:33:33 -0700 2009
| link
Cannot reproduce with ruby-enterprise-1.8.6-20090610
My next best guess is that this is libxml2-version-dependent, since I ran the above tests with 2.6.32. I'll try 2.7.3.
flavorjones
Wed Aug 19 04:51:48 -0700 2009
| link
OK, unable to reproduce with libxml2 2.7.3.
Can you provide more information about your configuration? Please include the output from "nokogiri -v".
tenderlove
Sun Oct 04 21:16:59 -0700 2009
| link
I can't repro this with libxml2 2.7.5 and REE ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-darwin10.0.0] Ruby Enterprise Edition 20090610.
No updates on this ticket for 2 months, so I'll assume it's fixed in master. Please reopen and update if the problem still exists.
sunshineco
Mon Dec 07 02:32:32 -0800 2009
| link
I am running into this problem repeatedly on Windows Vista while developing a website with nanoc3 (http://nanoc.stoneship.org/). It is very difficult to reproduce outside of the larger project generation session, but I have managed to arrive at a scenario within
irbwhich results in this crash 100% of the time (for me). Nokogiri was installed viagem install nokogiri.C:\>ruby -v ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-mingw32] C:\>nokogiri -v --- warnings: [] nokogiri: 1.4.0 libxml: binding: extension compiled: 2.7.3 loaded: 2.7.3To reproduce, run
irband paste the following text into the window:require 'nokogiri' d = Nokogiri::HTML(' <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> </body> </html> ') d.to_htmlNote that when
d.to_htmlemits the document, the charset has been magically and suspiciously mutated from"charset=utf-8"to"charset=IBM437". Perhaps this is related to thelibxml2bug mentioned in this thread: http://groups.google.com/group/nokogiri-talk/msg/607fefd4f43d7accAfter pasting the above content into
irb, the actual "object allocation during garbage collection phase" crasher is triggered by pressing the up arrow (readline-recall) three times. (Though triggered by readline interaction in this reproduction recipe, the same crash has been triggered in other ways. During website development, a simpleputsinvocation can trigger it.) Upon the third up arrow press, the following diagnostics are produced:C:\>irb C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416: [BUG] Segmentation fault ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-mingw32] -- control frame ---------- c:0039 p:---- s:0210 b:0210 l:000209 d:000209 CFUNC :chars c:0038 p:---- s:0208 b:0208 l:000207 d:000207 CFUNC :each c:0037 p:---- s:0206 b:0206 l:000205 d:000205 CFUNC :inject c:0036 p:0220 s:0202 b:0202 l:000201 d:000201 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416 c:0035 p:0066 s:0195 b:0194 l:000193 d:000193 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8433 c:0034 p:1268 s:0186 b:0186 l:000185 d:000185 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:2790 c:0033 p:4100 s:0155 b:0155 l:000154 d:000154 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3653 c:0032 p:0101 s:0117 b:0117 l:000116 d:000116 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4580 c:0031 p:0284 s:0114 b:0114 l:000113 d:000113 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4641 c:0030 p:0021 s:0107 b:0107 l:000106 d:000106 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4705 c:0029 p:0104 s:0103 b:0103 l:000102 d:000102 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727 c:0028 p:0097 s:0098 b:0098 l:000097 d:000097 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/readline.rb:40 c:0027 p:0051 s:0090 b:0090 l:000089 d:000089 METHOD C:/ruby/lib/ruby/1.9.1/irb/input-method.rb:115 c:0026 p:0016 s:0086 b:0086 l:002604 d:000085 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:131 c:0025 p:0037 s:0083 b:0083 l:000082 d:000082 METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:263 c:0024 p:0011 s:0078 b:0078 l:002604 d:000077 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:130 c:0023 p:---- s:0076 b:0076 l:000075 d:000075 FINISH c:0022 p:---- s:0074 b:0074 l:000073 d:000073 CFUNC :call c:0021 p:0022 s:0071 b:0071 l:000070 d:000070 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:189 c:0020 p:0019 s:0067 b:0067 l:000066 d:000066 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:103 c:0019 p:0026 s:0063 b:0063 l:000062 d:000062 METHOD C:/ruby/lib/ruby/1.9.1/irb/slex.rb:205 c:0018 p:0055 s:0055 b:0055 l:000054 d:000054 METHOD C:/ruby/lib/ruby/1.9.1/irb/slex.rb:75 c:0017 p:0041 s:0050 b:0050 l:000049 d:000049 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:287 c:0016 p:0017 s:0046 b:0046 l:000045 d:000045 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:263 c:0015 p:0027 s:0041 b:0041 l:000024 d:000040 BLOCK C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:234 c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :loop c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:230 c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC :catch c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:229 c:0008 p:0042 s:0022 b:0022 l:002604 d:002604 METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:145 c:0007 p:0011 s:0019 b:0019 l:001a7c d:000018 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:69 c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC :catch c:0004 p:0172 s:0011 b:0011 l:001a7c d:001a7c METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:68 c:0003 p:0039 s:0006 b:0006 l:002604 d:00122c EVAL C:/ruby/bin/irb:12 c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH c:0001 p:0000 s:0002 b:0002 l:002604 d:002604 TOP --------------------------- C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416: [BUG] object allocation during garbage collection phase ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-mingw32] -- control frame ---------- c:0039 p:---- s:0210 b:0210 l:000209 d:000209 CFUNC :chars c:0038 p:---- s:0208 b:0208 l:000207 d:000207 CFUNC :each c:0037 p:---- s:0206 b:0206 l:000205 d:000205 CFUNC :inject c:0036 p:0220 s:0202 b:0202 l:000201 d:000201 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416 c:0035 p:0066 s:0195 b:0194 l:000193 d:000193 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8433 c:0034 p:1268 s:0186 b:0186 l:000185 d:000185 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:2790 c:0033 p:4100 s:0155 b:0155 l:000154 d:000154 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3653 c:0032 p:0101 s:0117 b:0117 l:000116 d:000116 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4580 c:0031 p:0284 s:0114 b:0114 l:000113 d:000113 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4641 c:0030 p:0021 s:0107 b:0107 l:000106 d:000106 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4705 c:0029 p:0104 s:0103 b:0103 l:000102 d:000102 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727 c:0028 p:0097 s:0098 b:0098 l:000097 d:000097 METHOD C:/ruby/lib/ruby/site_ruby/1.9.1/readline.rb:40 c:0027 p:0051 s:0090 b:0090 l:000089 d:000089 METHOD C:/ruby/lib/ruby/1.9.1/irb/input-method.rb:115 c:0026 p:0016 s:0086 b:0086 l:002604 d:000085 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:131 c:0025 p:0037 s:0083 b:0083 l:000082 d:000082 METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:263 c:0024 p:0011 s:0078 b:0078 l:002604 d:000077 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:130 c:0023 p:---- s:0076 b:0076 l:000075 d:000075 FINISH c:0022 p:---- s:0074 b:0074 l:000073 d:000073 CFUNC :call c:0021 p:0022 s:0071 b:0071 l:000070 d:000070 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:189 c:0020 p:0019 s:0067 b:0067 l:000066 d:000066 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:103 c:0019 p:0026 s:0063 b:0063 l:000062 d:000062 METHOD C:/ruby/lib/ruby/1.9.1/irb/slex.rb:205 c:0018 p:0055 s:0055 b:0055 l:000054 d:000054 METHOD C:/ruby/lib/ruby/1.9.1/irb/slex.rb:75 c:0017 p:0041 s:0050 b:0050 l:000049 d:000049 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:287 c:0016 p:0017 s:0046 b:0046 l:000045 d:000045 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:263 c:0015 p:0027 s:0041 b:0041 l:000024 d:000040 BLOCK C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:234 c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :loop c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:230 c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC :catch c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:229 c:0008 p:0042 s:0022 b:0022 l:002604 d:002604 METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:145 c:0007 p:0011 s:0019 b:0019 l:001a7c d:000018 BLOCK C:/ruby/lib/ruby/1.9.1/irb.rb:69 c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC :catch c:0004 p:0172 s:0011 b:0011 l:001a7c d:001a7c METHOD C:/ruby/lib/ruby/1.9.1/irb.rb:68 c:0003 p:0039 s:0006 b:0006 l:002604 d:00122c EVAL C:/ruby/bin/irb:12 c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH c:0001 p:0000 s:0002 b:0002 l:002604 d:002604 TOP --------------------------- -- Ruby level backtrace information----------------------------------------- C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416:in `chars' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416:in `each' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416:in `inject' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8416:in `_rl_adjust_point' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:8433:in `_rl_find_next_mbchar' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:2790:in `update_line' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3653:in `rl_redisplay' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4580:in `_rl_internal_char_cleanup' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4641:in `readline_internal_charloop' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4705:in `readline_internal' C:/ruby/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727:in `readline' C:/ruby/lib/ruby/site_ruby/1.9.1/readline.rb:40:in `readline' C:/ruby/lib/ruby/1.9.1/irb/input-method.rb:115:in `gets' C:/ruby/lib/ruby/1.9.1/irb.rb:131:in `block (2 levels) in eval_input' C:/ruby/lib/ruby/1.9.1/irb.rb:263:in `signal_status' C:/ruby/lib/ruby/1.9.1/irb.rb:130:in `block in eval_input' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `call' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `buf_input' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:103:in `getc' C:/ruby/lib/ruby/1.9.1/irb/slex.rb:205:in `match_io' C:/ruby/lib/ruby/1.9.1/irb/slex.rb:75:in `match' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:287:in `token' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:263:in `lex' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:234:in `block (2 levels) in each_top_level_statement' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `loop' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `block in each_top_level_statement' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `catch' C:/ruby/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `each_top_level_statement' C:/ruby/lib/ruby/1.9.1/irb.rb:145:in `eval_input' C:/ruby/lib/ruby/1.9.1/irb.rb:69:in `block in start' C:/ruby/lib/ruby/1.9.1/irb.rb:68:in `catch' C:/ruby/lib/ruby/1.9.1/irb.rb:68:in `start' C:/ruby/bin/irb:12:in `<main>' [NOTE] You may encounter a bug of Ruby interpreter. Bug reports are welcome. For details: http://www.ruby-lang.org/bugreport.html This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information.Buried in the middle of the diagnostic is the "object allocation during garbage collection phase" diagnostic.
Note that the crash does not occur if the
d.to_htmlis removed from the input.
flavorjones
Mon Dec 07 06:42:41 -0800 2009
| link
Thanks for reporting. Give me a day or so to reproduce and investigate.
flavorjones
Mon Dec 07 19:45:17 -0800 2009
| link
Note that I can't repro this on Linux with the same ruby, nokogiri and libxml2 versions. Going to try on windows.
sunshineco
Mon Dec 07 20:01:55 -0800 2009
| link
Not unexpected with this sort of problem, it does seem to be a moving target. Today, after restarting the Windows machine, I can still reproduce it, but the circumstances have changed slightly. It now crashes upon the second up-arrow press rather than the third.
I wonder also if the magically and randomly changing
charsetat output is related: perhaps some trashed or uninitialized memory. I have seen several values show up at output time forcharset, including "utf-8", "IBM437", and "US-ASCII" (if I recall correctly), depending upon the input to Nokogiri::HTML() even though all inputs explicitly specify "utf-8" via "meta http-equiv".
sunshineco
Mon Dec 07 20:17:53 -0800 2009
| link
I should note that, in my tests at least, using the
Nokogiri::XML()constructor rather thanNokogiri::HTML()(and emitting viato_s()) sidesteps the problem. Whether this is because the corruption is not occurring in this case or because it is less severe is unknown. Given that the original reporter of this bug was testing withNokogiri::XML::Document.parse(), one might suspect that the problem is still present with XML though manifesting externally less frequently.
flavorjones
Mon Dec 07 20:45:32 -0800 2009
| link
Can you do me a huge favor, and check if this is occurring with Nokogiri 1.3.3?
I ask because the auxiliary DLLs (zlib, iconv, libxml, libxslt) we released with 1.4.0 were from a different source than in previous versions.
sunshineco
Mon Dec 07 23:40:40 -0800 2009
| link
I have not been able to reproduce this crash with Nokogiri 1.3.3 (
nokogiri-1.3.3-x86-mingw32.gem).C:\>nokogiri -v --- warnings: [] nokogiri: 1.3.3 libxml: binding: extension compiled: 2.7.3 loaded: 2.7.3
sunshineco
Mon Dec 07 23:44:48 -0800 2009
| link
Regarding 1.4.0, I also was able to reproduce the crash reliably on Windows Vista with the command:
nokogiri --type html dummy.htmlwhere
dummy.htmlcontains the minimal HTML document indicated earlier.Once
nokogiristartsirb, I enter the following two expressions and then press up-arrow a few times, resulting in a crash.@doc @doc.to_html
sunshineco
Tue Dec 08 00:01:06 -0800 2009
| link
This problem is very much a moving target. Having finished testing 1.3.3, I removed all versions of Nokogiri and re-installed 1.4.0. Following re-installation, I can no longer get it to crash via
nokogiri --type html dummy.html. I also can no longer trigger the crash during website generation, which is how I originally discovered the issue since I could hardly keep it from crashing at that time.The earlier mentioned technique of pasting the sample code into a DOS window running
irb, however, still crashes 1.4.0 reliably when up arrow is pressed a couple times.
flavorjones
Tue Dec 08 20:36:10 -0800 2009
| link
whoop, just reproduced on windows. will update when I have more info.
sunshineco
Tue Dec 08 22:49:04 -0800 2009
| link
Perhaps this ticket should be re-opened? It is still marked as closed.
tenderlove
Wed Dec 09 09:41:19 -0800 2009
| link
Ugh. Apparently even I can't reopen issues. Can we open a new ticket a reference this one?
flavorjones
Wed Dec 09 10:39:16 -0800 2009
| link
Done. Moving the conversation to #188.
sunshineco
Wed Dec 09 12:53:23 -0800 2009
| link
tenderlove
Wed Dec 09 12:58:22 -0800 2009
| link
Wow. Apparently I can't reopen it while viewing the closed ticket. I have to go to the index. :-(
Well, it's reopened now.
sunshineco
Thu Dec 10 01:47:35 -0800 2009
| link
My earlier note where I said that I could side-step the crash by using Nokogiri::XML rather than HTML was indeed apparently just an accidental workaround. I just ran into this same crash in another situation using HTML::DocumentFragment, but replacing it with XML::DocumentFragment made no difference. The crash still occurs.
-
5 comments Created 19 days ago by ibcError when adding a XML fragment containing "xmlns" declaration or prefixed attribute1.4.2xTest_unit showing the issues: http://gist.github.com/253538
This test unit has 5 tests. 3 of them are fixed in my modified Nokogiri:
http://github.com/ibc/nokogiri/commit/1fd50936f1d4d21172f0f8e1ea7f07c888691766The remaining two tests fail in official trunk version and also in my version:
- test_adding_a_fragment_containing_namespaces_declaration_and_childs_using_them_1
- test_adding_a_fragment_with_prefixed_attribute_shold_generate_attribute_with_namespace
:
Nokogiri gives an error when the fragment contains ns declaration:
<frag id="frag_root" xmlns:ns1="urn:ns1" xmlns:ns2="urn:ns2">with this error:
TypeError: can't convert Array into String /..../lib/nokogiri/xml/fragment_handler.rb:37:in `[]='I've inspected it and it occurs because such attribute (which in fact is not an attribute) is converted to an Array ["xmlns", "ns1"] rather than String.
test_adding_a_fragment_with_prefixed_attribute_shold_generate_attribute_with_namespace:
When the new fragment contains attributes with prefix:name Nokogiri fails parsing it since it takes the whole "prefix:name" as attribute name and sets namespace=nil for the attribute.
Comments
Ok, the issue about fragments containing prefixed attributes is because "fragment_handler.rb" doesn't strip the attribute to get its prefix and name.
I'm coding it right now.
flavorjones
Fri Dec 11 06:20:32 -0800 2009
| link
commenting so I am subscribed to updates.
This problem is solved in my fork (branch "namespaces"):
http://github.com/ibc/nokogiri/tree/namespacesPlease take a look to my last comment in #185 in which I describe the full status of my fork and a complete test unit with very exotic cases including fragments with subnodes, namespaces, attributes with namespace and so:
http://github.com/tenderlove/nokogiri/issues/closed/#issue/185/comment/89302
To summarize:
- #185 : In my fork it's ~90% fixed.
- #192 : Still open and it's the 10% remaining in #185 :)
- #190 : Fixed in my fork.
Regards.
flavorjones
Tue Dec 15 05:16:16 -0800 2009
| link
a good place for these comments would be in the tickets to which they relate.
Hi. #192 is an independent bug(IMHO) and #185 is the other report in which I update any change I code in my Nokogiri fork.
In fact my last comment in #185 already explains the full status in detail (it includes also a test unit).The problem is that report #185 remains as "Closed" (while IMHO is active) and I don't know if Nokogiri developers receive notifications for new comments on it.
Regards.
-
1 comment Created 19 days ago by jmcnevinIncluded files referenced by relative paths in xml schemas can't be found.1.4.2xI'm trying to validate a document against the ONIX 2.1 reference schema, which includes the following lines...
<xs:include schemaLocation="ONIX_BookProduct_CodeLists.xsd"/> <xs:include schemaLocation="ONIX_XHTML_Subset.xsd"/>When running the validation, I receive this error:
Nokogiri::XML::SyntaxError: Element '{http://www.w3.org/2001/XMLSchema}include': Failed to load the document 'ONIX_BookProduct_CodeLists.xsd' for inclusion.All of the schema files reside in the same directory. I see a closed ticket that says this problem has been fixed, but I'm still not having any luck on my end.
Comments
tenderlove
Tue Dec 29 10:58:31 -0800 2009
| link
Can you provide some sample Ruby and XML to reproduce this problem? It should be fixed, but I need to see how to reproduce the error you're getting.
-
0 comments Created 18 days ago by ibcXML fragment is wrongly parsed when contains nodes with namespace prefix1.4.2xWhen inserting a fragment containing nodes with namespace prefix Nokogiri parses it incorrectly. A real use case:
doc = Nokogiri::XML <<-EOXML <root xmlns:ns1="urn:ns1"> </root> EOXML new_node = <<-EOXML <ns1:child_a /> <ns2:child_b /> EOXML frag = doc.root.fragment(new_node) doc.root.add_child(frag) puts doc.to_xmlThis code generates this output:
<?xml version="1.0"?> <root xmlns:ns1="urn:ns1"> <ns1:child_a> <ns2:child_b/></ns1:child_a></root>As you can see, "child_b" node is a child of "child_a" while it should be a sibling node.
I've done more tests with same result (removing spaces, break lines...).
My conclusion is that it only fails when a node has namespace prefix. Then all its sibling nodes are parsed as childs.Comments
-
3 comments Created 18 days ago by sunshineco1.4.2xXML::EntityReference migrationtenderlovexGiven the following input:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> </head> <body> <p>Foo <a href="/" title="François">François</a> bar.</p> </body> </html>The output of
Nokogiri::XML(example_input).to_sis:<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> </head> <body> <p>Foo ç<a href="/" title="Franois">François</a> bar.</p> </body> </html>Notice that the
çentity reference has migrated from thetitle="..."attribute to a position just before the<a>element. The entity reference has become a sibling ofFooand<a>.Is this a misunderstanding on my part and an abuse of
XML::Documentor a Nokogiri bug?Comments
tenderlove
Tue Dec 29 10:59:36 -0800 2009
| link
Could be a bug in libxml2. I need to investigate
sunshineco
Tue Dec 29 11:07:50 -0800 2009
| link
I should note that this problem manifests when processed via
Nokogiri::XML(example_input).to_sbut not withNokogiri::HTML(example_input).to_s.
tenderlove
Tue Dec 29 11:27:04 -0800 2009
| link
Ah, that is interesting. I suspect a problem in libxml2. Thanks for the update!
-
I've been running into some weirdness in switching from Nokogiri 1.3.3 to 1.4.1.
I've written a small script to demo the namespace weirdness.
If you run it using 1.4.1, hopefully you should see whats wrong. oai_dc:dc element namespaces beginning with xmlns aren't being properly added.
Now, here is the weirdness: It's exactly the same code, but in one the namespaces are strings, the other they are symbols!
Huh?
Furthermore, things start working fine if I
- remove any element from the builder, like "xml.identifier"
- move the strings xml generation before the symbols xml generation
- keep namespaces as symbols, but change the attribute :verb on xml.request to a string
These all seem very random. Is this desired behaviour, or some very weird quirk I was unlucky enough to stumble upon?
Comments
-
3 comments Created 7 months ago by flavorjonesFFI ruby object caching should be rewritten to not use id2refffixid2ref is slow and may be turned off by default in JRuby 1.4.
discussed with wmeissner, and the probable path is to build an API into FFI that is an hash table containing address => weakref(ruby_object).
Comments
nicksieger
Mon Nov 16 13:32:31 -0800 2009
| link
FYI, id2ref (and objectspace) is turned off by default in JRuby. We made a conscious decision to do this because it's expensive and not feasible to manage all live objects with JRuby.
nicksieger
Mon Nov 16 13:59:28 -0800 2009
| link
Also: tools to help implement the caching in Java/JRuby:
http://java.sun.com/javase/6/docs/api/java/lang/ref/WeakReference.html
http://java.sun.com/javase/6/docs/api/java/util/WeakHashMap.htmlNote the last item may not be exactly what is needed, it's a map w/ weak keys, not a map that weakly references its values.
flavorjones
Mon Nov 16 14:07:16 -0800 2009
| link
Nick, thanks for the pointers (no pun intended).
-
1 comment Created 7 months ago by flavorjonesFFI: support varargs in error/exception callbacksffixwe should open JIRA tickets for vararg support in FFI callbacks
then we should format the libxml error messages properly in the error/exception callbacks
Comments
flavorjones
Sun Jun 21 19:38:17 -0700 2009
| link
@tmm1 poked me about this. I'll open a ticket for it tonight.
-
0 comments Created 5 months ago by flavorjonesFFI needs unlinkedNodes to be optimizedffixExtension commit f34f3bd needs to be ported.
Comments
-
ERROR In Opening and ending tag mismatch: link line 107 and head
1 comment Created about 17 hours ago by ganeshkumarI am getting this error when i am trying to use from console
the steps followed below
ruby script/console
user = User.post(:register,params)i am getting below erro
Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: link line 107 and head
Comments
tenderlove
Tue Dec 29 08:54:33 -0800 2009
| link
Do you think you can narrow down this error? The code example you've given has nothing to do with Nokogiri.
- 1.2.3▾
- 1.3.0▾
- 1.3.1▾
- 1.3.2▾
- 1.3.3▾
- 1.4.0▾
- 1.4.1▾
- 1.4.2▾
- REE▾
- ffi▾
- flavorjones▾
- jruby▾
- libxml2▾
- namespace-confusion▾
- tenderlove▾
- unclear▾
- Apply to Selection
-
Change Color…
Preview:preview
- Rename…
- Delete





The markdown doesn't preserve my line breaks; hope the problem is clear.
I'd like to undefine "content=" for the EntityDecl class. GDOME2 and libxml2 treat entity declarations as read-only elements, and so it's non-trivial for Nokogiri to support modification.
However, it seems like the non-implementation of "content=" isn't the root cause for your request. Correct me if I'm wrong, but what you're trying to do is construct new entity declarations and add them to a new document. Is this correct?
Nokogiri currently has no support for the creation of new entity declarations. If this is what you're asking for, please respond in this ticket.
I need to rewrite some namespaces in OWL ontologies serialized as RDF/XML. They appear in entity declarations, namespace definitions, attributes, and comments. If modifying entity declarations and namespace definitions in place isn't kosher, I'd be happy with the ability to create new ones as long as I can replace (not merely add to) existing ones.
ok, I've changed the name of this issue to "allow creation and deletion of Entity Declaration elements". we'll try to get it into 1.4.1.