tenderlove / nokogiri
- Source
- Commits
- Network (51)
- Issues (14)
- Downloads (21)
- Wiki (7)
- Graphs
-
Branch:
master
Pledgie Donations
Once activated, we'll place the following badge in your repository's detail box:
Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser with XPath and CSS selector support. — Read more
-
Node#children returns an undecorated NodeSet
1 comment Created 14 days ago by flavorjonesIn general, our decoration of NodeSets could be cleaned up
Comments
-
1 comment Created 14 days ago by flavorjonesffixMake FFI pass tests again (1.4.1)flavorjonesxSigh. C extension is out of sync with FFI again.
Comments
flavorjones
Tue Dec 15 05:14:18 -0800 2009
| link
never mind. this is the in context parsing that just needs to be ported.
-
3 comments Created 14 days ago by flavorjonesDocumentFragments should support decorators like a DocumentflavorjonesxI'd like this for Loofah so I don't have as much special code for DocumentFragments, and so the nodes in a fragment have scrub! methods just like document nodes.
Comments
flavorjones
Tue Dec 15 05:26:13 -0800 2009
| link
Added test coverage, and decorators are working find on fragments, their nodes and nodesets.
flavorjones
Tue Dec 15 05:39:32 -0800 2009
| link
Ah! Looks like children() is not properly decorated. Looks like we could do some cleaning up of node set decorating in general.
flavorjones
Tue Dec 15 06:38:04 -0800 2009
| link
I'll open a new ticket. Issues collaborator fail. Again. Sigh.
-
Is there a way to estimate memory usage of a Nokogiri:XML:Document at runtime?
Comments
tenderlove
Sat Dec 12 17:54:25 -0800 2009
| link
Not at this point.
In the future, if you have a questions (such as this) rather than bugs, please send an email to the mailing list. Thanks.
Below is the mailing list address:
-
called out by Nick Sieger: http://www.engineyard.com/blog/2009/xml-parsing-in-ruby/
Comments
flavorjones
Fri Dec 11 06:16:58 -0800 2009
| link
Ya, never mind, he should use the Node constants.
-
After working in bug #185 I've realized that inserting a fragment could never be really robust and exact.
In my brach http://github.com/ibc/nokogiri/commit/1fd50936f1d4d21172f0f8e1ea7f07c888691766 I've improved it a bit so now parent node namespace scopes are inspected instead of root node namespace declarations. However this doesn't avoid the case in which the new fragment contains namespace declarations.
In fact, when the new fragment contains attributes with prefix:name Nokogiri fails parsing it since it takes the whole "prefix:name" as attribute name and sets namespace=nil for the attribute.
Also, when the new fragment contains namespace declaration ("xmlns:ns='urn:ns'") Nokogiri fails and creates an Exception because such attribute (which in fact is not an attribute) is converted to an Array rather than String so:
TypeError: can't convert Array into String /PATH_TO_NOKOGIRI/lib/nokogiri/xml/fragment_handler.rb:55:in `[]='(Note: the line 55 is in my branch after rewritting "fragment_handler.rb").
So there are too much issues very difficult to solve. Then I suggest a new/different approach:
- We have a XML document parsed into 'doc'.
- We get the parent node ('parent') in which we'll insert the fragment WITHOUT converting it to DocumentFragment (just as raw String containing XML).
- Then Nokogiri inserts verbatim (no inspection at all) the fragment (XML string) and generates the XML string of the whole document (doc.to_xml). This is: we get an string which is the original 'doc' with the new fragment inserted into the parent node as raw.
- Then Nokogiri parses the output and we get the document with the new fragment correctly parsed.
Does it sound feasible? IMHO this's the only secure way to insert a node.
Comments
tenderlove
Wed Dec 09 14:27:58 -0800 2009
| link
This sounds like a great feature to add to your client application. I think this can be accomplished without being in mainline nokogiri. Please implement it, and open a new ticket when you feel it's ready to be merge to the main line.
I don't know how a feature that improves Nokogiri could just be useful for me, specially when it tries to fix real issues in fragment management.
For sure I will try to code it. I just would like to ask for some help or tip, specially since I don't know how feasible is to insert verbatim a raw XML string into the string generated by Document#to_xml.
I would really appreciate any help on it.PS: I forgot to add a link to a test_unit which shows the errors I commented in this bug report:
http://gist.github.com/252738Should I open a different report for it?
Thanks. -
see #109 for more details.
Comments
flavorjones
Mon Dec 14 20:03:07 -0800 2009
| link
never mind. #109 is reopened.
-
Imagine this XML:
<root xmlns="urn:default" xmlns:att1="urn:att1" xmlns:att2="urn:att2" att:name="'value1" att2:name="value2" />If I get 'attribute_nodes' for root I get:
irb> doc.xpath("/*").first.attribute_nodes [#<Nokogiri::XML::Attr:0x9ae188 name="name" value="'att' prefixed attribute">, #<Nokogiri::XML::Attr:0x9ae16c name="name" namespace=#<Nokogiri::XML::Namespace:0x9ad4f4 prefix="att2" href="urn:att2"> value="'att2' prefixed attribute">]but If I get attributes with 'attributes' method it fails:
irb> doc.xpath("/*").first.attributes {"name"=>#<Nokogiri::XML::Attr:0x9ae16c name="name" namespace=#<Nokogiri::XML::Namespace:0x9ad4f4 prefix="att2" href="urn:att2"> value="'att2' prefixed attribute">}It seems that 'attributes' returns a Hash having each attribute name as key. This fails when two attributes have same name (even if prefix is different) since the last one replaces the first one in the hash (same key).
Comments
Added test unit: http://gist.github.com/252531
tenderlove
Wed Dec 09 08:38:40 -0800 2009
| link
This is a known issue. attributes is a convenience method that works well for most people. If you're dealing with namespaced attributes, you must use the attribute_nodes method.
-
An option to force namespace binding for all nodes
2 comments Created 21 days ago by ibcI work in a project which just manages XML documents with namespaces for all the nodes. If the document (or a node into it) doesn't have a default namespace and there is a node with no prefix into it, then it's a wrong XML document for me.
However Nokogiri (libxml2 in fact) allows XML nodes without namespace binding which, for sure, it's useful for other environments not so strict.
So I would like to ask for some option in XML::ParseOptions to require namespace bindings for all the nodes into the document.
Comments
tenderlove
Mon Dec 07 08:49:32 -0800 2009
| link
No. This is what DTD validation is for. Nodes with no namespace are perfectly valid. You need to validate your document against a DTD before processing if you're having that problem.
-
30 comments Created 24 days ago by ibcDon't modify namespace prefixes for new fragments added into the documentunclearxRecently a commit fixed a bug when inserting a fragment with namespace prefixes into a node:
http://github.com/tenderlove/nokogiri/commit/597195ff8fe471e5350581c2d5cce704fcf87439
However it's coded under wrong assumptions (IMHO). Imagine this case:
<?xml version='1.0' encoding='UTF-8'?> <cp:ruleset xmlns="default.ns" xmlns:cp="urn:common-policy"> <cp:rule id="empty"> </cp:rule> <cp:rulesetI want to insert a node:
<many id="all"/>into
<cp:rule id="empty">node.
Note that the new node has no NS prefix but it's not required at all since the XML document has a default namespace (xmlns="default.ns").
However when inserting this new node Nokogiri converts it to:
<cp:many id="all"/>.This is obviously wrong for sure. I want that the new node belongs to the NS "default.ns" so I have to avoid using NS prefix.
Note that this has nothing to do with Xpath. This is, to get the new inserted node (if Nokogiri would insert it without adding a wrong "cp" prefix) I must use namespaces into the Xpath query:
new_node = doc.xpath("ccpp:ruleset/ccpp:rule/yuhu:many", {"ccpp"=> "urn:common-policy", "yuhu"=>"default.ns"}So the namespaces used in the Xpath string are just useful for Nokogiri/libxml2 to search into the XML document, just it. But when insert a new node, its namespace prefixes don't require to match those used in the Xpath.
In the thread about the above commit, Mike Dalessio said 3 assumptions:
http://groups.google.com/group/nokogiri-talk/browse_thread/thread/f7f6509ad14ce340I would like to fix them:
1) "document fragments should not have a namespace, by default"
This is wrong. They could have or not, and that just depends on the existing XML document and the NS it uses. In case the document has a default namespace for the new node then this node must contain no NS prefix (as shown above in the example).
2) "if a namespace is specified in the node fragment (like your <cp:one> fragment above), Nokogiri should check if the prefix matches any of the namespace definitions on the document root node. If it finds a match, the node should have that namespace. So in your above example, the node name would be "one" under the namespace with the prefix "cp"."
This assumption fails. Namespace definitions on the document root node don't matter. Instead the namespace definitions on the parent node are the only important ones.
Imagine this XML:<?xml version="1.0"?> <foo xmlns="urn:test:default-namespace"> <ns1:bar xmlns:ns1="urn:test:namespace1-uri" xmlns="urn:test:namespace1-uri"> <baz/> <ns2:baz xmlns:ns2="urn:test:namespace2-uri"/> </ns1:bar> <ns3:hi xmlns:ns3="urn:test:namespace3-uri"> <there/> </ns3:hi> </foo>NS definitions on root node is just xmlns="urn:test:default-namespace". However, NS definitions on node are the following:
"xmlns"=>"urn:test:namespace1-uri", "xmlns:ns1"=>"urn:test:namespace1-uri"This already works with Nokogiri when using the following Xpath query:
doc.xpath("df:foo/df2:bar/df2:baz/namespace::*', {"df"=>"urn:ietf:params:xml:ns:common-policy", "df2"=>"urn:test:namespace1-uri"})3) "if a namespace is specified in the node fragment but does NOT match any of the namespace definitions on the document root, then the prefix will be silently ignored (which is libxml2's default behavior when parsing documents)."
If this occurs it means that the insert operation is wrong and the only way to know it is by validating the XML against its schema. Leaving it with no NS prefix doesn't mean that the resulting XML is correct.
So IMHO what Nokogiri should do is very easy: just nothing.
When inserting a node Nokogiri shouldn't check the node namespace prefix, neither try to guess the appropriate one, neither replace it. It's the client responsability to use the appropriate namespaces prefixes.
For this, the client can get the parent node namespaces definitions used in the XML document by using Xpath with "namespace::*" (as explained above).Finally, I tell that this wrong behavior of Nokogiri is breaking my application since I need to insert a node with no NS prefix into a parent node which has NS prefix (as in my first example), but Nokogiri corrupts the resulting XML by adding the parent NS prefixes to the new node.
Comments
Let me correct myself when I said:
"So IMHO what Nokogiri should do is very easy: just nothing. When inserting a node Nokogiri shouldn't check the node namespace prefix, neither try to guess the appropriate one, neither replace it."
I understand that this is not possible as Nokogiri requires all the nodes having correct NS prefixes (this is, prefixes matching the node namespace bindings). So it's not possible for Nokogiri to insert a node with unknown NS prefix into the document.
So IMHO what Nokogiri should do is:
If the new node has NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains these prefixes:
a) If so, insert the new node verbatim and that's all.
b) If new node prefixes (or absense of them) don't match parent node NS definitions (including a possible default NS for this parent node) then Nokogiri could do the following:
b1) Remove the new node prefixes (as it does now). However in a XML in which NS is required for all the nodes this would just work if the parent node (and not the root node) has a default namespace (and anyhow this doesn't ensure that the XML would be valid when checking its XML schema).
b2) Raise an exception. It should just occur if the document doesn't allow nodes without NS (it could be default NS in the parent node so no prefix is required).
I really like (and need) option "b2" working as in my project I work with strict XML documents requiring namespace por all the nodes. However for this to work it would be required a new parsing/document option like "REQUIRE_NS_FOR_ALL_NODES".
So when "REQUIRE_NS_FOR_ALL_NODES" isenabled:
If the new node has NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains these prefixes:
a) If so, insert the new node verbatim and that's all.
b) If not, raise an exception (like "WrongNamespacePrefix").If the new node has no NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains a default namespace.
a) If so, insert the new node verbatim and that's all.
b) If not, raise an exception (like "WrongNamespacePrefix").When "REQUIRE_NS_FOR_ALL_NODES" is dissabled (as the current Nokogiri's behavior) then:
If the new node has NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains these prefixes:
a) If so, insert the new node verbatim and that's all.
b) If not, remove the prefixes. But never attemp to replace the prefix with others.If the new node has no NS prefixes then check that the parent node (and not the doc root node) namespaces definitions contains a default namespace.
a) If so, insert the new node verbatim and that's all (Nokogiri FAILS in this point as I described in the report!!!).Again, I strongly suggest never to replace or add a NS prefix into the new node. Inserting a valid node is responsability of the user. Also, Nokogiri FAILS when adding a NS prefix as reported above.
flavorjones
Mon Dec 07 07:20:58 -0800 2009
| link
Hi,
The Nokogiri core developers practice "test driven development". This
means that, when we want to change Nokogiri's behavior, either by
adding a feature or fixing a bug, we first write a test (or tests)
that completely specifies the new behavior. You can see all of these
tests under the /test directory in Nokogiri's source respository.We like very much to get bug reports and feature requests that include
one or both of the following:- An indication of which existing test(s) should be changed
- New (failing) test(s), specifying the desired behavior
Or, less desirably, we would like to see clear, runnable sample code
that is easily convertible into a failing test.The reason we like to see these tests in a bug report is because the
core developers (who maintain Nokogiri in their spare time, for free),
do not have to spend time trying to translate imprecise English into
Ruby code.Secondarily, a test case indicates to us that the reporter has thought
through the issue completely, and cares enough about the issue to have
taken the time to present the argument clearly and concisely in the
lingua franca of the Ruby community (i.e., Ruby).Why am I bringing this up now? This issue, so far, contains over 1100
words and no test cases. Although there are a few snippets of XML and
Ruby code, I do not consider any of them to be a complete runnable
example or test.I do not doubt there is clear logic in your argument; I just don't
have the time to read it all and translate it into
requirements. Explaining your issue in Ruby tests instead of English
will be clearer, more dense and will better communicate your issue to
the core developers.Thank you for reporting this issue. I am looking forward to seeing
your failing tests, and better understanding what you are asking the
core developers to build for you.Cheers.
Ok, I've sent a mail to Nokogiri maillist containing a test_unit describing the issue I mean.
However the issue is not easy to describe just with unit cases so I strongly suggest, please, to read my above report after checking the test unit.To simplify a lot all the report: Nokogiri should NEVER add/guess/replace the NS prefixes of a new inserted node. It's responsability of the user/client to use the appropriate ns prefixes.
test unit created: http://gist.github.com/250927
flavorjones
Mon Dec 07 09:04:48 -0800 2009
| link
Thank you for the two test cases you submitted. I think I understand the first one, which states that a new fragment, without a namespace declared in it, should be given the default document namespace (if there is one).
I do not understand the second test case. It is nearly identical to the first, and specifies behavior in which a non-default namespace is applied to the fragment. Can you please explain why this behavior is expected?
I think I understand the first one, which states that a new fragment, without a namespace declared in it, should be given the default document namespace (if there is one).
And in case the document has no default namespace then the new node has no namespace binding, so adding parent node's namespace is wrong (or unexpected since Nokogiri has no way to determine the appropriate prefix, neither if the node must or not have a prefix).
I do not understand the second test case. It is nearly identical to the first, and specifies behavior in which a non-default namespace is applied to the fragment. Can you please explain why this behavior is expected?
The second case shows an error of the user who is trying to insert a node belonging to "urn:strint-rules" namespace using wrong prefix (no prefix in this case).
So from Nokogiri's point of view the new node belongs to parent node default namespace (I will comment this later as it's buggy IMHO) but Nokogiri adds "cp" prefix.This is, the XML will get wrong since the user is inserting a wrong node (wrong prefix). Anyhow the XML should get as follow:
<?xml version="1.0" encoding="UTF-8"?> <cp:ruleset xmlns="default.ns" xmlns:cp="urn:common-policy" xmlns:sr="urn:strint-rules"> <cp:rule id="1"/> <cp:rule id="2"/> <sr:strict_rule name="sr1"/> <condition id="1" name="I belong to default namespace"/> <cp:rule id="3"/> <sr:strict_rule name="sr2"/> <strict_rule id="3" name="I'm wrong as I should have 'sr' prefix"/> </cp:ruleset>But Nokogiri adds parent node prefix "cp" which, as in case 1, is wrong.
flavorjones
Mon Dec 07 09:22:34 -0800 2009
| link
You have not clarified the second test case at all. It appears to me that you are conflating DTD validation with the construction of a document.
Am I correct in assuming that you think Nokogiri should "know" that <strict_rule> belongs in the "sr" namespace because of some sort of DTD declaration?
No, I just say that Nokogiri should insert the new "strict_rule" node without prefix (as always) and its namespace binding should belong to "default.ns".
Yes, the resulting document would be incorrect according to DTD in case "default.ns" namespace has no element called "string_rule", but Nokogiri shouln't care of it at all.
I would like to add something important: After inspecting how Nokogiri inserts a new fragment (by doing 'doc.fragment' and so) I must say that it's 100% wrong.
Nokogiri tries to detect the new node namespace bindings by running "doc.fragment(new_node)" and it just inspects root node namespace bindings. This is wrong. It's not possible to know the appropriate bindings for a new node until we already know the exact parent node in which it will be inserted.
So IMHO Document#fragment method should entirely dissapear and instead Node#fragment should exist. Then the way to insert a node would be:
1) parent_node = xml.xpath("/ns1:root/ns1:child/ns2:list", {"ns1"=>.... 2) frag = parent_node.fragment(new_node) 3) parent_node.add_child(frag)In point 1 we get the parent node in which we want to insert the new node.
In point 2 Nokogiri inspects the appropriate namespace bindings for the new node according to the parent node. To do it, the best way I know is by feching parent node namespaces with Xpath:parent_node_ns = xml.xpath("/ns1:root/ns1:child/ns2:list/namespace::*", {"ns1"=>....This returns an array of Namespace objects so then Nokogiri can inspect the new_node prefixes and match them against the list of Namespace objects.
Please note that taking the root node namespaces is completely wrong. For example take a looko to this XML:
<?xml version='1.0' encoding='UTF-8'?> <root xmlns="urn:ns1"> <ns2:child xmlns:ns2="urn:ns2" xmlns="urn:ns3"> <elem name="I belong to 'urn:ns3'"/> </ns2:child> </root>- "root" node has no prefix but default ns "urn:ns1".
- "ns2:child" declares a new default ns for its children ("urn:ns3") which replaces the root node default namespace.
- So "elem" belongs to "urn:ns3" rather than "urn:ns1".
I've updated my test_unit and now it shows more cases, including two more to explain and prove what I said in my last comment:
flavorjones
Mon Dec 07 13:50:23 -0800 2009
| link
Please rewrite these tests in the style you see in test/xml/test_document_fragment.rb.
I am more confused now than when we started. I am pretty sure that is not what you intended, so please try to explain your issue again, in a simpler / smaller / more concise example.
Ok, I must recognize that this issue is very complex as it gets deep into XML's most exotic cases when handling namespaces.
Let's start from the beginning. The following test_unit is simpler (I hope). Let me know your opinion. Thanks.
flavorjones
Mon Dec 07 20:42:25 -0800 2009
| link
I have rewritten your tests, removing unnecessary markup and code (there was quite a bit that was unnecessary). I removed the last test, since it was redundant, and added a new test case for clarity (and to demonstrate how fragment namespaces currently work).
Please review https://gist.github.com/c149f6b74f811b6b93ae and let me know if these test cases accurately reflect your desired implementation.
Thanks, it's ok. However I've added one more test to your file:
The new test (last one) tries to probe than we should always speak about "parent node namespaces binding" rather than "root node namespaces" or "parent node namespaces".
Node namespaces bindings are got as follows:
- Take the namespaces of root node (level1).
- Take the namespaces of level2 and add them to the the previous list (replacing those matching the prefix).
- Take the namespaces of level3 and add them to the the previous list (replacing those matching the prefix).
- So we get the namespaces bindings for node level3 which is different than the namespaces declared in node level3.
Xpath and Libxml2 (and Nokogiri) allows getting node namespaces bindings by running:
doc.xpath("ns1:level1/ns2:level2/ns3:level3/namespace::*)This gets an array of namespace definitions corresponding to level3 namespaces bindings, which is different than just the namespaces definitions existing in level3 (as explained before).
flavorjones
Tue Dec 08 05:04:07 -0800 2009
| link
Hello. This last test is redundant, but I will include it if it
makes you happy.You are explaining things in English again. I promise you, I am
not reading any of your explanations of functionality. I am only
reading what you code. Show me, don't tell me.I am trying (and have been trying) to explain to you that the
spec code you are writing is your only means of communicating
changes in Nokogiri behavior to the core team. Test-driven
development: Development is driven by tests.So, is https://gist.github.com/c149f6b74f811b6b93ae a complete
specification of the behavior that you desire? If so, then we can
start to discuss:- whether this is desired behavior
- whether this is feasible to implement given current design and libxml limitations
- whether it conflicts with the current behavior, thereby possibly breaking existing applications
I hope, you are getting a sense of how the Nokogiri core
developers prefer to go about their work. I think you'll find
that most of the Ruby community does things this same way.Please let me know your thoughts.
Well, I don't agree too much with the names you set to the tests:
- test_adding_a_fragment_should_use_the_document_default_namespace_when_root_is_nondefault
- test_adding_a_fragment_should_use_the_document_default_namespace_when_root_is_default
- test_adding_a_fragment_should_have_no_namespace_when_root_has_no_namespace
I just wanted to clarify that "document default namespace" (or "root node namespace" which has same meaning here) should NEVER be considered when inserting a new node, and instead "parent node namespaces bindings" should just be inspected (which is different than "parent node declared namespaces"). Just it. I'm sorry but I don't know how to clarify this with a test unit (this is what I tryed to explain with my lastest test though).
But if we agree on it then your test sounds good for me. So starting to discuss:
- Yes, this is the desired behavior.
- I expect this is feasible to implement as libxml2 (and Nokogiri) allows feching node namespaces bindings (using "namespace::") so Nokogiri can use this info to set the appropriate namespaces in the new inserted node. But for this, 'Document#fragment' should be removed and instead 'Node#fragment' should exist, since the new node namespaces bindings depends on its parent node* rather than on the root node (see "test_adding_a_fragment_should_use_parent_node_namespaces_rather_than_root_node_namespaces").
- I hope this doesn't break any existing application. It just fixes some corner cases (well, no so "corner") and IMHO nobody should expect a wrong result from Nokogiri.
I hope this is a good point to start. Regards.
I've created a new test unit just to probe that "node namespaces bindings" is diffetent than "node declared namespaces":
http://gist.github.com/251636
flavorjones
Tue Dec 08 06:03:13 -0800 2009
| link
Yes, this is the desired behavior I expect this is feasible to implement
Sounds like you've got everything covered, and you don't need my advice. Please submit a patch when your implementation is complete and all tests pass.
Thanks for using Nokogiri.
Please, I just meant that I expect it to be feasible since libmlx2 and Nokogiri already implements fetching node namespaces correctly. I don't intend to say that it's easy to implement. I'm really sorry if my words weren't the most appropriate.
I can try to help in testing and specifications but I've no enough level to code such feature by myself. Please excuse my limited English.
ok, I'm already working on it. I've found where the issue is: in "xml/fragment_handler.rb" as it assumes it must inspect namespaces into document root. This is not valid since it's not possible to determine the namespaces fo a new node without knowing which will be its parent node.
So it's required a new method Node#fragment:
def fragment tags DocumentFragment.new(self, tags) endSo I'm also modifying DocumentFragment and FragmentHandler so Nokogiri could get the namespaces binding for the parent node in which the element will be inserted.
I just have a doubt I don't know how to achieve:
Let's assume I've a Node element called "parent_node" and I want to get its namespaces bindings, this is, the same output as if I run:
doc.xpath(path_to_parent_node + "/namespace::*", ns)So I would get an Array of XML::Namespace (please check http://gist.github.com/251636).
>How could I get the same Array just by having the Node "parent_node" rather than the xpath expression?
Please help me with this. I think I could fix the problem.
Thanks.
flavorjones
Tue Dec 08 11:14:14 -0800 2009
| link
i will write a new method (which will require C code) to give you back the namespaces that are in scope for a node. In the meantime, though, you can use node.xpath("./namespace::*") as a substitute.
flavorjones
Tue Dec 08 15:12:19 -0800 2009
| link
See branch 'namespaces' in tenderlove's repo. the method Node#namespaces returns a hash of all namespaces in scope for a node. Previously it returned only namespaces declared on the node.
That change will not be in 1.4.1, but will probably be in 1.4.2.
Thanks, I'm inspecting it right now.
I'm modyfing DocumentFragment so it creates a FragmentHandler with 3 parameters (instead of 2):FragmentHandler.new(parent_node, self, tags)Then in FragmentHandler#initialize I store 'parent_node' into @parent_node attribute.
Then I would use your new code to get namespaces bindings for the current parent node and use them to set the namespaces for each element in the new node.I'll work on it tomorrow. Thanks a lot.
I think I've a working fix for this issue. Please check the branch "namespaces" in my forked Nokogiri:
http://github.com/ibc/nokogiri/commit/1fd50936f1d4d21172f0f8e1ea7f07c888691766With this code, the following test units are passed:
http://gist.github.com/252440Note that in these tests I use "Node#fragment" rather than "Document#fragment".
Previously, "Node#fragment" was an "alias" of "Document#fragment", but in my new code this method uses the current node (parent node) to get the namespace scopes and use them for the new fragment.About the official test units, with the new code just one fails ("test_fragment_namespace_resolves_against_document_root"), but it works if we replace:
frag = doc.fragmentwith:
frag = doc.root.fragmentThe fact is that IMHO "doc.fragment" should be deprecated as fragments belong to nodes rather than documents.
A workaround wouuld be creating a "Document#fragment" method with just calls to "self.root.fragment".Please let me know your opinion about the commit. Thanks a lot.
There is still a corner case in which inserting a node would fail: when the new fragment node also contains namespace declarations.
For this, it's required to insert the fragment top level node, get its namespace scopes and use them when inserting its children (and so on).
I'll work on it.I've open a new report #189 with the suggestion of a different approach to insert a fragment.
Could this report be open again please? It's now "closed" but has activity and commited code :)
I've done a lot of improvements and now Nokogiri allows fragments containing namespace declarations and subnodes using them, and also subnodes containing namespace declaration and their subnodes using them. And also handling prefixed attributes with the namespaces declared into the fragment:
http://github.com/ibc/nokogiri/tree/namespacesI've created a "total" test_unit for this issue:
http://gist.github.com/254109There are 10 tests, some of them very exotic.
7 of them fail under current Nokogiri HEAD.
Just one fails under my fork and it fails due to a reported bug #192.So if you can help me with bug #192 then I expect that inserting a complex fragment would work really well :)
Thanks.
Hi, have you developers had a chance to check my commit and test unit?
Is there any update or comment for this report?If you need me to provide more data (along with the already provided test cases and patch) please ask it to me.
Thanks a lot.
PS: Could this report be open again please?
- An indication of which existing test(s) should be changed
-
http://github.com/bcardarella/nokogiri-issue
Migrate the test database and run the Cucumber feature in JRuby:
jruby -S cucumber features/book.feature
This works in MRI
Change the test database adapter to 'sqlite3'
Then:
cucumber features/book.feature
Comments
bcardarella
Fri Dec 04 15:37:58 -0800 2009
| link
Invalid, pass -X+O
jruby -X+O -S cucumber features/book.feature
bcardarella
Fri Dec 04 15:38:12 -0800 2009
| link
closing
tenderlove
Sat Dec 05 10:41:00 -0800 2009
| link
Apparently Mike knows about this, so I'm closing it.
-
1 comment Created 24 days ago by flavorjones1.4.1xalias next= and prev= for add_next_sibling and add_previous_siblingflavorjonesxalso maybe alias next -> next_sibling and previous -> previous_sibling
Comments
flavorjones
Fri Dec 04 12:10:58 -0800 2009
| link
aliasing Node#next= and Node#previous= to Node#add*sibling(). closed by 6b6bf52.
-
nokogiri.bundle: mach-o, but wrong architecture
2 comments Created 25 days ago by varandasiWhen installing in snow leopard 10.6.2:
LoadError: dlopen(/Library/Ruby/Gems/1.8/gems/nokogiri-1.4.0/lib/nokogiri/nokogiri.bundle, 9): no suitable image found. Did find:
/Library/Ruby/Gems/1.8/gems/nokogiri-1.4.0/lib/nokogiri/nokogiri.bundle: mach-o, but wrong architecture - /Library/Ruby/Gems/1.8/gems/nokogiri-1.4.0/lib/nokogiri/nokogiri.bundle from /Library/Ruby/Gems/1.8/gems/nokogiri-1.4.0/lib/nokogiri/nokogiri.bundle from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `require' from /Library/Ruby/Gems/1.8/gems/nokogiri-1.4.0/lib/nokogiri.rb:13Comments
tenderlove
Fri Dec 04 11:26:09 -0800 2009
| link
closing.
-
Add a method Node#add_at_pos to insert a node into a specific position
1 comment Created 25 days ago by ibcLet's imagine a XML document like:
<?xml version='1.0' encoding='UTF-8'?> <cp:ruleset xmlns:cp="urn:ietf:params:xml:ns:common-policy"> <cp:identity> <cp:one id="sip:alice@example.org"/> <cp:one id="sip:bob@example.org"/> <cp:one id="sip:carol@example.org"/> </cp:identity> </cp:ruleset>I want to insert a new node
<cp:one id="sip:new@example.org"/>into the second position (between "alice" and "bob").
For now it's required to play with Xpath to searh for the nodes into <cp:identity>, take the first node and run "add_next_sibling(first_node)".
It would be great a new method Node#add_at_pos(node, position, force=false) so:
- 'node' is the new node to add.
- 'position' is the position the new node will take.
- If 'force' is true then the method would insert the node into the last position if 'position' is greater than the number of nodes + 1. When false it would raise an exception (i.e. "WrongIndex").
So the above operation would be:
fragment = doc.fragment('<cp:one id="sip:new@example.org"/>') parent_node = doc.xpath("cp:ruleset/cp:identity", @ns).first parent_node.add_at_pos(fragment, 2)Comments
flavorjones
Thu Dec 03 21:54:30 -0800 2009
| link
XML::Node#add_child now accepts an optional +position+ argument. Closed by fec1b08.
-
<br /> surrounded by " is lost when parsing an HTML
2 comments Created 26 days ago by mironovI think it's something similar to issue #178
When <br /> is inside ", Nokogiri seems to remove it when the fragment is parsed.
Here's an irb session demonstrating the issue:>> worked = "test<br/>test" => "test<br/>test" >> Nokogiri::HTML::DocumentFragment.parse(worked).to_xhtml => "test<br />test" >> failed = ""test<br/>test"" => ""test<br/>test"" >> Nokogiri::HTML::DocumentFragment.parse(failed).to_xhtml => "\"testtest\""libxml 2.7.3
nokogiri 1.4.0Comments
flavorjones
Thu Dec 03 21:04:44 -0800 2009
| link
this issue was addressed by the fix for #178.
flavorjones
Thu Dec 03 21:04:50 -0800 2009
| link
added test coverage for fragments with leading entity. closed by fe0570b.
-
Code to reproduce
>> Nokogiri::HTML(%(<a>tag</a> <a href="test\0test"> you don't see me!)).to_html => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n<a>tag</a> <a href=\"test\"></a>\n</body></html>\n"\0 outside works though.
My ruby version is 1.8.7, nokogiri version is 1.4.0, libxml version is 2.7.5.
Comments
flavorjones
Thu Dec 03 20:33:50 -0800 2009
| link
this is an issue with all versions of libxml2, at least back to 2.6.16
flavorjones
Thu Dec 03 20:59:11 -0800 2009
| link
this is because a null byte is treated as a string terminator in C. you should work around this by removing null bytes from your document.
tenderlove
Fri Dec 04 08:41:11 -0800 2009
| link
This is a bug in libxml2. We'll work with them to fix it, but there is nothing we can do about it in the nokogiri code base.
-
5 comments Created 28 days ago by rgrove1.4.1x<br /> preceded by a newline is lost when parsing an HTML fragmentflavorjonesxWhen a
<br />element is preceded by a newline in an HTML fragment, Nokogiri seems to remove it when the fragment is parsed. Here's an irb session demonstrating the issue (using Nokogiri 1.4.0 with libxml2 2.7.6):>> require 'rubygems' => false >> require 'nokogiri' => true >> html = "First line\nSecond line<br />Broken line" => "First line\nSecond line<br />Broken line" >> fragment = Nokogiri::HTML::DocumentFragment.parse(html) => #<Nokogiri::HTML::DocumentFragment:0x80c94f5c name="#document-fragment" children=[#<Nokogiri::XML::Text:0x80c94c64 "First line\nSecond lineBroken line">]> >> fragment.to_xhtml => "First line\nSecond lineBroken line" >> fragment.to_html => "First line\nSecond lineBroken line"If I remove the newline, the fragment is parsed just fine:
>> html = "First line<br />Broken line" => "First line<br />Broken line" >> fragment = Nokogiri::HTML::DocumentFragment.parse(html) => #<Nokogiri::HTML::DocumentFragment:0x80c8c118 name="#document-fragment" children=[#<Nokogiri::XML::Text:0x80c8be20 "First line">, #<Nokogiri::XML::Element:0x80c8bd80 name="br">, #<Nokogiri::XML::Text:0x80c8bc7c "Broken line">]> >> fragment.to_xhtml => "First line<br />Broken line"Comments
This also applies to other HTML -- I have observed it with anchors, i.e., "One line\nTwo line\n\n<a href="http://brokenlink.com">This won't be a link after parsing</a>"
However, if I wrap the text block in a <div> and </div>, it works. I'm thinking that the newline must somehow interfere with Nokogiri's ability to discern the insides as HTML.
flavorjones
Thu Dec 03 12:09:37 -0800 2009
| link
OK, will investigate.
flavorjones
Thu Dec 03 19:27:01 -0800 2009
| link
fixing leading text node with newline in fragment parsing. closed by b659302.
Great turnaround time, thanks a lot! Between this and the fix to the document.root.namespace exception you recently did, I'd love to see a gem bump soon so I can strip out my collection of kluge-fixes. :)
flavorjones
Fri Dec 04 10:37:18 -0800 2009
| link
Should be bumped this weekend. Cross your fingers.
-
Hi,
the following Code crashes MRI on line 2:
Nokogiri::XML("") / '//*xml[@name="]' rescue nil Nokogiri::XML("") / '//*xml[@name="]'(Independent of the XML-Document)
I don't know whether this is a problem of the interpreter, but as I have never seen this bug before, I thought I'll post it here first.
Additional info:
$ruby19 --version ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10.0.0] $ruby19 -S nokogiri -v --- warnings: [] nokogiri: 1.4.0 libxml: binding: extension compiled: 2.7.6 loaded: 2.7.6Trace:
/usr/local/lib/ruby19/gems/1.9.1/gems/nokogiri-1.4.0/lib/nokogiri/xml/node.rb:142:in `evaluate': Invalid expression (Nokogiri::XML::XPath::SyntaxError) from /usr/local/lib/ruby19/gems/1.9.1/gems/nokogiri-1.4.0/lib/nokogiri/xml/node.rb:142:in `block in xpath' from /usr/local/lib/ruby19/gems/1.9.1/gems/nokogiri-1.4.0/lib/nokogiri/xml/node.rb:139:in `map' from /usr/local/lib/ruby19/gems/1.9.1/gems/nokogiri-1.4.0/lib/nokogiri/xml/node.rb:139:in `xpath' from /usr/local/lib/ruby19/gems/1.9.1/gems/nokogiri-1.4.0/lib/nokogiri/xml/node.rb:106:in `search' from segfault.rb:2:in `<main>' : [BUG] Segmentation fault ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10.0.0] -- control frame ---------- c:0001 p:0000 s:0002 b:0002 l:001978 d:001978 TOP --------------------------- -- Ruby level backtrace information----------------------------------------- -- C level backtrace information ------------------------------------------- [NOTE] You may encounter a bug of Ruby interpreter. Bug reports are welcome. For details: http://www.ruby-lang.org/bugreport.html Abort trapComments
tenderlove
Mon Nov 30 11:23:33 -0800 2009
| link
I think this is a bug in Ruby 1.9, but my commit to convert the exception to pure ruby seems to fix it:
-
In things such as JSTL and ESI, we have this pattern where people intersperse namespaced XML which form a valid document among other bits of text, which are implicitly treated as CDATA. It would be so awesome if, on creating an XML document, we could tell it which namespaces to parse, and everything else would be handled as CDATA implicitly. That'd be hot.
Comments
tenderlove
Mon Nov 30 11:24:22 -0800 2009
| link
Sorry, we can't tell libxml2 to do that. :-(
-
Feature request: support 'output method="text"' for XSLT transform
4 comments Created about 1 month ago by richardlehaneBecause XSLT transform methods only return serialized documents,
'output method="text"' is ignored and using "disable-output-escaping" has odd effects (new lines get inserted on all the text nodes).
Would it be possible to support text output as well as XML/HTML output
from transforms? This isn't a big issue because it is fairly easy to
work around it by removing XML declaration and manually unescaping
escaped characters, but it would I think be a useful feature.
thanks!
RichardComments
tenderlove
Mon Nov 30 12:13:04 -0800 2009
| link
It looks like Stylesheet#serialize will honor the output method (it gives you back a string). If I made serialize deal with XSLT parameters, would that be acceptable?
richardlehane
Tue Dec 01 17:39:48 -0800 2009
| link
Stylesheet#serialize doesn't actually apply a transform though does it? - it just serializes an XML document according to the output method declared by a stylesheet?
If that is the case, you could probably leave Stylesheet#serialize as is and simply redefine Stylesheet#apply_to to:
def apply_to document, params = [] serialize(transform(document, params)) endThis kills two birds with one stone since apply_to already accepts params and applying the serialization suggested by the stylesheet rather than simply using "to_s" is probably better anyway?
tenderlove
Tue Dec 01 19:37:54 -0800 2009
| link
Yes, you are correct! I like this solution a lot. I'll change it to do this.
tenderlove
Tue Dec 01 19:46:03 -0800 2009
| link
using serialize in XSLT#apply_to so that xslt output methods are honored. closed by 84a2986
-
1 comment Created about 1 month ago by manalang1.4.0-java doesn't work on jruby-1.4.01.4.1xZ:\rich\dev\blather\examples>jruby -rrubygems echo.rb [user]@[domain].com/ruby [pwd] [server]
C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/nokogiri-1.4.0-java/lib/nokogiri/ffi/libxml.rb:6: Nokogiri requires JRuby 1.4.0RC1 or later on Windows (RuntimeError)
from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/nokogiri-1.4.0-java/lib/nokogiri/ffi/libxml.rb:31:in `require' from C:/Program Files/jruby-1.4.0/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/nokogiri-1.4.0-java/lib/nokogiri.rb:11 from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/nokogiri-1.4.0-java/lib/nokogiri.rb:31:in `require' from C:/Program Files/jruby-1.4.0/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/blather-0.4.7/lib/blather.rb:58 from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/blather-0.4.7/lib/blather.rb:3:in `each' from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/blather-0.4.7/lib/blather.rb:3 ... 9 levels... from C:/Program Files/jruby-1.4.0/lib/ruby/gems/1.8/gems/blather-0.4.7/lib/blather/client.rb:36:in `require' from C:/Program Files/jruby-1.4.0/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require' from echo.rb:3Comments
tenderlove
Tue Nov 24 20:45:42 -0800 2009
| link
-
1 comment Created about 1 month ago by jeroenvandijk1.4.1xExtending nokogiri with jQuery selectors?tenderlovexI'm using nokogiri through webrat to assert whether elements exist. The set of css3 selectors are limited if you compare them with the set selectors of jquery (.e.g. :has, :contains). So I have the following questions:
- is there a plan to make nokogiri compatible with jquery?
- if not, would it be hard to fork nokogiri and add the selectors? Any hints where to start?
Here is an example that illustrates the problem I have http://groups.google.nl/group/formtastic/msg/2601ac08d8c48a96
Comments
tenderlove
Mon Nov 30 14:39:56 -0800 2009
| link
supporting :has() selectors. closed by af2856a
-
XML::Node#attributes doesn't do what the doc says
1 comment Created about 1 month ago by DaxxXML::Node#attributes returns 'name => node' instead of 'name => node.value' ...
lib/nokogiri/xml/node.rb ~line 292
#### # Returns a hash containing the node's attributes. The key is the # attribute name, the value is the string value of the attribute. def attributes Hash[*(attribute_nodes.map { |node| [node.node_name, node] # <- [node.node_name, node.value] }.flatten)] endFolks must have been using #each, instead? ;)
Cheers,
daz
Comments
flavorjones
Thu Nov 19 15:37:23 -0800 2009
| link
fixed docs for Node#attributes. Closed by 96a5436.
-
warning: parenthesize argument(s) for future version
1 comment Created about 1 month ago by ghazelC:/Ruby/lib/ruby/gems/1.8/gems/nokogiri-1.4.0-x86-mswin32/lib/nokogiri/xml/builder.rb:272: warning: parenthesize argument(s) for future version
Comments
flavorjones
Thu Nov 19 15:33:24 -0800 2009
| link
This has already been fixed in master, in commit b7eeeda.
-
Segmentation fault when calling #add_namespace
1 comment Created about 1 month ago by hrntNokogiri 1.4.0 segfaults when calling add_namespace
irb(main):001:0> require 'rubygems';require 'nokogiri'; puts Nokogiri::VERSION; Nokogiri::XML.parse('').add_namespace('foo', 'bar')
1.4.0
(irb):1: [BUG] Segmentation fault ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
Comments
tenderlove
Mon Nov 16 08:55:38 -0800 2009
| link
Nokogiri::XML::Document should not define add_namespace().
Closed by e16be12
- lib/nokogiri/xml/document.rb removed add_namespace
- test/xml/test_document.rb
-
The Tutorials link is broken on Firefox (including when JavaScript is enabled)
1 comment Created about 1 month ago by shlomifHi all!
The tutorials link on the Nokogiri site is broken on Firefox 3.5.5 on Mandriva Linux Cooker (standard Mandriva package), even when JavaScript is enabled (and it should work even when it isn't.).
I don't see anything in the error console.
Please fix it.
Regards,
-- Shlomi Fish
Comments
tenderlove
Tue Nov 24 20:43:48 -0800 2009
| link
This was fixed.
-
I added the xmldecl method to the SAX callbacks, and that broke the SOAP adapter. That broken this dudes code:
http://groups.google.com/group/nokogiri-talk/msg/dca72612114cfcc5
We need to figure out a way to get the adapter under test without loading in soap4r
Comments
flavorjones
Tue Nov 17 23:38:16 -0800 2009
| link
I've pushed a branch named 'soap4r-bug' that reproduces this problem in the Nokogiri test suite.
tenderlove
Tue Nov 24 20:32:36 -0800 2009
| link
adding tests for soap4r adapter and bugfixes. closed by ecdcb0a
-
1 comment Created about 1 month ago by tenderloveXML::Namespace#inspect may be broken1.4.1xI think the inspect method is broken. Broken or not, we need to figure out this crash:
http://groups.google.com/group/nokogiri-talk/msg/6f8e4ac93fbebf39
Comments
tenderlove
Tue Nov 24 20:43:29 -0800 2009
| link
This was fixed in 8111e7b
-
Provide DOM/SAX examples of loading a DTD so named entities can be resolved
4 comments Created about 1 month ago by yobCurrently if I attempt to parse a document that has named entities in it I get an exception.
The files I'm trying to parse conform to a DTD that defines the valid entities, word on the street is that by loading the DTD i may be able to avoid the exception. See http://groups.google.com/group/nokogiri-talk/browse_thread/thread/8225dfb0ffbe0098
Comments
tenderlove
Sun Dec 06 14:45:34 -0800 2009
| link
Figured it out!
Awesome, thanks. Now I just need to grok the XML catalog tomfoolery so I can stop libxml fetching the DTD over the net everytime I parse a file
tenderlove
Sun Dec 06 15:47:08 -0800 2009
| link
What I've been doing, and I admit this probably isn't the "right way", is doing a "sub" on my XML that contains a dtd, and point it at the filesystem:
http://github.com/tenderlove/markup_validity/blob/master/lib/markup_validity/validator.rb#L11-32
I'm hesitant to use sub, only because we sometimes deal with very large files (in the hundred of MB range). I think I've nutted out the system XML catalog stuff, see http://github.com/yob/onix-dtd.
>Thanks again for your help.
-
Comments
tenderlove
Wed Dec 02 23:46:50 -0800 2009
| link
adding the filter method on node set. closed by 90d4de3
-
This simple script generates invalid output
Comments
flavorjones
Tue Nov 17 22:38:37 -0800 2009
| link
Yo. This is fixed in master. Commit 1fefd59
-
5 comments Created about 1 month ago by rationReplacing a Node from another document will crash on exit1.4.1xIf you replace a node from another document, there is a crash on exit. A Node is probably referenced from both documents and tried to delete twice.
Example:
!/usr/bin/env ruby
require 'nokogiri'
xml1 = "
Original caption "
xml2 = "
Replacement caption "
doc1 = Nokogiri::XML(xml1)
doc2 = Nokogiri::XML(xml2)
caption1 = doc1.xpath("//caption")[0]
caption2 = doc2.xpath("//caption")[0]
caption1.replace(caption2)
Comments
flavorjones
Mon Nov 16 23:41:03 -0800 2009
| link
I believe what ration meant to say was:
#!/usr/bin/env ruby require 'nokogiri' xml1 = "<test> <caption>Original caption</caption> </test>" xml2 = "<test> <caption>Replacement caption</caption> </test>" doc1 = Nokogiri::XML(xml1) doc2 = Nokogiri::XML(xml2) caption1 = doc1.xpath("//caption")[0] caption2 = doc2.xpath("//caption")[0] caption1.replace(caption2)
flavorjones
Tue Nov 17 00:29:49 -0800 2009
| link
Node#replace reimplemented using reparent_node_with. Fixed by ebd4483
Ah yes, I didn't notice that the markdown broke the tags..
With this fix, doc2 will not contain the caption element. Is this the preferred behavior?
Personally I would assume the node is copied there, and something like "move" would actually move it.
flavorjones
Tue Nov 17 08:44:56 -0800 2009
| link
yes, this is the preferred behavior. if you want to copy the node, copy it:
caption1.replace(caption2.dup)Ok, thanks for the fix!
Moving was actually what I wanted, so this is perfect. I might not be the only one who doesn't automatically see this behavior, so maybe the documentation for replace deserves a comment about this side effect for the originating doc.
-
Comments
Commit bfa172d allows to set the RECOVER option for PushParser. This resolves the issue.
tenderlove
Wed Dec 02 23:40:33 -0800 2009
| link
Ok, cool. I'll close this ticket then.
-
5 comments Created about 1 month ago by bhauff1.4.1xJRuby 1.4.0, Nokogiri 1.4.0 and WindowsflavorjonesxWhen requiring nokogiri from irb or cucumber there is an FFI error:
FFI::NotFoundError: Function '__xmlParserVersion' not found in [msvcrt]I am including links to show the entire issue:
http://pastie.org/690973 - when running cucumber
http://pastie.org/691987 - when running irb (through JRuby)Comments
The main issue here is that FFI can't find the DLLs needed. I see that nokogiri does some tricks with PATH, but it won't work with dll loading, since LoadLibrary() call won't see PATH changes. So, either libs need to be specified with the full path, or they should be places in some place which is already on PATH, before JRuby starts.
Also it worth noting than on Jruby's master branch we changed a bit how ffi_lib works, so that it won't silently skip the DLL if it's not found, so with JRuby master version the failure is immediate and more clear. I also verified that if I put the DLLs into place on PATH and then start JRuby, nokogiri loads fine and works.
Here's the patch that solves the problem on Windows (under JRuby):
http://gist.github.com/233100Essentially, since ffi_lib doesn't do any magic at all about the paths, we should provide the fully qualified path names for every DLL, and in Windows format, since those paths will be directly transferred to LoadLibrary() call.
I have tested this patch with Windows, JRuby 1.4.0 and Nokogiri 1.4.0 and it works for me.
flavorjones
Thu Nov 19 15:29:31 -0800 2009
| link
ffi + windows + jruby dll loading fix (thanks, Vladimir Sizikov!). Closed by e4976fd
-
JRuby version test on Windows is broken
5 comments Created about 1 month ago by casebookraise(RuntimeError, "Nokogiri requires JRuby 1.4.0RC1 or later on Windows") if JRUBY_VERSION < "1.4.0RC1"fails if my version is 1.4.0
reported by bhauff
Comments
flavorjones
Mon Nov 09 14:35:02 -0800 2009
| link
I concur. that was me.
flavorjones
Mon Nov 09 14:37:57 -0800 2009
| link
closing. this will be in 1.4.1 sometime in the next week or so.
-
1 comment Created about 1 month ago by chriseppstein1.4.1xAncestor search doesn't work with a css query.tenderlovexSee this script for an example: http://gist.github.com/227429
Comments
tenderlove
Thu Nov 05 19:46:04 -0800 2009
| link
Node#matches? works in nodes contained by a DocumentFragment. closed by d41db1a
-
1 comment Created about 1 month ago by tenderlove1.4.1xAdd the ":self" psuedo selectortenderlovexAdd the ":self" pseudo selector so that people can have CSS expressions like this:
":self > foo"which would be equivalent to this:
"./foo"Comments
tenderlove
Mon Nov 09 15:37:11 -0800 2009
| link
Fixed in 55fbf25
-
I don't know if this really a bug or if I'm being really stupid, but here goes:
So, when you have a double quote character inside your attribute value, nokogiri does this
attribute='a string with a double " quote'
Unfortunately, this is making the xerces java parser (or whatever parser Openfire uses) throw a hissy fit. While I understand this may be a xerces bug (OWPOU), not a nokogiri one, it would still be nice if we could have the option of using
attribute="a string with a double " quote"
(I tried using send(:native_content=, "...") but that has the same result.
Comments
tenderlove
Mon Nov 09 15:41:15 -0800 2009
| link
What version of libxml2 are you using? Past the contents of this command, if you can:
$ nokogiri -vAlso, if you could post the code to reproduce this, that would be great. So far, libxml2 is behaving how you want it to:
d = Nokogiri::XML('<root />') d.root['foo'] = 'hello " world' puts d.to_xml # => '<root foo="hello " world"/>'
tenderlove
Tue Nov 24 20:44:29 -0800 2009
| link
Closing as there has been no response.
-
Nokogiri::HTML::DocumentFragment#parse regression in how UTF8 is handled
2 comments Created about 1 month ago by naofumiI'm comparing 1.3.3 on rubyforge, with the Oct. 29 09:05:43 2009 -0700 commit 7fbf262 .
I'm on Ruby 1.8.7 ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-darwin10]
LibXML2 is 2.7.3.With 1.3.3, DocumentFragment works OK with UTF8 strings.
puts Nokogiri::HTML::DocumentFragment.parse(%Q(<body>こんにちは</body>)).to_html(:encoding => 'UTF-8')
=> <body>こんにちは</body>
However, with the latest GitHub version, the result is
=><body>ããã«ã¡ã¯</body>Interestingly, if I add a <meta charset> tag, the latest GitHub version recognizes it and processes UTF8 strings correctly. The following is an example.
puts Nokogiri::HTML::DocumentFragment.parse(%Q(<meta content="text/html; charset=UTF8" http-equiv="content-type">\n<body>こんにちは</body>)).to_html(:encoding => 'UTF-8')
=> <meta content="text/html; charset=UTF8" http-equiv="content-type"><body>こんにちは</body>
I looked into the source, but I couldn't get any closer to the cause.Comments
tenderlove
Fri Oct 30 17:45:51 -0700 2009
| link
バグレポートを送ってくれてありがとう。
修正しました!
Thanks for using Nokogiri.
tenderlove
Fri Oct 30 17:46:44 -0700 2009
| link
Fixed in 7a77846
-
Segmentation fault when creating new node with 'Self' reference
1 comment Created 2 months ago by jhingoI get a [BUG] Segmentation Fault with the following code:
class XMLDocument < Nokogiri::XML::Document def initialize() super() body_node = Nokogiri::XML::Node.new("body",self) body_node.content = "stuff" self.root = body_node end endIf I execute the following 'xmldoc=XMLDocument.new()' the seg fault occurs on the line:
body_node = Nokogiri::XML::Node.new("body",self)If I create a separate 'tempdoc=Nokogiri::XML::Document.new' in the initialize() and change the references from 'self' to 'tempdoc' in the rest of the code then there's no fault. So the fault is being triggered by the 'self' reference.
Running
Windows XP
ruby 1.8.6
nokogiri 1.3.3
libxml 2.7.3I'm a newbie so apologies if I'm doing anything moronic.
Cheers!
Comments
tenderlove
Thu Oct 29 09:05:48 -0700 2009
| link
- ext/nokogiri/xml_document.c (Nokogiri_wrap_xml_document) init called after the tuple is set up. closed by 7fbf262
-
Fragment nodes with namespaces should work properly
1 comment Created 2 months ago by flavorjonesReported by Iñaki Baz Castillo on the mailing list.
Creating a fragment with a namespace makes the prefix part of the tag name, and (arbitrarily?) uses the namespace of the document root's first child.
Comments
flavorjones
Tue Oct 27 07:58:44 -0700 2009
| link
Closed by 597195f
-
NodeSet.wrap does not preserve document structure
2 comments Created 2 months ago by flavorjonesFailing spec:
def test_wrap_preserves_document_structure assert_equal "employeeId", @xml.at_xpath("//employee").children.detect{|j| ! j.text? }.name @xml.xpath("//employeeId[text()='EMP0001']").wrap("<wrapper/>") assert_equal "wrapper", @xml.at_xpath("//employee").children.detect{|j| ! j.text? }.name endComments
flavorjones
Mon Oct 19 20:10:58 -0700 2009
| link
NodeSet.wrap now preserves document structure. closed by f7388be.
flavorjones
Mon Oct 19 20:12:12 -0700 2009
| link
and 2d3db36
-
Would also be nice to see a :new_lines => false option at some point, so you get back to back elements, no extra spacing. Making the builder object searchable would be awesome too. e.g.
builder.at('//root').to_xml(:indent => 0, :new_lines => false)
or better yet:
builder.root.to_xml(:spacing => false)
Comments
tenderlove
Thu Oct 29 09:06:37 -0700 2009
| link
This was merged in with fd6fc58
Right, but the :new_lines and :spacing options don't exist yet. I was meaning someone add code for these options so we can make the resulting xml completely compact (no indents, no new lines etc).
tenderlove
Thu Oct 29 13:47:11 -0700 2009
| link
ah, oops. I thought this was the same thing.
The document on the builder is searchable. You can do:
builder.doc.at('//root')In fact, Builder#doc just returns a Nokogiri::XML::Document that you can manipulate as your normally would.
Awesome. So all that's left is a way to remove new lines. At the moment we do
.gsub(/\n/, '')But any content with new lines gets compacted too.
>> builder = Nokogiri::XML::Builder.new >> builder.root { |xml| xml.test('hey') } >> builder.doc.root.to_xml(:indent => 0) => "<root>\n<test>hey</test>\n</root>"Is there a way to do this?
>> builder = Nokogiri::XML::Builder.new >> builder.root { |xml| xml.test('hey') } >> builder.doc.root.to_xml(:indent => 0, :new_lines => false) => "<root><test>hey</test></root>" -
2 comments Created 2 months ago by paranormal1.4.1xNokogiri::XML ignore my set encodingtenderlovexReproduce
Nokogiri::XML(open('http://www.cite-sciences.fr/rss/ressources/fr/faq_fr_20.xml').read, nil, 'UTF-8', 18543)
Nokogiri::XML::SyntaxError: Unsupported encoding UTF-85
This xml, not valid becouse <?xml version="1.0" encoding="UTF-85"> but encoding set global.
Thanks for nokogiri ^-^.
Comments
paranormal
Mon Nov 02 00:50:22 -0800 2009
| link
This is important for me. Because of my program is web robot that detects encoding and merge to utf before nokogiri work.
I think, nokogiri must ignore tag encoding if they set in initialize.
tenderlove
Mon Nov 09 17:47:05 -0800 2009
| link
-
Anchor tags tightly wrapping another element generate unwanted whitespace on #to_xhtml
3 comments Created 2 months ago by dasil003This is just weird. Observe the reduced test cases:
s = '<a><b>see</b></a>' n = Nokogiri::HTML::DocumentFragment.parse(s) n.to_xhtml => "<a>\n <b>see</b>\n</a>"to_html works:
s = '<a><b>see</b></a>' n = Nokogiri::HTML::DocumentFragment.parse(s) n.to_html => "<a><b>see</b></a>"as does adding a text node in the source:
s = '<a> <b>see</b></a>' n = Nokogiri::HTML::DocumentFragment.parse(s) n.to_xhtml => "<a> <b>see</b></a>"Comments
BTW, I just discovered it also affects the OBJECT tag.
tenderlove
Wed Oct 14 15:06:55 -0700 2009
| link
I don't think this is a bug. The default save options for to_xhtml say to format or "pretty print" the document. If your document contains space nodes, it will preserve them. If there are no blank nodes in the document, it will add them to make the output formatted.
If you don't want formatting, you can change the to_xhtml save options:
s = '<a><b>see</b></a>' n = Nokogiri::HTML::DocumentFragment.parse(s) puts n.to_xhtml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XHTML) -
1 comment Created 2 months ago by EmpactSegmentation Fault on modified re-raise1.4.0xxml = Nokogiri::XML('<xml />') begin xml.xpath('http://') rescue Nokogiri::XML::XPath::SyntaxError => e raise e, "howdy" endresults in:
[BUG] Segmentation fault ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-darwin10] Abort trap
for
--- warnings: [] libxml: loaded: 2.7.5 binding: extension compiled: 2.7.5 nokogiri: 1.3.3
Comments
tenderlove
Tue Oct 13 20:49:06 -0700 2009
| link
duplicating erorrs works. yay! closed by 33922d7
-
Comments
-
This segfaults, if you look at it funny out of the corner of your eye:
require "nokogiri" class TextHandler < Nokogiri::XML::SAX::Document def initialize @chunks = [] end attr_reader :chunks def cdata_block(string) characters(string) end def characters(string) @chunks << string.strip if string.strip != "" end end th = TextHandler.new parser = Nokogiri::XML::SAX::Parser.new(th) parser.parse(<<-XML) <?xml version="1.0" encoding="utf-8"?> <root> <stuff> one </stuff> <stuff> two </stuff> </root> XMLI was able to duplicate consistently for awhile, but I uninstalled and reinstalled nokogiri a few times, and now it works. It would reach the end of the document before segfaulting. The end_document event would fire, and then it would segfault shortly thereafter.
Comments
sporkmonger
Sat Oct 10 21:10:51 -0700 2009
| link
Looks like 2.7.3.
tenderlove
Mon Oct 12 17:54:21 -0700 2009
| link
If you were on libxml2, 2.6.16, then I wouldn't be surprised. That version was very old an unstable.
I can't repro this (even with thousands of iterations), so I will assume it's a bug with 2.6.16. I am going to close this, but if you are able to repro with 2.7.3, please reopen this ticket. Thanks!
sporkmonger
Mon Oct 12 17:59:47 -0700 2009
| link
Shouldn't require thousands of iterations, it's a happens-every-time kind of bug. However, I may have been wrong about the version of nokogiri I was using. It might have been edge.
-
Adding a Document to a Node causes segfault on program exit
1 comment Created 2 months ago by david$ ruby -rubygems -rnokogiri -e 'Nokogiri::XML("").root << Nokogiri::XML::Document.new'
: [BUG] Segmentation fault
ruby 1.9.1p243 (2009-07-16 revision 24175) [i486-linux]-- control frame ----------
c:0001 p:0000 s:0002 b:0002 l:0011a4 d:0011a4 TOP
-- Ruby level backtrace information-----------------------------------------
-- C level backtrace information ------------------------------------------- 0xb76cd6e9 /usr/lib/libruby-1.9.1.so.1.9(rb_vm_bugreport+0x69) [0xb76cd6e9]
0xb75e907f /usr/lib/libruby-1.9.1.so.1.9 [0xb75e907f]
0xb75e911a /usr/lib/libruby-1.9.1.so.1.9(rb_bug+0x3a) [0xb75e911a]
0xb7674fa4 /usr/lib/libruby-1.9.1.so.1.9 [0xb7674fa4]
0xb7768410 [0xb7768410]
0xb767b817 /usr/lib/libruby-1.9.1.so.1.9(st_foreach+0x17) [0xb767b817]
0xb72d0aa9 /var/lib/gems/1.9.1/gems/nokogiri-1.3.3/lib/nokogiri/nokogiri.so [0xb72d0aa9]
0xb75f954d /usr/lib/libruby-1.9.1.so.1.9 [0xb75f954d]
0xb75f96e4 /usr/lib/libruby-1.9.1.so.1.9 [0xb75f96e4]
0xb75f98dc /usr/lib/libruby-1.9.1.so.1.9(rb_gc_call_finalizer_at_exit+0x17c) [0xb75f98dc]
0xb75eb0ee /usr/lib/libruby-1.9.1.so.1.9 [0xb75eb0ee]
0xb75ec436 /usr/lib/libruby-1.9.1.so.1.9(ruby_cleanup+0x116) [0xb75ec436]
0xb75ec5ee /usr/lib/libruby-1.9.1.so.1.9(ruby_run_node+0x5e) [0xb75ec5ee]
0x80487e8 ruby(main+0x68) [0x80487e8]
0xb73dfb56 /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb73dfb56]
0x80486e1 ruby [0x80486e1][NOTE] You may encounter a bug of Ruby interpreter. Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html^CAborted (core dumped)
-- SNIP --
This is ruby 1.9.1, but the same thing happened to me on 1.8.7.
The reason why I think this is significant is because I mistakenly was adding a Document to a Node in my code, and that kept failing with me understanding why, so the same thing may happen to other people.
Thanks.
Comments
tenderlove
Tue Oct 13 11:17:11 -0700 2009
| link
raising an exception if someone tries to reparent a *::Document. closed by c557764
-
Hi guys,
I get the following error message while downloading an XML file and opening it using nokogiri:
res = Net::HTTP.post_form(URI.parse(....), {...}) doc = Nokogiri::XML(Nokogiri::XML(res.body).xpath("//text()").to_s.gsub("& lt;", "<").gsub("& gt;", ">"))I have installed the latest nightly on OS X 10.5.6.
/Library/Ruby/Gems/1.8/gems/nokogiri-1.3.3.20091004000018/lib/nokogiri/xml/document.rb:33: [BUG] Segmentation fault ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0] Abort trapI have also tried to split the constructor calls:
doc = Nokogiri::XML(res.body).xpath("//text()").to_s.gsub("< ;", "<").gsub("> ;", ">")I have about 10 different XML files, and it crashes randomly on the different files, so I can't say that it's one specific file.
The XML files vary in size from 3mb to 150mb.
The files are very basic XML:
<?xml version="1.0" encoding="utf-8"?> <string>....</string>where the string element contains escaped XML. Unfortunately the XML files are data we receive from an external vendor, so not really anything we can do about that.
I can try to normalize the data using gsub and then use nokogiri on the xmlified data.
Regards
William
Comments
tenderlove
Tue Oct 06 08:49:35 -0700 2009
| link
Can you run "nokogiri -v" and add the output to this ticket?
It seems that that did the trick. I have not had any segfaults since.
libxml:
loaded: 2.7.3 binding: extension compiled: 2.7.3 nokogiri: 1.3.3I am still busy testing it, but so far so good. I will keep you posted and close this ticket when I am sure that it is not an issue anymore.
tenderlove
Tue Oct 06 09:05:55 -0700 2009
| link
Okay, sounds good. You might also want to try upgrading libxml2. The latest libxml2 is 2.7.5 and I know they've packed in a bunch of bug fixes. If that doesn't do the trick, would you mind sending us a sample of the XML you're using to make it crash? It shouldn't SEGV under any circumstances. :-)
tenderlove
Tue Oct 13 13:49:22 -0700 2009
| link
Any updates on this?
tenderlove
Thu Oct 15 09:16:59 -0700 2009
| link
I'm closing this since there have been no updates. Please reopen if you're still having problems! Thanks!
-
There is no test in the code below...
# test_document.rb def test_empty_string_returns_empty_doc doc = Nokogiri::HTML('') endComments
tenderlove
Sun Oct 04 20:32:52 -0700 2009
| link
Thanks for the heads up! http://ihighfive.com/
You are very whalecome: http://ihighfive.com/whale-high-five.php
-
Any idea how do I make it load the correct version? I'm on Ubuntu. I've followed the directions here: http://wiki.github.com/tenderlove/nokogiri/use-libxml-from-source but it's just not flying.
Thanks in advance.
Comments
tenderlove
Sun Oct 04 20:41:02 -0700 2009
| link
have you added '/usr/local/lib' to your ld.so.conf file?
tenderlove
Tue Oct 13 13:45:05 -0700 2009
| link
hello?
tenderlove
Wed Oct 14 18:49:33 -0700 2009
| link
Closing. If you're still having troubles with this, please send an email to the mailing list:
-
[PATCH] adding Builder#<< for appending raw strings
4 comments Created 2 months ago by dudleyfHere's a tiny patch implementing the functionality talked about here[0].
[0] http://rubyforge.org/pipermail/nokogiri-talk/2009-March/000224.html
diff --git a/lib/nokogiri/xml/builder.rb b/lib/nokogiri/xml/builder.rb index 89cd63a..5cdcafd 100644 --- a/lib/nokogiri/xml/builder.rb +++ b/lib/nokogiri/xml/builder.rb @@ -277,6 +277,12 @@ module Nokogiri @doc.to_xml end + ### + # Append the given raw XML +string+ to the document + def << string + @doc.fragment(string).children.each { |x| insert(x) } + end + def method_missing method, *args, &block # :nodoc: if @context && @context.respond_to?(method) @context.send(method, *args, &block) diff --git a/test/xml/test_builder.rb b/test/xml/test_builder.rb index d4a6e26..12b1f86 100644 --- a/test/xml/test_builder.rb +++ b/test/xml/test_builder.rb @@ -117,6 +117,26 @@ module Nokogiri assert_equal 'hello', builder.doc.at('baz').content end + def test_raw_append + builder = Nokogiri::XML::Builder.new do |xml| + xml.root do + xml << 'hello' + end + end + + assert_equal 'hello', builder.doc.at('//root/foo').content + end + + def test_raw_append_with_instance_eval + builder = Nokogiri::XML::Builder.new do + root do + self << 'hello' + end + end + + assert_equal 'hello', builder.doc.at('//root/foo').content + end + def test_cdata builder = Nokogiri::XML::Builder.new do root { -- 1.6.4.3Comments
tenderlove
Sun Oct 04 20:40:05 -0700 2009
| link
I've applied the patch, but next time please make sure the tests pass. After applying the patch, I got these errors:
1) Error: test_raw_append(Nokogiri::XML::TestBuilder): NoMethodError: undefined method `content' for nil:NilClass test/xml/test_builder.rb:127:in `test_raw_append' 2) Error: test_raw_append_with_instance_eval(Nokogiri::XML::TestBuilder): NoMethodError: undefined method `content' for nil:NilClass test/xml/test_builder.rb:137:in `test_raw_append_with_instance_eval'
tenderlove
Sun Oct 04 20:40:29 -0700 2009
| link
XML Builder can append raw strings. closed by 98b10d2
tenderlove
Mon Oct 05 08:38:30 -0700 2009
| link
No problem! :-)
-
Nokogiri::HTML(data)
src/tcmalloc.cc:186] Attempt to free invalid pointer: 0x20e030Nokogiri::VERSION => "1.3.2"
Nokogiri::LIBXML_VERSION => "2.6.32"
ruby -v => ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-darwin9.7.0] Ruby Enterprise Edition 20090610Where 'data' is... http://gist.github.com/199496
Thanks for the help!
Comments
Still crashes for me using libxml2 2.7.5 and nokogiri 1.3.3 and ree-1.8.6-20090610. It does however work with normal MRI: ruby 1.8.6 (2009-08-04 patchlevel 383) [i686-darwin9.8.0]
tenderlove
Sun Oct 04 21:03:28 -0700 2009
| link
This isn't crashing for me. I'm using:
[apatterson@higgins nokogiri (master)]$ ruby -v ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-darwin10.0.0] Ruby Enterprise Edition 20090610 [apatterson@higgins nokogiri (master)]$ ruby -I lib bin/nokogiri -v --- nokogiri: 1.3.3 warnings: [] libxml: compiled: 2.7.5 loaded: 2.7.5 binding: extension [apatterson@higgins nokogiri (master)]$Can you try with the nightly? To install the nightly build, do this:
$ sudo gem install nokogiri -s http://tenderlovemaking.com
tenderlove
Tue Oct 13 13:50:41 -0700 2009
| link
I can't reproduce this. Please reopen if the problem persists. I need more details to fix this if there is a problem. Thanks!
-
7 comments Created 2 months ago by flavorjonesextconf.rb have_func() always fails under Ruby Enterprise build systemREExruby-enterprise-1.8.6-20090610:
checking for xmlRelaxNGSetParserStructuredErrors()... no checking for xmlRelaxNGSetParserStructuredErrors()... no checking for xmlRelaxNGSetValidStructuredErrors()... no checking for xmlSchemaSetValidStructuredErrors()... no checking for xmlSchemaSetParserStructuredErrors()... noComments
flavorjones
Wed Sep 30 22:58:10 -0700 2009
| link
root cause:
"gcc -o conftest -I/usr/include/libxml2 -I/usr/include -I. -I/home/mike/builds/ruby-enterprise-1.8.6-20090610-install/lib/ruby/1.8/i686-linux -I/home/mike/code/nokogiri/ext/nokogiri -I/usr/include/libxml2 -I/usr/include -I. -I/home/mike/builds/ruby-enterprise-1.8.6-20090610-install/lib/ruby/1.8/i686-linux -I/home/mike/code/nokogiri/ext/nokogiri -g -O2 -g -DXP_UNIX -O3 -Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline conftest.c -L/opt/local/lib -Wl,-R/opt/local/lib -L. -rdynamic -Wl,-export-dynamic -lexslt -lxslt -lxml2 -lruby-static -lexslt -lxslt -lxml2 -ldl -lcrypt -lm -lc" conftest.c: In function ‘t’: conftest.c:3: warning: implicit declaration of function ‘xmlRelaxNGSetParserStructuredErrors’ /usr/bin/ld: cannot find -lruby-static collect2: ld returned 1 exit status checked program was: /* begin */ 1: /*top*/ 2: int main() { return 0; } 3: int t() { xmlRelaxNGSetParserStructuredErrors(); return 0; } /* end */suggested fix:
diff --git a/ext/nokogiri/extconf.rb b/ext/nokogiri/extconf.rb index 7c21e7d..b77552d 100644 --- a/ext/nokogiri/extconf.rb +++ b/ext/nokogiri/extconf.rb @@ -129,7 +129,7 @@ unless find_library('exslt', 'exsltFuncRegister', *LIB_DIRS) abort "libxslt is missing. try 'port install libxslt' or 'yum install libxslt-devel'" end -def nokogiri_link_command ldflags, opt='', libpath=$LIBPATH +def nokogiri_link_command ldflags, opt='', libpath=$DEFLIBPATH|$LIBPATH old_link_command ldflags, opt, libpath end
flavorjones
Wed Sep 30 23:03:20 -0700 2009
| link
proposed fix from Michael Reinsch:
- def nokogiri_link_command ldflags, opt='', libpath=$LIBPATH + def nokogiri_link_command ldflags, opt='', libpath=$DEFLIBPATH|$LIBPATHwhich appears to work for me. Aaron, thoughts?
(which is what is used in mkmf.rb, link_command for ruby enterprise)
tenderlove
Sun Oct 04 20:53:33 -0700 2009
| link
Ugh. This is a PITA, but basically we can't make everyone happy. If I add this patch, I may as well remove that "nokogiri_link_command" stuff all together. Let me try to explain why:
Someone has ruby installed in /usr/lib, they also have libxml2 installed in /usr/lib. They've installed a newer version of libxml2 in /usr/local. We try to be nice and search /opt/local/lib in addition to /usr/local/lib before falling back to /usr/lib. Unfortunately, if the custom directory (/opt/local or /usr/lib) isn't supplied to dir_config(), it won't search that path. We can only supply one directory. If mkmf doesn't find it in that directory, then it falls back to /usr/lib.
That means we either get /opt/local/lib or /usr/lib, unless the user intervenes with a --with-xml-lib=/whatever --with-xml-include=/whataver, or we use my Super Hack® code. Unfortunately my Super Hack® screws over people with custom ruby installs because it will never find the ruby-static library.
I'm going to apply this fix (and by apply, I mean remove my custom code). I'd rather it "just work" for people with custom ruby installs. People with custom libxml2 installs can use the command line arguments.
tenderlove
Sun Oct 04 20:54:09 -0700 2009
| link
removing my Super Hack® closed by fbe7217
flavorjones
Mon Oct 05 05:33:42 -0700 2009
| link
Aaron, the --with-xml-lib and --with-xml-include options do not appear to affect have_func(), since it consistently uses the wrong header files.
-
Hi
I've got some code here which replaces some parts of a html document with new content. This works pretty well in 99 out of 100 times, but in certain situation nokogiri segfaults. Yesterday we were able to capture a 'crashing' situation, this is: a) the original document b) a start and end dom-id and c) the new content. Based on this I was able to create a simpel example[0] which shows this bug.
I suspected that some char / element causes the crash, but during my first inspection it turned out that it must be a combination of some weird circumstances. For example when I reduce the size of c) (new content), the crashes happen less often. The same happens when I reduce the range between b) start- and end-id. It also appears to crash more often when rails is loaded as well (although I was now able to get it 'reliably' crashing wihtout rails). I was only able to reproduce it on linux, it doesn't crash with the same ruby/libxml/nokogiri version on OS X.
I'm pretty out of ideas, I hope somebody can checkout my example[0] and try to find the root cause. My current theory is that nokogiri/libxml somewhere corrupts memory, either on the stack or heap and that later somebody chokes on that corruption. I also think it is somehow related to the use of the french é, written as é, when I remove these chars the crash can't be reproduced.
You can see the output plus stacktrace of a run on my machine on:
https://gist.github.com/f37eda8131e39fac9dd4Thanks for the help!
Cheers
RetoComments
flavorjones
Mon Sep 28 23:39:13 -0700 2009
| link
will look at this ASAP.
flavorjones
Wed Sep 30 23:04:24 -0700 2009
| link
i've got a repro case and a possible fix.
tenderlove
Wed Sep 30 23:05:10 -0700 2009
| link
Can you try out the nightly build? I believe this may already be fixed.
To get the nightly, do this:
$ sudo gem install nokogiri -s http://tenderlovemaking.comI tested it against nokogiri/master and the nightly, the crash still happens.
But I increased the 'input' length in the test script, the crash should now happen more often (almost every time now on three of my machines).
git://github.com/retoo/nokogiri-bug.git
Thanks!
flavorjones
Thu Oct 01 07:48:29 -0700 2009
| link
retoo - the fix tenderlove is referring to does not apply to your/my situation, sorry about that. We have a repro case and a potential fix, and we are working on it. Thanks!
tenderlove
Sat Oct 03 22:05:16 -0700 2009
| link
@retoo I think we've got a fix in. Can you pull the latest nightly and see if that fixes your crash?
It doesn't crash anymore here. Bisect says that it has been fixed in c753c8d.
Thank you guys!
Reto
Hi guys,
I might have the same issue, I get the following error message while downloading a an xml file and opening it using nokogiri:
res = Net::HTTP.post_form(URI.parse(....), {...})
doc = Nokogiri::XML(Nokogiri::XML(res.body).xpath("//text()").to_s.gsub("& lt;", "<").gsub("& gt;", ">"))I have installed the latest nightly on OS X 10.5.6.
/Library/Ruby/Gems/1.8/gems/nokogiri-1.3.3.20091004000018/lib/nokogiri/xml/document.rb:33: [BUG] Segmentation fault ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0]
Abort trap
Regards
WilliamHmm, probably not, 1. It didn't crash on os x in my case, 2. it crashed when I tried to replace() some parts of it, not during any xpath/constructor calls.
Nonetheless, I would report this (perhaps in a new, unused fresh ticket :D). Does this crash everytime? If yes, seperate the two constructor calls like:
step 1 = XML(input).xpath.to_s.gub puts "I survived so far step2 = XML(step1) puts "should never reach this point"It would be very helpful if you could give the output of res.body. Or even better, the minimum of res.body which still triggers the crash.
Cheers.
Hey Reto, it errors out on
doc = Nokogiri::XML(res.body).xpath("//text()").to_s.gsub("< ;", "<").gsub("> ;", ">")I have about 10 different XML files, and it crashes randomly on the different files, so I can't say that it's one specific file.
The XML files vary in size from 3mb to 150mb.
The files are very basic XML:
<?xml version="1.0" encoding="utf-8"?> <string>....</string>where the string element contains escaped XML. Unfortunately the XML files are data we receive from an external vendor, so not really anything we can do about that.
I can try to normalize the data using gsub and then use nokogiri on the xmlified data.
flavorjones
Tue Oct 06 05:58:36 -0700 2009
| link
Hi, can you please open a new issue for this? It appears to be unrelated to this actual issue, which has been solved in the nightlies.
Created a new ticket: http://github.com/tenderlove/nokogiri/issues/#issue/144
tenderlove
Tue Oct 13 13:49:01 -0700 2009
| link
Closing this because we fixed it. :-)
-
Right now, the JRuby gem does not ship with the DLLs.
To work around this, I tried copying the DLLs from the Windows MRI gem, but then Nokogiri gives the following error:
FFI::NotFoundError: Function 'calloc' not found in [exslt]
Comments
Same problems here which is unfortunate since the community really needs a cross-platform XML library. The developers of our project work on Windows, Linux as well as Solaris and deploy on Solaris. Nokogiri would be perfect in this regard if this bug was just fixed.
Please, this problem has existed for about a year and is a show-stopper for many projects. I know the developers time are limited and they do this in their spare time, but if wide usage of the library is of any interest, addressing this bug is a quick win to expand the use of Nokogiri.
tenderlove
Sun Oct 04 20:59:03 -0700 2009
| link
Hey everyone, this is a bug in JRuby. I've filed a ticket with them here:
http://jira.codehaus.org/browse/JRUBY-4052
Once they get it sorted out, I will close this ticket. :-)
I am confused about what it is we're not doing. If calloc is a (fairly) standard POSIX function, what is it we should be doing differently?
tenderlove
Tue Oct 06 11:49:50 -0700 2009
| link
@headius I think the libc functions should be loaded by default. Wayne gives a workaround in the JIRA ticket, but I think it's unreasonable to require me to change my FFI code depending on the platform. The current code works on everything but windows.
Fix added to git://github.com/jojje/nokogiri.git for people to try having access to a Windows environment. Requires JRuby 1.4.0RC1 or higher due to some needed fixes regarding FFI.
tenderlove
Tue Oct 13 13:52:01 -0700 2009
| link
I applied this:
I'm closing this ticket since it should be fixed for JRuby / Windows people with 1.4.0
-
10 comments Created 3 months ago by bfolkens1.4.0xinner_html= dropping some elementstenderlovexI'm having some trouble on the following environment. The code below fails on a linux install but not on a macports install. Both environments are:
$ nokogiri -v --- warnings: [] libxml: loaded: 2.7.3 binding: extension compiled: 2.7.3 nokogiri: 1.3.3But on the linux environment, the following code:
require 'test/unit' require 'rubygems' require 'nokogiri' class BugTest < Test::Unit::TestCase def test_should_parse_inner_text text = '<base><one>1</one><two>2</two></base>' doc = Nokogiri::XML(text) doc.search('base').each do |base_tag| base_tag.name = 'span' base_tag.inner_html = "<sup>#{base_tag.at('one').inner_text}</sup>/<sub>#{base_tag.at('two').inner_text}</sub>" end assert_equal '<span><sup>1</sup>/<sub>2</sub></span>', doc.to_html.strip end endFails with:
test_should_parse_inner_text(BugTest) [oo.rb:15]: <"<span><sup>1</sup>/<sub>2</sub></span>"> expected but was <"<span><sup>1</sup></span>">.Am I missing something obvious, or is this a bug? The above code is an abstraction from a larger project I'm working on, so I've tried to reduce it to the base of the issue. It passes the test on the MacPorts install (same version of libxml2 and nokogiri as on the Linux install).
The Ruby -v on linux is:
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]And the MacPorts install is:
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9]Comments
tenderlove
Sat Sep 12 11:35:53 -0700 2009
| link
Strange. Are you sure nokogiri -v returns the same thing on the linux box? That seems crazy.
Yeah - I thought I was going crazy so I even did a diff. Weirdest thing I've seen... AFAIK libxml2 doesn't really depend on much does it? Or is there something else that Nokogiri depends on that might be causing this? I tried libxml2 2.7.2 just in case, but still had the same problem.
tenderlove
Sat Sep 12 11:53:41 -0700 2009
| link
libxml2 only depends on iconv and zlib. Neither of those should cause this problem.
What linux are you running?
Gentoo (default/linux/x86/2008.0 profile) over the 2.6.18-xenU-ec2-v1.0 kernel
libxml2: 2.7.3-r2
libc: 2.8_p20080602-r1
zlib: 1.2.3-r1
tenderlove
Sat Sep 12 12:01:46 -0700 2009
| link
Okay. I'll get a gentoo box up and running. Might be a little while before I get this one to repro. :-(
Thanks a ton! In the meantime I'm trying to upgrade glibc and anything else that might be out of date, just to try some different versions of things.
FWIW - The new glibc (2.9_p20081201-r2) didn't affect anything.
Here's another take on it, if it helps at all:
require 'test/unit' require 'rubygems' require 'nokogiri' module Nokogiri::XML class Node include Test::Unit::Assertions def inner_html=(tags) children.each { |x| x.remove } assert_equal ['sup', 'sub'], document.fragment(tags).children.map {|n| n.name } document.fragment(tags).children.to_a.each do |node| add_child node end self end end end class BugTest < Test::Unit::TestCase def test_should_parse_inner_html text = '<base><one>1</one><two>2</two></base>' doc = Nokogiri::XML(text) base_tag = doc.at('base') base_tag.inner_html = "<sup>#{base_tag.at('one').inner_text}</sup><sub>#{base_tag.at('two').inner_text}</sub>" assert_equal ['sup', 'sub'], base_tag.children.map {|n| n.name } end endSuccessful return on the MacPorts install, and on the Linux install:
1) Failure: test_should_parse_inner_html(BugTest) [oo2.rb:12:in `inner_html=' oo2.rb:27:in `test_should_parse_inner_html']: <["sup", "sub"]> expected but was <["sup"]>.In fact, even just this code returns only the first element and not the other:
Nokogiri::XML::DocumentFragment.parse("<one>1</one><two>2</two>")Unless it's wrapped in another outer element like:
<x><one>1</one><two>2</two></x>...then it returns the whole thing. And then obviously on the MacPorts install it returns an accurate copy regardless of the surrounding element.
I think I narrowed this down finally. For whatever reason, my local copy of Nokogiri (even though the gem was labeled 1.3.3) showed this diff from the copy on the linux machine (which was recently installed):
8,9c8,13 < @html_eh = node.kind_of? Nokogiri::HTML::DocumentFragment < --- > @klass = if node.kind_of?(Nokogiri::HTML::DocumentFragment) > Nokogiri::HTML::DocumentFragment > else > Nokogiri::XML::DocumentFragment > end > # 23,25c27,28 < regex = @html_eh ? %r{^\s*<#{Regexp.escape(name)}}i : < %r{^\s*<#{Regexp.escape(name)}} < --- > regex = (@klass == Nokogiri::HTML::DocumentFragment) ? %r{^\s*<#{Regexp.escape(name)}}i \ > : %r{^\s*<#{Regexp.escape(name)}}So a fresh install on my MacPorts version now fails as well - lol - not quite the expected result. However, installing the gem from the master works great - so looks like this was already fixed ;)
tenderlove
Mon Sep 14 21:07:45 -0700 2009
| link
Ugh. You're right. It fails against 1.3.3. I was checking against master. :-(
I guess I can stop fighting with VirtualBox now. Thanks for letting me know!
-
I think we should remove Node#collect_namespaces. Since namespace names are not unique, I don't know that this method is very useful.
Comments
flavorjones
Mon Sep 14 15:36:38 -0700 2009
| link
+1
tenderlove
Mon Sep 14 22:08:31 -0700 2009
| link
You're supposed to use the upvote button! ;-)
Although, I like the +1 better because it's not anonymous.
tenderlove
Sun Oct 04 20:34:09 -0700 2009
| link
This was removed in c7eb4b2
-
2 comments Created 3 months ago by david1.4.0xAdding a node with a default namespace stores it as 'no-namespace' in the parenttenderlovexThis works:
doc = Nokogiri::XML("<element><child xmlns="woop:de:doo" /></element>") doc.at("//xmlns:child", 'xmlns' => 'woop:de:doo') #=> <child xmlns="woop:de:doo" />This doesn't:
doc = Nokogiri::XML::Document.new e = Nokogiri::XML::Node.new('element', doc) c = Nokogiri::XML::Node.new('child', doc) c.add_namespace(nil, 'woop:de:doo') e.add_child(c) doc.add_child(c) doc.at("//xmlns:child", 'xmlns' => 'woop:de:doo') #=> nilComments
I'd also like to add that if you have a document like this:
<element> <c1 xmlns="one" /> <c2 xmlns="two" /> </element>then
doc.root.collect_namespaces.inspect #=> {'xmlns' => 'two'}
tenderlove
Fri Sep 11 21:48:15 -0700 2009
| link
Yup. That is the danger of collect_namespaces. I think that method should be removed.
The first problem is fixed here: c6e5fa0
-

I need original text like 'Пупкин'
ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]Linux lenny5/stable
nokogiri -1.3.3.Comments
tenderlove
Tue Sep 08 08:53:53 -0700 2009
| link
What encoding is the XML using?
romanvbabenko
Tue Sep 08 09:02:40 -0700 2009
| link
$ enca staff.xml Universal transformation format 8 bits; UTF-8
Mixed line terminators
tenderlove
Tue Sep 08 09:19:52 -0700 2009
| link
If you specify the encoding like this:
Nokogiri::XML('....', nil, 'UTF-8')What does it return?
romanvbabenko
Tue Sep 08 09:38:47 -0700 2009
| link
text first post picture
<?xml version="1.0"?>
without encoding
tenderlove
Tue Sep 08 10:01:03 -0700 2009
| link
This works well for me:
require 'nokogiri' doc = Nokogiri::XML('<person last_name="Пупкин"></person>', nil, 'UTF-8') puts doc.at('person')['last_name']How about you?
romanvbabenko
Tue Sep 08 10:16:36 -0700 2009
| link
see thet
i need save to file readable text
tenderlove
Tue Sep 08 10:25:08 -0700 2009
| link
Try this:
require 'nokogiri' doc = Nokogiri::XML('<person last_name="Пупкин"></person>', nil, 'UTF-8') doc.encoding = 'UTF-8' puts doc.to_xml
romanvbabenko
Tue Sep 08 10:48:48 -0700 2009
| link
Oh. That is work fine. But, i see text in my file only after require 'multibyte'
tenderlove
Tue Sep 08 10:55:52 -0700 2009
| link
multibyte should not make a difference. Can you paste a short script somewhere that shows the problem?
tenderlove
Fri Sep 11 21:09:12 -0700 2009
| link
I believe this is working as expected. Please reopen if I am incorrect.
-
4 comments Created 3 months ago by wtn1.4.0xSegfault when searching with #at (1.3.3)tenderlovex/Library/Ruby/Gems/1.8/gems/nokogiri-1.3.3/lib/nokogiri/xml/node.rb:591: [BUG] Segmentation fault ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0]
I think it happens whether I'm using OS X 10.5, 10.6, ruby 1.9.1, or 1.8.7
Also, it doesn't happen every time, but about a third of the time (I'm using Mechanize, perhaps the input is changing somewhat)Comments
tenderlove
Sun Sep 06 10:35:33 -0700 2009
| link
Could you possibly give me the HTML you're parsing? Or some sort of script so that I can reproduce it?
Thanks. I discovered that this crashes about half the time when I paste it in irb, but when I run it as a script on the command line it doesn't crash.
I emailed you the script.
tenderlove
Fri Sep 11 21:08:43 -0700 2009
| link
I can't seem to reproduce this with Nokogiri 1.3.3. Could you possibly get me the document that makes this crash?
I suspect that the webpage is dynamic, and only one of the dynamic pages causes it to crash.
tenderlove
Mon Sep 14 22:15:21 -0700 2009
| link
Figured this out. There was a bug in Node#inspect. This commit fixed it:
-
1 comment Created 3 months ago by zoozed1.4.0xNodeSet#slice doesn't handle ranges beyond the end of the arraytenderlovexLooks like NodeSet#slice isn't handling range beyond the end of the array...
#!/usr/bin/env ruby require 'rubygems' require 'nokogiri' xml = Nokogiri::XML('<?xml version="1.0" encoding="utf-8"?> <rss version="2.0"> <channel> <item><title>t1</title></item> <item><title>t2</title></item> </channel> </rss> ') items = (xml/:item) not_blowed_up = items[0, items.size] all_blowed_up = items[0, 100]Comments
tenderlove
Fri Sep 11 21:07:34 -0700 2009
| link
fixing node slices where the slice is larger than the node set length. closed by 9f04464
-
I think the reference to the racc tarball (v 1.4.5) should be removed from the gem compilation script because that version doesn't work on ruby 1.9. Instead of the user being directed to that tarball, the user should be told to install the racc gem (now 1.4.6 and compatable with 1.9).
Comments
tenderlove
Tue Sep 01 13:16:10 -0700 2009
| link
I can't seem to find the code you are talking about. Can you send me a link? Or be more specific?
Thanks
Oh sorry, here's what I got on the console before I installed the racc gem:
localhost:home user$ sudo gem install tenderlove-nokogiri
Building native extensions. This could take a while...
ERROR: Error installing tenderlove-nokogiri:
ERROR: Failed to build gem native extension./usr/local/bin/ruby19 -rubygems /usr/local/lib/ruby19/gems/1.9.1/gems/rake-0.8.7/bin/rake RUBYARCHDIR=/usr/local/lib/ruby19/gems/1.9.1/gems/tenderlove-nokogiri-0.0.0.20081021110113/lib RUBYLIBDIR=/usr/local/lib/ruby19/gems/1.9.1/gems/tenderlove-nokogiri-0.0.0.20081021110113/lib Hoe.new {...} deprecated. Switch to Hoe.spec.
(in /usr/local/lib/ruby19/gems/1.9.1/gems/tenderlove-nokogiri-0.0.0.20081021110113) WARNING: HOE DEPRECATION: Add '>= 0' to the 'rake' dependency.
/usr/local/bin/ruby19 extconf.rb checking for xmlParseDoc() in -lxml2... yes
checking for xsltParseStylesheetDoc() in -lxslt... yes
checking for libxml/xmlversion.h in /usr/include/libxml2... yes
checking for libxslt/xslt.h in /usr/include... yes
checking for racc... no
need racc, get the tarball from http://i.loveruby.net/archive/racc/racc-1.4.5-all.tar.gz
extconf.rb failed Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers. Check the mkmf.log file for more
details. You may need configuration options.Provided configuration options:
--with-opt-dir --without-opt-dir --with-opt-include --without-opt-include=${opt-dir}/include --with-opt-lib --without-opt-lib=${opt-dir}/lib --with-make-prog --without-make-prog --srcdir=. --curdir --ruby=/usr/local/bin/ruby19 --with-xml2lib --without-xml2lib --with-xsltlib --without-xsltlibrake aborted!
Command failed with status (1): [/usr/local/bin/ruby19 extconf.rb...]
/usr/local/lib/ruby19/gems/1.9.1/gems/tenderlove-nokogiri-0.0.0.20081021110113/Rakefile:58:in `block (2 levels) in ' (See full trace by running task with --trace)Gem files will remain installed in /usr/local/lib/ruby19/gems/1.9.1/gems/tenderlove-nokogiri-0.0.0.20081021110113 for inspection.
Results logged to /usr/local/lib/ruby19/gems/1.9.1/gems/tenderlove-nokogiri-0.0.0.20081021110113/gem_make.out
tenderlove
Tue Sep 01 14:58:55 -0700 2009
| link
Never install nokogiri from github. Always install it from rubyforge.
$ sudo gem install nokogiriI will see about getting the gem removed from github. If it is hosted on github, that is a mistake.
-
10 comments Created 3 months ago by DrusTheAxe1.4.0x1.3.3 breaks RELAXNG.new(rng).validate(xml)tenderlovexRELAXNG's validate() method changed in 1.3.3:
In 1.3.2, it returns an array of errors (and empty if succesful). In 1.3.3, it always returns an empty array AND writes to stderr.This is probably due to 1.3.3's registering a libxml global error-handler.
This is a critical breaking change on 2 counts:
validate() no longer returns errors ; impossible to programatically branch based on if the XML is valid or not stderr may not exist; run this under IIS (where stderr is not set) and you faultNOTE: This was tested against Ruby 1.8.6 no Windows 7, but none of that should matter.
At the very least a documented and clean workaround to alter the default error handler back to 1.3.2 behavior is critical; the docs are...thin...on the subject, and the only thing close to on point on the web are some dated forum messages in Feb'09 (the included code doesn't even parse correctly against 1.3.2 or 1.3.3).
Below is a trivial test to repro the problem
abort 'Usage: ruby relaxng_validate.rb <version>' if ARGV.empty? nokogiri_version = ARGV[0] require 'rubygems' gem 'nokogiri', nokogiri_version require 'nokogiri' puts "Nokogiri version #{Nokogiri::VERSION}" xml = <<EOXML <A/> EOXML schema = <<EOSCHEMA <?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <ref name="A"/> </start> <define name="A"> <element name="A"> <interleave> <attribute name="B"/> <element name="C"> <text/> </element> <element name="D"> <element name="E"> <text/> </element> </element> </interleave> </element> </define> </grammar> EOSCHEMA puts 'Loading xml...' doc = Nokogiri::XML(xml) puts 'Loading schema...' relaxng = Nokogiri::XML::RelaxNG(schema) puts 'Validating xml against schema...' errors = relaxng.validate(doc) puts "Errors.size = #{errors.size}" errors.each { |error| puts " Error: #{error}" } puts 'Done.'Comments
DrusTheAxe
Mon Aug 31 22:16:07 -0700 2009
| link
Oops. Run the command
ruby test.rb 1.3.2and the output is
Nokogiri version 1.3.2 Loading xml... Loading schema... Validating xml against schema... Errors.size = 2 Error: Invalid sequence in interleave Error: Element A failed to validate content Done.Run the command
ruby test.rb 1.3.3and the output is
Nokogiri version 1.3.3 Loading xml... Loading schema... Validating xml against schema... element A: Relax-NG validity error : Invalid sequence in interleave element A: Relax-NG validity error : Element A failed to validate content Errors.size = 0 Done.
flavorjones
Mon Aug 31 23:12:11 -0700 2009
| link
I cannot reproduce this on Ubuntu Linux 9.04.
flavorjones
Mon Aug 31 23:17:45 -0700 2009
| link
Howard,
Can you please include some information about your platform, as well as the value of Nokogiri::VERSION_INFO (which is a hash) after loading Nokogiri 1.3.3 ?
Thanks much!
DrusTheAxe
Mon Aug 31 23:34:54 -0700 2009
| link
Sure. Changing the line
puts "Nokogiri version #{Nokogiri::VERSION}"to
puts "Ruby #{RUBY_PLATFORM} #{RUBY_VERSION}" puts "Nokogiri version #{Nokogiri::VERSION}" puts "Nokogiri VERSION_INFO: #{Nokogiri::VERSION_INFO.inspect}"and run with
ruby test.rb 1.3.2now shows at the top of the output
Ruby i386-mswin32 1.8.6 Nokogiri version 1.3.2 Nokogiri VERSION_INFO: {"nokogiri"=>"1.3.2", "warnings"=>[], "libxml"=>{"compiled"=>"2.7.3", "loaded"=>"2.7.3", "binding"=>"extension"}}and when run with
ruby test.rb 1.3.3outputs
Ruby i386-mswin32 1.8.6 Nokogiri version 1.3.3 Nokogiri VERSION_INFO: {"nokogiri"=>"1.3.3", "warnings"=>[], "libxml"=>{"compiled"=>"2.7.3", "loaded"=>"2.7.3", "binding"=>"extension"}}As I said, I'm running 1.8.6 on Windows (XP and 7, same results).
The only difference in Nokogiri::VERSION_INFO appears to be the "nokogiri"=>"1.3.2" vs. "...1.3.3".Nokogiri on Windows with --platform mswin32 has a prebuilt libxml, right?
Then I'd suspect the problem is in 1.3.3's registered libxml global error-handler
--and/or-- in the prebuilt libxml binary included in the Nokogiri 1.3.3 mswin32 gem.P.S. I got Nokogiri via
get install Nokogiri --platform mswin32 --version 1.3.2 get install Nokogiri --platform mswin32 --version 1.3.3to explicitly pull both versions down to my machine to repro the issue.
Anything else I can do to help?
flavorjones
Tue Sep 01 00:45:30 -0700 2009
| link
Confirmed this issue on Windows XP with 1.3.3.
DrusTheAxe
Tue Sep 01 18:17:24 -0700 2009
| link
A few questions:
- Do you understand the source of the problem?
- Do you have an ETA on a fix?
- Do you have a workaround? (besides punting 1.3.3 and locking into 1.3.2)
flavorjones
Wed Sep 02 14:00:31 -0700 2009
| link
Howard -
Neither Aaron nor I will be able to spend the necessary time to debug this, since Windows is not a native platform for either of us. If you are under a time crunch, I'd advise backing down to 1.3.2.
And, if you have Windows expertise you'd like to lend to help us track the problem down, we'd love to have the help. Please remember, we're doing this in our spare time, for the benefit of humanity. That's you!
-m
DrusTheAxe
Thu Sep 03 22:53:07 -0700 2009
| link
Windows is native for me and I'm familiar with several XML libraries (Xerces, ElementTree, others), but new to Nokogiri and LibXML. I tried to hunt down how things were wired up, but didn't quite follow.
My suspicion is 1.3.3's wiring up of libxml's global error-handler, but i couldn't find such a thing. I did find what looks like 6-7 callbacks, but not sure which are which, and they look pretty cryptic anyway. Looks like there's some VB-Declare / .NET-P/Invoke / etc like native-ish-from-Ruby wiring ('pointer' and such), but I've only been doing Ruby for 4 months and haven't see that (in Ruby).
My suspicion is the newly registered global error-handler is dumping to stderr and not propogating the error messages up to Ruby like used to happen. That said, I'm not quite sure where to look. Not even sure it's Ruby code, could be native code, though the changelog comment said libxml-ruby (fwiw).
I'd be glad to help lend a hand, but at this point I've taken it as far as I can on my own. Looking for some pointers.
tenderlove
Sun Sep 06 14:47:31 -0700 2009
| link
Ugh. So, here is what I have learned so far. If I cross compile from the tag (REL_1.3.3), I cannot reproduce the problem. If I cross compile from HEAD, I cannot reproduce the problem. I can reproduce the problem with the released gem, and if I re-cross compile the released gem, I can reproduce the problem.
Also, I've noticed that the test suite does pick this up. Running the tests inside the released gem picks this up.
I am going to re-compile from the 1.3.3 release tag and replace the gems on rubyforge with the recompiled versions.
tenderlove
Sun Sep 06 15:02:32 -0700 2009
| link
I've uploaded the new gems, so if you uninstall, then reinstall, everything should work.
I haven't found the root cause of this problem, but I'm closing this ticket because our tests exercise this behavior, and it is not reproducible from HEAD.
I need to find a less painful way of getting the tests running on windows. :-(
- Do you understand the source of the problem?
-
After installing nokogiri on Snow Leoaprd (using ARCHFLAG or not), it blows up with:
dlopen(/usr/local/lib/ruby/gems/1.8/gems/nokogiri-1.3.3/lib/nokogiri/nokogiri.bundle, 9): no suitable image found. Did find: /usr/local/lib/ruby/gems/1.8/gems/nokogiri-1.3.3/lib/nokogiri/nokogiri.bundle: mach-o, but wrong architecture - /usr/local/lib/ruby/gems/1.8/gems/nokogiri-1.3.3/lib/nokogiri/nokogiri.bundle
Comments
tenderlove
Mon Aug 31 10:18:26 -0700 2009
| link
Try uninstalling nokogiri and re-installing it.
tenderlove
Mon Aug 31 10:43:51 -0700 2009
| link
Turns out that ruby needed to be recompiled.
Closing. :-D
-
1 comment Created 4 months ago by tenderloveimplement Nokogiri::XML::DTD#external_id and system_idtenderloveximplement Nokogiri::XML::DTD#external_id and system_id
look at xmlDtdPtr
Comments
tenderlove
Sat Sep 12 14:06:51 -0700 2009
| link
adding DTD external id an system id. closed by 303b2b2
-
1 comment Created 4 months ago by tenderlove1.4.0xImplement Nokogiri::XML::ElementDecl#contenttenderlovexImplement Nokogiri::XML::ElementDecl#content
look at tree.h
struct _xmlElement, content member
Comments
tenderlove
Sat Sep 12 17:14:18 -0700 2009
| link
updating changelog. closed by 1f658f0
-
strings returned by xpath expression "/text()" are bad formatted
1 comment Created 4 months ago by jneyStrings returned by Nokogiri::HTML(open(url)).xpath("//xpath_expression/text()").to_ary are such formatted that comparaison return false on identic strings.
To avoid the problem i have to do it : Nokogiri::HTML(open(url)).xpath("//xpath_expression").collect(&:text)Comments
tenderlove
Sun Aug 30 10:45:37 -0700 2009
| link
Right, because it returns a Nokogiri::XML::Text node. That is different than a string.
-
Nokogiri produces error "output error : unknown encoding" on certain pages
1 comment Created 4 months ago by mogman1Every now and again Nokogiri will fail to process an HTML document, producing the error "output error : unknown encoding". Reference issue 122 as I strongly suspect that this is also related to the version of libxml2. This issue only comes up in my Windows XP environment and it works fine in my Linux environment (again, please see #122 as my environments have remained exactly the same). To see this, try the following:
doc = open('http://businesslogos.com/resources_services.php')
noko = Nokogiri::HTML(doc)That second line will generate the error. Most URLs are just fine, but the one above is an example of a URL that produces the error. I skimmed through the document looking for any bizarre characters but did not find anything obvious. If you upgrade the version of libxml2 to fix issue 122, that may very well fix this problem as well, but I wanted to alert you to it. Also, I want to reiterate that this is a problem that only shows up in the Windows environment and does not happen in the Linux envrionment, the exact opposite of the problem in issue 122 :-/
Sorry to lob two complaints in so short a time. I absolutely love the gem and think it's the best out there for this sort of thing, I am just trying to help make it even better!
Comments
tenderlove
Thu Aug 27 18:33:51 -0700 2009
| link
I believe this is a problem with the version of ICONV that nokogiri on windows is using. I will make sure it is upgraded in the next release, but beyond that, this is a problem I can't fix.
-
1 comment Created 4 months ago by samsm1.4.0xDocumentFragment lacks detailed searchtenderlovexfragment = '<p id="content">hi</p>' Nokogiri::HTML.fragment(fragment).search('#content').length # this returns zero Nokogiri::HTML(fragment).search('#content').length # this returns 1Searching for an element ('p') does work, but using any CSS selector or XPath seems to always produce zero results. Non-fragment search works as I'd expect.
Comments
tenderlove
Fri Aug 28 22:23:02 -0700 2009
| link
delegating DocumentFragment#css to the fragment children. closed by ed10f01
-
Linux environment:
CentOS 5.2
ruby 1.8.6 (2008-08-11 patchlevel 287) [x86_64-linux]
rails 2.3.3
Nokogiri 1.3.3Windows environment:
Windows XP
ruby 1.8.6 (2008-08-11 patchlevel 287) [i386-mswin32]
rails 2.3.3
Nokogiri 1.3.3This is a peculiar issue I discovered when I moved my application from my Windows XP development box to a linux box for production. When I run Nokogiri::HTML(open('http://www.some-url.com')) on my Windows environment, the entire HTML document is returned. However, when I do that same thing in the Linux environment, curiously only the HTML comment tags are returned.
When I checked the temp file that open() generates, the entire HTML document was there, but for whatever reason Nokogiri only grabbed the comment tags. I checked and the same comment tags show inside the original document and in that order. Just for whatever reason, everything but comment tags are filtered out.
My guess is that this is some weird bug having to do with the Linux version of Ruby working with Nokogiri, but otherwise I am thoroughly perplexed. I also tried using a previous version of Nokogiri (1.3.1) to see if that would work, but I got the same result.
Comments
It should be noted that I can pass in a string and Nokogiri parses things just fine. So while Nokogiri::HTML(open('http://www.some-url.com')) produces the strange behaviour, Nokogiri::HTML(open('http://www.some-url.com').read) works exactly as expected. So at least there is a work-around if someone else comes across this.
tenderlove
Wed Aug 26 16:58:49 -0700 2009
| link
Can you run 'nokogiri -v' for me and add the output to the comments? I think this may be a bug in libxml2
nokogiri: 1.3.3
warnings: []libxml:
compiled: 2.7.3 loaded: 2.7.3 binding: extension
tenderlove
Thu Aug 27 10:11:35 -0700 2009
| link
Is that from the linux environment, or the windows environment?
Sorry, completely spaced there.
Linux:
nokogiri: 1.3.3
warnings: []libxml:
compiled: 2.6.26 loaded: 2.6.26 binding: extensionWindows:
nokogiri: 1.3.3
warnings: []libxml:
compiled: 2.7.3 loaded: 2.7.3 binding: extension
tenderlove
Thu Aug 27 17:49:53 -0700 2009
| link
Okay. This is a bug in libxml2. If you upgrade your server to at least 2.6.32, the problem will go away.
-
2 comments Created 4 months ago by henriktcmalloc error parsing "<a><b></a>" fragment with REEREExWorks fine with MRI, but not with REE:
henrik@Nyx ~/Code$ which nokogiri /opt/ruby-enterprise-1.8.6-20090610/bin/nokogiri henrik@Nyx ~/Code$ nokogiri -v --- nokogiri: 1.3.3 warnings: [] libxml: compiled: 2.7.3 loaded: 2.7.3 binding: extension henrik@Nyx ~/Code$ which ruby /opt/ruby-enterprise-1.8.6-20090610/bin/ruby henrik@Nyx ~/Code$ ruby -rubygems -e 'require "nokogiri"; puts Nokogiri::HTML::DocumentFragment.parse("<a><b></a>")' src/tcmalloc.cc:186] Attempt to free invalid pointer: 0x201a90 Abort trap henrik@Nyx ~/Code$Expected output is
<a><b></b></a>without the error.
Comments
tenderlove
Mon Sep 14 21:41:54 -0700 2009
| link
I'm not sure what to do about this. It's working for me:
tenderlove
Sun Oct 04 21:19:21 -0700 2009
| link
I can't repro this and the ticket hasn't been updated for almost a month. I will assume this is fixed on master.
Please reopen and update the ticket if it's still breaking against master. Thanks.
-
nokogiri-1.3.3 is not working with libxml2, libxslt in 1.3.1
1 comment Created 4 months ago by maoI am use ubuntu 9.04 amd64, xml2, xslt and exslt libraries are installed. When I require nokogiri, I got following error:
irb(main):003:0> require 'nokogiri'
LoadError: Could not open any of [xml2, xslt, exslt]
from /home/dan/installed/jruby/lib/ruby/1.8/ffi/library.rb:18:inffi_lib'<br/> from /home/dan/installed/jruby-1.3.1-rails/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-java/lib/nokogiri/ffi/libxml.rb:5<br/> from /home/dan/installed/jruby-1.3.1-rails/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-java/lib/nokogiri/ffi/libxml.rb:31:inrequire'
from /home/dan/installed/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:inrequire'<br/> from /home/dan/installed/jruby-1.3.1-rails/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-java/lib/nokogiri.rb:10<br/> from /home/dan/installed/jruby-1.3.1-rails/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-java/lib/nokogiri.rb:36:inrequire'
from /home/dan/installed/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require'
from (irb):4I solved the problem and record them here:
http://maodan520.spaces.live.com/blog/cns!E0C8D36B1650926A!237.entryBut I think nokogiri is not working correct with neither libxml2.so nor libxslt.so, maybe it's a issue.
Comments
flavorjones
Tue Aug 18 23:17:33 -0700 2009
| link
this problem was solved, i think. see #90 for followup.
-
NodeSet#search problems and NodeSet#+ is not different from NodeSet#&
1 comment Created 4 months ago by SerabeHi there:
I've been tracking down some problems with my implementation for XPath and I think it is not my problem, but yours. Let me explain. In Nokogiri::XML::NodeSet#search you can see:
each do |node| paths.each do |path| sub_set += send(path =~ /^(\.\/|\/)/ ? :xpath : :css, *(paths + [ns])) end endOk, for each node you're iterating over paths and, for each path, you're calling either the xpath or the css method of NodeSet. And guess what, both of them iterates over each node... again. You can take a look at search, xpath and css methods in Nokogiri::XML::NodeSet.
The easiest way to fix this is deleting lines 75 and 80.
Just to explain why this is happening, my implementation is based on RubyArray and I used the op_plus function in it. By the way, if you're implementation of the plus operator removes duplicates I see no difference between + and &, so I would like to know what should I do in the java impl.
Comments
tenderlove
Wed Aug 12 11:44:31 -0700 2009
| link
making NodeSet more consistent with Set, adding NodeSet#| closed by 541cbeb
-
1 comment Created 4 months ago by tenderlove1.4.0xConvert meta_encoding and meta_encoding= to rubytenderlovexThese methods need to be converted to Ruby. The current meta_encoding= method will call xmlFreeNode() on the old meta tag which will cause a segv:
require 'nokogiri' doc = Nokogiri::HTML DATA.read node = doc.at('meta') puts node.name doc.meta_encoding = 'EUC-JP' p node __END__ <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Hello World</title> </head> <body> <h1>Hello Again</h1> </body> </html>Comments
tenderlove
Tue Oct 13 21:38:38 -0700 2009
| link
meta_encoding and meta_encoding= are implemented in ruby. closed by ceffd26
-
4 comments Created 4 months ago by naofumi1.4.0xShouldn't inner_html convert to UTF8 the same way as inner_text?tenderlovexThere seems to be an inconsistency between how encoding conversion is applied with the inner_text, inner_html and to_html methods.
With #inner_text, I think all output is automatically converted to UTF8.
With #inner_html, encoding conversions are not applied.
With #to_html, you can specify the desired encoding for the result with the :encoding option.I would prefer that the output for both #inner_html and #to_html are converted to UTF8 by default, but that you can override this with the :encoding option.
At least, it would be nice to be able to pass the :encoding option to #inner_html.
Comments
In order to provide an :encoding option for #inner_html, maybe the following example;
in nokogiri/xml/node.rb
def inner_html (\*args) children.map { |x| x.to_html(\*args) }.join endin nokogiri/xml/node_set
def inner_html (\*args) collect{|j| j.inner_html(\*args)}.join('') end
tenderlove
Thu Aug 27 18:16:15 -0700 2009
| link
I've added the ability to pass encoding to #inner_html. I don't want to automatically convert all documents to UTF-8 when calling #to_html. I think that would be bad for people processing documents in something besides UTF-8, and they want the final output to remain the specified encoding.
If you always want the output to be UTF-8, just tell the document that it should be encoded with UTF-8 like so:
doc = Nokogiri::HTML open('http://example.com/') doc.encoding = 'UTF-8' # Set the document encoding to UTF-8After doing that inner_html and to_html will return UTF-8 documents.
tenderlove
Thu Aug 27 18:17:40 -0700 2009
| link
inner_html takes the same arguments as to_html. closed by ab9a8a0
-
nokogiri-1.3.3 introduces dependency on st.h -- error: st.h: No such file or directory
1 comment Created 4 months ago by TylerRick1.3.2 installs fine but I can't seem to build/install 1.3.3. I'm running Ubuntu 9.04.
What is st.h and how do I get it to find it?
Thanks!
> sudo gem1.9 install nokogiri -v 1.3.2 Building native extensions. This could take a while... Successfully installed nokogiri-1.3.2 1 gem installed Installing ri documentation for nokogiri-1.3.2... Installing RDoc documentation for nokogiri-1.3.2... > sudo gem1.9 install nokogiri -v 1.3.3 Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /usr/bin/ruby1.9 extconf.rb install nokogiri -v 1.3.3 checking for iconv.h in /opt/local/include/,/opt/local/include/libxml2,/opt/local/include,/opt/local/include,/opt/local/include/libxml2,/usr/local/include,/usr/local/include/libxml2,/usr/include,/usr/include/libxml2,/usr/include,/usr/include/libxml2... yes checking for libxml/parser.h in /opt/local/include/,/opt/local/include/libxml2,/opt/local/include,/opt/local/include,/opt/local/include/libxml2,/usr/local/include,/usr/local/include/libxml2,/usr/include,/usr/include/libxml2,/usr/include,/usr/include/libxml2... yes checking for libxslt/xslt.h in /opt/local/include/,/opt/local/include/libxml2,/opt/local/include,/opt/local/include,/opt/local/include/libxml2,/usr/local/include,/usr/local/include/libxml2,/usr/include,/usr/include/libxml2,/usr/include,/usr/include/libxml2... yes checking for libexslt/exslt.h in /opt/local/include/,/opt/local/include/libxml2,/opt/local/include,/opt/local/include,/opt/local/include/libxml2,/usr/local/include,/usr/local/include/libxml2,/usr/include,/usr/include/libxml2,/usr/include,/usr/include/libxml2... yes checking for xmlParseDoc() in -lxml2... yes checking for xsltParseStylesheetDoc() in -lxslt... yes checking for exsltFuncRegister() in -lexslt... yes checking for xmlRelaxNGSetParserStructuredErrors()... yes checking for xmlRelaxNGSetParserStructuredErrors()... yes checking for xmlRelaxNGSetValidStructuredErrors()... yes checking for xmlSchemaSetValidStructuredErrors()... yes checking for xmlSchemaSetParserStructuredErrors()... yes creating Makefile make cc -I. -I/usr/include/libxml2 -I/usr/include -I/usr/include/ruby-1.9.0/x86_64-linux -I/usr/include/ruby-1.9.0 -I. -DHAVE_XMLRELAXNGSETPARSERSTRUCTUREDERRORS -DHAVE_XMLRELAXNGSETPARSERSTRUCTUREDERRORS -DHAVE_XMLRELAXNGSETVALIDSTRUCTUREDERRORS -DHAVE_XMLSCHEMASETVALIDSTRUCTUREDERRORS -DHAVE_XMLSCHEMASETPARSERSTRUCTUREDERRORS -I/usr/include/libxml2 -I/usr/include -I/usr/include/ruby-1.9.0/x86_64-linux -I/usr/include/ruby-1.9.0 -I. -fPIC -fno-strict-aliasing -g -g -O2 -O2 -g -Wall -Wno-parentheses -fPIC -g -DXP_UNIX -O3 -Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline -o xml_reader.o -c xml_reader.c In file included from /usr/include/ruby-1.9.0/ruby.h:15, from ./nokogiri.h:6, from ./xml_reader.h:4, from xml_reader.c:1: /usr/include/ruby-1.9.0/ruby/ruby.h: In function ‘rb_type’: /usr/include/ruby-1.9.0/ruby/ruby.h:973: warning: conversion to ‘int’ from ‘VALUE’ may alter its value In file included from ./nokogiri.h:81, from ./xml_reader.h:4, from xml_reader.c:1: ./xml_document.h:5:16: error: st.h: No such file or directory xml_reader.c: In function ‘attribute_nodes’: xml_reader.c:171: warning: cast discards qualifiers from pointer target type xml_reader.c: In function ‘attribute_at’: xml_reader.c:199: warning: conversion to ‘int’ from ‘long int’ may alter its value xml_reader.c: In function ‘from_memory’: xml_reader.c:466: warning: conversion to ‘int’ from ‘long int’ may alter its value xml_reader.c:474: warning: conversion to ‘int’ from ‘long int’ may alter its value xml_reader.c: In function ‘from_io’: xml_reader.c:506: warning: conversion to ‘int’ from ‘long int’ may alter its value make: *** [xml_reader.o] Error 1Comments
tenderlove
Thu Aug 06 13:35:47 -0700 2009
| link
Ruby 1.9.0 is not supported. You should upgrade to 1.9.1-p129 or even the 1.9.2. 1.9.0 is too broken to be supported. :-(
-
1 comment Created 4 months ago by arndtjenssen1.4.0x"ArgumentError: NULL pointer given" on calling doc.meta_encodingtenderlovexIs thrown on (malformed) documents with missing encoding info on v1.3.2
Steps to reproduce:require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML(open('http://www.europapress.es/valencia/noticia-moby-dick-john-huston-obrira-cap-setmana-filmoteca-destiu-valencia-20090731183544.html')) doc.meta_encodingresults in:
ArgumentError: NULL pointer given from (irb):55:in `meta_encoding' from (irb):55 from :0Comments
tenderlove
Thu Aug 06 21:24:55 -0700 2009
| link
returns nil when an HTML document does not declare a meta encoding tag. closed by d0e9312
-
5 comments Created 5 months ago by alkanshel1.4.0xError parsing provided css_path()tenderlovexNokogiri occasionally has issues parsing the css_path it provides. A sample case is the following path (retrieved by running Node.css_path()):
html > body > div > div:nth-of-type(2) > div > text():nth-of-type(2)
Reasonably certain the cause is the 'text():nth-of-type(2)', which is generated from the corresponding .path of /html/body/div/form/table[3]/tr[11]/td/div/div/div/div/table/tr[2]/td[2]/span[2]/text().
The error message is 'Unexpected ':' in #<Nokigiri::CSS::...'
Comments
tenderlove
Thu Aug 06 20:32:08 -0700 2009
| link
That XPath doesn't result in the CSS path you're showing. Can you give me a code example with reference HTML for me?
Hmm. Okay, I'll check my code/data set and see if I can replicate the issue with better documentation.
tenderlove
Fri Aug 07 09:13:55 -0700 2009
| link
Thank you, I'd really appreciate it!
Weird. I ran Nokogiri over the same data set and looked over the CSS paths generated, and I can't seem to replicate the issue. I'll have to chalk it up to solar radiation or PEBKAC, then. Sorry about that.
tenderlove
Fri Aug 07 19:19:11 -0700 2009
| link
Haha! No problem. If you run across it again, please let me know!
-
1 comment Created 5 months ago by tenderlove1.4.0xffixFix platform detection codetenderlovexWe need to be able to tell when a user is running on windows, not by just the platform. Right now, the code looks at the platform when it needs to look at the OS. Switch to this (from Luis):
RUBY_PLATFORM + RbConfig::CONFIG['host_os'] Similar approach is being used by mspec and the RubySpec to determine which API behavior should be checked for Java on Windows.Comments
tenderlove
Thu Aug 06 21:24:55 -0700 2009
| link
using host OS to figure out ENV["PATH"]. closed by 544b431
-
2 comments Created 5 months ago by tenderlove1.4.0xffixPackage libxml2 dll's with jruby gemtenderlovexPackage libxml2 dll's with jruby gem so that the jruby gem can work on windows / jruby
Comments
tenderlove
Thu Jul 30 09:44:59 -0700 2009
| link
I forgot, people seem to get this error:
C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/1.8/ffi/ffi.rb: 114:in `create _invoker': Function 'calloc' not found in [exslt] (FFI::NotFoundError) from C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/1.8/ ffi/library. rb:50:in `attach_function' from C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/1.8/ ffi/library. rb:48:in `each' from C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/1.8/ ffi/library. rb:48:in `attach_function' from C:/Program Files/JRuby/jruby-1.3.1/lib/ruby/gems/1.8/gems/ nokogiri- 1.3.2-x86-mswin32/lib/nokogiri/ffi/libxml.rb:54 from C:/Program Files/JRuby/jruby-1.3.1/lib/ruby/gems/1.8/gems/ nokogiri- 1.3.2-x86-mswin32/lib/nokogiri/ffi/libxml.rb:31:in `require' from C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/ site_ruby/1.8/ru bygems/custom_require.rb:31:in `require' from C:/Program Files/JRuby/jruby-1.3.1/lib/ruby/gems/1.8/gems/ nokogiri- 1.3.2-x86-mswin32/lib/nokogiri.rb:10 from C:/Program Files/JRuby/jruby-1.3.1/lib/ruby/gems/1.8/gems/ nokogiri- 1.3.2-x86-mswin32/lib/nokogiri.rb:36:in `require' from C:/Program Files/JRuby/jruby-1.3.1/bin/../lib/ruby/ site_ruby/1.8/ru bygems/custom_require.rb:36:in `require' from TestNokogiri.rb:3
tenderlove
Wed Aug 12 11:45:21 -0700 2009
| link
This was fixed here: 70ad006
-
Windows Nokogiri 1.3.3 - ALWAYS LoadError: no such file to load -- nokogiri/1.9/nokogiri
1 comment Created 5 months ago by ocoolioNo matter how you set up, on every Windows system the newest Nokogiri (1.3.3) fails to load with the following error (both 1.9 and 1.8 Ruby):
irb(main):001:0> require 'rubygems'
=> false irb(main):002:0> require 'nokogiri'
LoadError: no such file to load -- nokogiri/1.9/nokogiri
from c:/Ruby/lib/ruby/gems/1.9.1/gems/nokogiri-1.3.3-x86-mingw32/lib/nokogiri/nokogiri.rb:1:in `require'
from c:/Ruby/lib/ruby/gems/1.9.1/gems/nokogiri-1.3.3-x86-mingw32/lib/nokogiri/nokogiri.rb:1:in `'
from c:/Ruby/lib/ruby/gems/1.9.1/gems/nokogiri-1.3.3/lib/nokogiri.rb:12:in `require'
from c:/Ruby/lib/ruby/gems/1.9.1/gems/nokogiri-1.3.3/lib/nokogiri.rb:12:1.3.2 works great
Comments
tenderlove
Tue Jul 28 13:20:50 -0700 2009
| link
Sorry about that. I uploaded the wrong gems. Please uninstall that version and try again.
-
Wondering about the possibility of things like to_xml/to_xhtml/etc... being able to write/stream to an IO or possibly even a proc (chunks at a time) instead of waiting for the entire document to be appended to a string and finally returned to the caller.
Would offer the same benefits as parsing from an IO or using a push-parser model, but for encoding as well. This would be especially useful in EventMachine apps, but would also be nice to be able to stream-encode directly to a socket or any other IO.Comments
tenderlove
Mon Jul 27 10:25:51 -0700 2009
| link
It already does what you describe:
http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Node.html#M000316
brianmario
Mon Jul 27 10:35:33 -0700 2009
| link
Awesome!
Any chance of supporting a callback or something for chunked streaming to the caller?
This would be especially useful in an EventMachine app.
tenderlove
Mon Jul 27 10:42:43 -0700 2009
| link
No. You can write a custom IO object that responds to "write" and "close":
class MyIO def initialize &write @write = write end def write data @write.call data end def close; end end doc = Nokogiri::XML(File.open(ARGV[0])) doc.write_to(MyIO.new { |data| puts data })Then you can do whatever you'd like.
brianmario
Mon Jul 27 10:56:45 -0700 2009
| link
that'll work, thanks
-
1 comment Created 5 months ago by tenderlove1.4.0xDocument encoding should be yielded on SAX parsingtenderlovexJust what the title says. :-)
Comments
tenderlove
Sun Aug 09 19:39:48 -0700 2009
| link
adding an xml declaration SAX callback handler. closed by b1d7523
-
1 comment Created 5 months ago by tenderloveNokogiri.parse always assumes XML with the document is an IOtenderlovexI think we should switch this to assume it is an HTML document when an IO is provided
Comments
tenderlove
Sun Jul 26 18:54:51 -0700 2009
| link
Nokogiri.parse will assume HTML if parameter is an IO object. closed by 42d3548
-
1 comment Created 5 months ago by tenderlove1.4.1xFigure out how to attach a DTD to a documenttenderlovexI would like to be able to attach a DTD to an HTML document so that the id() xpath function works.
Comments
tenderlove
Tue Dec 01 21:00:01 -0800 2009
| link
Blech.
-
6 comments Created 5 months ago by latompaffixIs jRuby/FFI leaking memory?flavorjonesxRun the following under jruby 1.3.1 and nokogiri 1.3.2
On my machine, I get up to a "ran 800 times", then my machine is starting to run really slow.
The test eats up ~500Mb right off the bat, which leads me to believe memory is leaking.require 'rubygems' require 'nokogiri' require 'open-uri' REPORT_EVERY=100 NUM_THREADS=1 def test_nokogiri() threads=[] ; a =-1 NUM_THREADS.times do threads << Thread.new('some_thread') do |t| xml = open('http://railstips.org/assets/2008/8/9/timeline.xml').read while true do doc = Nokogiri::HTML(xml) (p "ran #{a} times") if ((a+=1) % REPORT_EVERY == 0) end end end threads.each { |aThread| aThread.join } end test_nokogiriComments
flavorjones
Fri Jul 24 23:10:01 -0700 2009
| link
I concur that this appears to be a memory leak. I would like to point out, though, that when running the same code on ruby-ffi (MRI), there is no leak.
This leads me to quietly, gently, and without proof, suggest that perhaps this is a JRuby/FFI problem.
I will put together some tests to try to reproduce these results in a vanilla (non-Nokogiri) case, and perhaps narrow the search for causes.
flavorjones
Sun Jul 26 23:55:30 -0700 2009
| link
I have created a self-contained self case demonstrating the problem, which is available at http://gist.github.com/156081
I've opened a JIRA ticket for the JRuby team: http://jira.codehaus.org/browse/JRUBY-3832
I'll leave this ticket open for a few days, and will provide updates when I hear from the JRuby team.
I believe it's a ffi/jruby problem too.
I get the same kind of problem if replace
doc = Nokogiri::HTML(xml)with
doc=Nokogiri::LibXML.xmlReadMemory(xml, xml.length, nil,nil, nil)which goes straight to the "attached" libXML function.
flavorjones
Mon Jul 27 19:43:13 -0700 2009
| link
See an update from Wayne (JFFI team) here:
This fix is available now if you want to build JFFI yourself, or else you can wait until JRuby 1.3.2 (or 1.4.0). Details at the above URL.
Closing this ticket.
flavorjones
Sun Sep 06 18:24:41 -0700 2009
| link
Just an FYI, the latest JFFI does not appear to fix this issue. I've updated the JFFI ticket.
flavorjones
Sat Sep 12 14:57:35 -0700 2009
| link
Another update: current JFFI addresses this issue. Though it is still possible, with aggressive-enough memory usage, to get an OOM memory condition, it is vanishingly likely that using Nokogiri will cause a full-blown OOM.
The original test code no longer suffers from the monstrous memory leak. I'm considering this issue 'closed', for realsies. Let me know if you're still experiencing issues after upgrading to the latest JFFI.
-
I have this quick demo code to show the issue.
Comments
-
3 comments Created 5 months ago by flavorjonesimprove performance building large documents with XML::Builder1.3.3xsome benchmark numbers:
user system total real nokogiri: 1000 docs, 10 stories, to string 5.180000 0.470000 5.650000 ( 5.663716) nokogiri: 100 docs, 100 stories, to string 5.500000 0.480000 5.980000 ( 5.984575) nokogiri: 10 docs, 1000 stories, to string 7.900000 0.490000 8.390000 ( 8.409311) nokogiri: 1 docs, 10000 stories, to string 23.530000 0.500000 24.030000 ( 24.247925)Comments
flavorjones
Fri Jul 17 17:25:05 -0700 2009
| link
removing O(n) penalty in node new/unlink/reparent by replace xmlXPathNodeSet with a hash. closed by f34f3bd.
flavorjones
Fri Jul 17 17:27:17 -0700 2009
| link
new benchmarks:
user system total real nokogiri: 1000 docs, 10 stories, to string 4.980000 0.530000 5.510000 ( 5.546052) nokogiri: 100 docs, 100 stories, to string 5.360000 0.520000 5.880000 ( 5.900826) nokogiri: 10 docs, 1000 stories, to string 6.360000 0.560000 6.920000 ( 6.973381) nokogiri: 1 docs, 10000 stories, to string 6.270000 0.500000 6.770000 ( 6.783409)
flavorjones
Fri Jul 17 17:29:14 -0700 2009
| link
and just for posterity, here is the same documents generated by Builder::XmlMarkup (the Rails default builder):
builder: 1000 docs, 10 stories, to string 13.570000 1.300000 14.870000 ( 14.887572) builder: 100 docs, 100 stories, to string 13.470000 1.270000 14.740000 ( 14.748797) builder: 10 docs, 1000 stories, to string 13.210000 1.310000 14.520000 ( 14.544887) builder: 1 docs, 10000 stories, to string 13.370000 1.280000 14.650000 ( 14.889584) -
2 comments Created 5 months ago by theballnamespaces stripped out of tag namesnamespace-confusionx>> a = Nokogiri::HTML.parse(%(<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml"><body><fb:login-button></fb:login-button></body></html>)) => <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml"><body><login-button></login-button></body></html>
Notice that the "fb:" has been stripped out of the <fb:login-button> tag.
I get this behavior on v1.3.2.
Comments
flavorjones
Fri Jul 17 18:07:35 -0700 2009
| link
This is the proper behavior. In this XML, the node name is login-button, and the node is namespaced under http://www.facebook.com/2008/fbml
you can see this by running the following (after your setup above):
a.xpath('//body').first.children.first # => <login-button></login-button> a.xpath('//body').first.children.first.namespaces.inspect # => "{\"xmlns:fb\"=>nil}" a.xpath('//login-button', 'fb' => 'http://www.facebook.com/2008/fbml') # => <login-button></login-button>you can read more about XML namespaces, XPath and Nokogiri at http://tenderlovemaking.com/2009/04/23/namespaces-in-xml/
tenderlove
Fri Jul 17 18:08:32 -0700 2009
| link
You need to use the XML parser if you want to retain namespaces. The html parser does HTML 4.0, which doesn't have namespaces so libxml2 strips them.
-
2 comments Created 5 months ago by mperhamXpath search works with Hpricot, fails with Nokogirinamespace-confusionxI'll attach the test case.
Correct result is 10
[Hpricot] size = 10 [Nokogiri] size = 0Environment:
nokogiri: 1.3.2
warnings: []libxml:
compiled: 2.7.3 loaded: 2.7.3 binding: extensionComments
Test case: http://gist.github.com/149234
flavorjones
Fri Jul 17 17:58:14 -0700 2009
| link
you need to properly use namespaces. Nokogiri supports standard XPath, and Hpricot does not.
puts "[Nokogiri] size = #{xml.xpath('//xmlns:Video').size}" # => 10the above gives you the right answer of 10.
you can read more about this on tenderlove's blog, at http://tenderlovemaking.com/2009/04/23/namespaces-in-xml/
-
Nokogiri 1.3.2 fails to build, yet 1.3.1 works
2 comments Created 5 months ago by maxendpoint'gem install nokogiri' currently fails. Yet 'gem install nokogiri -v 1.3.1' is working just fine. I was unable to determine the cause, there was no detailed log or output.
It appears to be some interaction/confusion with rake-compiler.
This is on CentOS 5 with ruby 1.8.5.Comments
I'm finding the same thing. CentOS 5.1 with Ruby 1.8.5.
- 1.3.1 installs fine with both 1.8.5 and REE 1.8.6
- 1.3.2 installs fine with REE but breaks with 1.8.5
I have no idea what is wrong because there is no output in the ext/nokogiri directory aside from an unhelpful gem_make.out.
tenderlove
Wed Jul 22 22:21:37 -0700 2009
| link
Yes, the problem is that the mkmf api changed between 1.8.5 and 1.8.6. Unfortunately nokogiri is using functionality that is available in 1.8.6 and not 1.8.5. I just learned at Ruby Kaigi that 1.8.5 is completely unsupported now, so I recommend you upgrade. But I have fixed this problem here: 1f8d082
Everything should work with 1.3.3. I'll try to make a release this weekend.
-
Segmentation fault ruby 1.9.1p0 (2009-01-30 revision 21907) [x86_64-linux]
2 comments Created 5 months ago by ibcI've created a syntactically wrong XSD file and run:
Nokogiri::XML::Schema(File.read(XSD))In Ruby 1.9.1 compiled from sources in Linux Ubuntu 64 bits, I get a segmentfault:
ruby1.9 nokogiri_02.rb /usr/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.3.2/lib/nokogiri/xml/schema.rb:37:in `from_document': Element '{http://www.w3.org/2001/XMLSchema}element': The content is not valid. Expected is (annotation?, ((simpleType | complexType)?, (unique | key | keyref)*)). (Nokogiri::XML::SyntaxError)
from /usr/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.3.2/lib/nokogiri/xml/schema.rb:37:in `new' from /usr/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.3.2/lib/nokogiri/xml/schema.rb:8:in `Schema' from /home/ibc/Proyectos/Ruby-XCAP-Client/lib/pres_rules.rb:11:in `<class:PresRules>' from /home/ibc/Proyectos/Ruby-XCAP-Client/lib/pres_rules.rb:5:in `<module:XCAPClient>' from /home/ibc/Proyectos/Ruby-XCAP-Client/lib/pres_rules.rb:1:in `<top (required)>' from /home/ibc/Proyectos/Ruby-XCAP-Client/xcap-client.rb:11:in `require' from /home/ibc/Proyectos/Ruby-XCAP-Client/xcap-client.rb:11:in `<top (required)>' from nokogiri_02.rb:3:in `require' from nokogiri_02.rb:3:in `<main>':399: [BUG] Segmentation fault
ruby 1.9.1p0 (2009-01-30 revision 21907) [x86_64-linux]-- control frame ----------
c:0001 p:0000 s:0002 b:0002 l:000648 d:000648 TOP :399
-- Ruby level backtrace information-----------------------------------------
-- C level backtrace information ------------------------------------------- 0x4e882b ruby1.9(rb_vm_bugreport+0x3b) [0x4e882b]
0x5168b0 ruby1.9 [0x5168b0]
0x516a21 ruby1.9(rb_bug+0xb1) [0x516a21]
0x4940df ruby1.9 [0x4940df]
0x7f4f8188f080 /lib/libpthread.so.0 [0x7f4f8188f080]
0x4431de ruby1.9(rb_obj_is_kind_of+0x12e) [0x4431de]
0x41aa77 ruby1.9(ruby_cleanup+0x1d7) [0x41aa77]
0x41ab5a ruby1.9(ruby_run_node+0x3a) [0x41ab5a]
0x417f3d ruby1.9(main+0x4d) [0x417f3d]
0x7f4f80c635a6 /lib/libc.so.6(__libc_start_main+0xe6) [0x7f4f80c635a6]
0x417e29 ruby1.9 [0x417e29]
This doesn't occur with Ruby 1.8.
Comments
tenderlove
Wed Jul 15 18:10:22 -0700 2009
| link
We don't support 1.9.1-p0. p0 is way too buggy for us to support.
Please upgrade to 1.9.1-p129 and let us know if it still breaks! Also, please include the XSD file in the bug report.
-
3 comments Created 5 months ago by darryl1.3.3xReader segmentation fault on each with attributesflavorjonesxeach results in segmentation fault
nokogiri 1.3.2
ruby 1.8.7
linux and macossteps to reproduce
require 'nokogiri'
open('testfile.xml', 'w'){|f|
f.write("\n");
20000.times{|i| f.write("
f.write("")}Nokogiri::XML::Reader.from_io(open('testfile.xml')).each{|e| puts "#{e.name} #{e.attributes}"}
For me it usually segs within the first 3000 ids.
Comments
flavorjones
Sun Jul 12 21:19:07 -0700 2009
| link
reproduced:
==7301== Invalid write of size 4 ==7301== at 0x4055AC2: rb_ary_store (array.c:409) ==7301== by 0x4055B6F: rb_ary_push (array.c:474) ==7301== by 0x6750EB6: Nokogiri_wrap_xml_node (xml_node.c:864) ==7301== by 0x6751547: Nokogiri_xml_node_properties (xml_node.c:876) ==7301== by 0x6754F57: attribute_nodes (xml_reader.c:184) ==7301== by 0x406D031: call_cfunc (eval.c:5752) ==7301== Address 0x6c89bc4 is 4,484 bytes inside a block of size 4,912 free'd ==7301== at 0x4024B4A: free (vg_replace_malloc.c:323) ==7301== by 0x408DA7C: garbage_collect (gc.c:1242) ==7301== by 0x408DDA4: rb_newobj (gc.c:436) ==7301== by 0x4053116: ary_alloc (array.c:104) ==7301== by 0x4053193: ary_new (array.c:119) ==7301== by 0x4058453: flatten (array.c:3138) ==7301== by 0x4058814: rb_ary_flatten (array.c:3252) ==7301== by 0x406D04F: call_cfunc (eval.c:5749)
flavorjones
Sun Jul 12 21:20:24 -0700 2009
| link
awesome:
/usr/lib/ruby/gems/1.8/gems/nokogiri-1.3.2/lib/nokogiri/ffi/structs/xml_document.rb:42:in `_id2ref': 0xdbd4a5ac is recycled object (RangeError) from /usr/lib/ruby/gems/1.8/gems/nokogiri-1.3.2/lib/nokogiri/ffi/structs/xml_document.rb:42:in `ruby_doc' from /usr/lib/ruby/gems/1.8/gems/nokogiri-1.3.2/lib/nokogiri/ffi/xml/reader.rb:46:in `attribute_nodes' from /usr/lib/ruby/gems/1.8/gems/nokogiri-1.3.2/lib/nokogiri/xml/reader.rb:52:in `attributes' from /home/mike/foo.rb:11
flavorjones
Sun Jul 12 21:48:19 -0700 2009
| link
keeping a reference to the document in Reader, to prevent GC. closed by 159094a.
-
10 comments Created 5 months ago by Serabe1.3.3xNokogiri::XML::Document#add_child creates a two-nodes-rooted xml.tenderlovexTake a look at:
Comments
flavorjones
Sun Jul 12 22:11:38 -0700 2009
| link
Patient: "It hurts when I do this." (raises arm)
Doctor: "Then don't do that."In all seriousness, what do you expect the behavior to be?
In my opinion, if you ask libxml2 to add a node somewhere, and it adds that node, then that's not a bug in the library. That's a bug in your application.
tenderlove
Sun Jul 12 23:03:16 -0700 2009
| link
Hah! I think we either need to remove that method or deletage to the root node.
Mike, te code is extracted from one test in Nokogiri's test/xml/test_node. Furthermore, I would expect from a XML::Document class not to let me create something that it's not a xml document at all (see http://www.w3.org/TR/2008/REC-xml-20081126/#NT-document ). On the other hand, if I wanted a two rooted element, I can use a DocumentFragment, which doesn't have this restriction.
What do I expect the behavior to be? As tenderlove wrote, or remove that method, or add it to the root node. I would add another option, the one suggested by dom, raising an exception ( http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-1950641247 ). Anyway, I would rather see one of the two first.
tenderlove
Sat Jul 25 18:57:00 -0700 2009
| link
I will fix this before my release tomorrow. I'm torn between raising an exception and delegating to the root node. libxml2 actually allows the document to have "children". The DTD is a child of the document for example. That means that adding a child to a document is possible, just not when that child is a Node type.
I think for now, I'll remove the method. Either way, I'll close this ticket tomorrow.
tenderlove
Sat Jul 25 22:24:43 -0700 2009
| link
Okay. I can't remove the add_child method. I'm still not sure what the behavior should be. The XML spec says that you can have only one root node. But that doesn't mean the document can't have many children. For example, here is a perfectly valid document that has multiple children (3 specifically) and only one root node:
<?xml version="1.0"?> <!-- hello world --> <root/> <!-- I love comments! -->What should the right behavior for add_child be? Should it raise an exception is someone tries to pass an Element node (not a comment or a DTD)? What if there is no root node on the document yet? Should it let you add one element node as the root, then raise an exception if a second is added?
I understand the feeling of not wanting XML::Document to allow you to create invalid XML documents, but just constructing an XML::Document creates an invalid document. It has no root! Should you not be able to to_xml a newly constructed document?
I'm starting to convince myself that we should leave the behavior as is. If you construct an invalid XML document, well that's your fault. I'm definitely willing to change the current behavior, but all roads I've traveled down while looking at this problem indicate to leave it alone.
Just as I said, I would suggest to raise an exception as suggested here http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-184E7107
Anyway, it would be impossible for me to implement in Java (would be easier to reimplement all libxml2 in Java that adding that change) due to JAXP implements DOM and, as consequence, raise and exception. If the behaviour of this feature depends on implementation, good for me. I really cannot get this implemented in Java.
flavorjones
Sun Jul 26 08:46:14 -0700 2009
| link
I'm not trying to be difficult, but I again have to point out that nobody seems to care that libxml allows this behavior. I see no reason why nokogiri (or nokogiri-java) needs to be the gatekeeper of sanity here.
If someone's application depends on adding multiple root nodes to an XML doc, and that doesn't port properly to nokogiri-java, who cares? The code should be fixed.
I am opposed to adding this sort of sanity-checking logic to Nokogiri.
Let's simply call this "undefined behavior", omit (or remove) all tests for this "feature", and move on with our lives. I think we all have much more important stuff we could be working on.
tenderlove
Sun Jul 26 12:37:13 -0700 2009
| link
I agree with Mike. DOM implementation is not nokogiri's responsibility.
Serabe: if there are tests that depend on this "feature", then we should remove them or change them to not depend on this behavior. I agree that it's a strange behavior, but I think it is implementation specific.
I will track down all tests that are doing this and fix them.
tenderlove
Sun Jul 26 17:56:24 -0700 2009
| link
raising an exception when multiple roots are added, closed by a3eb811
-
1 comment Created 5 months ago by adsmart1.3.3xinner_html with parens gives incorrect resultsflavorjonesxThe following test fails
require 'rubygems' require 'nokogiri' out = Nokogiri::HTML.parse("") frag = Nokogiri::XML::Node.new("p", out) foo = "this is a test (<em>Let's see</em>)" frag.inner_html = foo puts frag.inner_html == foo #false puts frag.inner_html #)<em>Let's see</em>this is a test ( frag2 = Nokogiri::XML::Node.new("p", out) bar = "this is a test (<em>Let's see</em>)" frag2.inner_html = bar puts frag2.inner_html == bar #false puts frag2.inner_html #)<em>Let's see</em>this is a test ( frag3 = Nokogiri::XML::Node.new("p", out) foobar = "this is a test \(<em>Let's see</em>\)" frag3.inner_html = foobar puts frag3.inner_html == foobar #false puts frag3.inner_html #)<em>Let's see</em>this is a test ( # THIS IS STRANGE frag4 = Nokogiri::XML::Node.new("p", out) foo4 = "\ethis is a test (<em>Let's see</em>)" frag4.inner_html = foo4 puts frag4.inner_html == foobar #false puts frag4.inner_html #this is a test (Let's see)The presence of the parenthesis "()" causes things to fall apart. The inner html come out of order
Comments
flavorjones
Fri Jul 10 15:23:43 -0700 2009
| link
Node#inner_html= no longer reverses the elements passed to it. closed by b973c55.
-
2 comments Created 5 months ago by tommorrisXPath is not namespace awarenamespace-confusionxNokogiri currently does not do namespaced XPath queries properly.
Take the following XML:
sampledoc = <<-EOF;
<?xml version="1.0" ?>
;
<rdf:RDF>
<rdf:Description rdf:about="http://example.org/one">;
<ex:name>Foo</ex:name>
</rdf:Description>
</rdf:RDF>
<rdf:RDF>
<rdf:Description rdf:about="http://example.org/two">;
<ex:name>Bar</ex:name>
</rdf:Description>
</rdf:RDF>
EOF(I wrote it to test Reddy, an RDF library I've written which currently works on top of libxml-ruby, but which I'd like to port to Nokogiri so that I can have it run on JRuby thanks to the nokogiri support thanks to FFI in JRuby 1.3.0.)
Now, according to the RDF/XML Syntax specification, an RDF/XML document can be parsed from multiple root rdf:RDF nodes. This document gives this example: it has two rdf:RDF nodes, correctly namespaced in the http://www.w3.org/1999/02/22-rdf-syntax-ns# namespace. But because Nokogiri treats namespaces as basically nothing more than mildly clever attributes, the following XPath query fails:
Nokogiri::XML(sampledoc).xpath("rdf:RDF", 'rdf' => "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
Thanks for fixing the .namespace issue so it returns a namespace object rather than a prefix. Prefixes have no semantic value - the namespace URIs are what matters.
Comments
tenderlove
Sat Jul 04 14:41:33 -0700 2009
| link
Nokogiri's xpath function is quite aware of namespaces. In fact, nokogiri's xpath function is a thin wrapper around libxml2, so I'm curious why you think it treats namespaces as "clever attributes"?
Your sample document does not declare your rdf tags inside a namespace. Please try a sample RDF document that declares the tags using namespaces. Here is a good example:
http://www.w3schools.com/rdf/rdf_example.asp
Notice how the sample rdf document provided by w3schools declares it's namespaces, where your document does not.
Also, xml documents may only have one root node. If there are multiple nodes, it is not a legal XML document:
http://www.w3.org/TR/REC-xml/#dt-root
According to the RDF/XML spec, it may have multiple rdf:RDF, but they must be inside a root tag:
http://www.w3.org/TR/rdf-syntax-grammar/#section-grammar-summary
Also see section 7.2.8
-
6 comments Created 5 months ago by pdlug1.3.3xCreating a new document by with a root node cloned from another causes segfaulttenderlovexWhen trying to create a new document from part of an existing document via #dup a segfault results:
doc = Nokogiri::XML('test')
doc2 = Nokogiri::XML::Document.new
doc2.root = doc.root.dup(1)
Comments
Should have noted that this occurs both on Mac OS X (libxml2 2.7.3) and Gentoo Linux (libxml2 2.7.2).
flavorjones
Sun Jul 12 22:04:44 -0700 2009
| link
This is more of the node dictionary allocation issue, which occurs when nodes have resources owned by another document. In this case, when the node is dup()ed, it still references dictionary strings owned by the original document. At GC time, things blow up.
flavorjones
Sun Jul 12 22:05:26 -0700 2009
| link
Basically, libxml2 does not support moving nodes from one document to another, nor does it support moving dupes of a node to another document. Sigh.
tenderlove
Sun Jul 12 23:04:06 -0700 2009
| link
But this shouldn't be that problem, right? I thought we copied the tree on dups?
flavorjones
Mon Jul 13 04:07:36 -0700 2009
| link
This is that problem! Really. I think if we copy the document, everything works correctly. But copying a node and jamming it into another doc is definitely going to break libxml.
tenderlove
Wed Jul 15 18:07:50 -0700 2009
| link
moving roots around will copy them and gc old roots. closed by 10f5710
-
20 comments Created 6 months ago by flavorjonesffixFFI: Invalid callback parameter type: STRING on FreeBSD/amd64flavorjonesxOriginally at http://jira.codehaus.org/browse/JRUBY-3781
FreeBSD 7.2/amd64, jruby 1.3.1, nokogiri 1.3.2-java, libxml2-2.6.32
Loading nokogiri fails with:
irb(main):002:0> require 'nokogiri' ArgumentError: Invalid callback parameter type: STRING from /home/lovec/bin/jruby/lib/ruby/1.8/ffi/ffi.rb:120:in `create_invoker' from /home/lovec/bin/jruby/lib/ruby/1.8/ffi/library.rb:50:in `attach_function' from /home/lovec/bin/jruby/lib/ruby/1.8/ffi/library.rb:48:in `each' from /home/lovec/bin/jruby/lib/ruby/1.8/ffi/library.rb:48:in `attach_function' from /home/lovec/bin/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3.2-java/lib/nokogiri/ffi/libxml.rb:138 from /home/lovec/bin/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3.2-java/lib/nokogiri/ffi/libxml.rb:31:in `require' from /home/lovec/bin/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from /home/lovec/bin/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3.2-java/lib/nokogiri.rb:10 from /home/lovec/bin/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3.2-java/lib/nokogiri.rb:36:in `require' from /home/lovec/bin/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require' from (irb):3I .inspected the parameters passed to the function that fails if that helps
function => #<Library Symbol library=xml2 symbol=xmlSaveToIO address=0x82b702da0> args => [#<FFI::CallbackInfo [ pointer, string, int32 ], int32>, #<FFI::CallbackInfo [ pointer ], int32>, #<FFI::Type::Builtin:POINTER size=8 alignment=8>, #<FFI::Type::Builtin:STRING size=8 alignment=8>, #<FFI::Type::Builtin:INT32 size=4 alignment=4>] ret => #<FFI::Type::Builtin:POINTER size=8 alignment=8> options => {:convention=>:default, :type_map=>nil, :enums=>nil} FFI::Invoker.new(function, args, find_type(ret), options)Comments
flavorjones
Tue Jun 30 04:34:45 -0700 2009
| link
We need to repro on a 64-bit machine. Sigh.
having the same problem. would love to use nokogiri but have to use jruby and deploy apps using warbler. and i get the same error every time i tried. hope someone can help.
we would need that fix too. We have 64-bit machines, but no idea how to fix it. Maybe if you provide a test framework or a patched version we could try to reproduce and give more input on this issue.
Irg, showstopper for me! I would love to see that one fixed.
flavorjones
Tue Jul 21 20:06:59 -0700 2009
| link
Hey kids, we hear you loud and clear. I've just got to find a 64-bit machine to repro on. Will try a little harder.
flavorjones
Fri Jul 24 22:40:16 -0700 2009
| link





NodeSets are now always decorated. Added lots of test coverage to node set decoration and document, and cleaned up the implementation. Closed by 56e8c96.