Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.8.1 breaks simple XML parsing #13444

Closed
GaetanLepage opened this issue May 7, 2023 · 6 comments
Closed

1.8.1 breaks simple XML parsing #13444

GaetanLepage opened this issue May 7, 2023 · 6 comments
Labels
kind:bug kind:regression Something that used to correctly work but no longer works status:invalid topic:stdlib:serialization

Comments

@GaetanLepage
Copy link

Bug Report

Hello,
The following code works fine witch crystal 1.8.0 but breaks with 1.8.1:

str = <<-XML
<foo>ThisIsFoo</foo>
<bug:bar>ThisIsBar</bug:bar>
XML

require "xml"

rss = XML.parse_html(str)

foo = rss.xpath_node("//foo").not_nil!
pp foo.content

bar = rss.xpath_node("//bar").not_nil!
pp bar.content

Crystal 1.8.0:

"ThisIsFoo"
"ThisIsBar"

Crystal 1.8.1:

"ThisIsFoo"
Unhandled exception: Nil assertion failed (NilAssertionError)
  from /usr/lib/crystal/nil.cr:113:7 in 'not_nil!'
  from /usr/lib/crystal/nil.cr:109:3 in 'not_nil!'
  from eval:13:7 in '__crystal_main'
  from /usr/lib/crystal/crystal/main.cr:115:5 in 'main_user_code'
  from /usr/lib/crystal/crystal/main.cr:101:7 in 'main'
  from /usr/lib/crystal/crystal/main.cr:127:3 in 'main'
  from /usr/lib/libc.so.6 in '??'
  from /usr/lib/libc.so.6 in '__libc_start_main'
  from /home/crystal/.cache/crystal/crystal-run-eval.tmp in '_start'
  from ???
@Blacksmoke16 Blacksmoke16 added topic:stdlib:serialization kind:regression Something that used to correctly work but no longer works labels May 7, 2023
@Blacksmoke16
Copy link
Member

Blacksmoke16 commented May 7, 2023

What version of libxml2 do you have installed? I'm thinking this is more so due to changes in the underlying linked/installed lib versus Crystal itself. Given it reproduces for me on 1.7.3 as well.

@SamantazFox
Copy link

I tried with different containers, it breaks with containers tagged 1.8.x-alpine but not with 1.8.x

As discussed over Matrix, might be caused by the underlying libxml version:

$ echo 'FROM docker.io/crystallang/crystal:1.8.0-alpine \nCMD ["apk", "list", "-i"]' | docker build -f - -t crystal-test && docker run crystal-test | grep xml
[snip]
libxml2-2.10.4-r0 x86_64 {libxml2} (MIT) [installed]
libxml2-static-2.10.4-r0 x86_64 {libxml2} (MIT) [installed]
libxml2-utils-2.10.4-r0 x86_64 {libxml2} (MIT) [installed]
libxml2-dev-2.10.4-r0 x86_64 {libxml2} (MIT) [installed]
$ echo 'FROM docker.io/crystallang/crystal:1.8.0 \nCMD ["apt", "list", "--installed"]' | docker build -f - -t crystal-test && docker run crystal-test | grep xml
[snip]
libxml2-dev/now 2.9.13+dfsg-1ubuntu0.2 amd64 [installed,local]
libxml2/now 2.9.13+dfsg-1ubuntu0.2 amd64 [installed,local]

@BlobCodes
Copy link
Contributor

https://gitlab.gnome.org/GNOME/libxml2/-/compare/v2.10.4...v2.9.14?from_project_id=1665&straight=false

That link is wrong.

The 2.11 branch has been created during the release of v2.9.13.
Also, you set v2.9.14 as source and v2.10.4 as target, but it should be the other way around.

TL;DR: Your link only shows the commits which were added to v2.9.14 since v2.9.13.

This link shows the actual changes between the two versions:
https://gitlab.gnome.org/GNOME/libxml2/-/compare/v2.9.13...v2.10.4?from_project_id=1665&straight=false

@BlobCodes
Copy link
Contributor

I tested this with v2.10.3 (because that's what I had installed before I updated), and it works.
Also breaks for me with v2.10.4

This seems to be a documented (and purposeful) regression:
https://gitlab.gnome.org/GNOME/libxml2/-/blob/3463063001f36c16e5f6ce9ad33cd12a376fc874/NEWS#L360

v2.10.4: Apr 11 2023

[...]

### Regressions

- SAX2: Ignore namespaces in HTML documents

[...]

When using parse_html, namespaces are now completely ignored.
You can especially see this when printing the XML::Node to console:

<!-- Output from v2.10.3 -->
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <body><foo>ThisIsFoo</foo>
<bar>ThisIsBar</bar></body>
</html>

<!-- Output from v2.10.4 -->
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <body><foo>ThisIsFoo</foo>
<bug:bar>ThisIsBar</bug:bar></body>
</html>

I'd say this issue can be closed, not really related to crystal.

@Blacksmoke16 Blacksmoke16 closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2023
@Blacksmoke16
Copy link
Member

On second thought, we could possibly consider re-building/distributing new images/binaries with a patched version of libXML if we think that would be worthwhile. Granted we'll probably be limited on the available versions within the base images, and/or depending on how Crystal was installed might be up to the user to upgrade the version on their own machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug kind:regression Something that used to correctly work but no longer works status:invalid topic:stdlib:serialization
Projects
None yet
Development

No branches or pull requests

4 participants