-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on parse an URL #5
Comments
That's super weird - Emacs is retrieving the page over the network correctly, but that element is stripped when elquery calls (search "\"content" ; changing to "\"menu" reveals the <div id="menu">
(prin1-to-string (with-temp-buffer
(insert-string (with-current-buffer (url-retrieve-synchronously "https://nginx.org/en/docs/dirindex.html")
(buffer-string)))
(let ((tree (libxml-parse-html-region (point-min) (point-max))))
tree)))) Could this be an issue with EDIT: Yeah, removing the two billion nodes within |
I tested on this too, it's striped (confirmed). I use Python retrieve this, Python side is correct. |
No problem; thanks for pointing it out! |
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31427#8
|
It sounds like simply setting the temp buffer to unibyte would fix this. 59f93b8 appears to work for both multibyte and unibyte strings. |
Yes, confirmed. |
It could fail for web pages which are encoded in something other than utf-8. Although utf-8 is probably the most common encoding. Also, you might have trouble passing multibyte strings now (e.g., if the source of the string is not from a web page). (let ((string "<html><body>α</body></html>"))
(with-temp-buffer
(set-buffer-multibyte nil)
(insert string)
(libxml-parse-html-region (point-min) (point-max))))
;=> (html nil (body nil "\261"))
(let ((string "<html><body>α</body></html>"))
(with-temp-buffer
(insert string)
(libxml-parse-html-region (point-min) (point-max))))
;=> (html nil (body nil "α")) |
Can try to detect buffer string with |
BTW @AdamNiederer In function |
Oh, looks like it's deprecated in 25.x. Thanks for the heads up.
I tried converting the unibyte string with |
Right, that's why I suggested signaling an error or warning. You could possibly decode with |
It got nil, but
<div id="content">
is not empty.The text was updated successfully, but these errors were encountered: