Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module 'lxml.etree' has no attribute '_ElementStringResult' error since v0.45.18 #2312

Closed
searchjaunt opened this issue Apr 17, 2024 · 38 comments · Fixed by #2313
Closed

module 'lxml.etree' has no attribute '_ElementStringResult' error since v0.45.18 #2312

searchjaunt opened this issue Apr 17, 2024 · 38 comments · Fixed by #2313
Assignees
Labels
bug Something isn't working

Comments

@searchjaunt
Copy link

DO NOT USE THIS FORM TO REPORT THAT A PARTICULAR WEBSITE IS NOT SCRAPING/WATCHING AS EXPECTED

This form is only for direct bugs and feature requests todo directly with the software.

Please report watched websites (full URL and any settings) that do not work with changedetection.io as expected IN THE DISCUSSION FORUMS or your report will be deleted

CONSIDER TAKING OUT A SUBSCRIPTION FOR A SMALL PRICE PER MONTH, YOU GET THE BENEFIT OF USING OUR PAID PROXIES AND FURTHERING THE DEVELOPMENT OF CHANGEDETECTION.IO

THANK YOU

Describe the bug
A huge amount of checks return module 'lxml.etree' has no attribute '_ElementStringResult'. Not all though, but the common factor with the errors is that website returning errors might have all an xpath filter. Not 100% sure though.

Version
v0.45.18

To Reproduce

Steps to reproduce the behavior:
Just do a check of a website with an xpath filter

! ALWAYS INCLUDE AN EXAMPLE URL WHERE IT IS POSSIBLE TO RE-CREATE THE ISSUE - USE THE 'SHARE WATCH' FEATURE AND PASTE IN THE SHARE-LINK!

Expected behavior
No errors and showing the difference with the last check

Screenshots
image

Desktop (please complete the following information):
not applicable

Smartphone (please complete the following information):
not applicable

Additional context
Seems to be reported in https://forum.cloudron.io/topic/11456/module-lxml-etree-has-no-attribute-_elementstringresult too

@dgtlmoon
Copy link
Owner

dgtlmoon commented Apr 17, 2024 via email

@searchjaunt
Copy link
Author

Thx for the quick respons.
Sorry for not mentioning it, but it runs in a Docker container indeed.
A docker exec -it XXX pip3 list returns
lxml 5.2.1

@dgtlmoon
Copy link
Owner

Ok I can reproduce it, it is limited to xpath1 queries only

xpath1:/html/head/title

@dgtlmoon
Copy link
Owner

if type(element) == etree._ElementStringResult:

In 5.1.1 lxml removed _ElementStringResult(), this was used to get the ->text() of a result #778 #751

@searchjaunt
Copy link
Author

Thx for the investigation. Do you still need some information from my side? What is the next step?

@dgtlmoon
Copy link
Owner

@searchjaunt please paste me the exact selector you are using, visual-selector never generates text() type selectors afaik

dgtlmoon added a commit that referenced this issue Apr 17, 2024
…'_ElementStringResult' - reimplement _ElementStringResult (#2313 #2312)
@searchjaunt
Copy link
Author

some random examples:
xpath1://article[@Class='page sticky grid gt-large'][1]
image

xpath1://table[@id='wegenwerkendata'][1]
xpath1://div[1]/div[2]/div[2]
xpath1://div[3]/div[2]/div[1]/div[1]/div[1]
xpath1://div[1]/section[1]/div[1]

I've never used the the visual selector

I have > 300 sites failing now.

@dgtlmoon
Copy link
Owner

dgtlmoon commented Apr 17, 2024 via email

@dgtlmoon
Copy link
Owner

xpath1://table[@id='wegenwerkendata'][1] xpath1://div[1]/div[2]/div[2] xpath1://div[3]/div[2]/div[1]/div[1]/div[1] xpath1://div[1]/section[1]/div[1]

note, this will only trigger if those elements are there, the error wont show otherwise

@searchjaunt
Copy link
Author

I tried a couple of them and I'm getting the error
can only concatenate str (not "bytes") to str
image

now.
PS not sure if I understand your latest note

@dgtlmoon
Copy link
Owner

if i had your exact selectors when you reported the bug, then i would not have released a new version without testing your selectors :( none-the-less, thanks... i'll keep working at it

@dgtlmoon
Copy link
Owner

@searchjaunt any chance you can grace me with what URL you are watching that causes that? really need the most info possible

@searchjaunt
Copy link
Author

Sure, here are two of them (the one I tried returning the last error)
https://stratenplan.gistel.be/gipod/wegeniswerken
xpath1://table[@id='wegenwerkendata'][1]
image

https://www.kortrijk.be/nieuws?f%5B0%5D=%3A&f%5B1%5D=categorie%3Amobiliteit
xpath1://div[3]/div[2]/div[1]/div[1]/div[1]
image

No other options or filtering and Basic fast Plaintext/HTTP Client (for the records, it does occur with WebDriver Chrome/Javascript, no Playwright/Chrome installed).

@dgtlmoon
Copy link
Owner

does changing it to //table[@id='wegenwerkendata'][1] work?

@xconverge
Copy link

here are 3 where I see these issues

https://www.amd.com/en/support/chipsets/amd-socket-am4/x570
xpath1:/html/body/div[1]/main/div/div/div/div/div[1]/div[1]/div/div[2]/details[1]/div/div[1]/div/span/div/div[2]
https://www.boss.info/global/support/by_product/katana-50_mk2/updates_drivers/4d633c80-f506-440e-94ce-055aaba48df3/
xpath1:/html/body/form/div[4]/div[1]/article/div[2]/div[2]
https://www.arturia.com/products/audio/minifuse/resources
xpath1:/html/body/div/div[1]/main/section[9]/div/div[4]/div[2]/div[2]/div[1]/table/tbody

@xconverge
Copy link

removing xpath1: from each has them working again I think

@searchjaunt
Copy link
Author

@dgtlmoon that works indeed. I hope that the the conclusion won't be that I need to remove xpath1 for > 300 sites.
I didn't add them myself but started appearing from a certain version of changedetection (can't recall which version).
Why did it work before 0.45.18?

@dgtlmoon dgtlmoon reopened this Apr 18, 2024
@dgtlmoon
Copy link
Owner

@searchjaunt "Why did it work before 0.45.18?" because as you said its a container and the container was built differently, thats how containers work

@dgtlmoon dgtlmoon added the bug Something isn't working label Apr 18, 2024
@dgtlmoon
Copy link
Owner

@dgtlmoon that works indeed. I hope that the the conclusion won't be that I need to remove xpath1 for > 300 sites.
I didn't add them myself but started appearing from a certain version of changedetection (can't recall which version).
Why did it work before 0.45.18?

if you gave me better examples to test with from the very start then this wouldnt have happened, it was only because i was missing exact information, usually i never start working on a bug until i have the exact data someone is using, but this time i did and it bit me

@dgtlmoon
Copy link
Owner

I re-tested all situations mentioned above (all URLs and filters) and in the newest 0.45.20 they all pass

please try that version (0.45.20)

Constantin1489 added a commit to Constantin1489/changedetection.io that referenced this issue Apr 18, 2024
@searchjaunt
Copy link
Author

Just installed 0.45.20 and I still got an
'str' object has no attribute 'name'
for
https://www.depinte.be/werken
//div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1]

I explicitly removed xpath1:

other settings
image

image

nothing else

Some other things:
got some more false positives like
image
image
Apart from the spacing (don't know where it comes from since the since wasn't changed) there is no difference.

Despite being up to date, I get the message that there is a new version available
image

@Constantin1489
Copy link
Contributor

Constantin1489 commented Apr 18, 2024

Hi, I made a mistake when I did xpath3.1. when I made "xpath:" to link elementpath lib(xpath3.1), I forgot to duplicate the original xpath1 with new "xpath1:" test.
I'm currently investigating this xpath1 problem. I'm sorry.

EDIT: remove '//' in prefix

@Constantin1489
Copy link
Contributor

Constantin1489 commented Apr 18, 2024

@searchjaunt I can't reproduce the 'str' object has no attribute 'name' with v0.45.20
image

Add other test result.
image

@searchjaunt
Copy link
Author

Still getting it though:
image

@dgtlmoon
Copy link
Owner

dgtlmoon commented Apr 18, 2024 via email

@searchjaunt
Copy link
Author

@dgtlmoon see #2312 (comment)
Just tried deleting and creating it again, but with the same result

@Constantin1489
Copy link
Contributor

Constantin1489 commented Apr 18, 2024

@searchjaunt Could you run this command?
docker run -it -e LOGGER_LEVEL=CRITICAL --rm YOURCONTAINER_IMAGE bash -c 'pip3 list'

you can get the YOURCONTAINER_IMAGE(with the example image below mikebrady/shairport-sync:latest) of your running container with sudo docker ps.
like this
image

@Constantin1489
Copy link
Contributor

@searchjaunt Hi, I tried to reproduce the same thing with versions(18, 19, 20).. I couldn't reproduce 'str' object has no attribute 'name'

Screenshot 2024-04-18 at 22 32 44
Screenshot 2024-04-18 at 22 34 26
Screenshot 2024-04-18 at 22 37 17

@searchjaunt
Copy link
Author

@Constantin1489 did you try the URL
https://www.depinte.be/werken
with the xpath
//div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1]
and the other settings as mentioned in
#2312 (comment)

@Constantin1489
Copy link
Contributor

Yes!
Screenshot 2024-04-18 at 23 33 15
Screenshot 2024-04-18 at 23 36 23
Screenshot 2024-04-18 at 23 38 52

@Constantin1489
Copy link
Contributor

Constantin1489 commented Apr 18, 2024

@xconverge Also for the default xpath(XPath3.1).. That is why I didn't kill xpath1 and preserved the previous xpath syntax with 'xpath1:'

XPath3.1 function is important because when a user wants to use the syntax(xpath2~xpath3.1) obtained from SOF, in most cases, the person will fail. it's because lxml uses xpath1. also, python native xml xpath doesn't support all the syntax of xpath1. and Python native xml xpath is a little different than the XPath1 spec of W3C (especially namespace notation.).

I will soon publish the report repo about this subject(within two weeks? I'm cleaning codes now.). Spoiler alert! The number of tests is super huge. that shows why XPATH3.1 is possible without a problem in Python.(when the configuration is correct)

EDIT: So, basically there are pros and cons in xml or xpath parsers in Python. But the experience provided by elementpath lib is great because you can use xpath in the xpath spec without the problem.

@searchjaunt
Copy link
Author

@Constantin1489 strange. So what can I do in order to debug/make it work? I find it rather strange that in the header is said that a new version is available whilst 20 is installed (see earlier screenshot).

@Constantin1489
Copy link
Contributor

Constantin1489 commented Apr 18, 2024

@searchjaunt could you provide the command or script or dockerfile or docker-compose.yml how you run changedetectionio? Before posting here, please test the command you provide it actually works.

Also, Does the problem happen in all the watches?

@navels
Copy link

navels commented Apr 18, 2024

FYI I am also on 20 and am getting the "new version is available" banner. Installation is via this proxmox script: https://github.com/tteck/Proxmox/blob/main/ct/changedetection.sh

@Constantin1489
Copy link
Contributor

Constantin1489 commented Apr 18, 2024

Ah sorry. I thought you were saying the syntax is not working. For the new version banner. that will disappear. @navels does your xpath1 syntax work?

@dgtlmoon
Copy link
Owner

dgtlmoon commented Apr 18, 2024

Able to reproduce it with this shared watch https://changedetection.io/share/QtZ-94DW41sa on .20 , the error is actually now a different error 'str' object has no attribute '__name__'

When i use an earlier lxml version the error still exists so @searchjaunt this issue is unrelated, i will open a new one

@dgtlmoon
Copy link
Owner

Ok, this unrelated issue is now over at #2318 thanks @Constantin1489

@dgtlmoon
Copy link
Owner

tldr - fixed :)

@dgtlmoon dgtlmoon removed the triage label May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants