-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zalando extractor fails to extract sustainability labels #74
Comments
With the latest changes from #79 the extractor can't find any sustainability labels leading to not create products. |
I just updated the zalando extractor. It was just a minor change, due to a change of a class name in the html. See: 53601e8 |
There are two commits from @en-GB that might be more robust and improve the extraction of the zalando sustainability-labels see: We (@en-GB) should check if these behave the same (extract the same sustainability-labels) like in the original approach or if there are some implications. So far, for our zalando tests, these achieve the same results. |
We used to extract labels directly from the rendered HTML.
Since Splash is no longer able to render zalando product pages, we extract them from this json file
green-db/extract/extract/extractors/zalando.py
Line 152 in bf77115
but occasionally some labels will be missing.
This only affects ~10 products in any given run and ive only seen it happen on zalando.co.uk.
Switching the zalando scraper to Playwright would probably fix it tho.
The text was updated successfully, but these errors were encountered: