Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zalando extractor fails to extract sustainability labels #74

Closed
en-GB opened this issue Jun 13, 2022 · 4 comments
Closed

Zalando extractor fails to extract sustainability labels #74

en-GB opened this issue Jun 13, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@en-GB
Copy link
Contributor

en-GB commented Jun 13, 2022

We used to extract labels directly from the rendered HTML.
Since Splash is no longer able to render zalando product pages, we extract them from this json file

beautiful_soup.find("script", {"type": "application/json", "class": "re-1-13"}).get_text()

but occasionally some labels will be missing.
This only affects ~10 products in any given run and ive only seen it happen on zalando.co.uk.

Switching the zalando scraper to Playwright would probably fix it tho.

@en-GB en-GB changed the title Zalando scraper sometimes misses sustainability labels on certain products Zalando scraper sometimes misses sustainability labels Jun 13, 2022
@en-GB en-GB changed the title Zalando scraper sometimes misses sustainability labels Zalando extractor sometimes misses sustainability labels Jun 13, 2022
@se-jaeger se-jaeger added bug Something isn't working high priority High priority and removed low priority labels Jul 7, 2022
@se-jaeger se-jaeger changed the title Zalando extractor sometimes misses sustainability labels Zalando extractor fails to extract sustainability labels Jul 7, 2022
@se-jaeger
Copy link
Contributor

With the latest changes from #79 the extractor can't find any sustainability labels leading to not create products.

@BigDatalex
Copy link
Collaborator

I just updated the zalando extractor. It was just a minor change, due to a change of a class name in the html. See: 53601e8

@BigDatalex
Copy link
Collaborator

There are two commits from @en-GB that might be more robust and improve the extraction of the zalando sustainability-labels see:

We (@en-GB) should check if these behave the same (extract the same sustainability-labels) like in the original approach or if there are some implications. So far, for our zalando tests, these achieve the same results.

@se-jaeger se-jaeger added enhancement New feature or request and removed bug Something isn't working high priority High priority labels Jul 7, 2022
@se-jaeger
Copy link
Contributor

@en-GB what's the status about this one? Ist this still an issue or can we just close it? Especially after the lates changes #83

@en-GB en-GB closed this as completed Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants