Fix import with recipe-scrapers #1788

gloriousDan · 2022-05-09T22:29:48Z

This PR fixes several import problems which stem from using some internal recipe-scrapers classes.

Overview over changes:

If an url is provided, scrape_me is called in wild mode
- If a scraper for the url exists, this scraper is called by recipe-scrapers
- If no scraper exists wild_mode comes into effect and tries to search the page for schema.org fields.
If this is unsuccesfull or no url was provided (which happens when data is entered in the source field) the old way of wrapping the input into <script type='application/ld+json'> ... </script> and then sending this into recipe-scrapers is used. (This is now the only way how text_scraper in cookbook/helper/scrapers/scrapers.py gets called)
Additionally, we always try to get data from the property classes first. If that fails we try to get it from the schema attributes.

This doesn't change a lot of the old behaviour but uses the scraper classes of recipe-scrapers better. This way it shouldn't break anything for any sites.

Fixes:

fixes Fallback to wildmode if parser is broken #1764
https://www.reddit.com/r/selfhosted/comments/ui6cny/tandoor_recipe_v14_released_shopping_importing/i7kkz9a/?context=3 doesn't throw an exception anymore. The issue is within upstream though, so it should be raised there.

I'm not sure what to do about the CooksIllustrated custom scrapers and if they still work.
@smilerz Why was this implemented within tandoor and not within recipe-scrapers and is this still working?

smilerz · 2022-05-09T23:07:42Z

@gloriousDan the site is behind a paywall so can’t be implemented/tested as part of the main library.

A test is implemented for that site, so as long as the PR passes all of the url import tests it should be fine.

gloriousDan · 2022-05-09T23:17:26Z

@smilerz Ah okay, I already suspected that something like this is the reason. I didn't run the tests locally (yet), are they run by CI?

gloriousDan · 2022-05-09T23:18:45Z

The broken scraper for https://www.reddit.com/r/selfhosted/comments/ui6cny/tandoor_recipe_v14_released_shopping_importing/i7kkz9a/?context=3 has now also a fix in a PR upstream (hhursev/recipe-scrapers#538)

smilerz · 2022-05-09T23:36:24Z

@smilerz Ah okay, I already suspected that something like this is the reason. I didn't run the tests locally (yet), are they run by CI?

Only on the primary branches.

gloriousDan · 2022-05-10T08:22:51Z

I just ran the test suite locally.
The only issues seem to be because of the same error:

=========================================================================================== 
short test summary info 
============================================================================================
FAILED cookbook/tests/api/test_api_recipe.py::test_share_permission - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/api/test_api_userpreference.py::test_add - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/edits/test_edits_recipe.py::test_switch_recipe - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/edits/test_edits_recipe.py::test_external_recipe_update - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/edits/test_edits_storage.py::test_edit_storage - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/edits/test_edits_storage.py::test_view_permission[arg3] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/edits/test_edits_storage.py::test_view_permission[arg6] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/other/test_export.py::test_export_file_cache[arg2] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/other/test_export.py::test_export_file_cache[arg3] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/other/test_export.py::test_export_file_cache[arg4] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/other/test_export.py::test_export_file_cache[arg5] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_api.py::test_external_link_permission[arg4] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_api.py::test_external_link_permission[arg5] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_api.py::test_external_link_permission[arg6] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_books[arg2] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_books[arg3] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_plan[arg2] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_plan[arg3] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_shopping[arg2] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_shopping[arg3] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_settings[arg1] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_settings[arg2] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_settings[arg3] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_history[arg1] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_history[arg2] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_history[arg3] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_markdown_doc[arg0] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_markdown_doc[arg1] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_markdown_doc[arg2] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_markdown_doc[arg3] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_api_info[arg1] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_api_info[arg2] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_general.py::test_api_info[arg3] - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
FAILED cookbook/tests/views/test_views_recipe_share.py::test_share - ValueError: Missing staticfiles manifest entry for 'assets/favicon.svg'
==================================================================== 
34 failed, 559 passed, 24 skipped, 1327 warnings in 237.80s (0:03:57) 
=====================================================================

Do I have to run yarn build before running the tests?

gloriousDan · 2022-05-10T10:39:40Z

The broken scraper for https://www.reddit.com/r/selfhosted/comments/ui6cny/tandoor_recipe_v14_released_shopping_importing/i7kkz9a/?context=3 has now also a fix in a PR upstream (hhursev/recipe-scrapers#538)

The fix was merged and published in https://github.com/hhursev/recipe-scrapers/releases/tag/13.32.1 just now.

smilerz · 2022-05-10T11:59:19Z

Do I have to run yarn build before running the tests?

Yes, and python manage.py collectstatic

gloriousDan · 2022-05-10T12:17:58Z

Yes, and python manage.py collectstatic

After running yarn build and python manage.py collectstatic all tests pass.

vabene1111 · 2022-05-11T14:12:31Z

@smilerz i had to disable the tests for your inmporter as they were failing for me but i suspect its an easy fix, i just did not have the time to understand whats going wrong.

@gloriousDan thank you very much for this, i was just about to sit down and implement basically exactly what you did here. I will review and get this merged asap.

gloriousDan · 2022-05-11T14:16:36Z

That's great to hear as this problem is holding me back from upgrading to 1.2.x

smilerz · 2022-05-11T18:32:18Z

@smilerz i had to disable the tests for your inmporter as they were failing for me but i suspect its an easy fix, i just did not have the time to understand whats going wrong.

I just tried them and they pass for me. (SpruceEats is still broken)

vabene1111 · 2022-05-11T18:41:35Z

ok aweseome, thanks for the info

Call scrape_me first when scraping from url

2a7475c

gloriousDan changed the title ~~Call scrape_me first when scraping from url~~ Fix import with recipe-scrapers May 9, 2022

vabene1111 merged commit cb59a63 into TandoorRecipes:develop May 11, 2022

gloriousDan deleted the fix-import branch May 12, 2022 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix import with recipe-scrapers #1788

Fix import with recipe-scrapers #1788

gloriousDan commented May 9, 2022 •

edited

smilerz commented May 9, 2022

gloriousDan commented May 9, 2022

gloriousDan commented May 9, 2022

smilerz commented May 9, 2022

gloriousDan commented May 10, 2022

gloriousDan commented May 10, 2022

smilerz commented May 10, 2022

gloriousDan commented May 10, 2022

vabene1111 commented May 11, 2022

gloriousDan commented May 11, 2022

smilerz commented May 11, 2022

vabene1111 commented May 11, 2022

Fix import with recipe-scrapers #1788

Fix import with recipe-scrapers #1788

Conversation

gloriousDan commented May 9, 2022 • edited

smilerz commented May 9, 2022

gloriousDan commented May 9, 2022

gloriousDan commented May 9, 2022

smilerz commented May 9, 2022

gloriousDan commented May 10, 2022

gloriousDan commented May 10, 2022

smilerz commented May 10, 2022

gloriousDan commented May 10, 2022

vabene1111 commented May 11, 2022

gloriousDan commented May 11, 2022

smilerz commented May 11, 2022

vabene1111 commented May 11, 2022

gloriousDan commented May 9, 2022 •

edited