Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DuckDuckGo search engine, add extra_deps #79

Closed
wants to merge 1 commit into from

Conversation

theresnotime
Copy link

⚠️ This PR adds a dependency to https://github.com/deedy5/duckduckgo_search

The DuckDuckGo search engine interface returns a [list] of URLs (max. 200), so should be compatible with the copyvios tool — it would be very interesting to compare this to the output of the Google search API.

Future work

  • If used, web queries should be proxied (see docs). This PR does not add any proxying of requests.

Notes

Copy link

@A09090091 A09090091 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@earwig
Copy link
Owner

earwig commented Apr 10, 2023

If I'm understanding this right, that library is scraping DDG's site using an internal endpoint for their JS frontend instead of any sort of public API (see code). DDG used to have a documented API but it no longer functions, as far as I can tell.

There are a few concerns with this:

  • It's probably against their TOS. From that now-inaccessible API docs page: "That is, it is not a full search results API or a way to get DuckDuckGo results into your applications beyond our instant answers. Because of the way we generate our search results, we unfortunately do not have the rights to fully syndicate our results, free or paid. For the same reason, we cannot allow framing our results without our branding."
    • If we want to establish a relationship with them, as I think you're suggesting with those links, starting out by violating their terms isn't the best move.
  • The internal endpoint could break at any time without warning; i.e., there is no guarantee of future compatibility.

I always understood DDG's to be a wrapper around Bing that provides anonymity and additional higher-level search features that aren't applicable for the copyvio detector, so we really should be looking into Bing instead if we want a Google alternative, and their free tier limits mean we'd most likely need to work something out with the WMF. See my comment here.

@earwig earwig deleted the branch earwig:develop April 7, 2024 23:56
@earwig earwig closed this Apr 7, 2024
@theresnotime theresnotime deleted the add-duckduckgo branch April 8, 2024 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants