[FEATURE] Search articles #41

BrightDV · 2022-12-18T10:30:14Z

Is your feature request related to a problem? Please describe.
/

Describe the solution you'd like
A search functionnality, at least for articles.

Describe alternatives you've considered
/

Additional context
#26

BrightDV · 2022-12-18T10:42:22Z

For now, the app search for articles using SearXNG instances. The instances selected allow showing the results in JSON format in order to avoid scraping. However, the requests are often blocked because of rate limits.
For the search, it filters the results using search parameters: it searches for "formula1.com/en/latest/article" $query so the url of the result must contain the string between the double quotes. Thus, it only returns articles.

One workaround is to use the RSS feed of Formula 1 and then search in it. For the moment, I didn't find any way to get more than 22 articles. Furthermore, I don't think that getting 1000 articles and then searching among them is a good solution, as it will use a lot of bandwidth and be very slow.

sinfullad · 2022-12-18T11:58:41Z

For now, the app search for articles using SearXNG instances. The instances selected allow showing the results in JSON format in order to avoid scraping.

Sorry for the dumb question, but what do you mean by avoid scraping in this context?

Also in the worst case of scenario of all of the selected instances going down, are there any search engines you plan to use as the fallback option or will you use other instances? Currently I found Metager (metasearch similar to SearX), Mojeek (UK, uses its own crawler), Swisscows (data center in Switzerland, uses Bing Search and Bing Ads, though it uses its own indexes for Germany) to be viable options as well

BrightDV · 2022-12-18T13:13:28Z

Sorry for the dumb question, but what do you mean by avoid scraping in this context?

I don't like fetching a page and then extracting the content, but I will try to see if the rate limits still apply. If it doesn't, I will add the scraping if no results are found using the first method.

Also in the worst case of scenario of all of the selected instances going down, are there any search engines you plan to use as the fallback option or will you use other instances? Currently I found Metager (metasearch similar to SearX), Mojeek (UK, uses its own crawler), Swisscows (data center in Switzerland, uses Bing Search and Bing Ads, though it uses its own indexes for Germany) to be viable options as well

Thanks for these suggestions! However, I choose SearXNG because the backend is open-source, even if these propositions are made to be private.
With the scraping, there are up to 106 instances available, so I am going to try this way.

BrightDV · 2022-12-18T13:40:10Z

The good news is that requesting the page in HTML format is not rate limited, so it will work better.
I have implemented a basic scraping when the five previous requests did not work, but I will improve it later.

BrightDV · 2023-01-17T20:29:35Z

Added in latest release (v0.4.0).

BrightDV added the enhancement New feature or request label Dec 18, 2022

BrightDV mentioned this issue Dec 18, 2022

[FEATURE] Add side menu peeking when using gesture navigation #26

Closed

BrightDV added a commit that referenced this issue Dec 18, 2022

add search with scraping and add user-agent #41

b61c1c4

BrightDV added a commit that referenced this issue Jan 3, 2023

fix search again and shuffle instances #41

60dd77b

BrightDV closed this as completed Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Search articles #41

[FEATURE] Search articles #41

BrightDV commented Dec 18, 2022

BrightDV commented Dec 18, 2022

sinfullad commented Dec 18, 2022

BrightDV commented Dec 18, 2022

BrightDV commented Dec 18, 2022

BrightDV commented Jan 17, 2023

[FEATURE] Search articles #41

[FEATURE] Search articles #41

Comments

BrightDV commented Dec 18, 2022

BrightDV commented Dec 18, 2022

sinfullad commented Dec 18, 2022

BrightDV commented Dec 18, 2022

BrightDV commented Dec 18, 2022

BrightDV commented Jan 17, 2023