Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: Document how search works #956

Closed
bbkane opened this issue Mar 27, 2022 · 7 comments
Closed

Documentation: Document how search works #956

bbkane opened this issue Mar 27, 2022 · 7 comments

Comments

@bbkane
Copy link

bbkane commented Mar 27, 2022

Wiki Page URL

https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#explanation-of-buttons-in-the-web-ui---admin-snapshots-list

Suggested Edit

Document how search works in Archivebox. It looks like #721 and #543 add full text search using Sonic, but I can't find advanced usage details. Can it do boolean queries ("rabbits" AND NOT "racoons") for example? Would someone more familiar with the project add a few sentences describing what the capabilities of the search are and maybe some examples explaining how to use it?

@bbkane
Copy link
Author

bbkane commented Mar 27, 2022

Does it search the body as well as the title? Can I order by fields in the query?

@rcarmo
Copy link

rcarmo commented Sep 13, 2022

It would be great to have this documented. Right now it feels like I can't do full text search on page contents, and it is mostly because search is inscrutable.

@pirate
Copy link
Member

pirate commented Sep 28, 2022

I don't believe it can do Boolean queries but if you have Sonic you can do full-text search of the article bodies with fuzzy matching.

@rcarmo
Copy link

rcarmo commented Sep 28, 2022

I’m using the container image (which I assume includes it). Yet the lack of documentation still applies.

@pirate
Copy link
Member

pirate commented Nov 18, 2022

I've added a line to the Usage docs and a screenshot explaining search slightly more here: https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#explanation-of-buttons-in-the-web-ui---admin-snapshots-list

I still have to add instructions on how to set up Sonic/ripgrep and configure them later.

  • add short description of search functionality to Wiki usage page + screenshot of search in use
  • add wiki page explaining ripgrep vs sonic, their tradeoffs, and how to set them up
  • add entries to the wiki configuration page for SEARCH_BACKEND_ENGINE, SEARCH_BACKEND_HOST, and SEARCH_BACKEND_PASSWORD
  • add README quick summary explaining that Sonic is available for full-text search, but ripgrep is used by default + link to wiki pages for more info
  • add docs on how to use ripgrep-all instead of ripgrep https://github.com/phiresky/ripgrep-all

In summary for people arriving here via Google the setup instructions for Sonic are as follows:

🔍 Sonic Search Setup Instructions

  1. Download the sonic.cfg file into your data/ folder: curl -O https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/etc/sonic.cfg
  2. Uncomment the sonic: container config in docker-compose.yml: https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml#:~:text=sonic.cfg
  3. Set the SEARCH_BACKEND_ENGINE, SEARCH_BACKEND_HOST, and SEARCH_BACKEND_PASSWORD config vars in ArchiveBox to point to the new container: https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml#:~:text=SEARCH_BACKEND_ENGINE
  4. Restart the Docker compose project with docker-compose down; docker-compose down; docker-compose up
  5. Add any previously archived data into the Sonic index by running docker compose run archivebox update --index-only
  6. Verify Search works from the Snapshot admin page by searching for some text only present in an archived article's body text

If anyone wants to contribute the wiki page with these instructions + screenshots + links to the README I'm happy to review a documentation improvement PR.

@diego898
Copy link

diego898 commented Jan 19, 2023

Split out my comment into #1087

@pirate
Copy link
Member

pirate commented May 7, 2024

This is mostly done now! Check out our new documentation page here: https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-up-Search

I still have to document the config options on the Configuration page but it's a start.

For improvements / suggestions you can comment back here or open a PR with changes for this file:
https://github.com/ArchiveBox/docs/blob/master/Setting-up-Search.md

@pirate pirate closed this as completed May 7, 2024
@pirate pirate removed good first ticket status: wip Work is in-progress / has already been partially completed help wanted labels May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants