-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Conversation
recoll is a local search engine based on Xapian: http://www.lesbonscomptes.com/recoll/ By itself recoll does not offer web or API access, this can be achieved using recoll-webui: https://github.com/koniu/recoll-webui As recoll-webui by default does not support paged JSON results it is advisable to use a patched version which does: https://github.com/Yetangitu/recoll-webui/tree/jsonpage A pull request was sent upstream, if this is merged the patched version is no longer needed This engine uses a custom 'files' result template set base_url to the location where recoll-webui can be reached set dl_prefix to a location where the file hierarchy as indexed by recoll can be reached set search_dir to the part of the indexed file hierarchy to be searched, use an empty string to search the entire search domain
patched recoll-webui supports paged JSON on that endpoint
FYI, the |
URL handling sanitised and engine disabled by default |
BTW, if this PR is to be merged I - or someone else - should add template/style support for the 'files' template to the other styles as it currently only works as intended in oscar/logicdev. I'll only spend time doing so if it will be merged as I use searx with a custom theme and style. |
The commits above adds It looks like recoll-webui is unmaintained (no commits since Sept. 2016) so I won't hold my breath for the PR to be merged. |
PS hold a bit with merging, there are some features I'm adding at the moment (embedded preview) plus one part of the code which turns out to be specific to my network which needs to be generalised (the download logic). |
* add mount_prefix parameter, set this to location where _local_ filesystem covered by index is mounted, used to create download path, see explanation in settings.yml * add preview support for audio, video and image types - settings.yml: * add mount_prefix plus explanation on how to use it - templates/.../files.html * add generic media preview support
The commit above adds preview support for audio, video and image types. It also adds a mandatory |
…engine remote commit
@Yetangitu What are the dependencies of the webui? I am trying to start it, but it fails with |
Try to run the webui by itself first using I.e. install Recoll, have it index something, point the webui at this config and it should work. You don't need to start the Recoll GUI (I never touch it) to get it going. Here's a quick example Recoll config to get started: https://gist.github.com/Yetangitu/1bb4c5cd4b35e2911123d71b6ca3cc1c Dump it in a directory (preferably hosted on SSD) which has enough capacity to hold the expected index file size, these can become quite big when indexing a large set. To actually have Recoll index the contents of the files you'll want to run the https://gist.github.com/Yetangitu/dbb624db032fc217cf97898008a27a71 |
Any progress on deciding whether to merge this PR? I've been using it for a long time and I assume this functionality can be of benefit to others who maintain a large document collection. |
@kvch do you have time to continue with your review? |
FYI, I submitted a PR to the Python3-version of the Recoll web interface to make it work better in combination with this change - it allows the use of more than one Recoll instance in a query. (edit: the PR was merged) For those using this engine it may be interesting to review a bug report I submitted to the Recoll project related to degraded indexing and searching performance in later versions using a specific database format. The bug report contains a suggestion on how to work around the issue. |
|
||
# helper functions | ||
def get_time_range(time_range): | ||
sw = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a global variable.
@Yetangitu Could you please rebase your PR? After that, I will approve it and merge it to master. |
This change adds the possibility to run queries over more than one database by pointing the program at extra recoll configuration directories using the RECOLL_EXTRACONFDIR environment variable. This variable can contain space-separated recoll configuration directories (i.e. directories which contain `recoll.conf`) which are parsed to find out the indexed *topdirs* and the location of the database directory. The _topdirs_ are added to the directory tree, the databases are added to the `extradbs` list. When running a query over the entire tree (using `<all>`, the default) all databases are searched. When the query is limited to a subdirectory the searched set is limited to only those databases which cover the related _topdir_, thus reducing search time and overhead. The raison d'ètre for this change is to allow the web interface to be used to search a large index split over several databases, e.g. _fiction_, _nonfiction_ and _audio_. This in turn is used in the _recoll engine_ for the _Searx_ meta-search engine, see searx/searx#1257 . This is a further development of an earlier change I submitted to Github, most of which was merged but for the extra databases.
Recoll is a local search engine based on Xapian:
http://www.lesbonscomptes.com/recoll/
Although Recoll seems to be mostly aimed at desktop users the engine can be run without any graphical interface or interaction, I never use the GUI tools while I've been running it for many years over a large (currently ~2TB) document collection.
By itself recoll does not offer web or API access, this can be achieved using recoll-webui:
https://github.com/koniu/recoll-webui
As recoll-webui by default does not support paged JSON results it is advisable to use a patched version which does:
https://github.com/Yetangitu/recoll-webui/tree/jsonpage
(A pull request was sent upstream, if this is merged the patched version is no longer needed)
This engine uses a custom
files
result template included in this PR (only for the oscar theme using the logicdev style, I can make versions for other themes/styles if this PR goes through)Use:
set
base_url
to the location where recoll-webui can be reachedset
dl_prefix
to a location where the file hierarchy as indexed by recoll can be reachedset
search_dir
to the part of the indexed file hierarchy to be searched, use an empty string to search the entire search domainExample from settings.yml:
Example output:
BTW, I'm using Searx with a custom theme so I adapted the oscar theme for this PR. I did not test the adaptations so if something doesn't work this ight be the cause. The result should more or less look like the image above.