New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Whoosh search work #61
Conversation
@Linbreux I have created this as a draft since I'm not done yet but this is my initial work on moving search over to Whoosh. I thought I'd create a draft PR so you could have a look to see if you're OK with the approach. Currently it only indexes documents when the wiki is started and the search index is empty. TODO:
I am really not sure how to force reindexing files that are created/updated/deleted directly on the filesystem. It seems to me like we'd need a thread running that is monitoring filesystem changes and triggering indexing based on that, and also trigger a reindex every time wikmd is started. |
Perfect, I'll look into it when I have some free time and maybe add some commits myself. Appreciate the work you put into it! |
No problem, I was bored and wanted something to do. :D I'll make some more changes this evening or tomorrow to tie up loose ends. |
c90524e
to
a195d83
Compare
@Linbreux I am pretty happy with how this works now. When the Flask app is started 2 things will happen;
When creating, modifying or deleting a file with extension
The search index directory lives in the wiki directory and is hidden from the results of page listing. I have also enabled scoring and highlighting, all search results are sorted by score with the highest score first. The results returned also contain a sentence of two with the keyword highlighted. The highlighting itself comes directly from Whoosh and by default makes the highlight bold, this is probably configurable but I kind of like it as it is. I still need to write some tests for this. |
For some reason search is also broken...
Removed all index operations from Flask HTTP methods and do them via watchdog instead. This was done because when modifying a file via the web UI an indexing operation would be triggered by the web UI and then also very soon after by watchdog because it monitors for file changes.
…t create files ending in .MD
Oh alright perfect. I thought it kept some cache in the folder or something :) |
…d itself creates a new index
There is one bug I still need to figure out a solution to, it involves the git_manager and _searchindex. Git tries to commit the search index and it breaks things, maybe the _searchindex should live outside of the Markdown folder like caching does? |
I see passing tests! 🚀 |
Yes I guess we could put it in the same location as the cache folder! |
@Linbreux I just submitted a rather large set of changes. When the path was changed to just be a relative path it broke edits and deletes, edit causing duplication in results and delete deleting everything. I have fixed this now by adding an additional ID on the records: filename. This will have the added benefit of making files with the same name in different directories unique. I have switched the writer used for indexing and deleting to an asynchronous writer because I was running in to lock contention when edit a file (delete + index). I have entirely rewritten the watchdog so that it is easier to test, it now has a start and stop function that can be called from within Python to trigger the background indexer. |
@kura Prefect, I noticed that the index does not update when removing a file. This was the only bug I could find for know. |
@Linbreux you are right, I broke it when rewriting the watchdog code. I'll fix it now. |
@kura perfect! Do you have some things you'd like to test or are you ready to merge? |
@Linbreux I would like to write some tests around the search class and watchdog. Give me 24 hours or so to get a few tests in if you wouldn't mind. |
@kura No perfect, the more tests the better! Just let me know when you'd like to merge |
@Linbreux I have added a set of tests for I have also renamed the cache_dir to I'm pretty happy with this now, if you want to give it a once over and re-test. |
…d it was causing problems on my Docker container so I moved search to /dev/shm. Changes: * Move cachedir to `wikmd/cache` * Move searchdir to `wikmd/searchindex` * Exclude all non-.md files from setup_search
I'd suggest when you're ready to merge that you also use the squash + merge function of GitHub rather than straight up merging. This is a ton of commits and it'll pollute the history so I'd suggest just squash to 1 commit and merging. Alternatively when you're happy you can prod me and I'll squash locally and force push a squashed commit you can merge instead. |
Alright, this looks very good! I'll squash them ;) Thanks again for the work you put into it. |
@Linbreux commented on 6 Sept 2022, 21:06 UTC:
Thanks. And no worries, I've been doing other non-programming stuff at work recently and the lack of coding was driving me a little crazy so I needed to find an outlet. :D |
@kura Haha I get it. Always welcome to make further adjustments on wikmd! |
Summary
Provide a short overview ...
Details
Context to describe changes if necessary
Checks
docs/<corresponding file>