-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using org-roam-db with org-ql #303
Comments
You can work around some of those performance issues by preloading your many Org files into Emacs when it's idle, so org-ql can search them quickly (the main performance penalty comes from initializing org-mode in each buffer). My long-term plans for Org QL include using a SQLite database as an additional backend, similar to org-roam. I experimented with such a system several years ago, before I wrote org-ql, using org-rifle as a frontend, inspired by John Kitchin's work. It seemed promising, but there were some rough corners that needed to be handled before it could be considered polished enough to be generally usable. Maybe org-roam has solved some of those problems already. Anyway, I don't have a timeframe for the work, but I hope to do it someday. What you're doing here, using org-ql-view as a frontend for data provided by org-roam, is interesting, but not quite the same thing, from what I can tell. I don't know much about how org-roam works, so I can't offer much input on that. |
Adam Porter ***@***.***> writes:
You can work around some of those performance issues by preloading your many Org files into Emacs when it's idle, so org-ql can search them quickly (the main performance penalty comes from initializing org-mode in each buffer).
FYI, the performance on large number of small files has been improved
recently of the latest development version of Org. If you are willing
to, you may help to improve it further by writing to Org ML (see
https://orgmode.org/manual/Feedback.html). We can then try to identify
the bottlenecks in your setup.
…--
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92
|
@yantar92 That's great news! Thanks! |
That'd be cool to have. If you have a roadmap or a list of items you want implemented, I would love to contribute. The function I had posted above has evolved a bit since: (defun org-roam-ql-view--get-nodes-from-query (source-or-query)
"Convert SOURCE-OR-QUERY to org-roam-nodes.
SOURCE-OR-QUERY can be one of the following:
- A list of params that can be passed to `org-roam-db-query'. Expected
to have the form (QUERY ARG1 ARG2 ARG3...). `org-roam-db-query' will
called with the list or parameters as:
(org-roam-db-query QUERY ARG1 ARG2 ARG3...). The first element in each
row in the result from the query is expected to have the ID of a
corresponding node, which will be conerted to a org-roam-node. QUERY
can be a complete query. If the query is going to be of the form
[:select [id] :from nodes :where (= todo \"TODO\")], you can omit the
part till after :where. i.e., pass only [(= todo \"TODO\")] and the
rest will get appended in the front.
- A list of org-roam-nodes
- A function that returns a list of org-roam-nodes"
(cond
((-all-p #'org-roam-node-p source-or-query) source-or-query)
((and (listp source-or-query) (vectorp (car source-or-query)))
(let ((query (car source-or-query))
(args (cdr source-or-query)))
(--map (org-roam-node-from-id (car it))
(apply #'org-roam-db-query
(if (equalp :select (aref query 0))
query
(vconcat [:select id :from nodes :where] query))
args))))
((functionp source-or-query) (funcall source-or-query))))
(defun org-roam-ql-view (source-or-query title &optional super-groups)
"Basically what `org-ql-search does', but for org-roam-nodes.
See `org-roam-ql-view--get-nodes-from-querySOURCE-OR-QUERY' for what
SOURCE-OR-QUERY can be. TITLE is a title to associate with the view.
See `org-roam-search' for details on SUPER-GROUPS."
(let* ((nodes (org-roam-view--get-nodes-from-query source-or-query))
(strings '())
(title (format "org-roam - %s" title))
(buffer (format "%s %s*" org-ql-view-buffer-name-prefix title))
(header (org-ql-view--header-line-format
:title title))
(org-ql-view-buffers-files (mapcar #'org-roam-node-file nodes))
(org-ql-view-query '(property "ID"))
(org-ql-view-sort nil)
(org-ql-view-narrow nil)
(org-ql-view-super-groups super-groups)
(org-ql-view-title title))
(dolist-with-progress-reporter (node nodes)
(format "Processing %s nodes" (length nodes))
(push (org-roam-ql-view--format-node node) strings))
(when super-groups
(let ((org-super-agenda-groups (cl-etypecase super-groups
(symbol (symbol-value super-groups))
(list super-groups))))
(setf strings (org-super-agenda--group-items strings))))
(org-ql-view--display :buffer buffer :header header
:string (s-join "\n" strings))))
;; modified org-ql-view--format-element to work with org-roam nodes
(defun org-roam-ql-view--format-node (node)
;; This essentially needs to do what `org-agenda-format-item' does,
;; which is a lot. We are a long way from that, but it's a start.
"Return NODE as a string with text-properties set by its property list.
If NODE is nil, return an empty string."
(if (not node)
""
(let* ((marker
(org-roam-with-file (org-roam-node-file node) t
(goto-char (org-roam-node-point node))
(point-marker)))
(properties (list
'org-marker marker
'org-hd-marker marker))
;; (properties '())
(string (org-roam-node-title node))) ;;(org-roam-node--format-entry (org-roam-node--process-display-format org-roam-node-display-template) node)))
(remove-list-of-text-properties 0 (length string) '(line-prefix) string)
;; Add all the necessary properties and faces to the whole string
(--> string
;; FIXME: Use proper prefix
(concat " " it)
(org-add-props it properties
'org-agenda-type 'search
'todo-state (org-roam-node-todo node)
'tags (org-roam-node-tags node)
;;'org-habit-p (org)
))))) org-roam-db has all the information needed to build the agenda buffer except for the markers, which needs the corresponding buffer open. That in turn is also the bottleneck in my implementation so far. Since org-agenda seems to heavily rely on Another interesting issue I ran into was with the I'll switch this to open the files in the background when idle, hopefully having >1500 org files open doesn't cause other issues 😅 @yantar92 that sounds awesome, I was profiling a few alternatives to see what might work best, I'll try them with the dev branch of org and see how it goes. |
Aright, so here's what I did: I used emacs with emacs -Q -l ~/.emacs.d/straight/repos/straight.el/bootstrap.el Used the following script on two versions of org, one I had previously installed at the current head (I think): (straight-use-package 'org)
(straight-use-package 'org-roam)
(defun run-elp (func sources)
"Instrument org and FUNC and iterate on SOURCES with FUNC.
FUNC is a sumbol representing a function that takes one parameter.
SOURCES is a list of element that will be processed by FUNC"
(elp-instrument-package "org")
(elp-instrument-function func)
(elp-reset-all)
(mapcar func sources)
(elp-results))
(defmacro with-plain-file (file keep-buf-p &rest body)
"Same as `org-roam-with-file', but doesn't start `org-roam'."
(declare (indent 2) (debug t))
`(let* (new-buf
(auto-mode-alist nil)
(find-file-hook nil)
(buf (or
(and (not ,file)
(current-buffer)) ;If FILE is nil, use current buffer
(find-buffer-visiting ,file) ; If FILE is already visited, find buffer
(progn
(setq new-buf t)
(find-file-noselect ,file)))) ; Else, visit FILE and return buffer
res)
(with-current-buffer buf
(setq res (progn ,@body))
(unless (and new-buf (not ,keep-buf-p))
(save-buffer)))
(if (and new-buf (not ,keep-buf-p))
(when (find-buffer-visiting ,file)
(kill-buffer (find-buffer-visiting ,file))))
res))
(defun test-org-load-files (func &optional restart)
(let ((test-dir "~/temp/org-mode-test/")
files)
(message "Tests running")
(when (and (file-exists-p test-dir) restart)
(dolist (f (directory-files (file-truename test-dir))) (unless (member f '("." "..")) (delete-file f)))
(delete-directory (file-truename test-dir) t))
(if (or restart (not (file-exists-p test-dir)))
(progn
(make-directory (file-truename test-dir))
;; generating a bunch of file for testing
(dolist (num (number-sequence 1 25 1))
(let ((auto-mode-alist nil)
(find-file-hook nil)
(id (org-id-new))
(f (file-truename (format "~/temp/org-roam-test/test_%s.org" num))))
(push f files)
(with-current-buffer (find-file-noselect f)
(erase-buffer)
(insert (format "* This is the heading in file number %s
:PROPERTIES:
:ID: %s
:TEST_PROP_1: %s
:TEST_PROP_2: id:%s
:END:" num id num id))
(save-buffer)
(kill-buffer (find-buffer-visiting f))))))
(progn
(mapcar (lambda (f) (let ((f (find-buffer-visiting f)))
(em f)
(when f
(kill-buffer f))))
(setq files (f-glob "*.org" test-dir)))))
(run-elp func files)
(with-current-buffer "*ELP Profiling Results*"
(write-file (format "~/elp_results_%s" func (format-time-string "%Y-%m-%dT%H-%M-%S%-z"))))))
(defun --test-org-roam-with-file (f)
(org-roam-with-file f t
(goto-char 3)
(point-marker)))
(defun --test-with-current-buffer (f)
(with-current-buffer (find-file-noselect f)
(goto-char 3)
(point-marker)))
(defun --test-with-plain-file (f)
(with-plain-file f t
(goto-char 3)
(point-marker)))
(setq org-roam-directory (file-truename "~/temp/org-mode-test/"))
(setq org-roam-node-display-template (concat "${title:*} " (propertize "${tags:10}" 'face 'org-tag)))
(org-roam-db-autosync-mode)
(with-eval-after-load 'org-roam
;; running twice to so that the first time around module loading won't effect times
(dolist (func '(--test-org-roam-with-file
--test-with-current-buffer
--test-with-plain-file))
(test-org-load-files func t))
(dolist (func '(--test-org-roam-with-file
--test-with-current-buffer
--test-with-plain-file))
(test-org-load-files func t))) The results summery was as follows:
These numbers are pretty good, ~100 files a second would more than satisfactory. But when I try something similar with my init loaded, the number jump to much larger values:
I'll try run this with the profiler and see what I get from it. |
fyi, I posted the detailed breakdown: https://ahmed-shariff.github.io/post/2022-09-16-profiling_loading_org_files |
Your testing does not contain any information about why the time increased. The org-related staff is certainly not the culprit there. I recommend using M-x profiler-start ... M-x profiler-report to identify the actual "heavy" functions that cause the slowdown. |
Yep, when I get some time, I'll run it with the profile functions and update here. Any better way to share those results without cluttering this thread? |
You can write to Org mailing list directly. See https://orgmode.org/manual/Feedback.html |
@ahmed-shariff You might find this interesting. I use it to avoid reading all org roam nodes to find TODO headings: https://d12frosted.io/posts/2021-01-16-task-management-with-roam-vol5.html |
I've ended up running ripgrep over my org-roam directory and only allowing files that contain todos and tags. I have about 600 files, but tagged todos are only couple. Then I set Here's the code:
One thing about org-ql that bothers me is that it keeps bunch of buffers open. If there is ever sqlite solution and no need for open buffers - that'd be awesome! |
another option for this is to query directly from the database with |
A quick update here, for some reason, after I upgraded my packages recently, the performance seems to be "ok". Started putting it into a package: https://github.com/ahmed-shariff/org-roam-ql |
That's exactly what I do, for example: (defun get-project-nodes ()
(seq-uniq
(seq-map
#'car
(org-roam-db-query
[:select [nodes:file]
:from tags
:left-join nodes
:on ( = tags:node-id nodes:id)
:where (= tag "project")
])))) |
I'm starting to delve in both packages, and I have a few questions @ahmed-shariff @alphapapa ;) So basically the difference org-roam allows to make is to use a polished and robust SQLite database as a drastically faster cache (if configured properly, you should never have to parse any file, just query a SQLite database. In my experience, building the agenda view from files is noticeably slower, even though I haven't tried the org-ql agenda properly). The rest of the org-roam mindset can be set aside, users can do whatever they want with it. Just see this task management method for instance : Why not consider reaching out to org-roam, proposing them to put the org-roam-db.el code into a org-sql-db package and consider this a common utility that could be one of the options for the cache in org-ql ? And then consider allowing the Then an improvement to https://d12frosted.io/posts/2021-01-16-task-management-with-roam-vol5.html could be to use this kind of function in a hook to record wanted headlines "on write", as dfrosted suggests (that's code written by me, gpl3+) :
This would register (or delete) directly each wanted headline directly in the SQL database when the org file is written, so we don't have to read it again until we have to open it. The SQL database could be set with a defvar, so that we don't have to worry about if it's actually the org-roam database or another "custom" database. As for all the useful @ahmed-shariff 's code about roam-ql buffers, it shoudln't be too hard to rewrite if we have proper access to all org-ql commands and set the And finally if we manage to get it to work we could setup org-agenda-commands to have an instant editing experience. Don't have to care about agenda-files (only database entries). |
Ok, why don't you do that? |
I'm still trying to figure out if that's actually feasible, that's why I contribute to this discussion (I'm not telling you what to do, just thinking out loud). While sending the message, I remarked that there's still a difference between the caches role and management. Currently updating my previous comment to abbount for that. |
Here's a flaw in the reasoning: What I take from https://d12frosted.io/posts/2021-01-16-task-management-with-roam-vol5.html is the idea that with a database and a proper hook setup, you can basically instantly get access to elements you want. So if you know what you want to record, you don't have to parse anything. But that isn't exactly the functionning of the cache in org-ql, since what you parse can be arbitrary and that's what you want to cache. However, that is what I want from a custom-org-agenda-commands configuration. (parse all possible agenda files once, then update on write). |
There are few other threads where I had discussed some of these with @alphapapa that might be of interest to you @nicolas-graves : #354 #334 It's also worth considering that with recent org releases, you would be able to comfortably use org-ql even with 1000s of files in a bare emacs setup. |
I released a tiny package based on the dynamic-agenda concept, which should integrate pretty well with So that's without org-roam's database, but I think once |
I love
org-ql
and I loveorg-roam
. Unfortunately, with 2000+ org files (following the org-roam convention),org-ql
seems to be suffering performance vice. On the other hand, I love whatorg-ql
does with the agenda buffer, whereas org-roam-buffer does a few interesting things, it doesn't allow to query and display content as flexibly asorg-ql
does. (see org-roam/org-roam#1043). Expanding on the conversation in org-roam/org-roam#1043, I was wondering how one might go about extendingorg-ql
to also allow to interface with the org-roam's db. For now I have the following snippet, which is modified fromorg-ql-search
to display content from a list oforg-roam-node
s:Any suggestions or comments on how one could go about this?
A crazy idea I am playing with: I'd love to use the same org-ql-search interface, where 'org-roam would be another option under the buffers-files (maybe call this sources?) and query with related options get processed through a different process
The text was updated successfully, but these errors were encountered: