Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using org-roam-db with org-ql #303

Open
ahmed-shariff opened this issue Sep 13, 2022 · 21 comments
Open

Using org-roam-db with org-ql #303

ahmed-shariff opened this issue Sep 13, 2022 · 21 comments

Comments

@ahmed-shariff
Copy link

I love org-ql and I love org-roam. Unfortunately, with 2000+ org files (following the org-roam convention), org-ql seems to be suffering performance vice. On the other hand, I love what org-ql does with the agenda buffer, whereas org-roam-buffer does a few interesting things, it doesn't allow to query and display content as flexibly as org-ql does. (see org-roam/org-roam#1043). Expanding on the conversation in org-roam/org-roam#1043, I was wondering how one might go about extending org-ql to also allow to interface with the org-roam's db. For now I have the following snippet, which is modified from org-ql-search to display content from a list of org-roam-nodes:

(defun org-ql-roam-view (nodes title &optional super-groups)
  "Basically what `org-ql-search does', but for org-roam-nodes.
NODES is a list of org-roam-nodes. TITLE is a title to associate with the view.
See `org-ql-search' for details on SUPER-GROUPS."
  (let* ((strings (--map (save-excursion
                           ;; using this avoid org mode throwing
                           ;; "too many files" errors
                           (org-roam-with-file (org-roam-node-file it) nil
                             (org-with-point-at (marker-position (org-roam-node-marker it))
                               (org-ql-view--format-element (org-ql--add-markers (org-element-context))))))
                         nodes))
         (title (format "org-roam - %s" title))
         (buffer (format "%s %s*" org-ql-view-buffer-name-prefix title))
         (header (org-ql-view--header-line-format
                  :title title))
         ;; Bind variables for `org-ql-view--display' to set.
         (org-ql-view-buffers-files nil)
         (org-ql-view-query nil)
         (org-ql-view-sort nil)
         (org-ql-view-super-groups super-groups)
         (org-ql-view-title title))
    (when super-groups
      (let ((org-super-agenda-groups (cl-etypecase super-groups
                                       (symbol (symbol-value super-groups))
                                       (list super-groups))))
        (setf strings (org-super-agenda--group-items strings))))
    (org-ql-view--display :buffer buffer :header header
      :string (s-join "\n" strings))))

Any suggestions or comments on how one could go about this?

A crazy idea I am playing with: I'd love to use the same org-ql-search interface, where 'org-roam would be another option under the buffers-files (maybe call this sources?) and query with related options get processed through a different process

@alphapapa
Copy link
Owner

You can work around some of those performance issues by preloading your many Org files into Emacs when it's idle, so org-ql can search them quickly (the main performance penalty comes from initializing org-mode in each buffer).

My long-term plans for Org QL include using a SQLite database as an additional backend, similar to org-roam. I experimented with such a system several years ago, before I wrote org-ql, using org-rifle as a frontend, inspired by John Kitchin's work. It seemed promising, but there were some rough corners that needed to be handled before it could be considered polished enough to be generally usable. Maybe org-roam has solved some of those problems already. Anyway, I don't have a timeframe for the work, but I hope to do it someday.

What you're doing here, using org-ql-view as a frontend for data provided by org-roam, is interesting, but not quite the same thing, from what I can tell. I don't know much about how org-roam works, so I can't offer much input on that.

@yantar92
Copy link
Contributor

yantar92 commented Sep 16, 2022 via email

@alphapapa
Copy link
Owner

@yantar92 That's great news! Thanks!

@ahmed-shariff
Copy link
Author

ahmed-shariff commented Sep 17, 2022

My long-term plans for Org QL include using a SQLite database as an additional backend, similar to org-roam

That'd be cool to have. If you have a roadmap or a list of items you want implemented, I would love to contribute.

The function I had posted above has evolved a bit since:

(defun org-roam-ql-view--get-nodes-from-query (source-or-query)
  "Convert SOURCE-OR-QUERY to org-roam-nodes.
SOURCE-OR-QUERY can be one of the following:
- A list of params that can be passed to `org-roam-db-query'. Expected
  to have the form (QUERY ARG1 ARG2 ARG3...). `org-roam-db-query' will
  called with the list or parameters as:
  (org-roam-db-query QUERY ARG1 ARG2 ARG3...). The first element in each
  row in the result from the query is expected to have the ID of a
  corresponding node, which will be conerted to a org-roam-node. QUERY
  can be a complete query. If the query is going to be of the form
  [:select [id] :from nodes :where (= todo \"TODO\")], you can omit the
  part till after :where. i.e., pass only [(= todo \"TODO\")] and the
  rest will get appended in the front.
- A list of org-roam-nodes
- A function that returns a list of org-roam-nodes"
  (cond
   ((-all-p #'org-roam-node-p source-or-query) source-or-query)
   ((and (listp source-or-query) (vectorp (car source-or-query)))
    (let ((query (car source-or-query))
          (args (cdr source-or-query)))
      (--map (org-roam-node-from-id (car it))
       (apply #'org-roam-db-query
             (if (equalp :select (aref query 0))
                 query
               (vconcat [:select id :from nodes :where] query))
             args))))
   ((functionp source-or-query) (funcall source-or-query))))
    

(defun org-roam-ql-view (source-or-query title &optional super-groups)
  "Basically what `org-ql-search does', but for org-roam-nodes.
See `org-roam-ql-view--get-nodes-from-querySOURCE-OR-QUERY' for what
SOURCE-OR-QUERY can be. TITLE is a title to associate with the view.
See `org-roam-search' for details on SUPER-GROUPS."
  (let* ((nodes (org-roam-view--get-nodes-from-query source-or-query))
         (strings '())
         (title (format "org-roam - %s" title))
         (buffer (format "%s %s*" org-ql-view-buffer-name-prefix title))
         (header (org-ql-view--header-line-format
                  :title title))
         (org-ql-view-buffers-files (mapcar #'org-roam-node-file nodes))
         (org-ql-view-query '(property "ID"))
         (org-ql-view-sort nil)
         (org-ql-view-narrow nil)
         (org-ql-view-super-groups super-groups)
         (org-ql-view-title title))
    (dolist-with-progress-reporter (node nodes)
        (format "Processing %s nodes" (length nodes))
      (push (org-roam-ql-view--format-node node) strings))
    (when super-groups
      (let ((org-super-agenda-groups (cl-etypecase super-groups
                                       (symbol (symbol-value super-groups))
                                       (list super-groups))))
        (setf strings (org-super-agenda--group-items strings))))
    (org-ql-view--display :buffer buffer :header header
      :string (s-join "\n" strings))))

;; modified org-ql-view--format-element to work with org-roam nodes
(defun org-roam-ql-view--format-node (node)
  ;; This essentially needs to do what `org-agenda-format-item' does,
  ;; which is a lot.  We are a long way from that, but it's a start.
  "Return NODE as a string with text-properties set by its property list.
If NODE is nil, return an empty string."
  (if (not node)
      ""
    (let* ((marker
            (org-roam-with-file (org-roam-node-file node) t
              (goto-char (org-roam-node-point node))
              (point-marker)))
           (properties (list
                        'org-marker marker
                        'org-hd-marker marker))
           ;; (properties '())
           (string (org-roam-node-title node))) ;;(org-roam-node--format-entry (org-roam-node--process-display-format org-roam-node-display-template) node)))
      (remove-list-of-text-properties 0 (length string) '(line-prefix) string)
      ;; Add all the necessary properties and faces to the whole string
      (--> string
        ;; FIXME: Use proper prefix
        (concat "  " it)
        (org-add-props it properties
          'org-agenda-type 'search
          'todo-state (org-roam-node-todo node)
          'tags (org-roam-node-tags node)
          ;;'org-habit-p (org)
          )))))

org-roam-db has all the information needed to build the agenda buffer except for the markers, which needs the corresponding buffer open. That in turn is also the bottleneck in my implementation so far. Since org-agenda seems to heavily rely on get-text-property to get the markers I haven't been able to think of a workaround for this.

Another interesting issue I ran into was with the File mode specification error: (file-error Creating pipe Too many open files) error on my windows pc, I still haven't been able figure that out. That mostly happens because of some other packages I have that spawn processes in the background (git-gutter for example). Which is why I am using org-roam-with-file which suppresses some of these hooks.

I'll switch this to open the files in the background when idle, hopefully having >1500 org files open doesn't cause other issues 😅

@yantar92 that sounds awesome, I was profiling a few alternatives to see what might work best, I'll try them with the dev branch of org and see how it goes.

@ahmed-shariff
Copy link
Author

Aright, so here's what I did:

I used emacs with -Q and staright bootstraped:

emacs -Q -l ~/.emacs.d/straight/repos/straight.el/bootstrap.el

Used the following script on two versions of org, one I had previously installed at the current head (I think):

(straight-use-package 'org)
(straight-use-package 'org-roam)

(defun run-elp (func sources)
  "Instrument org and FUNC and iterate on SOURCES with FUNC.
FUNC is a sumbol representing a function that takes one parameter.
SOURCES is a list of element that will be processed by FUNC"
  (elp-instrument-package "org")
  (elp-instrument-function func)
  (elp-reset-all)
  (mapcar func sources)
  (elp-results))

(defmacro with-plain-file (file keep-buf-p &rest body)
  "Same as `org-roam-with-file', but doesn't start `org-roam'."
  (declare (indent 2) (debug t))
  `(let* (new-buf
          (auto-mode-alist nil)
          (find-file-hook nil)
          (buf (or
                (and (not ,file)
                     (current-buffer)) ;If FILE is nil, use current buffer
                (find-buffer-visiting ,file) ; If FILE is already visited, find buffer
                (progn
                  (setq new-buf t)
                  (find-file-noselect ,file)))) ; Else, visit FILE and return buffer
          res)
     (with-current-buffer buf
       (setq res (progn ,@body))
       (unless (and new-buf (not ,keep-buf-p))
         (save-buffer)))
     (if (and new-buf (not ,keep-buf-p))
         (when (find-buffer-visiting ,file)
           (kill-buffer (find-buffer-visiting ,file))))
     res))

(defun test-org-load-files (func &optional restart)
  (let ((test-dir "~/temp/org-mode-test/")
        files)
    (message "Tests running")
    (when (and (file-exists-p test-dir) restart)
      (dolist (f (directory-files (file-truename test-dir))) (unless (member f '("." "..")) (delete-file f)))
      (delete-directory (file-truename test-dir) t))

    (if (or restart (not (file-exists-p test-dir)))
        (progn
          (make-directory (file-truename test-dir))
          ;; generating a bunch of file for testing
          (dolist (num (number-sequence 1 25 1))
            (let ((auto-mode-alist nil)
                  (find-file-hook nil)
                  (id (org-id-new))
                  (f (file-truename (format "~/temp/org-roam-test/test_%s.org" num))))
              (push f files)
              (with-current-buffer (find-file-noselect f)
                (erase-buffer)
                (insert (format "* This is the heading in file number %s
  :PROPERTIES:
  :ID:       %s
  :TEST_PROP_1: %s
  :TEST_PROP_2: id:%s
  :END:" num id num id))
                (save-buffer)
                (kill-buffer (find-buffer-visiting f))))))
      (progn
        (mapcar (lambda (f) (let ((f (find-buffer-visiting f)))
                              (em f)
                              (when f
                                (kill-buffer f))))
                (setq files (f-glob "*.org" test-dir)))))

    (run-elp func files)
    (with-current-buffer "*ELP Profiling Results*"
      (write-file (format "~/elp_results_%s" func (format-time-string "%Y-%m-%dT%H-%M-%S%-z"))))))

(defun --test-org-roam-with-file (f)
  (org-roam-with-file f t
    (goto-char 3)
    (point-marker)))

(defun --test-with-current-buffer (f)
  (with-current-buffer (find-file-noselect f)
    (goto-char 3)
    (point-marker)))

(defun --test-with-plain-file (f)
  (with-plain-file f t
    (goto-char 3)
    (point-marker)))

(setq org-roam-directory (file-truename "~/temp/org-mode-test/"))
(setq org-roam-node-display-template (concat "${title:*} " (propertize "${tags:10}" 'face 'org-tag)))
(org-roam-db-autosync-mode)

(with-eval-after-load 'org-roam
  ;; running twice to so that the first time around module loading won't effect times
  (dolist (func '(--test-org-roam-with-file
                  --test-with-current-buffer
                  --test-with-plain-file))
    (test-org-load-files func t))

  (dolist (func '(--test-org-roam-with-file
                  --test-with-current-buffer
                  --test-with-plain-file))
    (test-org-load-files func t)))

The results summery was as follows:

Functions Org version 9.5.5-g8cc821 Org version 9.5.4-g5a6442
Run1 Run2 Run3 Avg Run1 Run2 Run3 Avg
test-with-current-buffer 0.0141548 0.01415752 0.0150422 0.014451507 0.01387448 0.0147026 0.01433376 0.014303613
test-org-roam-with-file 0.01293492 0.01199168 0.01381696 0.01291452 0.01209968 0.01191384 0.01204764 0.012020387
test-with-plain-file 0.00915172 0.00927128 0.00839904 0.00894068 0.00762304 0.00856996 0.00862808 0.008273693

These numbers are pretty good, ~100 files a second would more than satisfactory. But when I try something similar with my init loaded, the number jump to much larger values:

  • test-with-current-buffer equivalent: 0.8930580909 s
  • test-org-roam-with-file equivalent: 0.7955048727 s
  • test-with-plain-file equivalent: 0.5225694909 s

I'll try run this with the profiler and see what I get from it.

@ahmed-shariff
Copy link
Author

fyi, I posted the detailed breakdown: https://ahmed-shariff.github.io/post/2022-09-16-profiling_loading_org_files

@yantar92
Copy link
Contributor

Your testing does not contain any information about why the time increased. The org-related staff is certainly not the culprit there. I recommend using M-x profiler-start ... M-x profiler-report to identify the actual "heavy" functions that cause the slowdown.

@yantar92
Copy link
Contributor

yantar92 commented Sep 18, 2022

P.S. Your website is unreadable using my browser.
2022-09-18_14-14

@ahmed-shariff
Copy link
Author

Yep, when I get some time, I'll run it with the profile functions and update here. Any better way to share those results without cluttering this thread?

@yantar92
Copy link
Contributor

You can write to Org mailing list directly. See https://orgmode.org/manual/Feedback.html

@ParetoOptimalDev
Copy link

@ahmed-shariff You might find this interesting. I use it to avoid reading all org roam nodes to find TODO headings:

https://d12frosted.io/posts/2021-01-16-task-management-with-roam-vol5.html

@viocost
Copy link

viocost commented Jan 8, 2023

I've ended up running ripgrep over my org-roam directory and only allowing files that contain todos and tags. I have about 600 files, but tagged todos are only couple.

Then I set org-agenda-files to be the list of filtered org-roam files.

Here's the code:

(defun update-agenda-files()
  (interactive)

  (let* (
     (default-directory "/home/kostia/org-roam"))
        (setq org-agenda-files (split-string (shell-command-to-string "rg -l \"\\*+ (TODO|TICKET|BLOCKED|PROGRESS|REVIEW|QA|DONE|CANCELLED|IDEA|PROJ).*\:(work|chore|spike|idea|ticket)\:\"")))
    )
)

(update-agenda-files)

One thing about org-ql that bothers me is that it keeps bunch of buffers open. If there is ever sqlite solution and no need for open buffers - that'd be awesome!

@ahmed-shariff
Copy link
Author

I've ended up running ripgrep over my org-roam directory and only allowing files that contain todos

another option for this is to query directly from the database with org-roam-db-query or filter the nodes returned by org-roam-node-list

@ahmed-shariff
Copy link
Author

A quick update here, for some reason, after I upgraded my packages recently, the performance seems to be "ok". Started putting it into a package: https://github.com/ahmed-shariff/org-roam-ql

@ParetoOptimalDev
Copy link

I've ended up running ripgrep over my org-roam directory and only allowing files that contain todos

another option for this is to query directly from the database with org-roam-db-query or filter the nodes returned by org-roam-node-list

That's exactly what I do, for example:

(defun get-project-nodes ()
  (seq-uniq
   (seq-map
    #'car
    (org-roam-db-query
     [:select [nodes:file]
              :from tags
              :left-join nodes
              :on ( = tags:node-id nodes:id)
              :where (= tag "project")
              ]))))

@nicolas-graves
Copy link

nicolas-graves commented Sep 2, 2023

I'm starting to delve in both packages, and I have a few questions @ahmed-shariff @alphapapa ;)

So basically the difference org-roam allows to make is to use a polished and robust SQLite database as a drastically faster cache (if configured properly, you should never have to parse any file, just query a SQLite database. In my experience, building the agenda view from files is noticeably slower, even though I haven't tried the org-ql agenda properly). The rest of the org-roam mindset can be set aside, users can do whatever they want with it. Just see this task management method for instance :
https://d12frosted.io/posts/2021-01-16-task-management-with-roam-vol5.html

Why not consider reaching out to org-roam, proposing them to put the org-roam-db.el code into a org-sql-db package and consider this a common utility that could be one of the options for the cache in org-ql ?

And then consider allowing the buffers-or-files argument to take a emacsql-sqlite-connection (in which case we would know the cache is SQL). Then there could be some work to adapt normalizers to convert the query to an org-sql-db query (this could overcomplicate everything, I'm not sure, this is where I don't know how feasible it is) and ignore preambles in the case org-sql-db is the cache.

Then an improvement to https://d12frosted.io/posts/2021-01-16-task-management-with-roam-vol5.html could be to use this kind of function in a hook to record wanted headlines "on write", as dfrosted suggests (that's code written by me, gpl3+) :

(defun rde-org-id-pred (h)
  "Predicate that matches which org headline should be saved in
`org-roam-db'."
  (or (eq (org-element-property :todo-type h) 'todo)
      (org-element-property :scheduled h)
      (org-element-property :deadline h)))

(defun rde-org-update-ids (forgetp)
  "Set id properties on elements defined by `rde-org-id-pred' to
record them in `org-roam-db'."
  (let ((points (org-element-map
                    (org-element-parse-buffer 'headline)
                    'headline
                  (lambda (h)
                    (cons (org-element-property :begin h)
                          (or (funcall 'rde-org-id-pred h) (not forgetp)))))))
    ;; Update points in the reverse order to avoid moving upper headlines.
    (dolist (p (reverse points))
      (pcase p
        (`(,pt . t) (org-id-get pt 'create))
        (`(,pt)     (org-entry-delete pt "ID"))))))

This would register (or delete) directly each wanted headline directly in the SQL database when the org file is written, so we don't have to read it again until we have to open it.

The SQL database could be set with a defvar, so that we don't have to worry about if it's actually the org-roam database or another "custom" database. As for all the useful @ahmed-shariff 's code about roam-ql buffers, it shoudln't be too hard to rewrite if we have proper access to all org-ql commands and set the buffers-or-files option to (org-roam-db).

And finally if we manage to get it to work we could setup org-agenda-commands to have an instant editing experience. Don't have to care about agenda-files (only database entries).

@alphapapa
Copy link
Owner

Why not consider reaching out to org-roam, proposing them to put the org-roam-db.el code into a org-sql-db package and consider this a common utility that could be one of the options for the cache in org-ql ?

Ok, why don't you do that?

@nicolas-graves
Copy link

nicolas-graves commented Sep 2, 2023

I'm still trying to figure out if that's actually feasible, that's why I contribute to this discussion (I'm not telling you what to do, just thinking out loud). While sending the message, I remarked that there's still a difference between the caches role and management. Currently updating my previous comment to abbount for that.

@nicolas-graves
Copy link

nicolas-graves commented Sep 2, 2023

Here's a flaw in the reasoning:

What I take from https://d12frosted.io/posts/2021-01-16-task-management-with-roam-vol5.html is the idea that with a database and a proper hook setup, you can basically instantly get access to elements you want. So if you know what you want to record, you don't have to parse anything. But that isn't exactly the functionning of the cache in org-ql, since what you parse can be arbitrary and that's what you want to cache.

However, that is what I want from a custom-org-agenda-commands configuration. (parse all possible agenda files once, then update on write).

@ahmed-shariff
Copy link
Author

And then consider allowing the buffers-or-files argument to take a emacsql-sqlite-connection (in which case we would know the cache is SQL). Then there could be some work to adapt normalizers to convert the query to an org-sql-db query (this could overcomplicate everything, I'm not sure, this is where I don't know how feasible it is) and ignore preambles in the case org-sql-db is the cache.

There are few other threads where I had discussed some of these with @alphapapa that might be of interest to you @nicolas-graves : #354 #334

It's also worth considering that with recent org releases, you would be able to comfortably use org-ql even with 1000s of files in a bare emacs setup.

@nicolas-graves
Copy link

I released a tiny package based on the dynamic-agenda concept, which should integrate pretty well with org-ql: https://github.com/nicolas-graves/org-dynamic-agenda

So that's without org-roam's database, but I think once org-roam-ql-block is implemented, the same thing can be done with an agenda built with direct calls to the org-roam database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants