Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global history tree #1007

Closed
wants to merge 80 commits into from
Closed

Global history tree #1007

wants to merge 80 commits into from

Conversation

aartaka
Copy link
Contributor

@aartaka aartaka commented Oct 20, 2020

This changes the history to a more coherent and buffer-oriented model, turning browser history into one huge tree by default.

Motivation

This change is to make history handling coherent with recent changes (#910) and to open a way for new features (#247 (when we store session, we won't capture the history of the incognito buffers anymore, because these will have buffer-isolated history), #854, #958).

Adding to that, there is a change in user experience -- buffer-local histories are now connected to each other and do branch out from each other. This means that if you open a link in a new buffer, you'll be able to return to original page even in this new buffer. It's not conventional, but it's productive, and there are examples of extensions to other browsers (Firefox and Chrome) that organize tabs/histories into a tree for the sake of productive navigation.

What's New

  • History data (i.e. the thing accessed by (get-data (history-path (current-buffer)))) is a htree.
  • buffer-description is deleted in favor of history-entry.
  • web-mode relies on the global history, instead of having its own.
  • Some utilities (e.g. map-tree, do-tree, remove-node, find-node, find-data, delete-data, add-children) are added to the htree code.
  • Old history storage format is supported with close-to-intuitive importers (session is not imported because lots of deleted code it relies on).
  • (re)store methods can have arbitrary keyword arguments.
  • Finally, history-tree command to see the full tree in it's grotesque beauty xD
  • history-tree and buffer-history-tree are now colored/glyphed (with display-buffer-id-glyphs-p set to non-nil) to highlight the current buffer.
  • history-list to see the history as a list.
  • web-mode configuration slots for history navigation:
    • history-forwards-prompting-p -- whether history-forwards prompts the user on which branch to take in case there are several. Defaults to t.
    • history-forwards-to-dead-history-p -- whether history-forwars considers the dead history nodes worth visiting. Defaults to nil.
    • conservative-history-movement-p -- whether history nodes that don't belong to the current buffer are considerend visitable, i.e. conservative buffer-bounded navigation that we had before the patch. Defaults to nil.

Caveats

  • Our format for history serialization becomes unreadable with this patch. Maybe write a custom (de)serialization code for htree?

How Have This Been Tested.

  • Nyxt runs well on this branch.

How does that look to you?

EDIT: Add a note about how this helps #247.
EDIT: Update it with the recent changes as of 11/21/20 (map-tree, no more history modification on buffer switching).
EDIT: Mention history and session importers.
EDIT: Mention runnability and bugs.
EDIT: Update with the latest changes (internal buffer URLs, add-children, (re)store keywords).
EDIT: No more internal buffer URLs and hangs on history-backwards.
EDIT: history-tree.
EDIT: Colors, history-list, prompting history-forwards.
EDIT: history-forwards-prompting-p, history-forwards-to-dead-history-p, moving things to web-mode.
EDIT: Session is not restorable.
EDIT: find-node, remove-node.

@Ambrevar

This comment has been minimized.

libraries/history-tree/history-tree.lisp Show resolved Hide resolved
libraries/history-tree/history-tree.lisp Show resolved Hide resolved
libraries/history-tree/history-tree.lisp Outdated Show resolved Hide resolved
libraries/history-tree/history-tree.lisp Outdated Show resolved Hide resolved
source/browser.lisp Outdated Show resolved Hide resolved
source/buffer.lisp Outdated Show resolved Hide resolved
source/buffer.lisp Outdated Show resolved Hide resolved
source/buffer.lisp Outdated Show resolved Hide resolved
source/history.lisp Outdated Show resolved Hide resolved
source/session.lisp Outdated Show resolved Hide resolved
@Ambrevar
Copy link
Member

Wow! Yet another impressive patchset by @aartaka! :)
It's a lot to take in so I'll test it for a little while and report later.

@Ambrevar
Copy link
Member

Ambrevar commented Oct 20, 2020 via email

@aartaka
Copy link
Contributor Author

aartaka commented Oct 20, 2020

Can you explain how this change would help with incognito mode?

This will make session less coupled to global browser state. When we store session (global history, actually), we won't capture the history of the incognito buffers anymore, because these will have only buffer-isolated history.

Would web-mode have a separate history, it should be kept in sync with the global history, right? Wouldn't that be trickier? What are the benefits of a separate web-mode history? What's the link with auto-mode "get-data interface"?

web-mode used to have a separate history to enable per-buffer tree-like history, while global one was stored as hash-table keyed by URL-strings. Now that we have a tree-like history everywhere, we get buffer-local history tree for free. We don't need separate history-handling in web-mode then.

get-data and auto-mode example is there because, like in #910, we turn a browser-global session into a (possibly) buffer-local.

If the history tree is global, isn't it stateful by definition? Can you provide an example?

Sure. make-buffer sets htree:current of history to htree:root for non-child buffers. If one creates a non-child buffer (call it A) in the background and then navigates to some page on current buffer (call it B), then the new history node for B will be inserted to the root of the tree instead of it's dedicated branch. No matter what happens with A, B's history is now in two places: in the root and its dedicated branch. This doesn't seem right.

Another question: With this new global history, nodes without buffer are deleted. This means the global history gets automatically cleaned up on buffer deletion, which was not the case before when our global history used to persist everything until the end of time (or until the user called delete-history-entry). So what about making the automatic clean up an option disabled by default? Also maybe make it conditional to a duration, e.g. only delete the entry if it's older than 1 week.

Yes, this sounds fair.

@Ambrevar
Copy link
Member

Ambrevar commented Oct 21, 2020 via email

@Ambrevar
Copy link
Member

Ambrevar commented Oct 21, 2020 via email

@aartaka
Copy link
Contributor Author

aartaka commented Oct 21, 2020

OK, could you document this in your patch set? Thanks!

Done :)

So there is no benefits in having a separate history in web-mode, or is there?

There's no benefit anymore, now that we have tree-like history everywhere :)

How is can the session be buffer-local? Do you mean that a buffer can individually decide whether it belongs to the global history or not?

In some sense. But what I meant by buffer-local session is that now we can isolate history in buffers (like we can isolate auto-mode-rules).

Potentially, we can have several buffers that store their common history-tree to some buffer and then persist it to the session file. So, yes, session (in the sense of the composite history of several buffers) can become buffer-local too :)

Indeed. Off the top of my head, here is what I suggest: Make the history as immutable as possible. This might require some (interesting!) changes to the htree library.

I'm not sure how this htree:current modification can be solved by immutability of a history. Can you expand on that, please?

  • Instead of htree:current, store the current node buffer-locally.

This is the way to solve the problem with these additional branches, I believe. Having buffers reference the parent buffers is an option too.

  • Only delete nodes in the the delete-history-entry command.
  • Maybe leverage an immutable data structure like fset:seqs. This means that when you delete a node it creates a new tree which you assign to the global history tree variable. Parallel operations are thus not impacted by node deletion.

Again, I don't quite get how this could help the problem with unnecessary history modification after new buffer creation.

  • If you don't go for an immutable data structure, you'll need to protect the htree accesses with read/write locks.

We lock data on its modification using with-data-access already, don't we?

@aartaka
Copy link
Contributor Author

aartaka commented Oct 21, 2020

Another use-case question: What if the user wants to decouple the global history from the list of buffers to restore (the session)?

  1. I want to restore no buffers or other buffers but keep the same global history.
  2. Vice-versa: I want to restore the buffers but discard the global history.

Yes, that's a good use-case. What I see as options here are:

  • emptying the history-entry id-s of the buffers one doesn't want to save to session. This way one can restore the buffers they have previously chosen to save, and nothing else.
  • Adding restore-buffer-p slot to history-entry (it's not optmal, because we'll need to set it for all the entries of the same buffer).
  • Restrain from modifying id-s in history-entry-es, so that one could always pick the suitable set of buffers on session restoration. This require us to traverse the history for the maximum id to make sure new buffers don't collide with the ones already in history. This is sub-optimal.

Adding to that, we can prompt the user for the list of buffers to restore. We have synchronous minibuffer now, so it shouldn't be a problem anymore. (By the way, we should use it in auto-mode now!)

I think this question is the same as my previous remark about automatic node deletion. If we don't delete history nodes when we delete buffer, we get this decoupling between the buffer list and the global history, which seems ideal. Thoughts?

Yes, that's fair. Deleting nodes seems to be a premature optimization that hurts one's experience more that it enhances it :)

@aartaka
Copy link
Contributor Author

aartaka commented Oct 21, 2020

OK, I believe I've covered most of review points, but there's nothing else except that. I'll get to experiments and fixing CI/tests once I can run Nyxt again :)

@Ambrevar
Copy link
Member

Ambrevar commented Oct 21, 2020 via email

@jmercouris
Copy link
Member

jmercouris commented Oct 21, 2020

I suggest that we do not worry about the question of performance just yet. Let us first hit the wall, and then do some optimization at that time. There are many potential solutions, we can for example chunk the history to a running 30 day window.

@Ambrevar
Copy link
Member

Ambrevar commented Oct 21, 2020 via email

@Ambrevar
Copy link
Member

Ambrevar commented Oct 21, 2020 via email

@aartaka
Copy link
Contributor Author

aartaka commented Oct 22, 2020

Use what? Synchronous minibuffer? Sorry, I forgot what for!

To prompt the user with the list of modes to enable for a matching page. Because rules can have some modes user doesn't actually want to enable this time.

@aartaka
Copy link
Contributor Author

aartaka commented Oct 22, 2020

Having buffers reference the parent buffers is an option too.

Can you elaborate?

Sure. If we store the parent buffer id/reference as a slot of child buffer, then we won't need to change the htree:current to start an independent buffer history -- we can just look up the parent buffer in its slot:

  • If there is a parent buffer, then branch history out from the parent buffer's current history node,
  • If there's no parent buffer, branch out from tree root.

Completely forgot about that! :) So if we don't have any risk of race condition, how can the problem you described occur?

It's not a race condition, it's improper data modification. Again, when new buffer is created, the htree:current is set to htree:root so that this new buffer can start its history from root as an independent branch. The problem is that if we don't switch to the new buffer and continue to use the old one, the old buffer history will continue from the root too. This is not always the behavior we'd expect.

Everything is properly locked and no race conditions occur, it's just that the data is modified in a way that can shoot one in the foot.

@Ambrevar
Copy link
Member

Ambrevar commented Oct 23, 2020 via email

@Ambrevar
Copy link
Member

The last commit mixed up changes to libraries/htree and source/.
Can you keep them separate?

  • htree

    • find-data and find-child are too similar, it's confusing.
      find-child was poorly named. I suggest we rename it go-to-child.

    • found is undefined in find-data (should be match).

    • find-data is not optimized: htree:all-nodes first turns the tree into a
      list, then you call find over this list. It'd be more efficient to walk the
      tree and return on first match. See traverse for inspiration.

    • delete-node seems to be almost the same as delete-child except that it
      works over the whole tree.
      If I understand correctly:

      (delete-node data history) == (delete-child data (root history))

      If so, then I suggest we remove delete-node.

  • session

    buffer-local-histories-alist is undefined.

  • buffers-set

    About your comment "TODO: use setf-method?" you can't, because the hash-table
    is not set when a hash table entry is set.

  • history-add and equals

    If buffer A visits FOO and buffer B goes to FOO, then maybe-entry won't match
    the existing FOO entry because the buffer ID is different.

    This means that there will be as many FOO entries as there are buffers with
    different IDs that visit it. Is it what we want?

  • buffer-local-history-clean:

    I suggest we only clean up the history from the delete-history-entry instead.

  • set-current-buffer fails

    It fails to start because the (with-data-access history ...) in
    set-current-buffer binds history to NIL if there is no history file, or if
    started with the private buffer.

    Shouldn't we do a NIL check in with-data-access?

  • Backward compatibility:

    This change alters the content of both the session and the history. We need
    to include importers to avoid disrupting our users too much.

  • htree:all-nodes errors out on empty history.

    Code should be

    (defmethod all-nodes-data ((history history-tree))
      "Return a list of all nodes data, in depth-first order."
      (mapcar #'data (all-nodes history)))
  • history-add

    Seems to fail when no history exists.

@Ambrevar
Copy link
Member

I wasn't able to load any URL, there are more issues with history-add (last point).
I'll give it another shot later.

@aartaka aartaka marked this pull request as draft October 29, 2020 09:15
@aartaka aartaka force-pushed the global-history-tree branch 2 times, most recently from 9c8a515 to 3c8ee45 Compare November 19, 2020 10:17
@aartaka
Copy link
Contributor Author

aartaka commented Nov 19, 2020

  • delete-node seems to be almost the same as delete-child except that it
    works over the whole tree.
    If I understand correctly:
(delete-node data history) == (delete-child data (root history))

No, that's not the same. delete-child looks at the children of the
(current history-tree) only, while delete-node looks at all the
nodes of the tree.

The purposes of these functions are quite different, in fact:

  • delete-child is a low-level function, in line with add-child and
    go-to-child (formerly find-child). Analogy: rplacd.
  • delete-node is a bit more higher-level utility. Analogy:
    delete-if. In an ideal world delete-node should rely on
    delete-child, like delete-if may rely on rplacd.

No need to remove delete-node then.

  • history-add and equals
    If buffer A visits FOO and buffer B goes to FOO, then maybe-entry won't match
    the existing FOO entry because the buffer ID is different.
    This means that there will be as many FOO entries as there are buffers with
    different IDs that visit it. Is it what we want?

I believe it is, unless we turn history-tree into history-graph :)

The notion of history-tree is dependent on node ordering. To get back
in the history, one needs to look at the parent of the current
node. If we don't create new nodes as children of existing ones (by
refreshing some other node, for example), then this structure breaks
down and the only way to get back in history we are left with is a
sub-optimal whole tree traversal.

Creating "duplicate" nodes for buffers may consume more computer
memory, but it's worth the price:

  • "Duplicate" nodes represent how the browsing happens and how
    sessions need to be stored.
  • These nodes are easier to traverse back and forth, given that it's
    simple htree:back/htree:forward call.
  • This preserves the behavior of web-mode buffer-local
    history-tree. It had some duplication too, and it was a useful
    duplication.
  • buffer-local-history-clean:
    I suggest we only clean up the history from the delete-history-entry instead.

Another option I lean to is to have a *browser*-based switch for
whether to clean history.

  • set-current-buffer fails
    It fails to start because the (with-data-access history ...) in
    set-current-buffer binds history to NIL if there is no history file, or if
    started with the private buffer.
    Shouldn't we do a NIL check in with-data-access?

I'm not sure. If we don't make this check, we can modify a possibly
empty data. This can come in handy (bookmark-add uses it already),
but then all the data becomes non-obviously typed as (or null type).
This is not okay either. All in all, we do NIL checks sometimes, as a
price for flexibility, so why not stay on the flexible side there :)

A question to ask there is: how do we modify a truly empty history
(e.g. on the first Nyxt start)? If we do a NIL check, we won't be able
to modify it.

  • Backward compatibility:
    This change alters the content of both the session and the history. We need
    to include importers to avoid disrupting our users too much.

That sounds about right, but currently we store only a flat hash-table
in history and a list of history-trees in session. Importing history
won't provide enough information to construct a meaningful
history-tree, same for session -- flat structures (list/hash-table)
just cannot store the node-interrelation information that tree needs.

It's radical to just discard all the history, but aren't we moving
towards 2.0? :)

(mapcar #'traverse (next-traversal node)))))))
(if include-root
(traverse root)
(apply #'append (mapcar #'traverse (next-traversal root))))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serapeum has map-tree and walk-tree, can't we use them instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serapeum functions walk the cons cells (because lists are trees too). To use them, we need to transform our trees (represented by history-tree class) into lists. That will probably cost more than using a dedicated function, so there's no benefit in using these.

But sera:map-tree hooked me too, so I might need to see it's source to get some inspiration :)

@Ambrevar
Copy link
Member

Ambrevar commented Nov 20, 2020 via email

@aartaka
Copy link
Contributor Author

aartaka commented Nov 20, 2020

OK, but then I have a question: while we have an existing buffer A with some history, when a new buffer B is created from scratch, how is its history going to start from the root? It seems to me that history-add will add the new history entry of B to the first child of the current history node, which is what A is visiting. Am I missing something?

The implementation is still in writing, so you might've found something I haven't noticed. What it should work like is that when current buffer is switched, current history node is changed too, so that the history of new buffer will start from the proper node. But then, what node should we use if new buffer has no history yet? There are several options:

  • Use the node that this buffer will start it's history from. In case of B, it's root node. This can be useful, but it can also lead to undesirable side-effects on such nodes (e.g. access time of root will be refreshed without user actually visiting it).
  • Create surrogate node. This is bad, because it will make us juggle the surrogate nodes and distinguish them from normal nodes somehow.
  • Refer to the initial history node (e.g. root in in case of buffer B) in the buffer itself. For example, I have a following slot definition that I use for that:
(initial-history-node (htree:current (get-data (history-path (current-buffer))))
                      :type (or null htree:node)
                      :documentation "The history node that this buffer history starts from.
Used to make sure children buffers properly branch from parent buffers in global history.")

As you can see, I've already picked a solution (and you actually proposed it a month ago), but I'm still searching for something more elegant :)

Another option I lean to is to have a *browser*-based switch for whether to clean history.

Can you describe an example use?

nohistory-mode. But then this switch should be a buffer-based one.

The *browser*-based switch was just a customization option for the ones that care about performance of this history tree, but buffer-based option sounds better now when I've thought about it.

I don't understand how bookmark-add is making use of a possibly NIL bookmarks database. Can you explain? Since this is a common use case, what about extending with-data-access to set the bound variable to a default if NIL? A bit like gethash does.

The way that bookmark-add uses NIL value is that it push-es a new bookmark to a list of bookmarks. If it pushes it to NIL, it just creates a list of one element. This behavior is totally intuitive there for bookmarks stored in list. However, NIL-handling can be clunky on non-list values (e.g. hash-tables, history-trees), so your idea on initial value resonates with me!

Not a problem: we can make a "flat tree": a tree of depth 1 where all previous history entries are direct child of the root.

Yes, I thought about it too. One possible drawback is that going forward in history from the root will overwhelm the user with branches that they didn't explicitly create and/or restore from somewhere. Is that enough of an inconvenience? I'd say it is, because it's not mimicking the browsing that user have done.

Losing history of thousand entries is not an option either...

By the way, did you know that the first element of the persisted history / session is the version number? This is convenient to write importers since it allows you to do version checks.

Yes, that's smart :)

@Ambrevar
Copy link
Member

Ambrevar commented Jan 6, 2021

* `history-forwards-query` still triggers a condition (at least when
  running from the REPL).

It errored even before the patch, so I just though it's deliberate :) I can fix
it, though.

OK, I remember now. Off the top of my head, I think I decided to raise a
condition because back then the minibuffer was asynchronous.
I'll look into it.

There is a structural issue with the tree.

1. `make-buffer`'s argument `child-p` is not used consistently.  It seems
   to be only used in `make-buffer-focus` and `set-url-from-bookmark`,
   which is odd since in my opinion it should not be.  However it is not
   used in `%follow-hint-new-buffer` where I would have expected it.
   Misunderstanding, @aartaka?

No, I just forgot about %follow-hint-new-buffer 😅

But what about make-buffer-focus' and set-url-from-bookmark', why to they pass
:child-p t?

We can re-set the ID of the current-history-node of buffer, when we switch to
it. This way, buffer history will always have at least one entry.

Sorry, I don't understand. Can you provide an example?

Yes, that's reasonable. However, what about going more than one node backwards?
E.g. what if there was another page you visited before https://ambrevar.xyz and
you wanted to get there?

I think it's not a problem.

Consider this:

- (ID 2) (URL https://duckduckgo.com)
  - (ID 2) (URL https://ambrevar.xyz/)
    - (ID 3) (URL https://ambrevar.xyz/projects/index.html)

Now in buffer three, if you go back twice, you'd get

- (ID 2) (URL https://duckduckgo.com)
  - (ID 2) (URL https://ambrevar.xyz/)
    - (ID 3) (URL https://duckduckgo.com)
      - (ID 3) (URL https://ambrevar.xyz/)
        - (ID 3) (URL https://ambrevar.xyz/projects/index.html)

But now that I think about it, it's not what we want to do because then we would
lose the information "from which buffer and which URL did buffer 3 spawn".

Maybe our htree data structure is not shaped well for what we are trying to
achieve.

The problem looks to me like we need a nested history-tree:

  • The parent tree (not necessarily a htree) holds the inter-buffer relationships.

  • The child trees hold the buffer-local history.

htree would need to be updated to support adding parents above the "original node". This
means the history-tree class would have one more slot:

  • root
  • current
  • original

Thoughts?

@aartaka
Copy link
Contributor Author

aartaka commented Jan 6, 2021

OK, I remember now. Off the top of my head, I think I decided to raise a
condition because back then the minibuffer was asynchronous.
I'll look into it.

I gave it a shot in 2fa8152. Is that enough of fix?

But what about make-buffer-focus' and set-url-from-bookmark', why to they pass
:child-p t?

Oh, you mean that! make-buffer-focus is usually used in situations when the foucsed buffer is dependent on current one:

  • paste-or-set-url
  • set-url
  • open-file-function
  • %follow-hint-new-buffer-focus
  • set-url-from-bookmark-new-buffer
    All of these need to be connected to the curren history node to faithfully represent one's browsing. That's why make-buffer-focus passes :child-p by default. This, however, doesn't mean that it will do so in any case, it's just an observation of current behavior :)

Regarding set-url-from-bookmark -- the argument is the same -- we need to create child buffer to preserve the browsing order and user's mental picture of it.

However, I can be terribly mistaken about what actions should branch out from the current node. It's too subjective. Maybe we can make it configurable?

Sorry, I don't understand. Can you provide an example?

Based on your example, if user goes back and history looks like that,

  - (ID 3) (URL https://ambrevar.xyz/)
    - (ID 3) (URL https://ambrevar.xyz/projects/index.html)

They can always switch buffer and current-history-node of buffer 2 (set to the node with "https://ambrevar.xyz") will be rewritten to have ID equal to 2, i.e.

  - (ID 2) (URL https://ambrevar.xyz/)
    - (ID 3) (URL https://ambrevar.xyz/projects/index.html)

This way we won't occupy any more space in the tree while still preserving buffers' history, or at least current-history-node that history can be expanded from. Done in f49a60f.

Maybe our htree data structure is not shaped well for what we are trying to
achieve.

The problem looks to me like we need a nested history-tree:

* The parent tree (not necessarily a htree) holds the inter-buffer relationships.

* The child trees hold the buffer-local history.

htree would need to be updated to support adding parents above the "original node". This
means the history-tree class would have one more slot:

* root

* current

* original

We had this idea of buffer-tree holding history-trees on the initial discussion stage xD

This also sounds like one day history-tree will become history-graph 🤔

But my main impression is that it's not that dramatic and we can handle these things with the current implementation of htree. It can always be rewritten later. Let's just try to debug this implementation and rewrite it in the future if we hit some htree grave limitations :)

@Ambrevar
Copy link
Member

Ambrevar commented Jan 6, 2021 via email

@aartaka
Copy link
Contributor Author

aartaka commented Jan 7, 2021

In my opinion, we should branch out only when - Cloning a buffer (for which we don't have a command yet). - Opening a link in a new tab (mouse click of link-hint). All other commands would start from the root.

Yes, that makes sense.

I'm confused: are you saying that the state of the tree depends on which buffer is currently showing? If so, it seems problematic to me. Also maybe you misunderstood the problem I was referring to: if you restore the first history, you'll only have 1 buffer, while you had 2 initially.

Yes, that's problematic.

It seems to me that it's fixable if we don't alter the history of other buffers. However, we need a way to expand a history from it's root, and we need to have a slot that refers to the node from which the history was branched out. In particular, a buffer can jump to anywhere in the global history, including to an unrelated branch. So if we expand this history, we must be able to reconstruct the whole path from the root to the distant node. Another approach: instead of storing a single ID, store the a list of IDs in the history entries. This involves a serious design change though. I'll sleep over it.

The idea of storing all IDs is good, but restoring sessions will become somewhat random, because one last-access stored in history-entry won't be enough to understand when a praticular buffer visited it and to what buffer it should be restored in particular. We need to store all buffer ID and the respective last-accesses these buffers have made to the entry. This way session will be restored solidly, but it'll probably incur a big performance slowdown, even it we store it effectively, e.g. in hash-tables (id -> last-access, for example).

EDIT: senSe.

@Ambrevar
Copy link
Member

Ambrevar commented Jan 7, 2021

I've slept over it and came up with some (hopefully) useful insights!

But first let me answer your remaining points.

Hmmmmmm, I've already made a workaround for that in make-buffer-from-history, I believe:

(setf id (case id
                     (root-id new-id)
                     ;; This is to escape collisions of the new buffer ID with old buffer IDs.
                     (new-id root-id)
                     (t id)))

Doesn't it work for you?

Sadly not, but in any case I don't think this is the right approach: we should
not have collisions at all, because we should remember the whole state (which
buffers browsed which history node).

For now, we just delete existing buffers on session restoration. It's somewhat
harsh, but that's the only way I see to make it coherent. Session restoration
replaces the tree and de-syncs it with the opened buffers, so the best we can do
is delete them. df2f46c is the commit it starts from.

Maybe we don't have to go down this (fragile) road if we keep the whole state.
More on that in my next message.

* `delete-history-entry` over multiple entries fails (at least sometimes).

Maybe it's because we use #'equals in delete-data? Should we use #'eq to
ensure that it's the same object that's deleted?

I am not sure, I haven't studied this part well enough.
My intuition is that we should

  • never delete any entry, only pop it back up towards the root;
  • not alter any node that has at list one child bound to a buffer.

Yes, that's reasonable. However, what about going more than one node
backwards? E.g. what if there was another page you visited before
https://ambrevar.xyz and you wanted to get there?

Sorry, you were right, you can't go back up multiple times if we insert nodes.
Forget it, this is a bad approach.

I gave it a shot in 2fa8152. Is that enough of fix?

Should work, I'll test later.


When I said:

In my opinion, we should branch out only when

  • Cloning a buffer (for which we don't have a command yet).
  • Opening a link in a new tab (mouse click of link-hint).

All other commands would start from the root.

maybe I was too opinionated and you are right, the decision of branching is
subjective. Make it configurable then? That shouldn't be a problem, and we can
probably worry about this later.

The idea of storing all IDs is good, but restoring sessions will become somewhat
random, because one last-access stored in history-entry won't be enough to
understand when a praticular buffer visited it and to what buffer it should be
restored in particular. We need to store all buffer ID and the respective
last-accesses these buffers have made to the entry.

Thank you for mentioning that, I had missed this point!
There is another point of contention that bugged me for a while and that
stopped me from exploring this path further: currently htree's "forward child"
is the first of the children, so there is contention there too. If 2 buffers
share the history, and one buffer changes its forward child, the change will be
made on the other buffer as well, which is not what we want. So we need to
store the association (ID forward-child).

Draft coming up just now.

@Ambrevar
Copy link
Member

Ambrevar commented Jan 7, 2021

Let's recap.

What we want to keep in memory:

  • The inter-buffer relationships, and more precisely from which history node a
    buffer was spawned.

    (This is also one of the reasons we cannot easily insert new nodes above a
    buffer-local history root.)

  • Which history nodes have been visited by a given buffer.

  • The forward-children of each buffer-local history.

  • The last-access of each buffer-local history.

Potential issues we need to deal with:

  1. Try to keep the node count small.
    Having separated history nodes for each buffer is probably too much.

  2. List the buffer history nodes without browsing the whole tree.
    This would make many operations faster.

  3. Last all unique URLs without browsing the whole tree.

For this last issue (3) I guess maintaining a hash-table like we have on master,
side-by-side with the GHT is good enough. But let's talk about it later.

Let me know if you can think of anything else.


I think the core of the problem we are facing is due to us trying to use htree
for something it was not meant to do, i.e. htree was designed for local,
contention-free histories, not for global histories.

So I suggest we overhaul the library to turn it into an actual GHT library. To
that end, I think we need to introduce a new concept, that of owners.

An owner is a unique object that "owns" (or visited) a node. In our case,
buffer IDs are good contenders because they are easy to serialize.

The history-tree current node makes no sense, instead we need to store the
current node for all individual owners, and list all owners in history-tree.

Finally, as mentioned in the previous post, the node children order should not
matter to know which is the "forward child". Instead, we need to keep a
map between the node owner and the next child.

This would give us the following data structures:

(defclass node-info ()
  ((parent-owner :accessor parent-owner
                 :initarg :parent-owner
                 :initform nil
                 :type t)
   (forward-child :accessor forward-child
                  :initarg :forward-child
                  :initform nil
                  :type (or null node)
                  :documentation "Which of the `children' (in a `node') is the
child to go forward to for this owner.")
   (last-access :accessor last-access
                :initarg :last-access
                :initform (local-time:now)
                :type local-time:timestamp
                :documentation "Timestamp of the last access to this node by the
owner."))
  (:documentation "Internal node of the history tree."))
  
 (defclass node ()
  ((parent :accessor parent
           :initarg :parent
           :initform nil
           :type (or null node))
   (children :accessor children
             :initarg :children
             :initform nil
             :type list
             :documentation "List of nodes.")
   (owner-node-info-map :accessor owner-child-map
                    :initarg :owner-child-map
                    :initform (make-hash-table)
                    :type hash-table
                    :documentation "The key is an owner, the value is a
`node-info'.  This slot also allows us to know to which ownder a node belongs.")
   (data :accessor data
         :initarg :data
         :initform nil
         :documentation "Arbitrary data."))
  (:documentation "Internal node of the history tree.")) 
  
 (defclass owner-data ()
  ((origin :accessor origin             ; TODO: Rename to `root'?
           :initarg :origin
           :type (or null node)
           :initform nil
           :documentation "The first node.")
   (current :accessor current
            :type (or null node)
            :initform nil
            :documentation "The current node.
It changes every time a node is added or deleted.")
   (nodes :accessor nodes
          :initarg :nodes
          :initform (make-hash-table)
          :type hash-table
          :documentation "The set of all unique nodes visited by an owner."))
  (:documentation "Owner information.")) 
  
 (defclass history-tree ()
  ((root :accessor root
         :initarg :root
         :type (or null node)
         :initform nil
         :documentation "The root node.
It only changes when deleted.")
   (owners :accessor owners
           :initarg :owners
           :initform (make-hash-table)
           :type hash-table
           :documentation "The key is an owner, the value is an `owner-data'."))
  (:documentation "History tree data structure.")) 

This effectively remembers the whole state (all the points in the first list of
this message).

Let me know if you see something missing.

The node count should be as small as possible since the nodes are shared
every time they can.

We can list the buffer history nodes without browsing the whole tree by
inspecting owner-data nodes.

2 other challenges:

  • Owner removal: remove the owner (buffer ID) from all the
    owner-node-info-map, then remove it from history-tree owners.

    How do we deal with owner-less nodes? I suggest that we reparent nodes to the
    root only when they have no owner and none of their children has.

    Indeed, nodes can be owner-less but still some child node may have an owner.
    In this case, we don't want to reparent the node otherwise the owner would
    lose access to the global "parent" history.

  • Session restoration: Restore the tree, loop over the owners and for each of
    them create a buffer, then loop over the corresponding nodes and replace the
    ID with the one of the new buffer.

The benefits of moving the whole logic to this library is that the code will be
much easier on the Nyxt side. Plus we can test the full logic in just the
library.

Let me know what you think!

@aartaka
Copy link
Contributor Author

aartaka commented Jan 7, 2021

Wow! That ticks all the boxes for me :)

It also resembles UNIX filesystem access model. Coincidence or pattern 🤔

@Ambrevar
Copy link
Member

Ambrevar commented Jan 7, 2021 via email

@aartaka
Copy link
Contributor Author

aartaka commented Jan 7, 2021

Tinkering with htree was fun, but I believe that you're in a better position to make it right. I can integrate it Nyxt afterwards :)

@Ambrevar
Copy link
Member

Ambrevar commented Jan 7, 2021 via email

@Ambrevar
Copy link
Member

Ambrevar commented Jan 8, 2021

I've pushed a draft of the overhauled history-tree library on the
history-tree->ght branch.

I've updated the naming and the slots a little bit compared to the above draft.
In particular, the parent-owner was moved from node-info to owner-data
(now called owner). Indeed, the parent-owner is set once and for all for an
owner at creation time.

Now I'm facing the main challenge: how do we handle owner and node removal?

I took these notes:

Garbage-collect nodes where no recursive children has any owner.
What should we do with them? Store them in a stash in `history-tree'.
If all nodes are always stored in this stash, then it's enough to just
unbind the first top node without owned child.

Or maybe we don't garbage-collect anything on owner deletion.

The stash can NOT give us a convenient way to access all the unique node
data, because the bindings allow 2 nodes with the same data to coexist in a set.
Solution: node's data' should be a class with 2 slots: payload' and nodes'. Then the data' can be held in the stash only.

We need to add a function to cleanup the stash.

  • If it has only disowned nodes not in the tree, easy.
  • If it has disowned nodes in the tree, we need to unbind all these nodes.
    This problem does not exist with the alternative mentioned below.
  • It it has owned nodes, we need to disown them first.

Function to cleanup the nodes owned by an owner? Add 1 function to disown all
nodes but the current nodes. This can also be done node-by-node. If
done on current node, go to parent or child, and if there is none, remove
owner.

The drawback of the above is that it gets hard for the user to delete
disowned nodes.

Alternative: Browse the nodes of the deleted owner, unbind each disowned
node, and for every owned node that has a disowned parent, rebind it to
the root. We lose more information this way, but it's easier.

So for now I'm leaning towards deleting the nodes when they are disowned.
All the owned children are reparented to the root. This way there can never be
"holes" in the middle of tree.

This loses some information, but I think it's saner, because practically how
would the user delete disowned nodes otherwise? There is no buffer to access
them anymore. We could add a command that displays the disowned modes, but this
seems tedious and I doubt any user will ever bother with this. As a result, the
history tree would keep growing constantly, which could quickly become
overwhelming.

Bonus: Adding a stash of all known data (for us: URLs + titles) is a convenient
way to import the old history format.

So here are my questions:

  • Do you like the idea of storing the data/payload not in the nodes but in a
    set instead?

  • Do you like the idea of immediately removing the disowned node, reparenting
    all the (owned) children to the root?

Thoughts? @aartaka @jmercouris

I need to sleep on it, I'll work on a draft tomorrow.

Let me know if anything is not clear and I'll explain (maybe some example would
help).

@Ambrevar
Copy link
Member

Ambrevar commented Jan 9, 2021

Another question: how do we delete individual history entries?
For disowned nodes, it's easy if they are stashed: just remove the nodes from
the stash

But for owned nodes (or disowned nodes that would still be in the tree if we
choose this strategy), it's more complex because we would need to "mark" the
node as deleted and delete it when it gets disowned. However, we would need to
remove the mark if the node is visited again.

Does that make sense?

Otherwise, we could forbid owned node deletion, and only offer to delete
disowned nodes. This could be a bit more limiting, e.g. if a user wants to
delete all Wikipedia entries but one buffer (in the background) still owns a
wikipedia entry, then this one will persist after the deletion, which could be
confusing to the user.

Thoughts?

@aartaka
Copy link
Contributor Author

aartaka commented Jan 9, 2021

So here are my questions:

* Do you like the idea of storing the data/payload _not_ in the nodes but in a
  set instead?

Yes, I do like that! Reminds of smart Linux Kernel linked lists hack :) We will save some memory this way, and it will probably be faster.

* Do you like the idea of immediately removing the disowned node, reparenting
  all the (owned) children to the root?

It breaks the mental picture of history, but it seems to be the best solution given the way history is built. I'm still trying to wrap my head around what's on the branch, though.

@Ambrevar
Copy link
Member

Ambrevar commented Jan 9, 2021 via email

@jmercouris
Copy link
Member

My first thoughts:

  • I do not like the idea of storing the data outside of the nodes and in another structure, I find it confusing. If you want to have fast access to nodes and their attributes, you can maintain a set of pointers in parallel.
  • Instead of reparenting to the root, I would suggest reparenting to the parent of the deleted node

Given tree A->B->C

If we delete B, then the new parent of C should be A.

  • how do we delete individual history entries? (options)
    I think we should forbid it.

@jmercouris
Copy link
Member

I hope I have understood the problem/questions correclty :-D

@Ambrevar
Copy link
Member

Ambrevar commented Jan 9, 2021 via email

@aartaka
Copy link
Contributor Author

aartaka commented Jan 9, 2021

What about adding a `detach-history' command which resets buffer histories by creating a new branch of all the owned nodes? This would effectively detach the history for this buffer, thus possibly freeing the bigger branch it was relying on, along with its dead nodes. We could add multiple-selection to this command, thus allowing to detach the histories of all buffer, which would effectively free all the dead nodes. Then these nodes can be deleted from the stash.

I thought about the same thing, but more in terms of mode for history isolation. Mode will make it visual that the history of the current buffer is not connected to the main one, and it will also hide all the detachment details behind something simpler, like isolate-history-mode or local-history-mode activation command. This will also free the user from choosing the buffer they want to detach, because it is most probably the current one. Extended command version can have minibuffer with multiple selection, though.

@Ambrevar
Copy link
Member

Ambrevar commented Jan 9, 2021

I am not sure about modes, because what happens when we deactivate the mode
then? We can't reattach to the original branch, can we?
Indeed, the branch could have been garbage-collected (i.e. no buffer was owning
any node in the original branch).

I think a command would address all your concerns:

  • The detachment details can be hidden.
  • We can make specialized commands that act on the current buffer.

But maybe I misunderstood what you meant.
Do I make sense?

@aartaka
Copy link
Contributor Author

aartaka commented Jan 9, 2021

Do I make sense?

Yes you do! I agree with all the points, especially the one about mode deactivation.

@Ambrevar
Copy link
Member

Ambrevar commented Jan 9, 2021 via email

@Ambrevar
Copy link
Member

I think I've gotten around all the design issues (maybe minus the thread-safe
aspect).

I've started testing.
It still needs some polishing, shouldn't take too long now.

@Ambrevar
Copy link
Member

I'm almost done with the history-tree overhaul!

What remains to be done:

  • Test owner deletion.
  • Implement owner reparenting.

This is not urgent though and Nyxt can already make use of the new API (it's
just that the tree will never stop growing :p).

@aartaka: Would you like to try to integrate it in Nyxt?

To go forward, we can either keep working on the global-history-tree branch, or
maybe start from a new branch since the former pull request is starting to get
very long. In this case, maybe we don't have to reuse all the commits in
global-history-tree since many of them are obsolete now. As you prefer, @aartaka.

A few questions, for @jmercouris and @aataka:

  • Shall we name the library global-history-tree? Then we would lose
    convenient access to the Git history.

    Or maybe shared-history-tree would be more accurate. Thoughts?

  • Shall we rename the history-tree class to history?

  • Should we have different functions for finding nodes vs. "owned nodes",
    or pass an option as key argument?

  • Same question for visiting "unowned" nodes: shall I add an extra argument to
    forward and back, or shall I make forward-unowned and back-unowned?
    If the latter, suggestions for better names? Maybe rename forward to
    forward-owned instead, and call the "unowned" version forward?

@aartaka
Copy link
Contributor Author

aartaka commented Jan 12, 2021

@aartaka: Would you like to try to integrate it in Nyxt?

Sorry, it seems I've underestimated my time and workload with life and QtWebEngine :) Wouldn't you mind integrating it yourself? It's also going to be faster, I guess.

To go forward, we can either keep working on the global-history-tree branch, or
maybe start from a new branch since the former pull request is starting to get
very long. In this case, maybe we don't have to reuse all the commits in
global-history-tree since many of them are obsolete now. As you prefer, @aartaka.

The choice is yours, but I agree that the current pull request is uncomfortably long.

* Shall we name the library `global-history-tree`?  Then we would lose
  convenient access to the Git history.
  Or maybe `shared-history-tree` would be more accurate.  Thoughts?
* Shall we rename the `history-tree` class to `history`?

Regarding history-tree vs. history: I'd leave -tree there, because it's more of a data-structure library/class than just history-management one. Regarding (shared)|(global)-?history-tree: Not sure about that, but staying with history-tree sounds fine.

* Should we have different functions for finding nodes vs. "owned nodes",
  or pass an option as key argument?

Maybe we can make it a method dispatching on either tree, node, or owner? In the latter case it can search for owned nodes only.

* Same question for visiting "unowned" nodes: shall I add an extra argument to
  `forward` and `back`, or shall I make `forward-unowned` and `back-unowned`?
  If the latter, suggestions for better names?  Maybe rename `forward` to
  `forward-owned` instead, and call the "unowned" version `forward`?

Maybe utilizing methods here too:

  • if it's a tree, then don't consider ownership
  • if it's an owner, then consider ownership

@Ambrevar
Copy link
Member

Ambrevar commented Jan 12, 2021 via email

@jmercouris
Copy link
Member

I have no preferences, the names are OK with me in either case :-)

@Ambrevar Ambrevar mentioned this pull request Jan 27, 2021
10 tasks
@aartaka
Copy link
Contributor Author

aartaka commented Feb 3, 2021

Closing this then?

@jmercouris jmercouris closed this Feb 3, 2021
@jmercouris jmercouris deleted the global-history-tree branch July 28, 2021 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants