Recursive Links & Recursive Search #1246

aadcg · 2021-03-26T09:47:27Z

Say you'd like to search for a keyword on the current web buffer and its
links. This is a recursive search of depth 1.

To achieve the above, one needs a procedure that we'll call
recursive-links. A set of initial URLs and a depth go in; a set of
final URLs that are at distance depth from each of the initial URLs come
out. Informally, depth is the minimum distance between 2 URLs, i.e. how
many URLs need to be visited to get from one to the other via their
links.

First observation. Only the URLs of depth 1 (i.e. the links) can be
fetched from any URL. Which implies that computing URLs at depth N
requires computing the links of all URLs up to depth N-1 (worst case
scenario). The links can be fetched using a:

web renderer (which often implies making temporary buffers);
HTTP client (think of thedexador library).

Second observation. Let A and B be URLs, such that both have the other
as its exclusive link. What happens when starting from either of them,
and fetching links of depth N>1? (Why did we define depth as the
MINIMUM distance above?).

Fetching URLs at depth N is a recursive procedure coupled with the
guarantee that fetching the links of any given URL is done once and only
once. Notice that this guarantee can't be embedded in the recursion
itself, for it ONLY acts on contiguous depth levels.

This is the core of recursive-links.

When fetching the links of a URL, once can think of multiple
filtrations. A web page can have links to itself (fragments), links to
another scheme (e.g. mailto) and links to other hosts/domains. These
are examples of things one might want to leave out. There are also
interactive filtrations, where the user selects a subset of the computed
URLs at each depth level. More "advanced" filtrations would entail
defining similarity metrics between web pages so that the search happens
on the ones that score higher. Text rank is already in place. This is
food for thought at a later stage.

As hinted above recursive-search is a procedure that relies on
recursive-links. If the latter focuses on computing URLs; the former
makes buffers corresponding to those URLs, only to perform something
akin to search-buffers afterwards. In reality, recursive-search
relies on interactive-recursive-links rather than recursive-links,
since the former is the realisation of the latter using the idea of a
user filtration on each depth level (as mentioned above).

I think a non-incremental search would be better suited for the case at
hand. In fact, the incremental search in place should to build upon a
(future, not yet existent) non-incremental search. While at it, the
incremental search could use a throughout analysis.

aadcg · 2021-04-01T06:51:07Z

Just in case anyone wants to see what I've been up to and wants to give advice.

Ambrevar · 2021-04-01T07:13:52Z

@jmercouris Do you have time to review this?

jmercouris · 2021-04-01T13:11:22Z

I can try to get to this tomorrow.

source/search-buffer.lisp

jmercouris · 2021-04-02T09:48:24Z

Thanks for sharing Andre. I can see that you have been very hard at work! It seems to me that you have thought of all possible angles!

I think I understand your approach. Seems logical. I would still encourage you if possible to submit a single depth example so that we can try out the UI and see what it feels like. We could figure out what the performance is like, and any possible issues so that we can remedy them before implementing arbitrary depth search.

aadcg · 2021-04-02T10:15:52Z

It seems to me that you have thought of all possible angles!

That's me!

I think I understand your approach. Seems logical. I would still encourage you if possible to submit a single depth example so that we can try out the UI and see what it feels like. We could figure out what the performance is like, and any possible issues so that we can remedy them before implementing arbitrary depth search.

That was indeed my plan, but it got frustrated due to a big issue I'm currently working on. Basically, I can only fetch the links of URLs when they're fully loaded (since I need JS). I didn't have this issue before because I was working on depth 1. Notice the hack on 1th-neighbours-from-list using sleep. I know that on-signal-load-finished exists, but that means turning research search into a mode, which I couldn't make much sense of. I'd like to be able to run something when a web buffer is fully loaded. What are your thoughts?

I'm currently cleaning this up.

jmercouris · 2021-04-02T10:26:48Z

I would keep polling the web buffer on a separate thread until it is ready as an interim hack. The correct thing would be to implement some sort of listener model as you've suggested. I also don't have ideas beyond the usage of a hook :-/.

aadcg · 2021-04-02T18:09:37Z

My first entry on this pull request was updated and might be worth reading. It's crucial for me, at least, to understand where I am and where I'm going. Plenty of things to do, in progress. Thanks for the feedback @jmercouris, I'll address it.

source/search-buffer.lisp

aadcg · 2021-04-08T12:23:53Z

@Ambrevar, I'd like to ask for your advice. I've implemented the hook mechanism to run code when buffers are done loading. For my particular case, I need to run some JS when a buffer is loaded, but I also need the values returned by that function. At the moment I'm not able to do that. @jmercouris hinted that this could perhaps be achieved with calispel. I'm wondering if you could give me a hand.

Take a look at commit 4761adc, at the function get-links-from-url. I'm able to run nyxt::get-links, but I need the values returned by that procedure. How would you approach this? Thanks.

Ambrevar · 2021-04-08T18:03:30Z

OK, will look at it tomorrow!

aadcg · 2021-04-08T19:20:32Z

Thank you @Ambrevar.

If you want to see it perform you can already do it. Call recursive-search-from-current-buffer from the prompt-buffer with depth 1 here.

All of the RELEVANT issues are yet to be addressed, but it's getting real close as soon as the issue with running js as soon as a buffer gets loaded will be sorted out.

aadcg · 2021-04-08T19:55:29Z

I cleaned my commits up but now the compiler is mad at me, and I'm not understanding him.

Edit: Ah, but that's ccl, so I guess there must be a reason.

aadcg · 2021-04-09T14:32:36Z

My first entry was again edited.

Ambrevar · 2021-04-09T17:17:01Z

Hey André, sorry I was unexpected out all day, I didn't have time to look at this. I'll come back to it as soon as possible.

aadcg · 2021-04-09T17:35:06Z

Hey André, sorry I was unexpected out all day, I didn't have time to look at this. I'll come back to it as soon as possible.

No worries Pierre! I always have plenty of things to do anyway, so this can be addressed whenever you find some time. À tout à l'heure!

Ambrevar · 2021-04-12T15:20:59Z

The prompt buffer has all the functionality needed. Is it easy to put its contents into a window/buffer/pane where it would use all of the screen height available? Probably Pierre could help.

This is actually our oldest issue: #55.
I want to work on it soon, but probably not before 2.0.
Once we have window management, we can display the prompt buffer in arbitrary ways, including vertically left or right.

source/browser.lisp

source/buffer.lisp

source/search-buffer.lisp

Ambrevar · 2021-04-12T18:16:21Z

André Alexandre Gomes ***@***.***> writes:

Take a look at commit 4761adc, at the function `get-links-from-url`. I'm able to run `nyxt::get-links`, but I need the values returned by that procedure. How would you approach this? Thanks.

Commit 4761adc is gone, but I had a look at get-links-from-url nonetheless. `ffi-buffer-evaluate-javascript`, `pflet` and thus `get-links` all run synchronously, and thus return a value as expected. So is your problem to get the return value of get-links _when run from a hook_? If so, I think what you need to do after sera:add-hook is wait on a channel with `calispel:?` and in the handler write to this channel with `calispel:!`. You can create a channel with `make-channel`. Does that make sense? I've just tested: I tried with a depth of 1, it seems to work already! Of course there are a few things to iron out, but we can discuss this progressively. Good job so far!

aadcg · 2021-04-13T06:29:31Z

Commit 4761adc is gone

Indeed. I didn't update it, sorry about that!

So is your problem to get the return value of get-links when run from a
hook?

If so, I think what you need to do after sera:add-hook is wait on a
channel with calispel:? and in the handler write to this channel with
calispel:!.

Ok, this helps me.

There's still a question: is it possible to run a handler once only?

Ambrevar · 2021-04-13T06:37:36Z

There's still a question: is it possible to run a handler once only?

Have you tried `sera:once`?

Recursive-search-from-current-buffer is a special case of the above.

See #1246 for a detailed description of the rationale.

This method allows the user to filter the computed URLs at each depth level.

Follows property-based and mock testing approaches.

The modes are returned as mode symbols. This differs from the modes slot of the buffer class, since that returns the instances of the enabled modes.

See issue #1299.

Ambrevar · 2021-04-29T09:22:17Z

Sorry for answering a bit late here. One thing I didn't understand:

I hardly understand the comments given by @Ambrevar, so feel free to take the lead.

Which comments and take the lead on what? Please let me know how I can help.

aadcg · 2021-04-29T10:34:23Z

Which comments and take the lead on what? Please let me know how I can help.

No worries! You've helped already. My doubts were related to the implementation of #1332.

jmercouris · 2021-07-27T15:30:59Z

I'm closing this for now, there are many conflicts and it would be unreasonably difficult to merge this. This work is saved on a branch titled recursive-search.

See #1246 for a detailed description of the rationale.