New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solid-search compatibility #275
Comments
Excellent news, congrats! If you're interested about exchanging ideas, my team has quite some experience with this. Happy to discuss.
Excellent!
Given that you want this to be reusable across multiple servers, I'll write how I envision such types of agents to function for the Solid ecosystem in general. Conceptually, an agent providing a service in a network (such as, in your case, index and search) should subscribe via Linked Data Notifications to the source it wants to index. Whenever a change occurs in that source, the source will generate a Linked Data Notification and send it to the service, upon which the service performs it task. (Possibly, subscriptions need to be periodically renewed.) This is the general mechanism. The case where the service and the pod run on the same server (or even in the same process) is an optimization. That is: the LDN mechanism would still work, it's just that there are more efficient channels available for local communication. Architecturally, you could use the decorator pattern to implement a I'd strongly recommend to consider implementing it the notification way. I can offer help from @Dexagod, who is working on notifications. Also CC'ing @csarven.
Mmm, I think several things are unnecessarily conflated here.
In general, I see no need for delta processing (as explained above), and even if that were the case, no need for the pod to be responsible for delta storage. But if there is, we got you covered.
See my above suggestion of designing it as an independent agent, for instance, as a Docker container.
There is no problem with an external search or indexing service. Clients just follow links, and they don't care whether they go to the same domain or a different one. We might want an explicit trust statement "this service can be an indexer for my data", but that's about it. That said, the server could perfectly proxy. If you are implementing this in a JavaScript-ish language, I'd suggest implementing the
Additional thought: integrate early. I haven't seen the details of your planning, but I wonder why the integration would come this late. In a fail-fast mode, this is what I would test first. All the rest, I can pretty much imagine how it will work, given that good indexers etc. are available. Really excited about this project, and always reachable for a chat! |
Thanks for the comments, and thoughts @RubenVerborgh! Very helpful. You suggest replacing the deltas with linked data notifications. These notifications do not contain descriptions of how the resource has changed, which means that the service listening to notifications (in this case the search index) needs to fetch each individual document again and rebuild its index. This would lead to horrible performance, and this will happen with every system that has a dependency to changing resources. I think this is why we really need to have some bus / event log / delta store where the changes are persisted and made accessible to modules, such as a search module.
For the search itself, it's very unlikely to be JS - performance is crucial, and most performant search engines are powered by system level languages or maybe java (lucene). I personally think the Rust based Sonic project is really interesting for solid-search, as it is incredibly fast, lightweight and returns just URLs (which can be resolved in a pod anyways). Lightweight is important, as I want to let people run their own solid pods on lightweight (arm-based) devices. But anyway - it will probably also be made available as an independent docker-image.
Gotta agree with that, I'll change the planning!
Thanks, we'll be in touch soon enough! |
LDNs could perfectly be designed to include such a description, and there are indeed good reasons to do so.
So no 🙂
In principle, you could be right, but let's not make such assumptions before we adequately calculate and/or measure.
It is crucial that we agree that the interface and storage are different, orthogonal concerns. Whether or not a notification (= interface) contains a delta is completely independent of whether or not the back-end (= storage) contains deltas. It's really important to keep these separate in order to have a clear design discussion.
That's okay for the back-end.
WebAssembly could be interesting. |
Hi @RubenVerborgh! I have an update, the search server is working. I've got some documentation right here, although the code itself still has to be merged. Please read the docs upon reading further. So I'm looking for how to integrate this into CSS. IIRC, there's a bunch of middlewares in CSS which could execute something, like POSTing a turtle representation of the resource to some endpoint. If we can get that working, we're mostly there. Next step is some strategy for running the search app. It's a rust binary or a dockerized image, whichever you prefer. I can imagine it makes sense to simply Now, we need to add the route to the search instance. We can link to the built-in front-end, but that would expose all data to the public - probably not the best idea! So I think the other approach it to make the endpoint available behind some authorization check. What are your thoughts? |
Hi @joepio!
Yes indeed. You could either:
I suggest the second; you then basically create a component that listens to a source, which will tell you when something changes.
Docker seems good. We already have Dockers (in the integration tests) for a SPARQL endpoint and Redis.
Is there any authentication going on with the search server? An approach that I have used previously, relied on there being a low number of unique users per pod. As such, I created indexes for every user: https://github.com/solid/solid-tpf Alternatively, or as a start, we can only expose that interface to the owner of the pod.
In any case, you should be able to reuse existing authentication components such as
|
Hi there! I'd love to give this another go, but I feel like I need a bit of help getting started when I want to run everything locally. Is there a contributer who might have a moment to help me out in a video call for an hour sometime this week? @joachimvh maybe? I'd be grateful! |
@joepio sure, send me a mail or a message though slack/gitter to arrange. There is not that much "this week" left though 😄 |
Thanks to @joachimvh, we've been able to set up a repo that almost works. It contains very simple logic: post Turtle documents to some endpoint (atomic-server) whenever a resource is updated. There's still at least one major thing that's holding us back: the exported class ( Some thoughts from Joachim:
Seems like I should make an Initializer from |
That is actually exactly what I meant. It's just that it also requires a few lines of config to work. The I'll first translate and explain the snippet above. When you run Components.js you have to tell it the URI of the component it needs to generate. It will then instantiate that component and its constructor parameters recursively. This means that your component will only be instantiated if there is a path from our entry component to your component. That URI is hardcoded here:
Currently there is no path from our Somewhere in our config we have a list of CommunitySolidServer/config/app/init/default.json Lines 8 to 11 in 3c32466
Once your class extends {
"@type": "SearchListener",
"@id": "urn:solid-server:search:SearchListener"
"source": { "@id": "urn:solid-server:default:ResourceStore" },
"store": { "@id": "urn:solid-server:default:ResourceStore" },
"searchEndpoint": "http://example.org/my-search-endpoint"
},
{
"@id": "urn:solid-server:default:ParallelInitializer",
"@type": "ParallelHandler",
"handlers": [
{ "@id": "urn:solid-server:search:SearchListener" }
]
} Note that an |
@joachimvh Thanks for the help again! I've modified the config file, and
I'm also not entirely sure if my config files are correct now, they seem to have some duplicate info 1, 2. |
https://github.com/RubenVerborgh/solid-hue might provide help; do you use the |
Thanks, adding |
Feels like I'm getting real close! I'm now getting errors when trying to get the Resource: await this.store.getRepresentation(changed, { type: { 'text/turtle': 1 } })
Also tried without the option ( |
You should indeed put that information in only one of the two. I would suggest putting everything in the The error message you're getting is quite weird. For some reason the JSON files that are generated to keep track of resource locks are invalid. Does the server process have write rights on the disk? If you still have a |
Removing my local files solved the issue! Thanks again @joachimvh 👏 |
Two problems: Can't get turtle representationsI'm trying to get the turtle representation of the resource with This function returns a I need the turtle string, but for some reason I'm getting an error when I try to turn the Readable into a string:
Not sure if the If I remove the
and
I could get around this for now by simply ignoring all resources that aren't valid turtle, but that's not a real solution I think.
|
This is going to happen if the target resource is not an RDF resource.
It's because the target resource is JSON and the converter is trying to interpret it as JSON-LD.
That actually is the solution I think. Or you could first check the content-type in the metadata to check if the data is RDF. All this data you're seeing is internal data.
Can you double check that |
Thanks, I'll use the
Yeah, I can verify this. When I do
I don't see any response, but I also can't read the file after posting.
Gives the same response (empty) as
I'm trying to work with a clean setup (removing my EDIT: It was |
It's working! Thanks so much @RubenVerborgh and @joachimvh :) Closing for now. Any questions / issues can be posted here: https://github.com/ontola/solid-search-community-server/issues |
I'll admit I've only skimmed this, but does But I think the JSON case should be a 406 for us, no, @joachimvh? And also the
|
You're right, it works, but it throws errors for non-turtle / internal JSON resources:
I'll revert back to using |
The returned status code is a 302 though.
Indeed, that's the reason for the |
Ah, I guess I don't use |
We (Ontola) have been awarded a grant to implement a full-text search module for Solid pods, called Solid-Search. We're planning on linking this newly created module to this community-server. The goal is to have a re-usable search module that can be used by various RDF servers (most notably solid pods).
Here's some thoughts on how we'll be doing this:
We'll start development of Solid-Search December this year, and plan to link it to community-server in August 2021. Any thoughts are very welcome!
The text was updated successfully, but these errors were encountered: