Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing & querying nested resources #16

Closed
joepio opened this issue Sep 19, 2020 · 4 comments
Closed

Storing & querying nested resources #16

joepio opened this issue Sep 19, 2020 · 4 comments

Comments

@joepio
Copy link
Member

joepio commented Sep 19, 2020

All Atomic Data Resources that we've discussed so far have a URL as a subject.
Unfortunately, creating unique and resolvable URLs can be a bother, and sometimes not necessary.
If you've worked with RDF, this is what Blank Nodes are used for.
In Atomic Data, we have something similar: Nested Resources.

Let's use a Nested Resource in the example from the previous section:

["https://example.com/john", "https://example.com/lastName", "McLovin"]
["https://example.com/john https://example.com/employer", "https://example.com/description", "The greatest company!"]

By combining two Subject URLs into a single string, we've created a nested resource.
The Subjet of the nested resource is https://example.com/john https://example.com/employer, including the spacebar.

So how should we deal with these in atomic_lib?

Approaches

Store nested in their parent as Enum

In both Db and Store, this would mean that we make a fundamental change to the internal model for storing data. In both, the entire store is a HashMap<String, Hashmap<String, String>>

We could change this to:

HashMap<String, Hashmap<String, StringOrHashmap>>, where StringOrHashMap is some Enum that is either a String or a hashmap. This will have a huge impact on the codebase, since the most used method (get_resource_string) changes. Don't think this is the way to go.

Store nested in parent as Value

An alternative is to not store the string representations, but store the Values in the store. Currently, all Atom values in the store are strings. We could changes this to store Values. Some performance implications:

  • Serializing the string representations (ad3) would be slower.
  • Serializing to non-string representations would be faster (e.g. JSON)
  • Using the data in some structured way would be faster.
  • Adding data to the store from highly-optimized serialized formats (AD3) would be slower.

This would also

Store as new entities, with path as subject

In this approach, the nested resources are stored like all other resources, except that the subject has two URLs with a spacebar. This has a couple of implications:

  • When deleting the original resource, all its nested ones will not be deleted (but should be), so this requires some extra logic
  • When iterating over all resources, we can no longer assume that every single Key (subject) is a valid URL.

Store inside parent resource, with path in Property URL

Similar to the approach above, but in this approach we use the Property URL to store nested paths. Implications:

  • Iterating over the Properties will not result valid Properties - these must be split up.
  • Finding some nested value needs a range query: select all properties that start with some string

Store all Atoms as BtreeMap<Path, Value>

Perhaps it makes sense to store all Atoms in something such as BtreeMap<Path, Value>, where the path is the subject followed by a property path (one or more property URLs). This should work by using BtreeMap's (and Sled's) range function to select all the right properties.

API design

And what should the API for the library user look like? Should a nested resource be a special Value? This seems sensible. However, in reality it is just a regular AtomicURL.

Serialization

Let's go over serialization by example. Let's assume a Resource of a person with some nested friends.

JSON

This is the easiest. Simply nest an object!

{
  "@id": "https://example.com/arthur",
  "name": "Arthur",
  "friends": [{
     "name": "John"
  }, {
    "name": "Suzan"
  }]
}

Note that these nested resources don't need to have an @id field, contrary to the root resource. Their identity is implicit.

AD3

JSON has nice nesting, but AD3 is originally designed to be very flat. If we use the Subject field to store paths, we get quite long subjects. This gets a bit awkward:

["https://example.com/arthur", "https://example.com/friends", ["https://example.com/arthur https://example.com/friends 0", "https://example.com/arthur https://example.com/friends 1"] ]
["https://example.com/arthur https://example.com/friends 0", "https://example.com/name", "John"]
["https://example.com/arthur https://example.com/friends 1", "https://example.com/name", "Suzy"]

The first Atom seems entirely redundant - it provides no more information than the second two. However, leaving it out might cause issues down the line: imagine if I'd GET https://example.com/arthur, but the first atom didn't exist. It would return no atoms - it would be empty. In order to prevent this, we could tweak the store a bit, so that a GET will search for all subjects that either are the URL, or start with the URL followed by a spacebar.

Another approach might be to nest triples in AD3, too:

["https://example.com/arthur", "https://example.com/friends", [[["https://example.com/name", "John"]],[["https://example.com/name", "Suzy"]]]

But this, too, is ugly and not human readable. JSON might be the way to go.

@joepio
Copy link
Member Author

joepio commented Sep 26, 2020

Perhaps it makes sense to start with the atomic_lib::Resource API - which does not support nested resources yet.

I think something like this makes sense:

let resource = Resource::new();
let nested_resource  = resource.new_nested("https://example.com/someProp");
nested_resource.set_prop_shortname("description", "me is nested");

With this API, the nested_resource needs some reference to the store, just like the Resource does (otherwise, changing props would not do anything).

A different approach, where we set a nested resource as a value:

let resource = Resource::new();
let nested_resource  = resource.new_nested();
nested_resource.set_prop_shortname("description", "me is nested");
resource.set("https://example.com/someProp", nested_resource);

Now, let's consider a case where theres a 1-N relationship, instead of a 1-1. Perhaps we've got a Blog with some posts.
How should we append a blogpost to our blog, assuming it's a nested resource?

let blog = Resource::new();
let post  = blog.new_nested();
post.set_prop_shortname("text", "I'm a blogpost!");
blog.set_prop_shortname("posts", Vec::from(blog.get("posts")).push(post));

Yuck... Perhaps we need some functions to deal with array (of nested resources):

let blog = Resource::new();
let post  = blog.new_nested();
let postslist = blog.get_as_vec("posts");
postslist.push(post);
// Maybe we won't need this following one, if the get_as_vec is awayre
let blog.post = postslist;

@joepio
Copy link
Member Author

joepio commented Oct 17, 2020

For now, I think I'm going to try storing Values instead of Strings in the store. This means refactoring quite a bit of code.

@joepio joepio mentioned this issue Oct 17, 2020
5 tasks
joepio added a commit that referenced this issue Oct 17, 2020
joepio added a commit that referenced this issue Oct 17, 2020
joepio added a commit that referenced this issue Oct 25, 2020
joepio added a commit that referenced this issue Oct 25, 2020
@joepio
Copy link
Member Author

joepio commented Nov 22, 2020

Currently, nested resources (at least in the Commit.rs impl) are serialized using Hashmap's auto conversion... Not nice.

This needs more thought. I think the AtomicURL datatype should be removed, and should be replaced by a Resource datatype, which can be either a URL or a Nested Resource.

The Nested Resource should be serialized as a JSON object in JSON, and perhaps in some other way in AD3.

@joepio joepio mentioned this issue Jan 22, 2021
6 tasks
@joepio
Copy link
Member Author

joepio commented Jan 24, 2021

Should I allow both Nested and URL resources in an array?

/// An individual Value in an Atom, represented as a native Rust enum.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub enum Value {
    AtomicUrl(String),
    Date(String),
    Integer(isize),
    Markdown(String),
    ResourceArray(Vec<ResourceValue>),
    Slug(String),
    String(String),
    /// Unix Epoch datetime in milliseconds
    Timestamp(i64),
    NestedResource(PropVals),
    Boolean(bool),
    Unsupported(UnsupportedValue),
}

#[derive(Clone, Debug, Serialize, Deserialize)]
pub enum ResourceValue {
    NestedResource(PropVals),
    AtomicUrl(String),
}

@joepio joepio closed this as completed Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant