New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 1 to 1 associations at the schema level #2511

Closed
camway opened this Issue Jul 31, 2018 · 22 comments

Comments

Projects
None yet
9 participants
@camway
Copy link

camway commented Jul 31, 2018

Me and another member of the dgraph forum have been going back and forth on how exactly this should be done. Feel free to read

https://discuss.dgraph.io/t/return-single-object-instead-array-for-exactly-one-relation-between-node/2927/7

The solution proposed by bennyrio (the person I've been talking to in the forum) was elaborated on after creating this. I do not want to misrepresent his view on this so below is a direct link to his explanation:

https://discuss.dgraph.io/t/return-single-object-instead-array-for-exactly-one-relation-between-node/2927/8

The essence of my request is that I think we should change the schema so that uid associations are capable of being defined as 1 to 1. Unfortunately, my proposal will be a breaking change, but I think it will make uid predicates consistent with the other predicate types.

Currently a schema of this:

a: uid .
b: [string] .
c: int .

Exhibits these behaviors:

a is capable of returning 0 or more nodes
b is capable of returning 0 or more strings
c is capable of returning 0 or 1 integers

My proposal would look like this:

a: uid .
b: [string] .
c: int .
d: [uid] .

Exhibiting these behaviors:

a is capable of returning 0 or 1 nodes
b is capable of returning 0 or more strings
c is capable of returning 0 or 1 integers
d is capable of returning 0 or more nodes

This would also require that any mutation which would violate this 1 to 1 association should abort with an error.

This would also mean that existing schemas would have to be updated, but I believe this will make for consistency across all predicates types when defining a list vs a single item.

@ahopkins

This comment has been minimized.

Copy link

ahopkins commented Aug 6, 2018

Honestly ... as a newcomer to dgraph ... this is how I initially expected it to work.

If I want to map multiple "things", I wrap it in []. For example: [str]. That makes sense.

EXCEPT if I am mapping to multiple uid. Then I just leave it as uid. That does not make sense.

@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Aug 7, 2018

Hmm... This looks like a very convincing argument. In essence, if you can have multiple edges from one node to another, the schema must be defined as [uid], and the result would be a list of JSON objects. Otherwise, if defined as uid, then there can only be at most one edge and the result would be a JSON map.

I haven't thought through all the implications of this, but as an initial thought, it makes a lot of sense to be this way.

@manishrjain manishrjain changed the title Feature - support 1 to 1 associations at the schema level Support 1 to 1 associations at the schema level Aug 7, 2018

@slawo

This comment has been minimized.

Copy link

slawo commented Aug 9, 2018

There should be a consistency tag put on this.

@insanitybit

This comment has been minimized.

Copy link

insanitybit commented Aug 11, 2018

The proposed behavior is absolutely what I had expected when going into dgraph, and I even just today had to remind myself that [uid] wasn't an option.

@camway

This comment has been minimized.

Copy link
Author

camway commented Aug 13, 2018

@ahopkins and @insanitybit, that seems to be the general consensus. I don't know what it will take to get this implemented, but I think I can safely speak for us (and others) in saying we want this functionality. It seems like this is something many people have found workarounds for, but it seems like it's something the database should do for it's users.

@bandirsen

This comment has been minimized.

Copy link

bandirsen commented Aug 16, 2018

@camway

Thing to consider, predicate with scalar value and predicate with uid relation is different thing.

p: [string] mean, a node can have zero or one of this predicate, and the predicate hold list of string value, its predicate/edge multiplicity is still zero or one, same as p: string, but the value type is different, now its a list of string.

while p:uid mean, a node can have zero or one of this predicate, and p:[uid] mean a node can have zero, one or more of this predicate, which mean predicate/edge multiplicity is changed.

The difference is clear when we do set mutation, in current implementation, when we do set mutation on a already existed scalar value predicate with a new value, it will automatically update the old value with new value (or if it was a list type, it will add new item to the list). On the other hand, set mutating on already existed uid predicate with new uid will add new uid-to-uid relation, which mean predicate/edge multiplicity is changed

Since this feature request is about predicate/edge multiplicity constraints not value type, I think it would be better if we implement this feature as '@' tag, just like @reverse tag, for example:
p: uid @one @index @reverse which mean a node can have zero or one of this predicate.
p: uid @Many @index @reverse** which mean a node can have zero, one or more of this predicate

@ahopkins

This comment has been minimized.

Copy link

ahopkins commented Aug 16, 2018

@bandirsen I cannot speak to the technical implementation reasons for one versus the other. But if it is something that can be accomplished (liminiting cardinality at the dB level), then it seems like a syntactic issue. And, it seems counterintuitive to handle cardinality with brackets in one place and a directive in another.

@camway

This comment has been minimized.

Copy link
Author

camway commented Aug 16, 2018

@bandirsen I'm going to try to proceed through this as methodically as possible. Please tell me where I've gone wrong because I truly do not understand why you are still fighting this change.

A few notes.

p: [string] mean, a node can have zero or one of this predicate, and the predicate hold list of string value, its predicate/edge multiplicity is still zero or one, same as p: string, but the value type is different, now its a list of string.

p: [string] means a node can have zero or more of this predicate. Writing it in your form makes it sound worse than it is. It would be like saying I have (2mill/1mill) options, instead of saying I have 2 options.

while p:uid mean, a node can have zero or one of this predicate, and p:[uid] mean a node can have zero, one or more of this predicate, which mean predicate/edge multiplicity is changed.

In the proposed change, the simplified form is:
p:uid means zero or one
p:[uid] means zero or more

I know you've been pushing for this to be a directive since before I jumped on this bandwagon, and I'm not opposed to that being an option. I don't see any problem with adding a directive such as @one that causes the relationship to drop the array and retrieve just a single item. This would be useful for problems where I want a key in my response such as "most_recent_post" in my response shape. Of course the posts in this case need to hold zero or more, but in this use case it would be useful to not have it as an array. It would also be useful in certain cases to combine this with first, last, and order statements. This was why I didn't ask you to close your github issue, I thought both changes were equally valid.

What you have not addressed is the major inconsistency in the syntax when defining the schema. That is the primary problem this issue is designed to fix, and a fix for it appears no where in your argument. This issue is also designed to allow the database - at its schema level - to address the problem of one to one associations which your argument also fails to address.

The inconsistency issue is the primary problem here. It seems everyone who starts using Dgraph (including me) instinctively tries to use uid and [uid] because after seeing how all other predicates work, this would just make sense. It's counter-intuitive. Unless your solution includes a way to solve this inconsistency, then I cannot discount the value of this proposed change.

Please tell me if I've gone wrong somewhere. I do not understand your stance against this change, and I'm trying my best.

@bandirsen

This comment has been minimized.

Copy link

bandirsen commented Aug 16, 2018

@camway @ahopkins

p: [string] means a node can have zero or more of this predicate

I think I was wrong all this time, I always thought in DGraph treat list type is someway like this.

<aNode> <aPredicate> ["string1", "string2", "string3"]

if it's treat like this:

<aNode> <aPredicate> "string1"
<aNode> <aPredicate> "string2"
<aNode> <aPredicate> "string3"

Then it p:[string] will have same behaviour like p:[uid] for all mutation process and my arguments to recommends predicate multiplicity as @ directive are wrong.

@camway

This comment has been minimized.

Copy link
Author

camway commented Aug 16, 2018

@bandirsen I'd have to defer to the Dgraph team for that. Honestly, the internals of this database are something I find very interesting, but I struggle to understand how much of the system works. Part of this is I just don't have enough available time to invest in order to learn it. The other part - simply put - is it's complicated.

@manishrjain could you or another member of the team give us your thoughts on @bandirsen's post above?

@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Aug 16, 2018

Dgraph stores values and nodes in posting list format. Multiple values for the same (sub, pred) are stored in a single posting list. Similarly, multiple nodes for the same (sub, pred) are stored in a single posting list as well.

We currently enforce that there's only one value when a user specifies a non-list data type (like string, or int). When a user specifies a list type (like [string]), then the posting list holds all the elements of the list in a single posting list object.

Currently, we don't enforce the singularity of a uid type in a posting list, and so for a node connection, a posting list can hold as many uids as provided. With this proposal, uid type would also start to enforce a single uid per posting list for the predicate. Hope that sheds some light on the internals.

@camway

This comment has been minimized.

Copy link
Author

camway commented Aug 16, 2018

@manishrjain Thank you for the fast response. I think that clears up everything for me.

@bandirsen

This comment has been minimized.

Copy link

bandirsen commented Aug 16, 2018

@camway @manishrjain

Thank you for explanation, It's clear now, in dgraph scalar value and node are treated equally, then p:[uid] is better choice.

I come from Neo4J, I always thought, predicate with scalar value is like Node Property and predicate with uid is like Edge, and Neo4J treat Node Property and Edge differently.

@camway

This comment has been minimized.

Copy link
Author

camway commented Aug 16, 2018

@bandirsen No problem here. I've looked into Neo4J, but I've never used it. It's on my list to learn.

@pmualaba

This comment has been minimized.

Copy link

pmualaba commented Aug 27, 2018

@manishrjain
quote: ...Currently, we don't enforce the singularity of a uid type in a posting list, and so for a node connection, a posting list can hold as many uids as provided. With this proposal, uid type would also start to enforce a single uid per posting list for the predicate....

The desired outcome of this proposal is that the GraphQl+- queryresult JSON object, should always return an Object { } for uid predicates, and an Array [ ] for [uid] predicates. I hope this change can be pushed higher up in the priority list, because it is part of the foundation on which the upper layer application code is built.

@camway

This comment has been minimized.

Copy link
Author

camway commented Aug 27, 2018

@pmualaba

The desired outcome of this proposal is that the GraphQl+- queryresult JSON object, should always return an Object { } for uid predicates, and an Array [ ] for [uid] predicates.

I think what @manishrjain was describing is the behavior change "under the hood". What you described is ultimately what the low level behavior change would result in. @manishrjain please correct me if I'm wrong here.

I hope this change can be pushed higher up in the priority list, because it is part of the foundation on which the upper layer application code is built.

I believe the "p1" tag applied stands for Priority 1 (again @manishrjain correct me if I'm wrong). Which would mean it's essentially at the highest priority a feature request can be at.

@ahopkins

This comment has been minimized.

Copy link

ahopkins commented Aug 27, 2018

For what it is worth.... I am building this into my library pydiggy. Not ideal because it is application logic that should probably be handled at the data layer. But it is a basic enough concept that I think it is important for the end user of my library to have this "feature".

The idea is to use type annotations:

class A(Node):
    pass

class B(Node):
    single: A

class C(Node):
    multi: List[A]

To achieve this:

a1 = A()
a2 = A()

b.single = a1

c.multi = [a1, a2]

Then, after querying the DB, and hydrating them back from JSON to Python objects, we still get:

>>> b.single
<A:123>

>>> c.multi
[<A:123>, <A:456>]
@Vliro

This comment has been minimized.

Copy link

Vliro commented Nov 29, 2018

How is this coming along? It is the single most needed feature for our backend, because as of right now objects that are singular are json-parsed as arrays, which makes frontend a nightmare.

@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Nov 29, 2018

Given this is a breaking change, we will make it part of v1.1 release. Eta Jan 2019.

@Vliro

This comment has been minimized.

Copy link

Vliro commented Nov 29, 2018

Thank you for the update @manishrjain . Looking forward to it! Will it be available earlier as part of a nightly release, or merged at the date of arrival?

@martinmr

This comment has been minimized.

Copy link
Contributor

martinmr commented Jan 14, 2019

PR referenced above will address this issue. See PR for more details.

@martinmr

This comment has been minimized.

Copy link
Contributor

martinmr commented Jan 18, 2019

Closing as the fix has been submitted. Feature should be available starting with the 1.1 release.

@martinmr martinmr closed this Jan 18, 2019

@martinmr martinmr self-assigned this Jan 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment