Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relational sets #14

Open
jimfulton opened this issue Mar 13, 2017 · 6 comments
Open

Relational sets #14

jimfulton opened this issue Mar 13, 2017 · 6 comments

Comments

@jimfulton
Copy link
Contributor

jimfulton commented Mar 13, 2017

Lean on PostgreSQL to implement sets.

Objects grow _newt__parents__, which is an array of containers. When a container fetches its contents, it does so by querying objects which has it as one of their parents. Of course, the GC has to be aware of this. An object without references to it would still be non-garbage if it has any non-garbage parents.

Containers "store" their contents through these references. Containers have a length. Container conflicts are always resolvable, because they behave like lengths.

A variation on this is approximately ordered collections. Collections have min and max positions that behave like zope.min-max. When an object is added at the front, the min position is decremented and used for the object position. Similar at the back. The position becomes part of the parent pointers. Positions are not unique.

@jimfulton
Copy link
Contributor Author

I guess I should mention what itches this is scratching.

  • This potentially provides conflict-free containers (assuming that we don't somehow end up with some sorts of conflict-like behaviors on Postgres indexes).

  • It provides a maybe cleaner way to deal with PSets, which are sets of persistent objects that need not be orderable. This gets complicated because PSets based on BTrees require that items have oids, whcih can be awkward for new objects.

  • We often want to maintain forward and backward references between parents and children, which is a DRY violation, on some level. This allows us to push the refs from parents to children down to Postgres indexes, which are derived, and presumably faster.

  • The last part, which is almost unrelated is an idea for something like a scalable persistent list, which is interesting for implementing queues, or perhaps for dealing with things like "news".

@jamadden
Copy link
Contributor

Half (or less) baked idea: I wonder if this might just be a fundamentally differently managed type of object. Instead of treating it as a Python object with a pickle and this weird ( 😄 ) attribute and indexing behaviour, what if it was treated more like a SQLAlchemy-style object backed by a table?

Container OID Contained Object OID

A Connection subclass would recognize a Persistent subclass as this type of object and just go through a different code path (direct SQL) when it came time to load and save it. A sufficiently smart implementation could treat iteration as a series of SQL queries over a very large table if needed. Ordering and uniqueness would be determined by the class of the object and hence the table layout and indexes it was stored in. Interaction with the persistent object cache would be the trickiest part, but that could be seen as a benefit.

@jimfulton
Copy link
Contributor Author

jimfulton commented Mar 16, 2017 via email

@pauleveritt
Copy link

That point about returning living instead of ghosts would be useful for our project. I don't know if you necessarily mean from the JSONB or having the pickles come back in the result. Either way, it would be nice to avoid potentially 20 more SQL requests to get the data needed to fill a query "batch".

Apologies if this is thread-jacking, but one complaint about SQL-backed traversal is, a different SQL query for each hop in the URL. Various schemes (e.g. the Kotti project in Pyramid) have custom traversers which do magical SQLAlchemy stuff to only generate one query. Is this problem is subclass of anything the ticket is scratching? I doubt it...I suspect it's a different topic.

@jimfulton
Copy link
Contributor Author

jimfulton commented Mar 16, 2017 via email

@jamadden
Copy link
Contributor

I'm not sure I follow this, but I think this can all be managed by the
container object as easily.

Perhaps I'm missing some benefit of handling this at the connection level.

In practice there may not be much of a difference. It just seemed like it might be less...invasive? or a better separation of concerns?...to have the connection handle this more directly, since it already knows about the pickle cache and has the database connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants