Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta: flatten meta types #1

Closed
erichanson opened this issue Apr 13, 2019 · 6 comments
Closed

meta: flatten meta types #1

erichanson opened this issue Apr 13, 2019 · 6 comments

Comments

@erichanson
Copy link
Collaborator

The meta identifier type system has been a source of great joy and also much tears. They very nicely encapsulate the identifier of a database object (which can be in some cases four or five separate values) in a single value. This encapsulation makes life easier in many, many scenarios. Overall the concept of meta identifiers is a huge win. HOWEVER. The implementation has a number of problems:

1. They are highly composite

schema_id has just a name value.
relation_id has a relation name value and a schema_id.
column_id has a column name value and a relation_id.
row_id has a primary key column_id, and a primary key value pk_value.
field_id has a row_id and a column_id

The nesting here is hopefully obvious.

While correct in terms of information architecture, five-level-deep composite types are patently ridiculous to work with.

Creating one (without using the handy constructors) looks like:

row(row(row(value->'relation_id'->'schema_id'->>'name'), value->'relation_id'->>'name'), row(value->'role_id'->>'name'), value->>'type')::meta.table_privilege_id

Accessing some variable (without using a handy cast function) looks like:

schema_name := (((column_id).relation_id).schema_id).name

2. Their internal representation is awful

aquameta=# select meta.field_id('bundle','bundle','id','1b5bf8c5-c93c-48f5-81a8-bd39277ecde6','name');
                                                               field_id                                                               
--------------------------------------------------------------------------------------------------------------------------------------
 ("(""(""""(""""""""(bundle)"""""""",bundle)"""",id)"",1b5bf8c5-c93c-48f5-81a8-bd39277ecde6)","(""(""""(bundle)"""",bundle)"",name)")
(1 row)

Unfortunately we cannot override this behavior, because in postgres, composite types do not have input or output functions, but all share a single global (and dumb) function.

3. They cause havoc in datum.js

So, when endpoint's REST API selects rows, it checks to see if that row's type is JSON or a composite type, and if it is, sends the row over the wire as a JSON object. When pushing the object back to the database, ... really not sure what happens, but I doubt it works. Basically the REST API has never worked properly with composite types, which has been a massive bottleneck in terms of UI design of the schema admin, bundle admin, user admin, and more. They don't work, and the obtuseness of their structure makes designing a sane REST API to handle them...formidable.

We need to redesign them entirely, to be flat, non-composite types.

Functionality to retain

These patterns of usage should continue to work:

1. Constructors

select meta.field_id('a','b','c','d');

2. Casts from one type to another

select field_id::meta.schema_id;

3. Casts to/from JSON

select meta.field_id::json

Improvements

1. Easily accessible (flat) value namespace

select field_id.schema_name;

2. Casts to/from text

We already have casts to text for some types, however not full coverage. Text representation should be readable, unique and unambiguous. We need to make use of quote_ident and quote_literal in our casting, and test against identifiers that contain quotes, slashes, etc.

aquameta=# select meta.field_id('bundle','bundle','id','1b5bf8c5-c93c-48f5-81a8-bd39277ecde6','name')::text;
                        field_id                         
---------------------------------------------------------
 bundle/bundle/1b5bf8c5-c93c-48f5-81a8-bd39277ecde6/name
(1 row)

Desired Outcomes

  1. Support for CRUD on meta.* from datum.js
  2. Version control on meta.* from bundle

These are the two biggest objectives still ahead, and this is the source of all the problems.

Implementation Ramifications

This is a central pillar in the architecture, a high-flying buttress. Pulling it out will be highly impactful throughout the system. Test coverage will preserve functionality in bundle and meta, presuming tests don't rely too heavily on internals and instead use constructors and casts. Need a endpoint test suite.

Implementation Approach

The constructors for the types should be a pretty good indicator of how the flattened types should be structured. Do that.

We need to figure out what bundle is going to store in blob.value for, say, a meta.column_id, be it the text representation or ... whatever postgresql outputs for non-composite types...? Since these types won't be composite anymore, we have hope of writing input/output functions that do the right thing without the incessant explicit casting.

@erichanson erichanson changed the title epic: refactor meta types meta: flatten meta types Apr 13, 2019
@micburks
Copy link

👍

Definitely agree. I was always defensive of meta types just because they were so correct, but you have a much better argument. Composite types have always made development difficult, especially when prototyping in psql. I think there is a way to make them work from datum right now, but it's not simple.

What do you recommend as the approach to migration? I'm sure you don't want to lose the actual data export. I don't think we store meta types often, but bundle is chock full of them.

I'm not sure about performance or validation, but it would be nice to always use the text representation of meta types when storing them. That way you can decouple their usage from their internal structure. Say you want to rename a field in the type, you don't have to do a migration.

Along those lines, I can't remember if there is any sort of validation you can apply to types. It'll be necessary to make sure a text representation actually fits the structure of the type, but it would be even more impressive to actually validate that the id represents an actual database object (i.e. a row_id points to a real row).

This would also be a good time to make sure you're happy with the field names in the type, e.g. relation_id.schema_name could maybe even just be relation_id.schema, and so forth.

@erichanson
Copy link
Collaborator Author

Cool.

Retaining data: Yes, migrating that will be challenging. I started a branch. For bundle internals values like all the row_ids and field_ids I think it'll be possible to just manually cast and update the values by hand, and rexport the bundle. We are doing version control on just a few meta entities, I think it's mostly in semantics only.

Re: using text rep in bundle export: Yes! I think performance ramifications will be minimal. We'll probably get a speedup overall, because composite type comparisons and casting is extremely slow. I remember actually running into an instance where comparing one field_id to another was like 100x slower than casting both to text and comparing the strings.

Re: validation, certainly we should be validating the strings as containing valid values for all the idents and literals. There are quire a few places where say bundle contains a row_id that isn't live in the working copy. Having that level of exists() validation could be really handy in some scenarios but other times it doesn't apply, like say when a bundle is fetched but not checked out.

schema_name vs schema.... hmm. I like _name because it distinguishes from _id and implies that it is text. I could imagine with realtion_id.schema people might expect a schema_id. Pretty sure I'm happy with it.

@erichanson erichanson transferred this issue from aquametalabs/aquameta Oct 2, 2019
@erichanson
Copy link
Collaborator Author

After flatting the types, I have to import and rexport all the bundles since their internals now will use flat meta-identifiers. This is what progress looks l ike:

Screenshot from 2023-02-04 20-52-52

@erichanson
Copy link
Collaborator Author

erichanson commented Feb 6, 2023

Quick summary of the refactor so far:

  1. Resurrect the meta-meta-coverage.sql tests, which is meant to inspect the database and check to see that for each "meta-entity" (schema, table, column, field, row, role, policy, foreign data wrapper, etc.) there exists all the appropriate meta-identifier types, views, insert update and delete triggers, casts to/from json, casts to/from less-specific types, etc. It's a big list. The coverage map revealed that we are not even close to having full coverage. I made a brief video about it.
  2. Rather than try to refactor all this code, I made a code generator that has a table for "meta-entities" and "entity-components", and for each component, a statement generator. That way, if something needs to be changed or added, I can change it upstream and regenerate, instead of changing it in 20 different places. It works really well. The output is here, some ~240 statements, and growing.
  3. Refactor the meta views to use the new identifiers. This was a big job and probably still isn't perfect, but tests are all passing. Tests had to be refactored as well.
  4. Refactor bundle to use the new identifiers. SO satisfying, erasing really ugly code and replacing it with clean code.
  5. Refactor endpoint to use the new identifiers. In doing so, found a ton of ancient cruft in the endpoint code and re-organized it a bit. I removed everything except the data model and request handlers, and then ran it and saw what was broken, and then added whatever was missing back in. There was a lot of unused code that hasn't caused any problems by being removed (yet). Plus, all the non-request-handler stuff is isolated and most of it probably needs to be audited and removed.
  6. I used the code refactor feature of JetBrains IDEA to rename the new flat meta and bundle to meta2 and bundle2, so that they can co-exist with an old v0.3 version of Aquameta, aka run in parallel. Then, copied all the core bundles from bundle to bundle2, then exported them from bundle2 into CSVs in the new format. This all worked surprisingly well.
  7. Renamed bundle2 and meta2 back to meta and bundle, and fired up the server. Found six or seven bugs or missing things, fixed them, and the IDE mostly comes up.
  8. The IDE falls down because the widgets haven't been refactored. The base ones still work, but when it starts getting into the more complicated stuff that leans on meta and the meta-identifiers, it still contain a lot of old-style non-flat identifiers. So, the IDE is bricked. There's no good way to edit widgets without the IDE, so we have neither chicken nor egg. Much tears.
  9. Decided to try to resurrect pgfs.py, but it was written in Python 2, and the Python3/fuse ecosystem seems to be in total disarray. Pip couldn't install fuse, nothing seemed to be working. More tears.
  10. Decided to explore resurrecting pgfs in Go. Work is ongoing. I could just figure out how to unbrick the damn widgets, it's probably not that hard, but this situation is exactly why something like pgfs is so necessary. So pull over and build it. Won't be too hard. The bots will help.

Overall, a very challenging but successful refactor. The end is in sight, just fix the widgets, and just build the tool necessary to do so. Mickey says yak shaving, ha. This issue was the last piece of low-level architecture that I know of that was absolutely wrong and touched every aspect of the system. Much lessons learned, it was a "correct" decision at the time, but has plagued the architecture from the beginning. Much thanks but goodbye.

@erichanson
Copy link
Collaborator Author

Meta identifiers have been unnested and are now flat, aka no meta types contain any other meta types. In the refactor, the entire process of how identifiers are created and maintained has changed:

Meta identifiers are now automatically generated progratically, instead of maintaining a massive static SQL file. For each of the 26 meta-identifiers, the script creates 17 statements that flesh out the identifier such as type, constructor function, casts to and from json and jsonb, etc. As additional meta-identifiers are added to the system, the entire file can be easily regenerated using the generator system.

@erichanson
Copy link
Collaborator Author

erichanson commented Jul 12, 2023

To convert a bundle from nested-types format to flat-types format, here's some regexes:

rowset_row.csv:

:%s/("(""(""""//g
:%s/)"""",/,/g
:%s/)"",/,/g
:%s/)",/,/g

rowset_row_field.csv

:%s/("(""(""""(""""""""//g
:%s/""""(.*)"""",.*)"",/,/g
:%s/)"""""""",/,/g
:%s/)"""",/,/g
:%s/)"",/,/g
:%s/)","(""(,/,/g
:%s/)")/)/g

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants