-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add primary and foreign key support #274
Conversation
@@ -183,13 +184,25 @@ def discover_typeengine(typ): | |||
|
|||
@discover.register(sa.Column) | |||
def discover_sqlalchemy_column(col): | |||
optionify = Option if col.nullable else identity | |||
return Record([[col.name, optionify(discover(col.type))]]) | |||
optionify = Option if col.nullable else PrimaryKey if col.primary_key else identity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe metafy
is a better name for this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
Can we still convert tables that use fkeys into tabular structures like dataframes and ndarrays? It would be cool if we could pass a |
Yes.
I agree, but I'd like to do that in a separate PR. |
* Compound primary key * Multiple foreign keys from different tables form a compound primary key * Single foreign key reference to one column of a compound primary key * Foreign key references to all columns of a compound primary key
@llllllllll this is now passing, any more thoughts here before merging? |
Nope, merge when ready |
>>> dshape = 'var * {id: !int64, name: string}' | ||
>>> products = resource('sqlite:///db.db::products', dshape=dshape) | ||
>>> products.c.id.primary_key | ||
True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primary-ness might be part of the name rather than the dtype? E.g.
{!id: int64, ...}
Sorry, I started reviewing this but I have enough other things pinging me at about 0.1 hertz. I'll try to take a deeper look at this this weekend. In the absence of actually looking at things I'll just provide a high level concern (tongue in cheek). I'm curious if the nature of relationships between data should be kept separate from the shape and dtypes of that data. If it is not necessary to tie these things together then that might be ideal. Can we say that field X in table T is a foreign key to field Y in table S without knowing what the dtype of those tables are? What does tying this information into the datashape give us? What does it cost us? What would a separate |
The ability to inspect the columns of the parent table to allow automatic generation of joins, which was the main use case. |
I'm completely on board with the idea of encoding this information. My main question is if it should be integrated into datashape or live in some complementary structure. |
Here are the things that separating these things out would have to satisfy that are currently satisfied by my implementation here:
The last one IMO is hard to satisfy without repeating a ton of information or tying blaze symbols to specific names. |
hm i might be able to cheat a bit and use |
We wouldn't repeat information, we'd use the datashape for this I think? The two would be used in concert. My question isn't "is there something other than datashape that we should use". It's, "is there some way that we can pull out the data-relationships to a separate structure?" So lets say that we have two tables
And we intend that Given the names that we have above (unfortunate that we don't have them easily in real life) we might want to encode the following: A -- primary: id
foreign: []
B -- primary: transaction_id
foreign: [user_id->A.id] So in the case where datashapes have identities I think that this problem is relatively easy. We have the datashapes up above, which relate only to the data and the shape. We have this other data-relationship thing below, which describes how they all interact. I suspect that we can use the two of these things in tandem to accomplish any goal that Blaze and Odo would want to accomplish while also keeping datashape somewhat isolated (which is good for projects, like dynd, that only care about data and shapes.) I also find this separation easier to write and reason about although I have a strong bias because this is what came out of my head. |
I'm having trouble seeing on what structure this would sit. I see that for example you could write down what you're saying as {'A': {'primary': ['id'], 'foreign': {}},
'B': {'primary': ['transaction_id'], 'foreign': {'user_id': {'A': 'id'}}}} but where would this thing live? on the datashape? on a blaze symbol? |
In my fantasy world this is a peer to datashape. It would be passed around just as datashapes are passed around now. What's missing for this is that datashapes don't have identities (we're missing the terms |
@mrocklin thanks for the input ... merging on pass |
Add primary and foreign key support
related: blaze/datashape#164