Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add primary and foreign key support #274

Merged
merged 32 commits into from Sep 4, 2015
Merged

Add primary and foreign key support #274

merged 32 commits into from Sep 4, 2015

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Aug 2, 2015

@cpcloud cpcloud self-assigned this Aug 2, 2015
@cpcloud cpcloud added this to the 0.3.4 milestone Aug 2, 2015
@@ -183,13 +184,25 @@ def discover_typeengine(typ):

@discover.register(sa.Column)
def discover_sqlalchemy_column(col):
optionify = Option if col.nullable else identity
return Record([[col.name, optionify(discover(col.type))]])
optionify = Option if col.nullable else PrimaryKey if col.primary_key else identity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe metafy is a better name for this now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@llllllllll
Copy link
Member

Can we still convert tables that use fkeys into tabular structures like dataframes and ndarrays? It would be cool if we could pass a follow_fkey flag to odo that says to generate a flat structure or the nested structure based on the foreign keys.

@cpcloud
Copy link
Member Author

cpcloud commented Aug 5, 2015

Can we still convert tables that use fkeys into tabular structures like dataframes and ndarrays?

Yes.

It would be cool if we could pass a follow_fkey flag to odo that says to generate a flat structure or the nested structure based on the foreign keys.

I agree, but I'd like to do that in a separate PR.

@cpcloud cpcloud changed the title WIP: Add foreign key support in odo Add foreign key support in odo Sep 4, 2015
@cpcloud cpcloud changed the title Add foreign key support in odo Add primary and foreign key support Sep 4, 2015
@cpcloud
Copy link
Member Author

cpcloud commented Sep 4, 2015

@llllllllll this is now passing, any more thoughts here before merging?

@llllllllll
Copy link
Member

Nope, merge when ready

>>> dshape = 'var * {id: !int64, name: string}'
>>> products = resource('sqlite:///db.db::products', dshape=dshape)
>>> products.c.id.primary_key
True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primary-ness might be part of the name rather than the dtype? E.g.

{!id: int64, ...}

@mrocklin
Copy link
Member

mrocklin commented Sep 4, 2015

Sorry, I started reviewing this but I have enough other things pinging me at about 0.1 hertz. I'll try to take a deeper look at this this weekend.

In the absence of actually looking at things I'll just provide a high level concern (tongue in cheek).

I'm curious if the nature of relationships between data should be kept separate from the shape and dtypes of that data. If it is not necessary to tie these things together then that might be ideal. Can we say that field X in table T is a foreign key to field Y in table S without knowing what the dtype of those tables are? What does tying this information into the datashape give us? What does it cost us? What would a separate data-relationship thing look like in isolation from datashape?

@cpcloud
Copy link
Member Author

cpcloud commented Sep 4, 2015

What does tying this information into the datashape give us?

The ability to inspect the columns of the parent table to allow automatic generation of joins, which was the main use case.

@mrocklin
Copy link
Member

mrocklin commented Sep 4, 2015

I'm completely on board with the idea of encoding this information. My main question is if it should be integrated into datashape or live in some complementary structure.

@cpcloud
Copy link
Member Author

cpcloud commented Sep 4, 2015

Here are the things that separating these things out would have to satisfy that are currently satisfied by my implementation here:

  • Convenient to write down
  • Odo can use it to create and infer relationships
  • A user of blaze doesn't have to tell blaze what the relationships are and t.column.fields must return the fields of the parent table. Where else is this information encoded besides the dshape?

The last one IMO is hard to satisfy without repeating a ton of information or tying blaze symbols to specific names.

@cpcloud
Copy link
Member Author

cpcloud commented Sep 4, 2015

hm i might be able to cheat a bit and use sa.orm.relationship

@mrocklin
Copy link
Member

mrocklin commented Sep 4, 2015

The last one IMO is hard to satisfy without repeating a ton of information or tying blaze symbols to specific names.

We wouldn't repeat information, we'd use the datashape for this I think? The two would be used in concert. My question isn't "is there something other than datashape that we should use". It's, "is there some way that we can pull out the data-relationships to a separate structure?"

So lets say that we have two tables

A = var * {id: int, name: string}
B = var * {transaction_id: int, user_id: int, amount: float}

And we intend that B.user_id -> A.id.

Given the names that we have above (unfortunate that we don't have them easily in real life) we might want to encode the following:

A -- primary: id
     foreign: []
B -- primary: transaction_id
     foreign: [user_id->A.id]

So in the case where datashapes have identities I think that this problem is relatively easy. We have the datashapes up above, which relate only to the data and the shape. We have this other data-relationship thing below, which describes how they all interact. I suspect that we can use the two of these things in tandem to accomplish any goal that Blaze and Odo would want to accomplish while also keeping datashape somewhat isolated (which is good for projects, like dynd, that only care about data and shapes.) I also find this separation easier to write and reason about although I have a strong bias because this is what came out of my head.

@cpcloud
Copy link
Member Author

cpcloud commented Sep 4, 2015

I'm having trouble seeing on what structure this would sit. I see that for example you could write down what you're saying as

{'A': {'primary': ['id'], 'foreign': {}},
 'B': {'primary': ['transaction_id'], 'foreign': {'user_id': {'A': 'id'}}}}

but where would this thing live? on the datashape? on a blaze symbol?

@mrocklin
Copy link
Member

mrocklin commented Sep 4, 2015

In my fantasy world this is a peer to datashape. It would be passed around just as datashapes are passed around now. What's missing for this is that datashapes don't have identities (we're missing the terms A and B.)

@cpcloud
Copy link
Member Author

cpcloud commented Sep 4, 2015

@mrocklin thanks for the input ... merging on pass

cpcloud added a commit that referenced this pull request Sep 4, 2015
Add primary and foreign key support
@cpcloud cpcloud merged commit 11d7521 into blaze:master Sep 4, 2015
@cpcloud cpcloud deleted the foreign-keys branch September 4, 2015 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants