New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add primary and foreign key support #274

Merged
merged 32 commits into from Sep 4, 2015

Conversation

Projects
None yet
3 participants
@cpcloud
Member

cpcloud commented Aug 2, 2015

@cpcloud cpcloud referenced this pull request Aug 2, 2015

Merged

Foreign key support #1192

@cpcloud cpcloud self-assigned this Aug 2, 2015

@cpcloud cpcloud added this to the 0.3.4 milestone Aug 2, 2015

@cpcloud cpcloud added the enhancement label Aug 2, 2015

@@ -183,13 +184,25 @@ def discover_typeengine(typ):
@discover.register(sa.Column)
def discover_sqlalchemy_column(col):
optionify = Option if col.nullable else identity
return Record([[col.name, optionify(discover(col.type))]])
optionify = Option if col.nullable else PrimaryKey if col.primary_key else identity

This comment has been minimized.

@llllllllll

llllllllll Aug 5, 2015

Member

maybe metafy is a better name for this now.

This comment has been minimized.

@cpcloud
@discover.register(sa.sql.FromClause)
def discover_sqlalchemy_selectable(t):
records = list(sum([discover(c).parameters[0] for c in t.columns], ()))
records = list(sum([discover(c).parameters[0]
for c in t.columns if not getattr(c, 'system', False)],

This comment has been minimized.

@llllllllll

llllllllll Aug 5, 2015

Member

what is this system attribute?

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2015

Member

From the sqlalchemy docs:

:param system: When ``True``, indicates this is a "system" column,
     that is a column which is automatically made available by the
     database, and should not be included in the columns list for a
     ``CREATE TABLE`` statement.

     For more elaborate scenarios where columns should be
     conditionally rendered differently on different backends,
     consider custom compilation rules for :class:`.CreateColumn`.

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2015

Member

I'll probably remove this

autoload_with=engine, schema=schema),
schema)
return attach_schema(
sa.Table(table_name, metadata, schema=schema,

This comment has been minimized.

@llllllllll

llllllllll Aug 5, 2015

Member

is it safe to drop the autoload arg?

This comment has been minimized.

@cpcloud
@llllllllll

This comment has been minimized.

Member

llllllllll commented Aug 5, 2015

Can we still convert tables that use fkeys into tabular structures like dataframes and ndarrays? It would be cool if we could pass a follow_fkey flag to odo that says to generate a flat structure or the nested structure based on the foreign keys.

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 5, 2015

Can we still convert tables that use fkeys into tabular structures like dataframes and ndarrays?

Yes.

It would be cool if we could pass a follow_fkey flag to odo that says to generate a flat structure or the nested structure based on the foreign keys.

I agree, but I'd like to do that in a separate PR.

@cpcloud cpcloud force-pushed the cpcloud:foreign-keys branch from efab20f to 9c7d8d2 Aug 14, 2015

@cpcloud cpcloud changed the title from WIP: Add foreign key support in odo to Add foreign key support in odo Sep 4, 2015

@cpcloud cpcloud changed the title from Add foreign key support in odo to Add primary and foreign key support Sep 4, 2015

@cpcloud

This comment has been minimized.

Member

cpcloud commented Sep 4, 2015

@llllllllll this is now passing, any more thoughts here before merging?

@llllllllll

This comment has been minimized.

Member

llllllllll commented Sep 4, 2015

Nope, merge when ready

cpcloud added some commits Sep 4, 2015

>>> dshape = 'var * {id: !int64, name: string}'
>>> products = resource('sqlite:///db.db::products', dshape=dshape)
>>> products.c.id.primary_key
True

This comment has been minimized.

@mrocklin

mrocklin Sep 4, 2015

Member

Primary-ness might be part of the name rather than the dtype? E.g.

{!id: int64, ...}
@mrocklin

This comment has been minimized.

Member

mrocklin commented Sep 4, 2015

Sorry, I started reviewing this but I have enough other things pinging me at about 0.1 hertz. I'll try to take a deeper look at this this weekend.

In the absence of actually looking at things I'll just provide a high level concern (tongue in cheek).

I'm curious if the nature of relationships between data should be kept separate from the shape and dtypes of that data. If it is not necessary to tie these things together then that might be ideal. Can we say that field X in table T is a foreign key to field Y in table S without knowing what the dtype of those tables are? What does tying this information into the datashape give us? What does it cost us? What would a separate data-relationship thing look like in isolation from datashape?

@cpcloud

This comment has been minimized.

Member

cpcloud commented Sep 4, 2015

What does tying this information into the datashape give us?

The ability to inspect the columns of the parent table to allow automatic generation of joins, which was the main use case.

@mrocklin

This comment has been minimized.

Member

mrocklin commented Sep 4, 2015

I'm completely on board with the idea of encoding this information. My main question is if it should be integrated into datashape or live in some complementary structure.

@cpcloud

This comment has been minimized.

Member

cpcloud commented Sep 4, 2015

Here are the things that separating these things out would have to satisfy that are currently satisfied by my implementation here:

  • Convenient to write down
  • Odo can use it to create and infer relationships
  • A user of blaze doesn't have to tell blaze what the relationships are and t.column.fields must return the fields of the parent table. Where else is this information encoded besides the dshape?

The last one IMO is hard to satisfy without repeating a ton of information or tying blaze symbols to specific names.

@cpcloud

This comment has been minimized.

Member

cpcloud commented Sep 4, 2015

hm i might be able to cheat a bit and use sa.orm.relationship

@mrocklin

This comment has been minimized.

Member

mrocklin commented Sep 4, 2015

The last one IMO is hard to satisfy without repeating a ton of information or tying blaze symbols to specific names.

We wouldn't repeat information, we'd use the datashape for this I think? The two would be used in concert. My question isn't "is there something other than datashape that we should use". It's, "is there some way that we can pull out the data-relationships to a separate structure?"

So lets say that we have two tables

A = var * {id: int, name: string}
B = var * {transaction_id: int, user_id: int, amount: float}

And we intend that B.user_id -> A.id.

Given the names that we have above (unfortunate that we don't have them easily in real life) we might want to encode the following:

A -- primary: id
     foreign: []
B -- primary: transaction_id
     foreign: [user_id->A.id]

So in the case where datashapes have identities I think that this problem is relatively easy. We have the datashapes up above, which relate only to the data and the shape. We have this other data-relationship thing below, which describes how they all interact. I suspect that we can use the two of these things in tandem to accomplish any goal that Blaze and Odo would want to accomplish while also keeping datashape somewhat isolated (which is good for projects, like dynd, that only care about data and shapes.) I also find this separation easier to write and reason about although I have a strong bias because this is what came out of my head.

@cpcloud

This comment has been minimized.

Member

cpcloud commented Sep 4, 2015

I'm having trouble seeing on what structure this would sit. I see that for example you could write down what you're saying as

{'A': {'primary': ['id'], 'foreign': {}},
 'B': {'primary': ['transaction_id'], 'foreign': {'user_id': {'A': 'id'}}}}

but where would this thing live? on the datashape? on a blaze symbol?

@mrocklin

This comment has been minimized.

Member

mrocklin commented Sep 4, 2015

In my fantasy world this is a peer to datashape. It would be passed around just as datashapes are passed around now. What's missing for this is that datashapes don't have identities (we're missing the terms A and B.)

@cpcloud

This comment has been minimized.

Member

cpcloud commented Sep 4, 2015

@mrocklin thanks for the input ... merging on pass

cpcloud added a commit that referenced this pull request Sep 4, 2015

Merge pull request #274 from cpcloud/foreign-keys
Add primary and foreign key support

@cpcloud cpcloud merged commit 11d7521 into blaze:master Sep 4, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@cpcloud cpcloud deleted the cpcloud:foreign-keys branch Sep 4, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment