Add primary and foreign key support #274

cpcloud · 2015-08-02T04:13:25Z

llllllllll · 2015-08-05T07:35:35Z

odo/backends/sql.py

@@ -183,13 +184,25 @@ def discover_typeengine(typ):

 @discover.register(sa.Column)
 def discover_sqlalchemy_column(col):
-    optionify = Option if col.nullable else identity
-    return Record([[col.name, optionify(discover(col.type))]])
+    optionify = Option if col.nullable else PrimaryKey if col.primary_key else identity


maybe metafy is a better name for this now.

llllllllll · 2015-08-05T07:47:21Z

Can we still convert tables that use fkeys into tabular structures like dataframes and ndarrays? It would be cool if we could pass a follow_fkey flag to odo that says to generate a flat structure or the nested structure based on the foreign keys.

cpcloud · 2015-08-05T12:28:23Z

Can we still convert tables that use fkeys into tabular structures like dataframes and ndarrays?

Yes.

It would be cool if we could pass a follow_fkey flag to odo that says to generate a flat structure or the nested structure based on the foreign keys.

I agree, but I'd like to do that in a separate PR.

* Compound primary key * Multiple foreign keys from different tables form a compound primary key * Single foreign key reference to one column of a compound primary key * Foreign key references to all columns of a compound primary key

cpcloud · 2015-09-04T16:38:22Z

@llllllllll this is now passing, any more thoughts here before merging?

llllllllll · 2015-09-04T16:44:38Z

Nope, merge when ready

mrocklin · 2015-09-04T19:02:13Z

docs/source/sql.rst

+      >>> dshape = 'var * {id: !int64, name: string}'
+      >>> products = resource('sqlite:///db.db::products', dshape=dshape)
+      >>> products.c.id.primary_key
+      True


Primary-ness might be part of the name rather than the dtype? E.g.

{!id: int64, ...}

mrocklin · 2015-09-04T19:18:25Z

Sorry, I started reviewing this but I have enough other things pinging me at about 0.1 hertz. I'll try to take a deeper look at this this weekend.

In the absence of actually looking at things I'll just provide a high level concern (tongue in cheek).

I'm curious if the nature of relationships between data should be kept separate from the shape and dtypes of that data. If it is not necessary to tie these things together then that might be ideal. Can we say that field X in table T is a foreign key to field Y in table S without knowing what the dtype of those tables are? What does tying this information into the datashape give us? What does it cost us? What would a separate data-relationship thing look like in isolation from datashape?

cpcloud · 2015-09-04T19:30:38Z

What does tying this information into the datashape give us?

The ability to inspect the columns of the parent table to allow automatic generation of joins, which was the main use case.

mrocklin · 2015-09-04T19:32:36Z

I'm completely on board with the idea of encoding this information. My main question is if it should be integrated into datashape or live in some complementary structure.

cpcloud · 2015-09-04T20:01:38Z

Here are the things that separating these things out would have to satisfy that are currently satisfied by my implementation here:

Convenient to write down
Odo can use it to create and infer relationships
A user of blaze doesn't have to tell blaze what the relationships are and t.column.fields must return the fields of the parent table. Where else is this information encoded besides the dshape?

The last one IMO is hard to satisfy without repeating a ton of information or tying blaze symbols to specific names.

cpcloud · 2015-09-04T20:06:07Z

hm i might be able to cheat a bit and use sa.orm.relationship

mrocklin · 2015-09-04T20:17:28Z

The last one IMO is hard to satisfy without repeating a ton of information or tying blaze symbols to specific names.

We wouldn't repeat information, we'd use the datashape for this I think? The two would be used in concert. My question isn't "is there something other than datashape that we should use". It's, "is there some way that we can pull out the data-relationships to a separate structure?"

So lets say that we have two tables

A = var * {id: int, name: string}
B = var * {transaction_id: int, user_id: int, amount: float}

And we intend that B.user_id -> A.id.

Given the names that we have above (unfortunate that we don't have them easily in real life) we might want to encode the following:

A -- primary: id
     foreign: []
B -- primary: transaction_id
     foreign: [user_id->A.id]

So in the case where datashapes have identities I think that this problem is relatively easy. We have the datashapes up above, which relate only to the data and the shape. We have this other data-relationship thing below, which describes how they all interact. I suspect that we can use the two of these things in tandem to accomplish any goal that Blaze and Odo would want to accomplish while also keeping datashape somewhat isolated (which is good for projects, like dynd, that only care about data and shapes.) I also find this separation easier to write and reason about although I have a strong bias because this is what came out of my head.

cpcloud · 2015-09-04T20:28:14Z

I'm having trouble seeing on what structure this would sit. I see that for example you could write down what you're saying as

{'A': {'primary': ['id'], 'foreign': {}},
 'B': {'primary': ['transaction_id'], 'foreign': {'user_id': {'A': 'id'}}}}

but where would this thing live? on the datashape? on a blaze symbol?

mrocklin · 2015-09-04T20:39:11Z

In my fantasy world this is a peer to datashape. It would be passed around just as datashapes are passed around now. What's missing for this is that datashapes don't have identities (we're missing the terms A and B.)

cpcloud · 2015-09-04T21:41:23Z

@mrocklin thanks for the input ... merging on pass

Add primary and foreign key support

Add foreign key support in odo

6db15cb

cpcloud mentioned this pull request Aug 2, 2015

Foreign key support blaze/blaze#1192

Merged

cpcloud self-assigned this Aug 2, 2015

cpcloud added this to the 0.3.4 milestone Aug 2, 2015

cpcloud added the enhancement label Aug 2, 2015

Merge branch 'master' of github.com:ContinuumIO/odo into foreign-keys

d611ae9

llllllllll reviewed Aug 5, 2015
View reviewed changes

cpcloud added 16 commits August 7, 2015 11:45

Remove the system check

9a44f2f

optionify -> metafy

f37985b

Merge branch 'master' of github.com:ContinuumIO/odo into foreign-keys

a6b0657

Syntax change

9c7d8d2

Different foreign key syntax

b5af8a8

Add test for recursive relationships

2f6d0bf

Test that we can create rescursive relationships

6cae55d

Clean up recursive discovery of foreign keys

67706fd

Merge branch 'master' of github.com:blaze/odo into foreign-keys

e5a5da8

Make sure our table exists and raise an error if it does not

04c14f6

Merge branch 'master' of github.com:blaze/odo into foreign-keys

b6771f3

Merge branch 'master' of github.com:blaze/odo into foreign-keys

53d4f36

Merge branch 'master' of github.com:blaze/odo into foreign-keys

59f89c3

Better assertion for foreign key discovery test

d17a716

Deal with foreign keys that may themselves be primary keys

aed0392

cpcloud changed the title ~~WIP: Add foreign key support in odo~~ Add foreign key support in odo Sep 4, 2015

cpcloud changed the title ~~Add foreign key support in odo~~ Add primary and foreign key support Sep 4, 2015

Experimental feature doc

a4e6649

cpcloud added 5 commits September 4, 2015 11:20

Add documentation

c269448

Small doc bug

5fcc621

Clarify docs a bit

b8ac4a0

Indent a bit

a79eca3

Python 2.6 has no OrderedDict collection

84ba63f

Clarify doc

ad60333

cpcloud added 5 commits September 4, 2015 12:55

Add docs on choice not to enforce specific database rules

57a1984

Grammar-o

0432b34

Clarify

abd7ac2

Cleanup

a938037

Clarify

d0bbdd0

mrocklin reviewed Sep 4, 2015
View reviewed changes

cpcloud added 2 commits September 4, 2015 17:23

Remove primary key syntax

8dfed57

Change to primary key because there can be only one

3fc4321

cpcloud added a commit that referenced this pull request Sep 4, 2015

Merge pull request #274 from cpcloud/foreign-keys

11d7521

Add primary and foreign key support

cpcloud merged commit 11d7521 into blaze:master Sep 4, 2015

cpcloud deleted the foreign-keys branch September 4, 2015 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add primary and foreign key support #274

Add primary and foreign key support #274

cpcloud commented Aug 2, 2015

llllllllll Aug 5, 2015

cpcloud Aug 5, 2015

llllllllll commented Aug 5, 2015

cpcloud commented Aug 5, 2015

cpcloud commented Sep 4, 2015

llllllllll commented Sep 4, 2015

mrocklin Sep 4, 2015

mrocklin commented Sep 4, 2015

cpcloud commented Sep 4, 2015

mrocklin commented Sep 4, 2015

cpcloud commented Sep 4, 2015

cpcloud commented Sep 4, 2015

mrocklin commented Sep 4, 2015

cpcloud commented Sep 4, 2015

mrocklin commented Sep 4, 2015

cpcloud commented Sep 4, 2015

Add primary and foreign key support #274

Add primary and foreign key support #274

Conversation

cpcloud commented Aug 2, 2015

llllllllll Aug 5, 2015

Choose a reason for hiding this comment

cpcloud Aug 5, 2015

Choose a reason for hiding this comment

llllllllll commented Aug 5, 2015

cpcloud commented Aug 5, 2015

cpcloud commented Sep 4, 2015

llllllllll commented Sep 4, 2015

mrocklin Sep 4, 2015

Choose a reason for hiding this comment

mrocklin commented Sep 4, 2015

cpcloud commented Sep 4, 2015

mrocklin commented Sep 4, 2015

cpcloud commented Sep 4, 2015

cpcloud commented Sep 4, 2015

mrocklin commented Sep 4, 2015

cpcloud commented Sep 4, 2015

mrocklin commented Sep 4, 2015

cpcloud commented Sep 4, 2015