Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for table copies #22

Closed
jberkus opened this issue May 4, 2014 · 11 comments
Closed

Support for table copies #22

jberkus opened this issue May 4, 2014 · 11 comments

Comments

@jberkus
Copy link

jberkus commented May 4, 2014

Supporting incremental insert/update/deletes will take quite a bit of work. However, another less difficult enhancement will make cstore_fdw 300% more useful right now, which would be support for bulk inserts, specifically:

 INSERT INTO cstore_table SELECT columns FROM regular_table;

As a shortcut, you could internally route this through the existing COPY code on the receiving end. It would be deadly slow for single-row inserts, but then people doing single-row inserts into a cstore are mistaken anyway.

@ozgune
Copy link

ozgune commented May 6, 2014

One thing we need to pay attention to on our end is that each call to COPY creates a separate stripe. We'd need to implement this in a way that integrates nicely with the Insert APIs for fdws. I see two alternatives here:

1/ We implement BeginForeignModify / ExecForeignInsert / EndForeignModify proper. One thing we haven't yet discussed is the transactional semantics. If the user is inserting multiple entries and the operation fails, how do we roll it back? This requires some thought.

2/ We use foreign table inheritance (coming in 9.4?) and Create Table cstore_table_child1 As (Select columns From regular_table);. We already intercept the utility hook for the Copy command and the Create Table As uses the same machinery. So this should be relatively easy to integrate.

The question is, does option 2/ provide any benefit to the user? @jberkus, what do you think?

@jberkus
Copy link
Author

jberkus commented May 6, 2014

I don't understand option (2).

The reason I filed this issue is that the primary use of cstore_fdw for me today involves transferring large blocks of data from a Postgres table to a cstore table. Currently, I have to dump that data to csv and re-import it using COPY to get it into the cstore table. If Postgres had a COPY table TO other_table, then this wouldn't be a limitation.

BTW, if COPY fails in the middle, it's supposed to roll back too.

@jberkus
Copy link
Author

jberkus commented May 6, 2014

BTW, if we need to add a new stripe every time anyone does a bulk load, we're going to need a utility to rebuild cstore files eventually.

@ozgune
Copy link

ozgune commented May 6, 2014

COPY rolls back if it fails in the middle. If you explicitly start a transaction however, issue COPY and a second COPY, and the second COPY fails in the middle, today, the first COPY still goes through. That's something we need to fix.

The Begin / Insert / End APIs for foreign data wrappers make transactional semantics more explicit, that's why we figured we'd think about them when we got to it. Once we define the semantics for Insert here, we'd then use them for Update / Delete too.

The second option I mentioned involves the following:

a/ The user creates a parent cstore table.
b/ The user creates a child cstore table using the upcoming (?) foreign table inheritance feature. When doing this, the user further uses Create Table cstore_table_child1 As (Select columns From regular table);

The difference here is that cstore_fdw can see the Create Table As command as part of the utility hook that it intercepted (so it's similar to Copy in that sense). We then need to direct the read tuples in a manner that's similar to what we do with Copy.

This naturally isn't as nice as implementing fdw's Begin / Insert / End. I'm just trying to explore our option space here.

@pykello
Copy link
Contributor

pykello commented Jun 19, 2014

Just wanted to note that as an interim solution until we implement "INSERT INTO ... SELECT ...", users can use unix pipes to achieve a similar thing:

psql -d postgres -c "COPY (SELECT * FROM lineitem) TO STDOUT" | psql -d postgres -c "COPY lineitem_cstore FROM STDIN"

@ozgune
Copy link

ozgune commented Aug 15, 2014

As we had a lot of questions about this feature in the past two months, I'm posting a quick update here.

It looks like the best way to implement this feature is through PostgreSQL FDW Insert APIs. One decision we deferred making early on was whether or not to provide transactional semantics. The Insert APIs provide transaction related info, but getting transactions right with foreign tables is significantly harder.

What we heard from the users that we talked to was that they needed the "INSERT INTO .. SELECT" feature much more than they needed transactions.

If you're a user who is interested in this thread, and have comments about transactions, please feel free to share them with us. We are compiling feedback and we'll make a decision on this next month.

@jberkus
Copy link
Author

jberkus commented Aug 15, 2014

I need this feature, like, yesterday.

While I would like it to be transaction-safe, that's pretty far down my list of priorities. And the issue of transaction-safety for cstore_fdw is a fairly thorny one anyway; there doesn't seem to be any good way to implement it. You'd need to make a full file copy every time you wanted to start a transaction.

@srobertson
Copy link

+1 need table copy is more important than transactional symantics

@pykello
Copy link
Contributor

pykello commented Dec 10, 2014

Support for "INSERT INTO cstore_table SELECT ..." was added to the develop branch. We will merge it into master branch in 2 months.

@jberkus
Copy link
Author

jberkus commented Dec 10, 2014

Wow, awesome! For me this is what makes cstore useful in production.

@pykello
Copy link
Contributor

pykello commented Mar 12, 2015

Merged into master. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants