New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for \copy command via script/trigger #82
Conversation
Gets the COPY function and script in for initial review.
options="${options}, HEADER true" | ||
;; | ||
h) | ||
usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized I didn't write a usage
function yet. Fixing in the afternoon.
High-level description of the approach: So the only type of statement that could work is: Once the function has been generated for the given table, it is installed as a trigger on a temporary table exactly like the target table. Then any |
Is |
It's part of the |
SELECT relname INTO STRICT table_name FROM pg_class WHERE oid = relation; | ||
temp_table_name = format('%s_copy_facade', table_name); | ||
|
||
SELECT array_agg(attname) INTO STRICT attr_names FROM pg_attribute WHERE attrelid = relation AND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we only support COPY of all attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means the INSERT
trigger will generate a statement with all user-visible attributes fully specified. If the overarching COPY
does not specify all columns, one of two things could happen:
- If
NEW.missing_col
is provided asNULL
, we'd explicitly include aNULL
in theINSERT
. This could be wrong if there is a default value specified - If
NEW.missing_col
already replaces unspecified columns with their default values, the trigger would just explicitly specify the default value, which would be correct.
The copy shell script presently doesn't allow omitting columns: it's expected the CSV or whatnot has the same dimensions as the table. So this is kind of a moot point… I'll check about the NULL
vs default value distinction.
We don't frequently write this heavily in PL/pgSQL, so here is a brief primer of features used to help in review:
|
Based on review discussions.
END; | ||
$cti$ LANGUAGE plpgsql VOLATILE; | ||
$$; | ||
table_tmpl CONSTANT text := 'CREATE TEMPORARY TABLE %I (LIKE %s)'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this statement use quotes instead of $$? Could we be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I had a style that I used normal quotes if it fit on one line and $$ for multi-line things. Kind of like how you wouldn’t use a here doc for every string declaration in a shell script…
—Jason
On Mar 10, 2015, at 3:29 PM, sumedhpathak notifications@github.com wrote:
In pg_shard--1.1.sql #82 (comment):
temp_table_name text;
attr_names text[];
attr_list text;
param_list text;
using_list text;
insert_command text;
func_tmpl CONSTANT text := $$ CREATE FUNCTION pg_temp.copy_to_insert() RETURNS trigger
AS $cti$
BEGIN
EXECUTE %L USING %s;
PERFORM nextval(%L);
RETURN NULL;
END;
$cti$ LANGUAGE plpgsql VOLATILE;
$$;
Why does this statement use quotes instead of $$? Could we be consistent?table_tmpl CONSTANT text := 'CREATE TEMPORARY TABLE %I (LIKE %s)';
—
Reply to this email directly or view it on GitHub https://github.com/citusdata/pg_shard/pull/82/files#r26168240.
Changing type of quotes, expanding 'trg_tmpl'.
5e2498c
to
39467e5
Compare
Makes it more standalone for direct use by users.
Now the shell script can handle counting successfully-INSERTed rows itself. It just installs a trigger _after_ the proxy one and enables write through. This allows it to see the successful rows, which it counts and discards to prevent any writes to the temp table.
Without this, we insert NULL in missing columns. With this, unspecified columns will have the correct default values.
Looks nicer on the command line without suffix, and other PostgreSQL extensions use a bin folder for scripts, so we'll do the same.
PGXS installs scripts into the PostgreSQL install's bin directory.
Tests INSERT/COPY functionality on proxy table, once with writethrough off (default) and once with it on.
Oops, meant to do this long ago.
Changed suffix used by proxy generation. Should we use COPY instead of \copy to permit interpolation of colon variables?
Gives better error messages and statuses.
Using -f prohibits the use of pipes, devices, or sockets. While the latter two might be of questionable use, prohibiting the first prevents users from using named pipes or process substitutions, which could be very useful in certain instances.
Default: 'public' | ||
E_O_USAGE | ||
|
||
exit $1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. Should the order of arguments be inverted? (Output device first, followed by exit status?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially just had one argument (status) but realized that commands I consider "well-behaved" output to stdout (and exit 0) when you ask them for help (i.e. pass -h
) but output to stderr (with non-zero status) when you make some usage mistake. So that necessitated adding a second argument.
I agree that reversing the order makes more sense.
I had three higher-level things:
|
On each of your high-level points:
|
Since you didn't mention the naming, I'm assuming you're fine with the names of:
I'll make a checklist at the top of this PR to keep track of the things I need to do and when they're done it sounds like this is a . I had one concern: because the UDF returns the proxy table name, I should be able to store it in a variable and interpolate it, or at least I would were I using
This is why I'm directly interpolating the table name myself in the shell script, which I don't like. Though
So |
@jasonmp85 I am OK with the names of both the script and the UDF. |
Decided it was easiest to understand. Added test for sequence functionality.
Lets us check partial failure modes of copy workflow.
Kills off trigger complication.
Casting to regclass requires that the input string already be properly quoted as an identifier. If there is a space or quote in the input to the cast, it fails. Explicitly quoting the identifier fixes this, and I added a test to verify the fix.
Safer to let the PostgreSQL-aware environment do our quoting and do as little quoting ourselves as possible.
Reordered arguments and defined some constants to clarify calls.
Add support for \copy command via script/trigger cr: @sumedhpathak
The script provides an easy way for users to COPY to a distributed table. It accepts a filename and table name, prepares the table for COPY, performs the COPY, then outputs the number of rows copied. Flags are supported to enable various COPY OPTIONS in the underlying SQL statement.
This branch still needs Makefile changes, better documentation, and a unit test, but I wanted to get the code out there to kick off a review. I'll be pushing the remaining changes here as they come up.
Fixes #61
Code Review Tasks
COPY
test with partial failurepsql
variables)usage
shell function