Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INSERT INTO SELECT does not throw an error / warning when some inserts fail #12218

Open
proddata opened this issue Mar 15, 2022 · 6 comments
Open

Comments

@proddata
Copy link
Member

proddata commented Mar 15, 2022

CrateDB version

4.6.8 / 4.7.0

Steps to Reproduce

CREATE TABLE insert_into_source ( ts TIMESTAMP, ts_g TIMESTAMP);
CREATE TABLE insert_into_sink ( ts TIMESTAMP, ts_g TIMESTAMP GENERATED ALWAYS AS date_trunc('minute',ts));

INSERT INTO insert_into_source VALUES (now(),date_trunc('minute',now())+1), (now(),date_trunc('minute',now()));
--INSERT OK, 2 records affected (0.016 seconds)

INSERT INTO insert_into_sink (ts,ts_g) SELECT ts, ts_g FROM insert_into_source;
--INSERT OK, 1 record affected (0.011 seconds)

Expected Result

Statement fails with

Failed to execute upsert shardId=[insert_into_sink][0] id=XOARjn8B6RWc_1mx5EhS
error=java.lang.IllegalArgumentException:
Given value null for generated column ts_g does not match calculation date_trunc('minute', ts) = 1647281100000

Actual Result

no error, just
INSERT OK, 1 record affected (0.011 seconds)
a lower number than expected


Update

This might not be a bug after all, however the situation could be maybe improved to give the user some way to check if an INSERT INTO SELECT partially failed.

INSERT INTO insert_into_sink VALUES (now(),date_trunc('minute',now())+1), (now(),date_trunc('minute',now()));

returns

Error!

SQLParseException[Given value 1647358980001 for generated column ts_g does not match calculation date_trunc('minute', ts) = 1647358980000]

COPY FROM has the optionRETURN SUMMARY to see how many rows failed to be copied and now also the fail_fast option to stop copy operations in best effort manner when insert error occur.

Possible solution

  • Add setting to allow partial inserts/failures
@mfussenegger
Copy link
Member

Discussion outcome: Could add a session setting that allows partial failures and causes a "fail fast" behavior (first error encountered gets propagated)

@BaurzhanSakhariev
Copy link
Contributor

BaurzhanSakhariev commented Sep 29, 2023

Starting from version 5.5, UPDATE statements will continue on row failures instead of always showing an error.
The affected rows are displayed by the resulting row count. Related PR

Session setting that allows partial failures and causes a "fail fast" behavior (first error encountered gets propagated)

The new session setting should behave in a similar way for UPDATE statements.

@hlcianfagna
Copy link
Contributor

One possible option to give back a warning could be by means of a NoticeResponse with severity WARNING see https://www.postgresql.org/docs/current/protocol-flow.html and https://www.postgresql.org/docs/current/protocol-error-fields.html

@mfussenegger
Copy link
Member

mfussenegger commented Mar 13, 2024

We had another discussion about this and concluded that the NoticeResponse would be the nicer option.

Partly because of bulk request handling where individual "batch" error reporting would work via HTTP, but is problematic for PostgreSQL. PG/DB-API clients typically only expose a row-count array or failure, not one failure per batch.

The other main advantage of a NoticeResponse is that it can be active by default, so users can always get the additional info.

It's more effort to implement, as it also requires a HTTP response format addition and updates in crash and the admin-ui to show the warnings. It won't make it for 5.7. We can target this for 5.8 instead but that's TBD

We could (later) also additionally add a fail_fast flag, similar to how it exists for COPY FROM (where error reporting currently works via the RETURN SUMMARY). It wouldn't change the error reporting which would still happen via the notice response, but instead cause it to short-circuit on errors.

@BaurzhanSakhariev
Copy link
Contributor

Relates to #8312 and #8949

@BaurzhanSakhariev BaurzhanSakhariev added the needs planning Needs a planning session label Mar 19, 2024
@BaurzhanSakhariev
Copy link
Contributor

Main user facing problem:

Confusion if some rows are not inserted (for example because of constraint violation).
Current behaviour: Errors are swallowed, number of successfully inserted rows is returned.

Listing all possible options to change behaviour from the user perspective to get additional feedback:

  1. Number of inserted rows is returned, errors aren’t swallowed but redirected to sys.operations

    To lookup the errors, users need some identifier.

  2. Number of inserted rows is returned and errors are emitted as warning (see comment above for details)

  3. Number of inserted rows is returned. In case of error, show first error and short-circuit the operation (number of inserts is discarded/unknown to a user). Short circuit would be a breaking change unless it’s behind a flag/setting.

  4. Number of inserted rows is returned. In case of error, show first error AND number of inserted rows and short-circuit the operation. Short circuit would be a breaking change unless it’s behind a flag/setting.

  5. Introduce optional RETURN SUMMARY clause for UPDATE/DELETE/INSERT in addition to RETURNING.
    Can’t use both RETURN SUMMARY and RETURNING at the same time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Candidates
Development

Successfully merging a pull request may close this issue.

5 participants