Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to copy data between tables using the syntax: INSERT INTO test2 (SELECT * FROM test) #15837

Closed
jphjsoares opened this issue Apr 12, 2024 · 5 comments

Comments

@jphjsoares
Copy link

CrateDB version

4.6

CrateDB setup information

Number of nodes: 1

Problem description

No response

Steps to Reproduce

  1. Have 2 tables with the following schemas:
CREATE TABLE IF NOT EXISTS "my_schema"."test" (
   "test_int" INTEGER,
   "test_string" TEXT,
   "timestamp" TIMESTAMP WITH TIME ZONE,
   "edate" TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('week', "timestamp")
)
CLUSTERED INTO 4 SHARDS
PARTITIONED BY ("timestamp")
WITH (
   "allocation.max_retries" = 5,
   "blocks.metadata" = false,
   "blocks.read" = false,
   "blocks.read_only" = false,
   "blocks.read_only_allow_delete" = false,
   "blocks.write" = false,
   codec = 'default',
   column_policy = 'strict',
   "mapping.total_fields.limit" = 1000,
   max_ngram_diff = 1,
   max_shingle_diff = 3,
   number_of_replicas = '0-1',
   "routing.allocation.enable" = 'all',
   "routing.allocation.total_shards_per_node" = -1,
   "store.type" = 'fs',
   "translog.durability" = 'REQUEST',
   "translog.flush_threshold_size" = 536870912,
   "translog.sync_interval" = 5000,
   "unassigned.node_left.delayed_timeout" = 60000,
   "write.wait_for_active_shards" = '1'
)

CREATE TABLE IF NOT EXISTS "my_schema"."test2" (
   "test_int" INTEGER,
   "test_string" TEXT,
   "timestamp" TIMESTAMP WITH TIME ZONE,
   "edate" TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('week', "timestamp")
)
CLUSTERED INTO 1 SHARDS
PARTITIONED BY ("timestamp")
WITH (
   "allocation.max_retries" = 5,
   "blocks.metadata" = false,
   "blocks.read" = false,
   "blocks.read_only" = false,
   "blocks.read_only_allow_delete" = false,
   "blocks.write" = false,
   codec = 'default',
   column_policy = 'strict',
   "mapping.total_fields.limit" = 1000,
   max_ngram_diff = 1,
   max_shingle_diff = 3,
   number_of_replicas = '0-1',
   "routing.allocation.enable" = 'all',
   "routing.allocation.total_shards_per_node" = -1,
   "store.type" = 'fs',
   "translog.durability" = 'REQUEST',
   "translog.flush_threshold_size" = 536870912,
   "translog.sync_interval" = 5000,
   "unassigned.node_left.delayed_timeout" = 60000,
   "write.wait_for_active_shards" = '1'
)
  1. Populate the test table with some data
  2. Try to copy data from the test table to test2 using: INSERT INTO my_schema.table2 (SELECT * FROM my_schema.table1)

Actual Result

No error will be returned, the only returned message will be something like: INSERT OK, 0 records affected (5.012 seconds). While the query took 5 seconds, no actual data is copied.

When checking the tables view in CrateDB web platform, the number of shards present in test2 is the same as the ones present in test. But the total records is 0.

Here is the test table:
Screenshot 2024-04-12 at 17 34 01

Here is the test2 table, showing shards but 0 total records:
Screenshot 2024-04-12 at 17 34 14

Expected Result

Since both tables have the same schema, only differing in the number of shards, data should be copied without a problem.

@jphjsoares jphjsoares added the triage An issue that needs to be triaged by a maintainer label Apr 12, 2024
@hlcianfagna
Copy link
Contributor

Hi,
4.6 is a bit dated, a lot of bugs have been fixed since, but I would suggest trying without the generated column:

INSERT INTO my_schema.table2 ("test_int","test_string","timestamp")
SELECT "test_int","test_string","timestamp"
FROM my_schema.table1

@mfussenegger mfussenegger added needs info or feedback and removed triage An issue that needs to be triaged by a maintainer labels Apr 15, 2024
@jphjsoares
Copy link
Author

Hey! I had already tried your suggestion and it also doesn't work.

@proddata
Copy link
Member

How did you get data into "my_schema"."test" ?
With old CrateDB versions CrateDB didn't fully validate COPY FROM imports in some cases.
However without any example data to replicate this, I rather tend to close this, as there is nothing to do here.


There is an issue tracking feedback with failing INSERT INTO SELECT already #12218

@jphjsoares
Copy link
Author

jphjsoares commented Apr 15, 2024

The data inserted into the test table was generated by a script, just for testing purposes. This is it:

$startTimestamp = Get-Date "2024-03-31T13:59:45"
$timestamps = @()

for ($i = 0; $i -lt 50; $i++) {
    $timestamp = $startTimestamp.AddDays(-$i)
    $timestamps += $timestamp.ToString("yyyy-MM-ddTHH:mm:ss")
}

$query = "INSERT INTO my_schema.test (timestamp) VALUES "

foreach ($timestamp in $timestamps) {
    $query += "('$timestamp'),"
}

# Remove the trailing comma
$query = $query.TrimEnd(",")
$query

Which results in a query similar to: INSERT INTO my_schema.test (timestamp) VALUES ('2024-03-31T13:59:45'),('2024-03-30T13:59:45'), ... which I ran directly in the web console

@proddata
Copy link
Member

I just saw that you PARTITIONED BY ("timestamp") which isn't a generated field and would lead to CrateDB initialising 50 partitions with a single query.


I quickly played around with older versions and could validate that form 4.6 - 5.2 the following query fails:

CREATE TABLE IF NOT EXISTS "my_schema"."test" (
       "test_int" INTEGER,
       "test_string" TEXT,
       "timestamp" TIMESTAMP WITH TIME ZONE,
       "edate" TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('week', "timestamp")
    )
    CLUSTERED INTO 4 SHARDS
    PARTITIONED BY ("timestamp");


INSERT INTO my_schema.test (test_int,test_string,"timestamp") SELECT 1,'a',x FROM generate_series(now(),now()+'50 days'::INTERVAL,'1 day'::INTERVAL) g(x);

Partitions are created, but not records are inserted.
This is fixed with CrateDB 5.3.4 and newer. I think this is related to:

Fixed an issue which prevented INSERT INTO ... SELECT ... from inserting any records if the target table had a partitioned column of a non-string type, used in any expressions of GENERATED or CHECK definitions.

Changing your table definition to

CREATE TABLE IF NOT EXISTS "my_schema"."test" (
       "test_int" INTEGER,
       "test_string" TEXT,
       "timestamp" TIMESTAMP WITH TIME ZONE,
       "edate" TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('week', "timestamp")
    )
    CLUSTERED INTO 4 SHARDS
    PARTITIONED BY ("edate");

and partitioning by "edate" also works with older versions as this avoids a "partitioned column of a non-string type ("timestamp"), used in any expressions of GENERATED ("edate")"


Considering this issue as identified and fixed I will close this issue.

@proddata proddata closed this as not planned Won't fix, can't repro, duplicate, stale Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants