/
dml.rst
640 lines (466 loc) · 19.4 KB
/
dml.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
.. highlight:: psql
.. _dml:
=================
Data manipulation
=================
This section provides an overview of how to manipulate data (e.g., inserting
rows) with CrateDB.
.. SEEALSO::
:ref:`General use: Data definition <ddl>`
:ref:`General use: Querying <dql>`
.. rubric:: Table of contents
.. contents::
:local:
.. _dml-inserting-data:
Inserting data
==============
Inserting data to CrateDB is done by using the SQL ``INSERT`` statement.
.. NOTE::
The column list is always ordered based on the column position in the
:ref:`sql-create-table` statement of the table. If the insert columns are
omitted, the values in the ``VALUES`` clauses must correspond to the table
columns in that order.
Inserting a row::
cr> insert into locations (id, date, description, kind, name, position)
... values (
... '14',
... '2013-09-12T21:43:59.000Z',
... 'Blagulon Kappa is the planet to which the police are native.',
... 'Planet',
... 'Blagulon Kappa',
... 7
... );
INSERT OK, 1 row affected (... sec)
When inserting a single row, if an error occurs an error is returned as a
response.
Inserting multiple rows at once (aka. bulk insert) can be done by defining
multiple values for the ``INSERT`` statement::
cr> insert into locations (id, date, description, kind, name, position) values
... (
... '16',
... '2013-09-14T21:43:59.000Z',
... 'Blagulon Kappa II is the planet to which the police are native.',
... 'Planet',
... 'Blagulon Kappa II',
... 19
... ),
... (
... '17',
... '2013-09-13T16:43:59.000Z',
... 'Brontitall is a planet with a warm, rich atmosphere and no mountains.',
... 'Planet',
... 'Brontitall',
... 10
... );
INSERT OK, 2 rows affected (... sec)
When inserting multiple rows, if an error occurs for some of these rows there
is no error returned but instead the number of rows affected would be decreased
by the number of rows that failed to be inserted.
When inserting into tables containing :ref:`sql-create-table-generated-columns`
or :ref:`sql-create-table-base-columns` having the
:ref:`sql-create-table-default-clause` specified, their values can be safely
omitted. They are *generated* upon insert:
::
cr> CREATE TABLE debit_card (
... owner text,
... num_part1 integer,
... num_part2 integer,
... check_sum integer GENERATED ALWAYS AS ((num_part1 + num_part2) * 42),
... "user" text DEFAULT 'crate'
... );
CREATE OK, 1 row affected (... sec)
::
cr> insert into debit_card (owner, num_part1, num_part2) values
... ('Zaphod Beeblebrox', 1234, 5678);
INSERT OK, 1 row affected (... sec)
.. Hidden: refresh debit_card
cr> refresh table debit_card
REFRESH OK, 1 row affected (... sec)
::
cr> select * from debit_card;
+-------------------+-----------+-----------+-----------+-------+
| owner | num_part1 | num_part2 | check_sum | user |
+-------------------+-----------+-----------+-----------+-------+
| Zaphod Beeblebrox | 1234 | 5678 | 290304 | crate |
+-------------------+-----------+-----------+-----------+-------+
SELECT 1 row in set (... sec)
For :ref:`sql-create-table-generated-columns`, if the value is given, it is
validated against the *generation clause* of the column and the currently
inserted row::
cr> insert into debit_card (owner, num_part1, num_part2, check_sum) values
... ('Arthur Dent', 9876, 5432, 642935);
SQLParseException[Given value 642935 for generated column check_sum does not match calculation ((num_part1 + num_part2) * 42) = 642936]
.. _dml-inserting-by-query:
Inserting data by query
-----------------------
.. Hidden: refresh locations
cr> refresh table locations
REFRESH OK, 1 row affected (... sec)
It is possible to insert data using a query instead of values. Column data
types of source and target table can differ as long as the values are castable.
This gives the opportunity to restructure the tables data, renaming a field,
changing a field's data type or convert a normal table into a partitioned one.
Example of changing a field's data type, in this case, changing the
``position`` data type from ``integer`` to ``smallint``::
cr> create table locations2 (
... id text primary key,
... name text,
... date timestamp with time zone,
... kind text,
... position smallint,
... description text
... ) clustered by (id) into 2 shards with (number_of_replicas = 0);
CREATE OK, 1 row affected (... sec)
::
cr> insert into locations2 (id, name, date, kind, position, description)
... (
... select id, name, date, kind, position, description
... from locations
... where position < 10
... );
INSERT OK, 14 rows affected (... sec)
.. Hidden: drop previously created table
cr> drop table locations2
DROP OK, 1 row affected (... sec)
Example of creating a new partitioned table out of the ``locations`` table with
data partitioned by year::
cr> create table locations_parted (
... id text primary key,
... name text,
... year text primary key,
... date timestamp with time zone,
... kind text,
... position integer
... ) clustered by (id) into 2 shards
... partitioned by (year) with (number_of_replicas = 0);
CREATE OK, 1 row affected (... sec)
::
cr> insert into locations_parted (id, name, year, date, kind, position)
... (
... select
... id,
... name,
... date_format('%Y', date),
... date,
... kind,
... position
... from locations
... );
INSERT OK, 16 rows affected (... sec)
Resulting partitions of the last insert by query::
cr> select table_name, partition_ident, values, number_of_shards, number_of_replicas
... from information_schema.table_partitions
... where table_name = 'locations_parted'
... order by partition_ident;
+------------------+-----------------+------------------+------------------+--------------------+
| table_name | partition_ident | values | number_of_shards | number_of_replicas |
+------------------+-----------------+------------------+------------------+--------------------+
| locations_parted | 042j2e9n74 | {"year": "1979"} | 2 | 0 |
| locations_parted | 042j4c1h6c | {"year": "2013"} | 2 | 0 |
+------------------+-----------------+------------------+------------------+--------------------+
SELECT 2 rows in set (... sec)
.. Hidden: drop previously created table
cr> drop table locations_parted;
DROP OK, 1 row affected (... sec)
.. NOTE::
``limit``, ``offset`` and ``order by`` are not supported inside the query
statement.
.. _dml-inserting-upserts:
Upserts (``ON CONFLICT DO UPDATE SET``)
---------------------------------------
The ``ON CONFLICT DO UPDATE SET`` clause is used to update the existing row if
inserting is not possible because of a duplicate-key conflict if a document
with the same ``PRIMARY KEY`` already exists. This is type of operation is
commonly referred to as an *upsert*, being a combination of "update" and
"insert".
::
cr> SELECT
... name,
... visits,
... extract(year from last_visit) AS last_visit
... FROM uservisits ORDER BY NAME;
+----------+--------+------------+
| name | visits | last_visit |
+----------+--------+------------+
| Ford | 1 | 2013 |
| Trillian | 3 | 2013 |
+----------+--------+------------+
SELECT 2 rows in set (... sec)
::
cr> INSERT INTO uservisits (id, name, visits, last_visit) VALUES
... (
... 0,
... 'Ford',
... 1,
... '2015-01-12'
... ) ON CONFLICT (id) DO UPDATE SET
... visits = visits + 1;
INSERT OK, 1 row affected (... sec)
.. Hidden: refresh uservisits
cr> REFRESH TABLE uservisits
REFRESH OK, 1 row affected (... sec)
::
cr> SELECT
... name,
... visits,
... extract(year from last_visit) AS last_visit
... FROM uservisits WHERE id = 0;
+------+--------+------------+
| name | visits | last_visit |
+------+--------+------------+
| Ford | 2 | 2013 |
+------+--------+------------+
SELECT 1 row in set (... sec)
It's possible to refer to values which would be inserted if no duplicate-key
conflict occurred, by using the special ``excluded`` table. This table is
especially useful in multiple-row inserts, to refer to the current rows
values::
cr> INSERT INTO uservisits (id, name, visits, last_visit) VALUES
... (
... 0,
... 'Ford',
... 2,
... '2016-01-13'
... ),
... (
... 1,
... 'Trillian',
... 5,
... '2016-01-15'
... ) ON CONFLICT (id) DO UPDATE SET
... visits = visits + excluded.visits,
... last_visit = excluded.last_visit;
INSERT OK, 2 rows affected (... sec)
.. Hidden: refresh uservisits
cr> REFRESH TABLE uservisits
REFRESH OK, 1 row affected (... sec)
::
cr> SELECT
... name,
... visits,
... extract(year from last_visit) AS last_visit
... FROM uservisits ORDER BY name;
+----------+--------+------------+
| name | visits | last_visit |
+----------+--------+------------+
| Ford | 4 | 2016 |
| Trillian | 8 | 2016 |
+----------+--------+------------+
SELECT 2 rows in set (... sec)
This can also be done when using a query instead of values::
cr> CREATE TABLE uservisits2 (
... id integer primary key,
... name text,
... visits integer,
... last_visit timestamp with time zone
... ) CLUSTERED BY (id) INTO 2 SHARDS WITH (number_of_replicas = 0);
CREATE OK, 1 row affected (... sec)
::
cr> INSERT INTO uservisits2 (id, name, visits, last_visit)
... (
... SELECT id, name, visits, last_visit
... FROM uservisits
... );
INSERT OK, 2 rows affected (... sec)
.. Hidden: refresh uservisits2
cr> REFRESH TABLE uservisits2
REFRESH OK, 1 row affected (... sec)
::
cr> INSERT INTO uservisits2 (id, name, visits, last_visit)
... (
... SELECT id, name, visits, last_visit
... FROM uservisits
... ) ON CONFLICT (id) DO UPDATE SET
... visits = visits + excluded.visits,
... last_visit = excluded.last_visit;
INSERT OK, 2 rows affected (... sec)
.. Hidden: refresh uservisits2
cr> REFRESH TABLE uservisits2
REFRESH OK, 1 row affected (... sec)
::
cr> SELECT
... name,
... visits,
... extract(year from last_visit) AS last_visit
... FROM uservisits ORDER BY name;
+----------+--------+------------+
| name | visits | last_visit |
+----------+--------+------------+
| Ford | 4 | 2016 |
| Trillian | 8 | 2016 |
+----------+--------+------------+
SELECT 2 rows in set (... sec)
.. Hidden: drop previously created table
cr> DROP TABLE uservisits2
DROP OK, 1 row affected (... sec)
.. SEEALSO::
:ref:`SQL syntax: ON CONFLICT DO UPDATE SET
<sql-insert-on-conflict-do-update>`
.. _dml-updating-data:
Updating data
=============
In order to update documents in CrateDB the SQL ``UPDATE`` statement can be
used::
cr> update locations set description = 'Updated description'
... where name = 'Bartledan';
UPDATE OK, 1 row affected (... sec)
Updating nested objects is also supported::
cr> update locations set inhabitants['name'] = 'Human' where name = 'Bartledan';
UPDATE OK, 1 row affected (... sec)
It's also possible to reference a column within the :ref:`expression
<gloss-expression>`, for example to increment a number like this::
cr> update locations set position = position + 1 where position < 3;
UPDATE OK, 6 rows affected (... sec)
.. NOTE::
If the same documents are updated concurrently an VersionConflictException
might occur. CrateDB contains a retry logic that tries to resolve the
conflict automatically.
.. _dml-deleting-data:
Deleting data
=============
Deleting rows in CrateDB is done using the SQL ``DELETE`` statement::
cr> delete from locations where position > 3;
DELETE OK, ... rows affected (... sec)
.. _dml-import-export:
Import and export
=================
.. _dml-importing-data:
Importing data
--------------
Using the ``COPY FROM`` statement, CrateDB nodes can import data from local
files or files that are available over the network.
The supported data formats are JSON and CSV. The format is inferred from the
file extension, if possible. Alternatively the format can also be provided as
an option (see :ref:`sql-copy-from-with`). If the format is not provided and
cannot be inferred from the file extension, it will be processed as JSON.
JSON files must contain a single JSON object per line.
Example JSON data::
{"id": 1, "quote": "Don't panic"}
{"id": 2, "quote": "Ford, you're turning into a penguin. Stop it."}
CSV files must contain a header with comma-separated values, which will
be added as columns.
Example CSV data::
id,quote
1,"Don't panic"
2,"Ford, you're turning into a penguin. Stop it."
.. NOTE::
* The ``COPY FROM`` statement will convert and validate your data.
* Values for generated columns will be computed if the data does not contain
them, otherwise they will be imported and validated
* Furthermore, column names in your data are considered case sensitive (as if
they were quoted in a SQL statement).
For further information, including how to import data to
:ref:`partitioned-tables`, take a look at the :ref:`sql-copy-from` reference.
.. _dml-importing-data-example:
Example
.......
.. highlight:: psql
Here's an example statement::
cr> COPY quotes FROM 'file:///tmp/import_data/quotes.json';
COPY OK, 3 rows affected (... sec)
This statement imports data from the ``/tmp/import_data/quotes.json`` file into
a table named ``quotes``.
.. NOTE::
The file you specify must be available on one of the CrateDB nodes. *This
statement will not work with files that are local to your client.*
For the above statement, every node in the cluster will attempt to import
data from a file located at ``/tmp/import_data/quotes.json`` relative to
the ``crate`` process (i.e., if you are running CrateDB inside a container,
the file must also be inside the container).
If you want to import data from a file that on your local computer using
``COPY FROM``, you must first transfer the file to one of the CrateDB
nodes.
Consult the :ref:`sql-copy-from` reference for additional information.
.. Hidden: delete imported data
cr> refresh table quotes;
REFRESH OK, 1 row affected (... sec)
cr> delete from quotes;
DELETE OK, 3 rows affected (... sec)
If you want to import all files inside the ``/tmp/import_data`` directory on
every CrateDB node, you can use a wildcard, like so::
cr> COPY quotes FROM '/tmp/import_data/*' WITH (bulk_size = 4);
COPY OK, 3 rows affected (... sec)
.. Hidden: delete imported data
cr> refresh table quotes;
REFRESH OK, 1 row affected (... sec)
cr> delete from quotes;
DELETE OK, 3 rows affected (... sec)
cr> refresh table quotes;
REFRESH OK, 1 row affected (... sec)
This wildcard can also be used to only match certain files in a directory::
cr> COPY quotes FROM '/tmp/import_data/qu*.json';
COPY OK, 3 rows affected (... sec)
.. Hidden: delete imported data
cr> refresh table quotes;
REFRESH OK, 1 row affected (... sec)
cr> delete from quotes;
DELETE OK, 3 rows affected (... sec)
cr> refresh table quotes;
REFRESH OK, 1 row affected (... sec)
.. _dml-importing-data-summary:
Detailed error reporting
........................
If the ``RETURN_SUMMARY`` clause is specified, a result set containing information
about failures and successfully imported records is returned.
.. Hidden: delete existing data
cr> refresh table locations;
REFRESH OK, 1 row affected (... sec)
cr> delete from locations;
DELETE OK, 8 rows affected (... sec)
cr> refresh table locations;
REFRESH OK, 1 row affected (... sec)
::
cr> COPY locations FROM '/tmp/import_data/locations_with_failure/locations*.json' RETURN SUMMARY;
+--...--+----------...--------+---------------+-------------+--------------------...-------------------------------------+
| node | uri | success_count | error_count | errors |
+--...--+----------...--------+---------------+-------------+--------------------...-------------------------------------+
| {...} | .../locations1.json | 6 | 0 | {} |
| {...} | .../locations2.json | 5 | 2 | {"Cannot cast value...{"count": ..., "line_numbers": ...}} |
+--...--+----------...--------+---------------+-------------+--------------------...-------------------------------------+
COPY 2 rows in set (... sec)
.. Hidden: delete imported data
cr> refresh table locations;
REFRESH OK, 1 row affected (... sec)
cr> delete from locations;
DELETE OK, ...
cr> refresh table locations;
REFRESH OK, 1 row affected (... sec)
If an error happens while processing the URI in general, the ``error_count`` and
``success_count`` columns will contains `NULL` values to indicate that no records were processed.
::
cr> COPY locations FROM '/tmp/import_data/not-existing.json' RETURN SUMMARY;
+--...--+-----------...---------+---------------+-------------+------------------------...------------------------+
| node | uri | success_count | error_count | errors |
+--...--+-----------...---------+---------------+-------------+------------------------...------------------------+
| {...} | .../not-existing.json | NULL | NULL | {"...not-existing.json (...)": {"count": 1, ...}} |
+--...--+-----------...---------+---------------+-------------+------------------------...------------------------+
COPY 1 row in set (... sec)
See :ref:`sql-copy-from` for more information.
.. _dml-exporting-data:
Exporting data
--------------
Data can be exported using the ``COPY TO`` statement. Data is exported in a
distributed way, meaning each node will export its own data.
Replicated data is not exported. So every row of an exported table is stored
only once.
This example shows how to export a given table into files named after the table
and shard ID with gzip compression:
.. Hidden: import data
cr> REFRESH TABLE quotes;
REFRESH OK...
cr> COPY quotes FROM '/tmp/import_data/*';
COPY OK, 3 rows affected (... sec)
::
cr> REFRESH TABLE quotes;
REFRESH OK...
::
cr> COPY quotes TO DIRECTORY '/tmp/' with (compression='gzip');
COPY OK, 3 rows affected ...
Instead of exporting a whole table, rows can be filtered by an optional WHERE
clause condition. This is useful if only a subset of the data needs to be
exported::
cr> COPY quotes WHERE match(quote_ft, 'time') TO DIRECTORY '/tmp/' WITH (compression='gzip');
COPY OK, 2 rows affected ...
For further details see :ref:`sql-copy-to`.
.. _crate-python: https://pypi.python.org/pypi/crate/
.. _PCRE: https://www.pcre.org/