/
3.0.0.rst
319 lines (218 loc) · 12.4 KB
/
3.0.0.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
.. _version_3.0.0:
=============
Version 3.0.0
=============
Released on 2018/05/16.
.. NOTE::
If you are upgrading a cluster, you must be running CrateDB 2.0.4 or higher
before you upgrade to 3.0.0.
We recommend that you upgrade to the latest 2.3 release before moving to
3.0.0.
You cannot perform a `rolling upgrade`_ to this version. Any upgrade to
this version will require a `full restart upgrade`_.
When restarting, CrateDB will migrate indexes to a newer format. Depending
on the amount of data, this may delay node start-up time.
Please consult the `Upgrade Notes`_ before upgrading.
.. WARNING::
Tables that were created prior to upgrading to CrateDB 2.x will not
function with 3.0 and must be recreated before moving to 3.0.x.
You can recreate tables using ``COPY TO`` and ``COPY FROM`` while running a
2.x release into a new table, or by `inserting the data into a new table`_.
Before upgrading, you should `back up your data`_.
.. _rolling upgrade: https://crate.io/docs/crate/howtos/en/latest/admin/rolling-upgrade.html
.. _full restart upgrade: https://crate.io/docs/crate/howtos/en/latest/admin/full-restart-upgrade.html
.. _back up your data: https://crate.io/docs/crate/reference/en/latest/admin/snapshots.html
.. rubric:: Table of contents
.. contents::
:local:
Changelog
=========
Breaking Changes
----------------
- Dropped support for tables that have been created with CrateDB prior to
version 2.0. Tables which require upgrading are indicated in the cluster
checks, including visually shown in the Admin UI, if running the latest 2.2
or 2.3 release. The upgrade of tables needs to happen before updating CrateDB
to this version. This can be done by exporting the data with ``COPY TO`` and
importing it into a new table with ``COPY FROM``. Alternatively you can use
``INSERT`` with query.
- Data paths as defined in ``path.data`` must not contain the cluster name as a
folder. Data paths which are not compatible with this version are indicated
in the node checks, including visually shown in the Admin UI, if running the
latest 2.2 or 2.3 release.
- The ``region`` setting for ``CREATE REPOSITORY`` has been removed. It is
automatically inferred but can still be manually specified by using the
``endpoint`` setting.
- Store level throttling settings ``indices.store.throttle.*`` have been
removed.
- The gateway recovery table setting ``recovery.initial_shards`` has been
removed. Nodes will recover their unassigned local primary shards
immediately after restart.
- The discovery setting ``discovery.type`` has been removed. To enable EC2
discovery, the ``discovery.zen.hosts_provider`` setting must be set to
``ec2``.
- Dropped support for reading AWS credentials used for S3 and EC2 discovery
from environment variables ``AWS_ACCESS_KEY_ID`` and
``AWS_SECRET_ACCESS_KEY`` as well as Java system properties
``aws.accessKeyId`` and ``aws.secretKey``.
- EC2 ``cloud.aws.*`` settings have been renamed to ``discovery.ec2.*``.
- The setting that controls system call filters ``bootstrap.seccomp`` has been
has been renamed to ``bootstrap.system_call_filter``.
- The columns ``number_of_shards``, ``number_of_replicas``, and
``self_referencing_column_name`` in ``information_schema.tables`` changed to
return ``NULL`` for non-sharded tables.
- Adapted queries in the Admin UI to be compatible with CrateDB 3.0 and
greater.
- For HTTP authentication, support was dropped for the ``X-User`` header, used
to provide a username, which has been deprecated in ``2.3.0.`` in favour of
the standard HTTP ``Authorization`` header.
- The ``error_trace`` GET parameter of the HTTP endpoint only allows ``true``
and ``false`` in lower case. Other values are not allowed any more and will
result in a parsing exception.
- The ``_node`` column on ``sys.shards`` and ``sys.operations`` has been
renamed to ``node``, is now visible by default and has been trimmed to only
include ``node['id']`` and ``node['name']``. In order to get all information
a join query can be used with ``sys.nodes``.
Changes
-------
- CrateDB is now based on Elasticsearch 6.1.4 and Lucene 7.1.0.
- Multiple Admin UI improvements.
- Added a new tab for views in the Admin UI which lists available views and
their properties.
- Updated the bundled CrateDB Shell (``crash``) to version ``0.24.0`` which
adds support for default schema for connections.
- Added support in the PostgreSQL Wire Protocol's SimpleQuery mode to process a
query string which contains multiple queries delimited by semicolons.
- Added support for ``DEALLOCATE`` statement which is used by certain
PostgreSQL Wire Protocol clients (e.g. libpq) to deallocate a prepared
statement and release its resources.
- Added support for ordering on analysed columns and :ref:`partition columns
<gloss-partition-column>`.
- Added support for views which can be created using the new ``CREATE VIEW``
statement and dropped using the ``DROP VIEW`` statement. Views are listed in
``information_schema.views`` and they show up in
``information_schema.tables`` as well as ``information_schema.columns``.
- Enterprise: Added the VIEW privilege class which can be used to grant/deny
access to views.
- Added support for ``INSERT INTO ... ON CONFLICT DO NOTHING``. The statement
ignores insert values which would cause duplicate keys.
- Added support for ``ON CONFLICT`` clause in insert statements. ``INSERT INTO
... ON CONFLICT (pk_col) DO UPDATE SET col = val`` is identical to ``INSERT
INTO ... ON DUPLICATE KEY UPDATE col = val``. The special ``EXCLUDED`` table
can be used to refer to the insert values: ``INSERT INTO ... ON CONFLICT
(pk_col) DO UPDATE SET col = EXCLUDED.col``
- DEPRECATED: The ``ON DUPLICATE KEY UPDATE`` clause has been deprecated in
favor of the ``ON CONFLICT DO UPDATE SET`` clause.
- Implemented the Block Hash Join algorithm which is now used for Equi-Joins.
- Added new ``sys.health`` system information table to expose the health of all
tables and table partitions.
- Added new ``cluster.routing.allocation.disk.watermark.flood_stage`` setting,
that controls at which disk usage indices should become read-only to prevent
running out of disk space. There is also a new node check that indicates
whether the threshold is exceeded.
- Added a new ``bengali`` language analyzer and a ``bengali_normalization``
token filter.
- Add ``max_token_length`` parameter to whitespace tokenizer.
- Added new tokenizers ``simple_pattern`` and ``simple_pattern_split`` which
allow to tokenize text for the fulltext index by a :ref:`regular expression
<gloss-regular-expression>` pattern.
- Added support for CSV file inputs in ``COPY FROM`` statements. Input type is
inferred using the file's extension or can be set using the optional ``WITH``
clause and specifying the ``format``.
- Fully qualified column names including a schema name will no longer match on
table aliases.
- The default user if enterprise is disabled changed from ``null`` to
``crate``. This causes entries in ``sys.jobs`` to show up with ``crate`` as
username. Functions like ``user`` will also return ``crate`` if enterprise is
enabled but the user module is not available.
- Display the node information (name and id) of jobs in the ``sys.jobs`` table.
- Changed the primary key constraints of the information schema tables
``table_constraints``, ``referential_constraints``, ``table_partitions``,
``key_column_usage``, ``columns``, and ``tables`` to be SQL compliant.
- Arrays can now contain mixed types if they're safely convertible. JSON
libraries tend to encode values like ``[0.0, 1.2]`` as ``[0, 1.2]``, this
caused an error because of the strict type match we enforced before.
- Implemented ``constraint_schema`` and ``table_schema`` in
``information_schema.key_column_usage`` correctly and documented the full
table schema.
- Statistics for jobs and operations are enabled by default. If you don't need
any statistics, please set ``stats.enabled`` to ``false``.
- Changed ``BEGIN`` and ``SET SESSION`` to no longer require ``DQL``
permissions on the ``CLUSTER`` level.
- Added ``epoch`` argument to the ``EXTRACT`` function which returns the number
of seconds since Jan 1, 1970. For example: ``extract(epoch from
'1970-01-01T00:00:01')`` returns ``1.0`` seconds.
- Enable logging of JVM garbage collection times that help to debug memory
pressure and garbage collection issues. GC log files are stored separately to
the standard CrateDB logs and the files are log-rotated.
- CrateDB will now by default create a heap dump in case of a crash caused by
an out of memory error. This makes it necessary to account for the additional
disk space requirements.
- Implemented a ``Ready`` node status JMX metric expressing if the node is
ready for processing SQL statements.
- Implemented a ``NodeInfo`` JMX MBean to expose useful information (id, name)
about the node.
- Fixed path of log file name in rotation pattern in ``log4j2.properties``. It
now writes into the correct logging directory instead of the parent
directory.
- ``ALTER TABLE <name> OPEN`` will now wait for all shards to become active
before returning to be consistent with the behaviour of other statements.
- Added note about the newly available ``JMX HTTP Exporter`` to the monitoring
documentation section.
- The first argument (``field``) of the ``EXTRACT`` function has been limited
to string literals and identifiers, as it was documented.
.. _version_3.0.0_upgrade_notes:
Upgrade Notes
=============
Configuration Changes
---------------------
There are a few configuration changes that you should be aware of before
restarting the nodes.
Removed Settings
................
- All store level throttle settings (under ``indices.store.throttle.*``) have
been removed, and should be removed from your node configuration.
- Similarly, the ``recovery.initial_shards`` configuration option has been
removed, and should also be removed from your configuration.
Renamed Settings
................
- The ``discovery.type`` setting which was previously used to specify whether a
cluster should use DNS discovery or the EC2 API, has been removed.
Configuring the use of the EC2 API has now been moved to the
``discovery.zen.hosts_provider`` setting.
- The ``bootstrap.seccomp`` setting, which controls system call filters, has
been renamed to ``bootstrap.system_call_filter``.
Altered Settings
................
- The ``path.data`` setting specifies the path or paths where the CrateDB node
should store its table data and cluster metadata.
In CrateDB 3.0.0 and later, this path must *not* contain the cluster name as
a directory. For example, if you have set ``cluster.name: abcdef``, the
setting ``path.data: /mnt/abcdef/data`` would be incompatible. Moving or
renaming the directory, such as to ``/mnt/data``, and altering your
``path.data`` setting accordingly will allow you to continue using the node's
data.
Data paths that are incompatible with 3.0.0 will be indicated visually in the
`Admin UI`_ if you are running the latest 2.2.x or 2.3.x release.
Other Changes
-------------
- The ``CREATE REPOSITORY`` statement for creating backup repositories has been
changed.
Previously, when using Amazon S3 for backup storage, bucket regions had to be
configured explicitly. Bucket regions are now inferred automatically.
If you want to override this, you can use the :ref:`endpoint parameter
<sql-create-repo-s3-endpoint>`.
- Previously, the ``X-User`` HTTP header could be used to provide a username.
This head is now deprecated in favour of the standard `HTTP Authorization
header`_.
- The ``_node`` column in the ``sys.shards`` and ``sys.operations`` tables has
been renamed to ``node``.
Additionally, ``node`` object now only includes ``id`` and ``name`` of the
node, i.e. ``node['id']`` and ``node['name']``.
To get the full node information, use ``node['id']`` to join the
``sys.nodes`` table.
.. _Admin UI: https://crate.io/docs/clients/admin-ui/en/latest/
.. _backup: https://crate.io/docs/crate/reference/en/latest/admin/snapshots.html
.. _full cluster restart: https://crate.io/docs/crate/howtos/en/latest/admin/full-restart-upgrade.html
.. _HTTP Authorization header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization
.. _inserting the data into a new table: https://crate.io/docs/crate/reference/en/latest/admin/system-information.html#tables-need-to-be-recreated