docs/appendices/release-notes/3.0.0.rst

.. _version_3.0.0:

=============
Version 3.0.0
=============

Released on 2018/05/16.

.. NOTE::

    If you are upgrading a cluster, you must be running CrateDB 2.0.4 or higher
    before you upgrade to 3.0.0.

    We recommend that you upgrade to the latest 2.3 release before moving to
    3.0.0.

    You cannot perform a `rolling upgrade`_ to this version. Any upgrade to
    this version will require a `full restart upgrade`_.

    When restarting, CrateDB will migrate indexes to a newer format. Depending
    on the amount of data, this may delay node start-up time.

    Please consult the `Upgrade Notes`_ before upgrading.

.. WARNING::

    Tables that were created prior to upgrading to CrateDB 2.x will not
    function with 3.0 and must be recreated before moving to 3.0.x.

    You can recreate tables using ``COPY TO`` and ``COPY FROM`` while running a
    2.x release into a new table, or by `inserting the data into a new table`_.

    Before upgrading, you should `back up your data`_.

.. _rolling upgrade: https://crate.io/docs/crate/howtos/en/latest/admin/rolling-upgrade.html
.. _full restart upgrade: https://crate.io/docs/crate/howtos/en/latest/admin/full-restart-upgrade.html
.. _back up your data: https://crate.io/docs/crate/reference/en/latest/admin/snapshots.html

.. rubric:: Table of contents

.. contents::
   :local:


Changelog
=========


Breaking Changes
----------------

- Dropped support for tables that have been created with CrateDB prior to
  version 2.0. Tables which require upgrading are indicated in the cluster
  checks, including visually shown in the Admin UI, if running the latest 2.2
  or 2.3 release. The upgrade of tables needs to happen before updating CrateDB
  to this version. This can be done by exporting the data with ``COPY TO`` and
  importing it into a new table with ``COPY FROM``.  Alternatively you can use
  ``INSERT`` with query.

- Data paths as defined in ``path.data`` must not contain the cluster name as a
  folder. Data paths which are not compatible with this version are indicated
  in the node checks, including visually shown in the Admin UI, if running the
  latest 2.2 or 2.3 release.

- The ``region`` setting for ``CREATE REPOSITORY`` has been removed. It is
  automatically inferred but can still be manually specified by using the
  ``endpoint`` setting.

- Store level throttling settings ``indices.store.throttle.*`` have been
  removed.

- The gateway recovery table setting ``recovery.initial_shards`` has been
  removed. Nodes will recover their unassigned local primary shards
  immediately after restart.

- The discovery setting ``discovery.type`` has been removed. To enable EC2
  discovery, the ``discovery.zen.hosts_provider`` setting must be set to
  ``ec2``.

- Dropped support for reading AWS credentials used for S3 and EC2 discovery
  from environment variables ``AWS_ACCESS_KEY_ID`` and
  ``AWS_SECRET_ACCESS_KEY`` as well as Java system properties
  ``aws.accessKeyId`` and ``aws.secretKey``.

- EC2 ``cloud.aws.*`` settings have been renamed to ``discovery.ec2.*``.

- The setting that controls system call filters ``bootstrap.seccomp`` has been
  has been renamed to ``bootstrap.system_call_filter``.

- The columns ``number_of_shards``, ``number_of_replicas``, and
  ``self_referencing_column_name`` in ``information_schema.tables`` changed to
  return ``NULL`` for non-sharded tables.

- Adapted queries in the Admin UI to be compatible with CrateDB 3.0 and
  greater.

- For HTTP authentication, support was dropped for the ``X-User`` header, used
  to provide a username, which has been deprecated in ``2.3.0.`` in favour of
  the standard HTTP ``Authorization`` header.

- The ``error_trace`` GET parameter of the HTTP endpoint only allows ``true``
  and ``false`` in lower case. Other values are not allowed any more and will
  result in a parsing exception.

- The ``_node`` column on ``sys.shards`` and ``sys.operations`` has been
  renamed to ``node``, is now visible by default and has been trimmed to only
  include ``node['id']`` and ``node['name']``. In order to get all information
  a join query can be used with ``sys.nodes``.


Changes
-------

- CrateDB is now based on Elasticsearch 6.1.4 and Lucene 7.1.0.

- Multiple Admin UI improvements.

- Added a new tab for views in the Admin UI which lists available views and
  their properties.

- Updated the bundled CrateDB Shell (``crash``) to version ``0.24.0`` which
  adds support for default schema for connections.

- Added support in the PostgreSQL Wire Protocol's SimpleQuery mode to process a
  query string which contains multiple queries delimited by semicolons.

- Added support for ``DEALLOCATE`` statement which is used by certain
  PostgreSQL Wire Protocol clients (e.g. libpq) to deallocate a prepared
  statement and release its resources.

- Added support for ordering on analysed columns and :ref:`partition columns
  <gloss-partition-column>`.

- Added support for views which can be created using the new ``CREATE VIEW``
  statement and dropped using the ``DROP VIEW`` statement. Views are listed in
  ``information_schema.views`` and they show up in
  ``information_schema.tables`` as well as ``information_schema.columns``.

- Enterprise: Added the VIEW privilege class which can be used to grant/deny
  access to views.

- Added support for ``INSERT INTO ... ON CONFLICT DO NOTHING``. The statement
  ignores insert values which would cause duplicate keys.

- Added support for ``ON CONFLICT`` clause in insert statements.  ``INSERT INTO
  ...  ON CONFLICT (pk_col) DO UPDATE SET col = val`` is identical to ``INSERT
  INTO ... ON DUPLICATE KEY UPDATE col = val``.  The special ``EXCLUDED`` table
  can be used to refer to the insert values: ``INSERT INTO ... ON CONFLICT
  (pk_col) DO UPDATE SET col = EXCLUDED.col``

- DEPRECATED: The ``ON DUPLICATE KEY UPDATE`` clause has been deprecated in
  favor of the ``ON CONFLICT DO UPDATE SET`` clause.

- Implemented the Block Hash Join algorithm which is now used for Equi-Joins.

- Added new ``sys.health`` system information table to expose the health of all
  tables and table partitions.

- Added new ``cluster.routing.allocation.disk.watermark.flood_stage`` setting,
  that controls at which disk usage indices should become read-only to prevent
  running out of disk space. There is also a new node check that indicates
  whether the threshold is exceeded.

- Added a new ``bengali`` language analyzer and a ``bengali_normalization``
  token filter.

- Add ``max_token_length`` parameter to whitespace tokenizer.

- Added new tokenizers ``simple_pattern`` and ``simple_pattern_split`` which
  allow to tokenize text for the fulltext index by a :ref:`regular expression
  <gloss-regular-expression>` pattern.

- Added support for CSV file inputs in ``COPY FROM`` statements. Input type is
  inferred using the file's extension or can be set using the optional ``WITH``
  clause and specifying the ``format``.

- Fully qualified column names including a schema name will no longer match on
  table aliases.

- The default user if enterprise is disabled changed from ``null`` to
  ``crate``. This causes entries in ``sys.jobs`` to show up with ``crate`` as
  username. Functions like ``user`` will also return ``crate`` if enterprise is
  enabled but the user module is not available.

- Display the node information (name and id) of jobs in the ``sys.jobs`` table.

- Changed the primary key constraints of the information schema tables
  ``table_constraints``, ``referential_constraints``, ``table_partitions``,
  ``key_column_usage``, ``columns``, and ``tables`` to be SQL compliant.

- Arrays can now contain mixed types if they're safely convertible. JSON
  libraries tend to encode values like ``[0.0, 1.2]`` as ``[0, 1.2]``, this
  caused an error because of the strict type match we enforced before.

- Implemented ``constraint_schema`` and ``table_schema`` in
  ``information_schema.key_column_usage`` correctly and documented the full
  table schema.

- Statistics for jobs and operations are enabled by default. If you don't need
  any statistics, please set ``stats.enabled`` to ``false``.

- Changed ``BEGIN`` and ``SET SESSION`` to no longer require ``DQL``
  permissions on the ``CLUSTER`` level.

- Added ``epoch`` argument to the ``EXTRACT`` function which returns the number
  of seconds since Jan 1, 1970. For example: ``extract(epoch from
  '1970-01-01T00:00:01')`` returns ``1.0`` seconds.

- Enable logging of JVM garbage collection times that help to debug memory
  pressure and garbage collection issues. GC log files are stored separately to
  the standard CrateDB logs and the files are log-rotated.

- CrateDB will now by default create a heap dump in case of a crash caused by
  an out of memory error. This makes it necessary to account for the additional
  disk space requirements.

- Implemented a ``Ready`` node status JMX metric expressing if the node is
  ready for processing SQL statements.

- Implemented a ``NodeInfo`` JMX MBean to expose useful information (id, name)
  about the node.

- Fixed path of log file name in rotation pattern in ``log4j2.properties``. It
  now writes into the correct logging directory instead of the parent
  directory.

- ``ALTER TABLE <name> OPEN`` will now wait for all shards to become active
  before returning to be consistent with the behaviour of other statements.

- Added note about the newly available ``JMX HTTP Exporter`` to the monitoring
  documentation section.

- The first argument (``field``) of the ``EXTRACT`` function has been limited
  to string literals and identifiers, as it was documented.


.. _version_3.0.0_upgrade_notes:

Upgrade Notes
=============


Configuration Changes
---------------------

There are a few configuration changes that you should be aware of before
restarting the nodes.


Removed Settings
................

- All store level throttle settings (under ``indices.store.throttle.*``) have
  been removed, and should be removed from your node configuration.

- Similarly, the ``recovery.initial_shards`` configuration option has been
  removed, and should also be removed from your configuration.


Renamed Settings
................

- The ``discovery.type`` setting which was previously used to specify whether a
  cluster should use DNS discovery or the EC2 API, has been removed.
  Configuring the use of the EC2 API has now been moved to the
  ``discovery.zen.hosts_provider`` setting.

- The ``bootstrap.seccomp`` setting, which controls system call filters, has
  been renamed to ``bootstrap.system_call_filter``.


Altered Settings
................

- The ``path.data`` setting specifies the path or paths where the CrateDB node
  should store its table data and cluster metadata.

  In CrateDB 3.0.0 and later, this path must *not* contain the cluster name as
  a directory. For example, if you have set ``cluster.name: abcdef``, the
  setting ``path.data: /mnt/abcdef/data`` would be incompatible. Moving or
  renaming the directory, such as to ``/mnt/data``, and altering your
  ``path.data`` setting accordingly will allow you to continue using the node's
  data.

  Data paths that are incompatible with 3.0.0 will be indicated visually in the
  `Admin UI`_ if you are running the latest 2.2.x or 2.3.x release.


Other Changes
-------------

- The ``CREATE REPOSITORY`` statement for creating backup repositories has been
  changed.

  Previously, when using Amazon S3 for backup storage, bucket regions had to be
  configured explicitly. Bucket regions are now inferred automatically.

  If you want to override this, you can use the :ref:`endpoint parameter
  <sql-create-repo-s3-endpoint>`.

- Previously, the ``X-User`` HTTP header could be used to provide a username.
  This head is now deprecated in favour of the standard `HTTP Authorization
  header`_.

- The ``_node`` column in the ``sys.shards`` and ``sys.operations`` tables has
  been renamed to ``node``.

  Additionally, ``node`` object now only includes ``id`` and ``name`` of the
  node, i.e. ``node['id']`` and ``node['name']``.

  To get the full node information, use ``node['id']`` to join the
  ``sys.nodes`` table.


.. _Admin UI: https://crate.io/docs/clients/admin-ui/en/latest/
.. _backup: https://crate.io/docs/crate/reference/en/latest/admin/snapshots.html
.. _full cluster restart: https://crate.io/docs/crate/howtos/en/latest/admin/full-restart-upgrade.html
.. _HTTP Authorization header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization
.. _inserting the data into a new table: https://crate.io/docs/crate/reference/en/latest/admin/system-information.html#tables-need-to-be-recreated