Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
********************************
Add data to Partition Table
********************************

Tajo provides a very useful feature of dynamic partitioning. You don't need to use any syntax with both ``INSERT INTO ... SELECT`` and ``Create Table As Select(CTAS)`` statments for dynamic partitioning. Tajo will automatically filter the data, create directories, move filtered data to appropriate directory and create partition over it.

For example, assume there are both ``student_source`` and ``student`` tables composed of the following schema.

.. code-block:: sql

CREATE TABLE student_source (
id INT,
name TEXT,
gender char(1),
grade TEXT,
country TEXT,
city TEXT,
phone TEXT
);

CREATE TABLE student (
id INT,
name TEXT,
grade TEXT
) PARTITION BY COLUMN (country TEXT, city TEXT);


================================================
How to INSERT dynamically to partition table
================================================

If you want to load an entire country or an entire city in one fell swoop:

.. code-block:: sql

INSERT OVERWRITE INTO student
SELECT id, name, grade, country, city
FROM student_source;


================================================
How to CTAS dynamically to partition table
================================================

If you want to load an entire country or an entire city in one fell swoop:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation is same precisely as above one.
I think it's better to add some words like 'including CREATE TABLE' or 'when a partition table is created'


.. code-block:: sql

DROP TABLE if exists student;

CREATE TABLE student (
id INT,
name TEXT,
grade TEXT
) PARTITION BY COLUMN (country TEXT, city TEXT)
AS SELECT id, name, grade, country, city
FROM student_source;


.. note::

When loading data into a partition, it’s necessary to include the partition columns as the last columns in the query. The column names in the source query don’t need to match the partition column names.
66 changes: 66 additions & 0 deletions tajo-docs/src/main/sphinx/partitioning/alter_partition.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
********************************
Alter partition
********************************

You can ALTER TABLE to add or drop partitions for partition table.

For example, assume there is a ``student`` table composed of the following schema.

.. code-block:: sql

CREATE TABLE student (
id INT,
name TEXT,
grade TEXT
) PARTITION BY COLUMN (country TEXT, city TEXT);


========================
ADD PARTITION
========================

*Synopsis*

.. code-block:: sql

ALTER TABLE <table_name> [IF NOT EXISTS] ADD PARTITION (<partition column> = <partition value>, ...) [LOCATION = <partition's path>]

*Description*

You can use ``ALTER TABLE ADD PARTITION`` to ADD PARTITIONs to a table. The location must be a directory inside of which data files reside. If the location doesn't exist on the file system, Tajo will make the location by force. ``ADD PARTITION`` changes the table metadata, but does not load data. If the data does not exist in the partition's location, queries will not return any results. An error is thrown if the partition for the table already exists. You can use ``IF NOT EXISTS`` to skip the error.

*Examples*

.. code-block:: sql

-- Each ADD PARTITION clause creates a subdirectory in HDFS.
ALTER TABLE student ADD PARTITION (country='KOREA', city='SEOUL');
ALTER TABLE student ADD PARTITION (country='KOREA', city='PUSAN');
ALTER TABLE student ADD PARTITION (country='USA', city='NEWYORK');
ALTER TABLE student ADD PARTITION (country='USA', city='BOSTON');
-- Redirect queries, INSERT, and LOAD DATA for one partition to a specific different directory.
ALTER TABLE student ADD PARTITION (country='USA', city='BOSTON') LOCATION '/usr/external_data/new_years_day';


========================
DROP PARTITION
========================

*Synopsis*

.. code-block:: sql

ALTER TABLE <table_name> [IF EXISTS] DROP PARTITION (<partition column> = <partition value>, ...) [PURGE]

*Description*

You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This doesn't remove the data for partition table. But if ``PURGE`` is specified, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error.

*Examples*

.. code-block:: sql

-- Delete just metadata
ALTER TABLE table1 DROP PARTITION (country = 'KOREA' , city = 'SEOUL');
-- Delete table data and metadata
ALTER TABLE table1 DROP PARTITION (country = 'USA', city = 'NEWYORK' ) PURGE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

';' is missed

52 changes: 0 additions & 52 deletions tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst

This file was deleted.

73 changes: 73 additions & 0 deletions tajo-docs/src/main/sphinx/partitioning/define_partition_table.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
*********************************
Define Partition Table
*********************************

Tajo makes it easy to specify an automatic partition scheme when the table is created.

================================================
How to Create Partitione Table
================================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Partitione' -> 'Partition'


You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
the ``PARTITION BY COLUMN`` clause with partition keys.

For example, assume there is a table ``student`` composed of the following schema.

.. code-block:: sql

id INT,
name TEXT,
grade TEXT

Now you want to partition on country. Your Tajo definition would be this:

.. code-block:: sql

CREATE TABLE student (
id INT,
name TEXT,
grade TEXT
) PARTITION BY COLUMN (country TEXT);

Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values.
Here's an example statement to create a table:

.. code-block:: sql

CREATE TABLE student (
id INT,
name TEXT,
grade TEXT
) USING PARQUET
PARTITION BY COLUMN (country TEXT, city TEXT);

The statement above creates the student table with id, name, grade. The table is also partitioned and data is stored in parquet files.

You might have noticed that while the partitioning key columns are a part of the table DDL, they’re only listed in the ``PARTITION BY`` clause. In Tajo, as data is written to disk, each partition of data will be automatically split out into different folders, e.g. country=USA/city=NEWYORK. During a read operation, Tajo will use the folder structure to quickly locate the right partitions and also return the partitioning columns as columns in the result set.


==================================================
Partition Pruning on Partition Table
==================================================

The following predicates in the ``WHERE`` clause can be used to prune unqualified column partitions without processing
during query planning phase.

* ``=``
* ``<>``
* ``>``
* ``<``
* ``>=``
* ``<=``
* LIKE predicates with a leading wild-card character
* IN list predicates

Now above example table data is partitioned by country and city, so when the query is applied on table it can easily access the required row by the help partitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'the help of partitions' looks better



.. code-block:: sql

SELECT * FROM student WHERE country = 'KOREA' AND city = 'SEOUL';
SELECT * FROM student WHERE country = 'USA' AND (city = 'NEWYORK' OR city = 'BOSTON');
SELECT * FROM student WHERE country = 'USA' AND city <> 'NEWYORK';

5 changes: 0 additions & 5 deletions tajo-docs/src/main/sphinx/partitioning/hash_partitioning.rst

This file was deleted.

90 changes: 90 additions & 0 deletions tajo-docs/src/main/sphinx/partitioning/hive_compatibility.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
********************************
Hive Compatibility
********************************

Tajo provides HiveCatalogStore to process the Hive partitioned tables directly. If you wish to use HiveCatalogStore, you should specify hive configurations to both tajo-env.sh file and catalog-site.xml file. Please see the following page.

.. toctree::
:maxdepth: 1

/hive_integration

================================================
How to create partition table
================================================

If you want to create a partition table as follows in Tajo:

.. code-block:: sql

default> CREATE TABLE student (
id INT,
name TEXT,
grade TEXT
) PARTITION BY COLUMN (country TEXT, city TEXT);


And then you can get table information in Hive:

.. code-block:: sql

hive> desc student;
OK
id int
name string
grade string
country string
city string

# Partition Information
# col_name data_type comment

country string
city string


Or as you create the table in Hive:

.. code-block:: sql

hive > CREATE TABLE student (
id int,
name string,
grade string
) PARTITIONED BY (country string, city string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|' ;

You will see table information in Tajo:

.. code-block:: sql

default> \d student;
table name: default.student
table uri: hdfs://your_hdfs_namespace/user/hive/warehouse/student
store type: TEXT
number of rows: 0
volume: 0 B
Options:
'text.null'='\\N'
'transient_lastDdlTime'='1438756422'
'text.delimiter'='|'

schema:
id INT4
name TEXT
grade TEXT

Partitions:
type:COLUMN
columns::default.student.country (TEXT), default.student.city (TEXT)



================================================
How to add data to partition table
================================================

In Tajo, you can add data dynamically to partition table of Hive with both ``INSERT INTO ... SELECT`` and ``Create Table As Select (CTAS)`` statments. Tajo will automatically filter the data to HiveMetastore, create directories and move filtered data to appropriate directory on the distributed file system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'.' is missed at the end of last statement.



15 changes: 0 additions & 15 deletions tajo-docs/src/main/sphinx/partitioning/intro_to_partitioning.rst

This file was deleted.

5 changes: 0 additions & 5 deletions tajo-docs/src/main/sphinx/partitioning/range_partitioning.rst

This file was deleted.

Loading