-
Notifications
You must be signed in to change notification settings - Fork 110
TAJO-1740: Update Partition Table document #677
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| ******************************** | ||
| Add data to Partition Table | ||
| ******************************** | ||
|
|
||
| Tajo provides a very useful feature of dynamic partitioning. You don't need to use any syntax with both ``INSERT INTO ... SELECT`` and ``Create Table As Select(CTAS)`` statments for dynamic partitioning. Tajo will automatically filter the data, create directories, move filtered data to appropriate directory and create partition over it. | ||
|
|
||
| For example, assume there are both ``student_source`` and ``student`` tables composed of the following schema. | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| CREATE TABLE student_source ( | ||
| id INT, | ||
| name TEXT, | ||
| gender char(1), | ||
| grade TEXT, | ||
| country TEXT, | ||
| city TEXT, | ||
| phone TEXT | ||
| ); | ||
|
|
||
| CREATE TABLE student ( | ||
| id INT, | ||
| name TEXT, | ||
| grade TEXT | ||
| ) PARTITION BY COLUMN (country TEXT, city TEXT); | ||
|
|
||
|
|
||
| ================================================ | ||
| How to INSERT dynamically to partition table | ||
| ================================================ | ||
|
|
||
| If you want to load an entire country or an entire city in one fell swoop: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| INSERT OVERWRITE INTO student | ||
| SELECT id, name, grade, country, city | ||
| FROM student_source; | ||
|
|
||
|
|
||
| ================================================ | ||
| How to CTAS dynamically to partition table | ||
| ================================================ | ||
|
|
||
| If you want to load an entire country or an entire city in one fell swoop: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| DROP TABLE if exists student; | ||
|
|
||
| CREATE TABLE student ( | ||
| id INT, | ||
| name TEXT, | ||
| grade TEXT | ||
| ) PARTITION BY COLUMN (country TEXT, city TEXT) | ||
| AS SELECT id, name, grade, country, city | ||
| FROM student_source; | ||
|
|
||
|
|
||
| .. note:: | ||
|
|
||
| When loading data into a partition, it’s necessary to include the partition columns as the last columns in the query. The column names in the source query don’t need to match the partition column names. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| ******************************** | ||
| Alter partition | ||
| ******************************** | ||
|
|
||
| You can ALTER TABLE to add or drop partitions for partition table. | ||
|
|
||
| For example, assume there is a ``student`` table composed of the following schema. | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| CREATE TABLE student ( | ||
| id INT, | ||
| name TEXT, | ||
| grade TEXT | ||
| ) PARTITION BY COLUMN (country TEXT, city TEXT); | ||
|
|
||
|
|
||
| ======================== | ||
| ADD PARTITION | ||
| ======================== | ||
|
|
||
| *Synopsis* | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| ALTER TABLE <table_name> [IF NOT EXISTS] ADD PARTITION (<partition column> = <partition value>, ...) [LOCATION = <partition's path>] | ||
|
|
||
| *Description* | ||
|
|
||
| You can use ``ALTER TABLE ADD PARTITION`` to ADD PARTITIONs to a table. The location must be a directory inside of which data files reside. If the location doesn't exist on the file system, Tajo will make the location by force. ``ADD PARTITION`` changes the table metadata, but does not load data. If the data does not exist in the partition's location, queries will not return any results. An error is thrown if the partition for the table already exists. You can use ``IF NOT EXISTS`` to skip the error. | ||
|
|
||
| *Examples* | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| -- Each ADD PARTITION clause creates a subdirectory in HDFS. | ||
| ALTER TABLE student ADD PARTITION (country='KOREA', city='SEOUL'); | ||
| ALTER TABLE student ADD PARTITION (country='KOREA', city='PUSAN'); | ||
| ALTER TABLE student ADD PARTITION (country='USA', city='NEWYORK'); | ||
| ALTER TABLE student ADD PARTITION (country='USA', city='BOSTON'); | ||
| -- Redirect queries, INSERT, and LOAD DATA for one partition to a specific different directory. | ||
| ALTER TABLE student ADD PARTITION (country='USA', city='BOSTON') LOCATION '/usr/external_data/new_years_day'; | ||
|
|
||
|
|
||
| ======================== | ||
| DROP PARTITION | ||
| ======================== | ||
|
|
||
| *Synopsis* | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| ALTER TABLE <table_name> [IF EXISTS] DROP PARTITION (<partition column> = <partition value>, ...) [PURGE] | ||
|
|
||
| *Description* | ||
|
|
||
| You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This doesn't remove the data for partition table. But if ``PURGE`` is specified, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error. | ||
|
|
||
| *Examples* | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| -- Delete just metadata | ||
| ALTER TABLE table1 DROP PARTITION (country = 'KOREA' , city = 'SEOUL'); | ||
| -- Delete table data and metadata | ||
| ALTER TABLE table1 DROP PARTITION (country = 'USA', city = 'NEWYORK' ) PURGE | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ';' is missed |
||
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| ********************************* | ||
| Define Partition Table | ||
| ********************************* | ||
|
|
||
| Tajo makes it easy to specify an automatic partition scheme when the table is created. | ||
|
|
||
| ================================================ | ||
| How to Create Partitione Table | ||
| ================================================ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 'Partitione' -> 'Partition' |
||
|
|
||
| You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use | ||
| the ``PARTITION BY COLUMN`` clause with partition keys. | ||
|
|
||
| For example, assume there is a table ``student`` composed of the following schema. | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| id INT, | ||
| name TEXT, | ||
| grade TEXT | ||
|
|
||
| Now you want to partition on country. Your Tajo definition would be this: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| CREATE TABLE student ( | ||
| id INT, | ||
| name TEXT, | ||
| grade TEXT | ||
| ) PARTITION BY COLUMN (country TEXT); | ||
|
|
||
| Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values. | ||
| Here's an example statement to create a table: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| CREATE TABLE student ( | ||
| id INT, | ||
| name TEXT, | ||
| grade TEXT | ||
| ) USING PARQUET | ||
| PARTITION BY COLUMN (country TEXT, city TEXT); | ||
|
|
||
| The statement above creates the student table with id, name, grade. The table is also partitioned and data is stored in parquet files. | ||
|
|
||
| You might have noticed that while the partitioning key columns are a part of the table DDL, they’re only listed in the ``PARTITION BY`` clause. In Tajo, as data is written to disk, each partition of data will be automatically split out into different folders, e.g. country=USA/city=NEWYORK. During a read operation, Tajo will use the folder structure to quickly locate the right partitions and also return the partitioning columns as columns in the result set. | ||
|
|
||
|
|
||
| ================================================== | ||
| Partition Pruning on Partition Table | ||
| ================================================== | ||
|
|
||
| The following predicates in the ``WHERE`` clause can be used to prune unqualified column partitions without processing | ||
| during query planning phase. | ||
|
|
||
| * ``=`` | ||
| * ``<>`` | ||
| * ``>`` | ||
| * ``<`` | ||
| * ``>=`` | ||
| * ``<=`` | ||
| * LIKE predicates with a leading wild-card character | ||
| * IN list predicates | ||
|
|
||
| Now above example table data is partitioned by country and city, so when the query is applied on table it can easily access the required row by the help partitions | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 'the help of partitions' looks better |
||
|
|
||
|
|
||
| .. code-block:: sql | ||
|
|
||
| SELECT * FROM student WHERE country = 'KOREA' AND city = 'SEOUL'; | ||
| SELECT * FROM student WHERE country = 'USA' AND (city = 'NEWYORK' OR city = 'BOSTON'); | ||
| SELECT * FROM student WHERE country = 'USA' AND city <> 'NEWYORK'; | ||
|
|
||
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| ******************************** | ||
| Hive Compatibility | ||
| ******************************** | ||
|
|
||
| Tajo provides HiveCatalogStore to process the Hive partitioned tables directly. If you wish to use HiveCatalogStore, you should specify hive configurations to both tajo-env.sh file and catalog-site.xml file. Please see the following page. | ||
|
|
||
| .. toctree:: | ||
| :maxdepth: 1 | ||
|
|
||
| /hive_integration | ||
|
|
||
| ================================================ | ||
| How to create partition table | ||
| ================================================ | ||
|
|
||
| If you want to create a partition table as follows in Tajo: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| default> CREATE TABLE student ( | ||
| id INT, | ||
| name TEXT, | ||
| grade TEXT | ||
| ) PARTITION BY COLUMN (country TEXT, city TEXT); | ||
|
|
||
|
|
||
| And then you can get table information in Hive: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| hive> desc student; | ||
| OK | ||
| id int | ||
| name string | ||
| grade string | ||
| country string | ||
| city string | ||
|
|
||
| # Partition Information | ||
| # col_name data_type comment | ||
|
|
||
| country string | ||
| city string | ||
|
|
||
|
|
||
| Or as you create the table in Hive: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| hive > CREATE TABLE student ( | ||
| id int, | ||
| name string, | ||
| grade string | ||
| ) PARTITIONED BY (country string, city string) | ||
| ROW FORMAT DELIMITED | ||
| FIELDS TERMINATED BY '|' ; | ||
|
|
||
| You will see table information in Tajo: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| default> \d student; | ||
| table name: default.student | ||
| table uri: hdfs://your_hdfs_namespace/user/hive/warehouse/student | ||
| store type: TEXT | ||
| number of rows: 0 | ||
| volume: 0 B | ||
| Options: | ||
| 'text.null'='\\N' | ||
| 'transient_lastDdlTime'='1438756422' | ||
| 'text.delimiter'='|' | ||
|
|
||
| schema: | ||
| id INT4 | ||
| name TEXT | ||
| grade TEXT | ||
|
|
||
| Partitions: | ||
| type:COLUMN | ||
| columns::default.student.country (TEXT), default.student.city (TEXT) | ||
|
|
||
|
|
||
|
|
||
| ================================================ | ||
| How to add data to partition table | ||
| ================================================ | ||
|
|
||
| In Tajo, you can add data dynamically to partition table of Hive with both ``INSERT INTO ... SELECT`` and ``Create Table As Select (CTAS)`` statments. Tajo will automatically filter the data to HiveMetastore, create directories and move filtered data to appropriate directory on the distributed file system | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. '.' is missed at the end of last statement. |
||
|
|
||
|
|
||
This file was deleted.
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explanation is same precisely as above one.
I think it's better to add some words like 'including CREATE TABLE' or 'when a partition table is created'