apache · blrunner · Aug 5, 2015 · Aug 25, 2015 · Aug 31, 2015 · eminency
diff --git a/tajo-docs/src/main/sphinx/partitioning/add_data_to_partition_table.rst b/tajo-docs/src/main/sphinx/partitioning/add_data_to_partition_table.rst
@@ -0,0 +1,62 @@
+********************************
+Add data to Partition Table
+********************************
+
+Tajo provides a very useful feature of dynamic partitioning. You don't need to use any syntax with both ``INSERT INTO ... SELECT`` and ``Create Table As Select(CTAS)`` statments for dynamic partitioning. Tajo will automatically filter the data, create directories, move filtered data to appropriate directory and create partition over it.
+
+For example, assume there are both ``student_source`` and ``student`` tables composed of the following schema.
+
+.. code-block:: sql
+
+  CREATE TABLE student_source (
+    id        INT,
+    name      TEXT,
+    gender    char(1),
+    grade     TEXT,
+    country   TEXT,
+    city      TEXT,
+    phone     TEXT
+  );
+
+  CREATE TABLE student (
+    id     INT,
+    name   TEXT,
+    grade  TEXT
+  ) PARTITION BY COLUMN (country TEXT, city TEXT);
+
+
+================================================
+How to INSERT dynamically to partition table
+================================================
+
+If you want to load an entire country or an entire city in one fell swoop:
+
+.. code-block:: sql
+
+  INSERT OVERWRITE INTO student
+  SELECT id, name, grade, country, city
+  FROM   student_source;
+
+
+================================================
+How to CTAS dynamically to partition table
+================================================
+
+If you want to load an entire country or an entire city in one fell swoop:
+
+.. code-block:: sql
+
+  DROP TABLE if exists student;
+
+  CREATE TABLE student (
+    id     INT,
+    name   TEXT,
+    grade  TEXT
+  ) PARTITION BY COLUMN (country TEXT, city TEXT)
+  AS SELECT id, name, grade, country, city
+  FROM   student_source;
+
+
+.. note::
+
+  When loading data into a partition, it’s necessary to include the partition columns as the last columns in the query. The column names in the source query don’t need to match the partition column names.
diff --git a/tajo-docs/src/main/sphinx/partitioning/alter_partition.rst b/tajo-docs/src/main/sphinx/partitioning/alter_partition.rst
@@ -0,0 +1,66 @@
+********************************
+Alter partition
+********************************
+
+You can ALTER TABLE to add or drop partitions for partition table.
+
+For example, assume there is a ``student`` table composed of the following schema.
+
+.. code-block:: sql
+
+  CREATE TABLE student (
+    id     INT,
+    name   TEXT,
+    grade  TEXT
+  ) PARTITION BY COLUMN (country TEXT, city TEXT);
+
+
+========================
+ADD PARTITION
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  ALTER TABLE <table_name> [IF NOT EXISTS] ADD PARTITION (<partition column> = <partition value>, ...) [LOCATION = <partition's path>]
+
+*Description*
+
+You can use ``ALTER TABLE ADD PARTITION`` to ADD PARTITIONs to a table. The location must be a directory inside of which data files reside. If the location doesn't exist on the file system, Tajo will make the location by force. ``ADD PARTITION`` changes the table metadata, but does not load data. If the data does not exist in the partition's location, queries will not return any results. An error is thrown if the partition for the table already exists. You can use ``IF NOT EXISTS`` to skip the error.
+
+*Examples*
+
+.. code-block:: sql
+
+  -- Each ADD PARTITION clause creates a subdirectory in HDFS.
+  ALTER TABLE student ADD PARTITION (country='KOREA', city='SEOUL');
+  ALTER TABLE student ADD PARTITION (country='KOREA', city='PUSAN');
+  ALTER TABLE student ADD PARTITION (country='USA', city='NEWYORK');
+  ALTER TABLE student ADD PARTITION (country='USA', city='BOSTON');
+  -- Redirect queries, INSERT, and LOAD DATA for one partition to a specific different directory.
+  ALTER TABLE student ADD PARTITION (country='USA', city='BOSTON') LOCATION '/usr/external_data/new_years_day';
+
+
+========================
+ DROP PARTITION
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  ALTER TABLE <table_name> [IF EXISTS] DROP PARTITION (<partition column> = <partition value>, ...) [PURGE]
+
+*Description*
+
+You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This doesn't remove the data for partition table. But if ``PURGE`` is specified, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error.
+
+*Examples*
+
+.. code-block:: sql
+
+  -- Delete just metadata
+  ALTER TABLE table1 DROP PARTITION (country = 'KOREA' , city = 'SEOUL');
+  -- Delete table data and metadata
+  ALTER TABLE table1 DROP PARTITION (country = 'USA', city = 'NEWYORK' ) PURGE
diff --git a/tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst b/tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst
diff --git a/tajo-docs/src/main/sphinx/partitioning/define_partition_table.rst b/tajo-docs/src/main/sphinx/partitioning/define_partition_table.rst
@@ -0,0 +1,73 @@
+*********************************
+Define Partition Table
+*********************************
+
+Tajo makes it easy to specify an automatic partition scheme when the table is created.
+
+================================================
+How to Create Partitione Table
+================================================
+
+You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
+the ``PARTITION BY COLUMN`` clause with partition keys.
+
+For example, assume there is a table ``student`` composed of the following schema.
+
+.. code-block:: sql
+
+  id     INT,
+  name   TEXT,
+  grade  TEXT
+
+Now you want to partition on country. Your Tajo definition would be this:
+
+.. code-block:: sql
+
+  CREATE TABLE student (
+    id     INT,
+    name   TEXT,
+    grade  TEXT
+  ) PARTITION BY COLUMN (country TEXT);
+
+Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values.
+Here's an example statement to create a table:
+
+.. code-block:: sql
+
+  CREATE TABLE student (
+    id     INT,
+    name   TEXT,
+    grade  TEXT
+  ) USING PARQUET
+  PARTITION BY COLUMN (country TEXT, city TEXT);
+
+The statement above creates the student table with id, name, grade. The table is also partitioned and data is stored in parquet files.
+
+You might have noticed that while the partitioning key columns are a part of the table DDL, they’re only listed in the ``PARTITION BY`` clause. In Tajo, as data is written to disk, each partition of data will be automatically split out into different folders, e.g. country=USA/city=NEWYORK. During a read operation, Tajo will use the folder structure to quickly locate the right partitions and also return the partitioning columns as columns in the result set.
+
+
+==================================================
+Partition Pruning on Partition Table
+==================================================
+
+The following predicates in the ``WHERE`` clause can be used to prune unqualified column partitions without processing
+during query planning phase.
+
+* ``=``
+* ``<>``
+* ``>``
+* ``<``
+* ``>=``
+* ``<=``
+* LIKE predicates with a leading wild-card character
+* IN list predicates
+
+Now above example table data is partitioned by country and city, so when the query is applied on table it can easily access the required row by the help partitions
+
+
+.. code-block:: sql
+
+  SELECT * FROM student WHERE country = 'KOREA' AND city = 'SEOUL';
+  SELECT * FROM student WHERE country = 'USA' AND (city = 'NEWYORK' OR city = 'BOSTON');
+  SELECT * FROM student WHERE country = 'USA' AND city <> 'NEWYORK';
+
diff --git a/tajo-docs/src/main/sphinx/partitioning/hash_partitioning.rst b/tajo-docs/src/main/sphinx/partitioning/hash_partitioning.rst
diff --git a/tajo-docs/src/main/sphinx/partitioning/hive_compatibility.rst b/tajo-docs/src/main/sphinx/partitioning/hive_compatibility.rst
@@ -0,0 +1,90 @@
+********************************
+Hive Compatibility
+********************************
+
+Tajo provides HiveCatalogStore to process the Hive partitioned tables directly. If you wish to use HiveCatalogStore, you should specify hive configurations to both tajo-env.sh file and catalog-site.xml file. Please see the following page.
+
+.. toctree::
+    :maxdepth: 1
+
+   /hive_integration
+
+================================================
+How to create partition table
+================================================
+
+If you want to create a partition table as follows in Tajo:
+
+.. code-block:: sql
+
+  default> CREATE TABLE student (
+    id     INT,
+    name   TEXT,
+    grade  TEXT
+  ) PARTITION BY COLUMN (country TEXT, city TEXT);
+
+
+And then you can get table information in Hive:
+
+.. code-block:: sql
+
+  hive> desc student;
+  OK
+  id                  	int
+  name                	string
+  grade               	string
+  country             	string
+  city                	string
+
+  # Partition Information
+  # col_name            	data_type           	comment
+
+  country             	string
+  city                	string
+
+
+Or as you create the table in Hive:
+
+.. code-block:: sql
+
+  hive > CREATE TABLE student (
+    id int,
+    name string,
+    grade string
+  ) PARTITIONED BY (country string, city string)
+  ROW FORMAT DELIMITED
+    FIELDS TERMINATED BY '|' ;
+
+You will see table information in Tajo:
+
+.. code-block:: sql
+
+  default> \d student;
+  table name: default.student
+  table uri: hdfs://your_hdfs_namespace/user/hive/warehouse/student
+  store type: TEXT
+  number of rows: 0
+  volume: 0 B
+  Options:
+    'text.null'='\\N'
+    'transient_lastDdlTime'='1438756422'
+    'text.delimiter'='|'
+
+  schema:
+  id	INT4
+  name	TEXT
+  grade	TEXT
+
+  Partitions:
+  type:COLUMN
+  columns::default.student.country (TEXT), default.student.city (TEXT)
+
+
+
+================================================
+How to add data to partition table
+================================================
+
+In Tajo, you can add data dynamically to partition table of Hive with both ``INSERT INTO ... SELECT`` and ``Create Table As Select (CTAS)`` statments. Tajo will automatically filter the data to HiveMetastore, create directories and move filtered data to appropriate directory on the distributed file system
+
+
diff --git a/tajo-docs/src/main/sphinx/partitioning/intro_to_partitioning.rst b/tajo-docs/src/main/sphinx/partitioning/intro_to_partitioning.rst
diff --git a/tajo-docs/src/main/sphinx/partitioning/range_partitioning.rst b/tajo-docs/src/main/sphinx/partitioning/range_partitioning.rst