[Carbondata-3173] Add the hive/presto documents index to the root of …

…file ReadMe.md Add document, merge the presto documents into the docs folder and modify the related links It helps user to find the relevant integration documents This closes #3015
apache · Dec 28, 2018 · d85d543 · d85d543
1 parent e8cf14a
commit d85d543
Show file tree

Hide file tree

Showing 5 changed files with 157 additions and 140 deletions.
diff --git a/README.md b/README.md
@@ -67,6 +67,10 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com
 * [Carbon as Spark's Datasource](https://github.com/apache/carbondata/blob/master/docs/carbon-as-spark-datasource-guide.md) 
 * [FAQs](https://github.com/apache/carbondata/blob/master/docs/faq.md) 
 
+##  Integration
+* [Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md)
+* [Presto](https://github.com/apache/carbondata/blob/master/docs/presto-guide.md)
+
 ## Other Technical Material
 * [Apache CarbonData meetup material](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609)
 * [Use Case Articles](https://cwiki.apache.org/confluence/display/CARBONDATA/CarbonData+Articles)

diff --git a/docs/documentation.md b/docs/documentation.md
@@ -37,7 +37,7 @@ Apache CarbonData is a new big data file format for faster interactive query usi
 
 ## Integration
 
-CarbonData can be integrated with popular Execution engines like [Spark](./quick-start-guide.md#spark) and [Presto](./quick-start-guide.md#presto).Refer to the [Installation and Configuration](./quick-start-guide.md#integration) section to understand all modes of Integrating CarbonData.
+CarbonData can be integrated with popular Execution engines like [Spark](./quick-start-guide.md#spark) , [Presto](./quick-start-guide.md#presto) and [Hive](./quick-start-guide.md#hive).Refer to the [Installation and Configuration](./quick-start-guide.md#integration) section to understand all modes of Integrating CarbonData.
 
 
 

diff --git a/integration/presto/README.md → docs/presto-guide.md b/integration/presto/README.md → docs/presto-guide.md
@@ -15,7 +15,153 @@
     limitations under the License.
 -->
 
-Please follow the below steps to query carbondata in presto
+
+# Presto guide
+This tutorial provides a quick introduction to using current integration/presto module.
+
+
+[Presto Multinode Cluster Setup for Carbondata](#presto-multinode-cluster-setup-for-carbondata)
+
+[Presto Single Node Setup for Carbondata](#presto-single-node-setup-for-carbondata)
+
+## Presto Multinode Cluster Setup for Carbondata
+### Installing Presto
+
+  1. Download the 0.210 version of Presto using:
+  `wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.210/presto-server-0.210.tar.gz`
+
+  2. Extract Presto tar file: `tar zxvf presto-server-0.210.tar.gz`.
+
+  3. Download the Presto CLI for the coordinator and name it presto.
+
+  ```
+    wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.210/presto-cli-0.210-executable.jar
+
+    mv presto-cli-0.210-executable.jar presto
+
+    chmod +x presto
+  ```
+
+ ### Create Configuration Files
+
+  1. Create `etc` folder in presto-server-0.210 directory.
+  2. Create `config.properties`, `jvm.config`, `log.properties`, and `node.properties` files.
+  3. Install uuid to generate a node.id.
+
+      ```
+      sudo apt-get install uuid
+
+      uuid
+      ```
+
+
+##### Contents of your node.properties file
+
+  ```
+  node.environment=production
+  node.id=<generated uuid>
+  node.data-dir=/home/ubuntu/data
+  ```
+
+##### Contents of your jvm.config file
+
+  ```
+  -server
+  -Xmx16G
+  -XX:+UseG1GC
+  -XX:G1HeapRegionSize=32M
+  -XX:+UseGCOverheadLimit
+  -XX:+ExplicitGCInvokesConcurrent
+  -XX:+HeapDumpOnOutOfMemoryError
+  -XX:OnOutOfMemoryError=kill -9 %p
+  ```
+
+##### Contents of your log.properties file
+  ```
+  com.facebook.presto=INFO
+  ```
+
+ The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`.
+
+## Coordinator Configurations
+
+  ##### Contents of your config.properties
+  ```
+  coordinator=true
+  node-scheduler.include-coordinator=false
+  http-server.http.port=8086
+  query.max-memory=5GB
+  query.max-total-memory-per-node=5GB
+  query.max-memory-per-node=3GB
+  memory.heap-headroom-per-node=1GB
+  discovery-server.enabled=true
+  discovery.uri=<coordinator_ip>:8086
+  ```
+The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers.
+
+**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`.
+
+Also relation between below two configuration-properties should be like:
+If, `query.max-memory-per-node=30GB`
+Then, `query.max-memory=<30GB * number of nodes>`.
+
+### Worker Configurations
+
+##### Contents of your config.properties
+
+  ```
+  coordinator=false
+  http-server.http.port=8086
+  query.max-memory=5GB
+  query.max-memory-per-node=2GB
+  discovery.uri=<coordinator_ip>:8086
+  ```
+
+**Note**: `jvm.config` and `node.properties` files are same for all the nodes (worker + coordinator). All the nodes should have different `node.id`.
+
+### Catalog Configurations
+
+1. Create a folder named `catalog` in etc directory of presto on all the nodes of the cluster including the coordinator.
+
+##### Configuring Carbondata in Presto
+1. Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes.
+
+### Add Plugins
+
+1. Create a directory named `carbondata` in plugin directory of presto.
+2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+
+### Start Presto Server on all nodes
+
+```
+./presto-server-0.210/bin/launcher start
+```
+To run it as a background process.
+
+```
+./presto-server-0.210/bin/launcher run
+```
+To run it in foreground.
+
+### Start Presto CLI
+```
+./presto
+```
+To connect to carbondata catalog use the following command:
+
+```
+./presto --server <coordinator_ip>:8086 --catalog carbondata --schema <schema_name>
+```
+Execute the following command to ensure the workers are connected.
+
+```
+select * from system.runtime.nodes;
+```
+Now you can use the Presto CLI on the coordinator to query data sources in the catalog using the Presto workers.
+
+
+
+## Presto Single Node Setup for Carbondata
 
 ### Config presto server
 * Download presto server (0.210 is suggested and supported) : https://repo1.maven.org/maven2/com/facebook/presto/presto-server/
@@ -144,5 +290,3 @@ carbondata files.
   ```
   Replace the hostname, port and schema name with your own.
 
-
-
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
@@ -35,7 +35,7 @@ This tutorial provides a quick introduction to using CarbonData. To follow along
 
 ## Integration
 
-CarbonData can be integrated with Spark and Presto Execution Engines. The below documentation guides on Installing and Configuring with these execution engines.
+CarbonData can be integrated with Spark,Presto and Hive Execution Engines. The below documentation guides on Installing and Configuring with these execution engines.
 
 ### Spark
 
@@ -51,6 +51,9 @@ CarbonData can be integrated with Spark and Presto Execution Engines. The below
 ### Presto
 [Installing and Configuring CarbonData on Presto](#installing-and-configuring-carbondata-on-presto)
 
+### Hive
+[Installing and Configuring CarbonData on Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md)
+
 
 
 ## Installing and Configuring CarbonData to run locally with Spark Shell
@@ -473,3 +476,4 @@ select * from carbon_table;
 
 **Note :** Create Tables and data loads should be done before executing queries as we can not create carbon table from this interface.
 
+```
diff --git a/integration/presto/Presto_Cluster_Setup_For_Carbondata.md b/integration/presto/Presto_Cluster_Setup_For_Carbondata.md