Skip to content

Commit

Permalink
[Carbondata-3173] Add the hive/presto documents index to the root of …
Browse files Browse the repository at this point in the history
…file ReadMe.md

Add document, merge the presto documents into the docs folder and modify the related links
It helps user to find the relevant integration documents

This closes #3015
  • Loading branch information
BeyondYourself authored and xubo245 committed Dec 28, 2018
1 parent e8cf14a commit d85d543
Show file tree
Hide file tree
Showing 5 changed files with 157 additions and 140 deletions.
4 changes: 4 additions & 0 deletions README.md
Expand Up @@ -67,6 +67,10 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com
* [Carbon as Spark's Datasource](https://github.com/apache/carbondata/blob/master/docs/carbon-as-spark-datasource-guide.md)
* [FAQs](https://github.com/apache/carbondata/blob/master/docs/faq.md)

## Integration
* [Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md)
* [Presto](https://github.com/apache/carbondata/blob/master/docs/presto-guide.md)

## Other Technical Material
* [Apache CarbonData meetup material](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609)
* [Use Case Articles](https://cwiki.apache.org/confluence/display/CARBONDATA/CarbonData+Articles)
Expand Down
2 changes: 1 addition & 1 deletion docs/documentation.md
Expand Up @@ -37,7 +37,7 @@ Apache CarbonData is a new big data file format for faster interactive query usi

## Integration

CarbonData can be integrated with popular Execution engines like [Spark](./quick-start-guide.md#spark) and [Presto](./quick-start-guide.md#presto).Refer to the [Installation and Configuration](./quick-start-guide.md#integration) section to understand all modes of Integrating CarbonData.
CarbonData can be integrated with popular Execution engines like [Spark](./quick-start-guide.md#spark) , [Presto](./quick-start-guide.md#presto) and [Hive](./quick-start-guide.md#hive).Refer to the [Installation and Configuration](./quick-start-guide.md#integration) section to understand all modes of Integrating CarbonData.



Expand Down
150 changes: 147 additions & 3 deletions integration/presto/README.md → docs/presto-guide.md
Expand Up @@ -15,7 +15,153 @@
limitations under the License.
-->

Please follow the below steps to query carbondata in presto

# Presto guide
This tutorial provides a quick introduction to using current integration/presto module.


[Presto Multinode Cluster Setup for Carbondata](#presto-multinode-cluster-setup-for-carbondata)

[Presto Single Node Setup for Carbondata](#presto-single-node-setup-for-carbondata)

## Presto Multinode Cluster Setup for Carbondata
### Installing Presto

1. Download the 0.210 version of Presto using:
`wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.210/presto-server-0.210.tar.gz`

2. Extract Presto tar file: `tar zxvf presto-server-0.210.tar.gz`.

3. Download the Presto CLI for the coordinator and name it presto.

```
wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.210/presto-cli-0.210-executable.jar
mv presto-cli-0.210-executable.jar presto
chmod +x presto
```

### Create Configuration Files

1. Create `etc` folder in presto-server-0.210 directory.
2. Create `config.properties`, `jvm.config`, `log.properties`, and `node.properties` files.
3. Install uuid to generate a node.id.

```
sudo apt-get install uuid
uuid
```


##### Contents of your node.properties file

```
node.environment=production
node.id=<generated uuid>
node.data-dir=/home/ubuntu/data
```

##### Contents of your jvm.config file

```
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
```

##### Contents of your log.properties file
```
com.facebook.presto=INFO
```

The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`.

## Coordinator Configurations

##### Contents of your config.properties
```
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8086
query.max-memory=5GB
query.max-total-memory-per-node=5GB
query.max-memory-per-node=3GB
memory.heap-headroom-per-node=1GB
discovery-server.enabled=true
discovery.uri=<coordinator_ip>:8086
```
The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers.

**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`.

Also relation between below two configuration-properties should be like:
If, `query.max-memory-per-node=30GB`
Then, `query.max-memory=<30GB * number of nodes>`.

### Worker Configurations

##### Contents of your config.properties

```
coordinator=false
http-server.http.port=8086
query.max-memory=5GB
query.max-memory-per-node=2GB
discovery.uri=<coordinator_ip>:8086
```

**Note**: `jvm.config` and `node.properties` files are same for all the nodes (worker + coordinator). All the nodes should have different `node.id`.

### Catalog Configurations

1. Create a folder named `catalog` in etc directory of presto on all the nodes of the cluster including the coordinator.

##### Configuring Carbondata in Presto
1. Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes.

### Add Plugins

1. Create a directory named `carbondata` in plugin directory of presto.
2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.

### Start Presto Server on all nodes

```
./presto-server-0.210/bin/launcher start
```
To run it as a background process.

```
./presto-server-0.210/bin/launcher run
```
To run it in foreground.

### Start Presto CLI
```
./presto
```
To connect to carbondata catalog use the following command:

```
./presto --server <coordinator_ip>:8086 --catalog carbondata --schema <schema_name>
```
Execute the following command to ensure the workers are connected.

```
select * from system.runtime.nodes;
```
Now you can use the Presto CLI on the coordinator to query data sources in the catalog using the Presto workers.



## Presto Single Node Setup for Carbondata

### Config presto server
* Download presto server (0.210 is suggested and supported) : https://repo1.maven.org/maven2/com/facebook/presto/presto-server/
Expand Down Expand Up @@ -144,5 +290,3 @@ carbondata files.
```
Replace the hostname, port and schema name with your own.



6 changes: 5 additions & 1 deletion docs/quick-start-guide.md
Expand Up @@ -35,7 +35,7 @@ This tutorial provides a quick introduction to using CarbonData. To follow along

## Integration

CarbonData can be integrated with Spark and Presto Execution Engines. The below documentation guides on Installing and Configuring with these execution engines.
CarbonData can be integrated with Spark,Presto and Hive Execution Engines. The below documentation guides on Installing and Configuring with these execution engines.

### Spark

Expand All @@ -51,6 +51,9 @@ CarbonData can be integrated with Spark and Presto Execution Engines. The below
### Presto
[Installing and Configuring CarbonData on Presto](#installing-and-configuring-carbondata-on-presto)

### Hive
[Installing and Configuring CarbonData on Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md)



## Installing and Configuring CarbonData to run locally with Spark Shell
Expand Down Expand Up @@ -473,3 +476,4 @@ select * from carbon_table;

**Note :** Create Tables and data loads should be done before executing queries as we can not create carbon table from this interface.

```
135 changes: 0 additions & 135 deletions integration/presto/Presto_Cluster_Setup_For_Carbondata.md

This file was deleted.

0 comments on commit d85d543

Please sign in to comment.