This initialization action installs Starburst Presto 312-e (https://www.starburstdata.com) on a Google Cloud Dataproc cluster. Additionally, this script will configure Presto to work with Hive on the cluster. The master Cloud Dataproc node will be the coordinator and all Cloud Dataproc workers will be Presto workers.
Using this initialization action
You can use this initialization action to create a new Dataproc cluster with Presto installed:
gcloudcommand to create a new cluster with this initialization action. The following command will create a new cluster named
gcloud dataproc clusters create <CLUSTER_NAME> \ --initialization-actions gs://$MY_BUCKET/starburst-presto/presto.sh
Once the cluster has been created, Presto is configured to run on port
8080(though you can change this in the script) on the master node in a Cloud Dataproc cluster. To connect to the Presto web interface, you will need to create an SSH tunnel and use a SOCKS 5 Proxy as described in the dataproc web interfaces documentation. You can also use the Presto command line interface using the
prestocommand on the master node.
You can find more information about using initialization actions with Dataproc in the Dataproc documentation.
- This script must be updated based on which Presto version you wish to install
- You may need to adjust the memory settings in
jvm.configbased on your needs
- Presto is set to use HTTP port
- Only the Hive connector is configured by default
- High-Availability configuration is discouraged as coordinator is started only on
m-0and other master nodes are idle