Switch branches/tags
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This component of Prestoclient is deprecated and no longer maintained!

Please use the DBI client: https://github.com/prestodb/RPresto

PrestoClient for R

PrestoClient implements a method to communicate with a Presto server. Presto (http://prestodb.io/) is a fast query engine developed by Facebook that runs distributed queries against a (cluster of) Hadoop HDFS servers (http://hadoop.apache.org/).\cr
Presto uses SQL as its query language. Presto is an alternative for Hadoop-Hive.\cr
PrestoClient was developed and tested on Presto 0.54. R version used: 3.0.2, RCurl: 1.95-4.1, jsonlite: 0.9.1

You can use this class with this sample code:

	sql <- "SHOW TABLES"

	# Replace localhost with ip address or dns name of the Presto server running the discovery service
	pc <- PrestoClient("localhost")

	if (!pc$startquery(sql) ) {
	  print(pc$getlasterrormessage() )
	} else {
	  pc$waituntilfinished(TRUE) # Remove True parameter to skip printing status messages
	  if (pc$getstatus() == "FAILED") {
	    print(pc$getlasterrormessage() )
	  # We're done now, so let's show the results

Presto client protocol
The communication protocol used between Presto clients and servers is not documented yet. It seems to
be as follows:

Client sends http POST request to the Presto server, page: "/v1/statement". Header information should
include: X-Presto-Catalog, X-Presto-Source, X-Presto-Schema, User-Agent, X-Presto-User. The body of the
request should contain the sql statement. The server responds by returning JSON data (http status-code 200).
This reply may contain up to 3 uri's. One giving the link to get more information about the query execution
('infoUri'), another giving the link to fetch the next packet of data ('nextUri') and one with the uri to
cancel the query ('partialCancelUri').

The client should send GET requests to the server (Header: X-Presto-Source, User-Agent, X-Presto-User.
Body: empty) following the 'nextUri' link from the previous response until the servers response does not
contain an 'nextUri' link anymore. When there is no 'nextUri' the query is finished. If the last response
from the server included an error section ('error') the query failed, otherwise the query succeeded. If
the http status of the server response is anything other than 200 with Content-Type application/json, the
query should also be considered failed. A 503 http response means that the server is (too) busy. Retry the
request after waiting at least 50ms.
The server response may contain a 'state' variable. This is for informational purposes only (may be subject
to change in future implementations).
Each response by the server to a 'nextUri' may contain information about the columns returned by the query
and all- or part of the querydata. If the response contains a data section the columns section will always
be available.

The server reponse may contain a variable with the uri to cancel the query ('partialCancelUri'). The client
may issue a DELETE request to the server using this link. Response http status-code is 204.

The Presto server will retain information about finished queries for 15 minutes. When a client does not
respond to the server (by following the 'nextUri' links) the server will cancel these 'dead' queries after
5 minutes. These timeouts are hardcoded in the Presto server source code.

- The R version of PrestoClient is quite slow, due to the cbind operations. May be useful to switch to the
  C version in the future.
- Enable PrestoClient to handle multiple running queries simultaneously. Currently you can only run one
  query per instance of this class.
- Add support for https connections
- Add support for insert/update queries (if and when Presto server supports this).

Source code is available through: https://github.com/easydatawarehousing/prestoclient

Additional information may be found here: http://www.easydatawarehousing.com/tag/presto/

Copyright 2013 Ivo Herweijer | easydatawarehousing.com

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at:


Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
See the License for the specific language governing permissions and
limitations under the License.