## Connecting to Presto

The three mandatory arguments to create a connection are host, port, and user. Other arguments such as source allow to identify the origin of the query. A common use case is to use it to tell which service, tool, or code sent the query.

Let's create a connection:

In [1]:
import prestodb.dbapi as presto

conn = presto.Connection(host="presto", port=8080, user="demo")
cur = conn.cursor()
cur

<prestodb.dbapi.Cursor at 0x7f1944401460>

## Create a View/Virtual Dataset

Presto supports the creation of views. The descripion of views are persisted in the Hive metastore.

The employees database stores employee salary and the dates for which they help that salary.

Below we create a view to quickly show what each employees current salary is. Note we assume that max salary for an employee is their current salary.


In [4]:
cur.execute("CREATE OR REPLACE VIEW hive.default.current_salaries AS SELECT emp_no, MAX(salary) AS salary FROM mysql.employees.salaries GROUP BY emp_no")
cur.fetchall()

[[True]]

## Querying a View

We can now use Presto to query the view we have created. Note that the view does not exist in the underlying data store (in this case MySQL), only the description of the view exists within Hive. Presto does the rest.


In [6]:
cur.execute("SELECT * FROM hive.default.current_salaries LIMIT 5")
rows = cur.fetchall()

import pandas as pd
from IPython.display import display

df = pd.DataFrame(rows)
display(df)

Unnamed: 0,0,1
0,10055,90843
1,10082,48935
2,10085,60910
3,10088,98003
4,10128,67619
