# dask-sql Introduction

`dask-sql` lets you query your (dask) data using usual SQL language.
You can find more information on the usage in the [documentation](https://dask-sql.readthedocs.io/).

In [None]:
from dask_sql import Context
from dask.datasets import timeseries
from dask.distributed import Client

As a first step, we will create a dask client to connect to a local dask cluster (which is started implicitly).
You can open the dashboard by clicking on the shown link (in binder, this is already open on the left).

In [None]:
client = Client()
client

Next, we create a context to hold the registered tables.
You typically only do this once in your application.

In [None]:
c = Context()

Load the data and register it in the context. This will give the table a name.
In this example, we generate random data.
It is also possible to load data from file, S3, hdfs etc.
Have a look into [Data Loading](https://dask-sql.readthedocs.io/en/latest/pages/data_input.html) for more information.

In [None]:
df = timeseries()
c.create_table("timeseries", df)

Now execute an SQL query. 
The result is a dask dataframe.

The query looks for the id with the highest x for each name (this is just random test data, but you could think of looking for outliers in the sensor data).

In [None]:
result = c.sql("""
    SELECT
        lhs.name,
        lhs.id,
        lhs.x
    FROM
        timeseries AS lhs
    JOIN
        (
            SELECT
                name AS max_name,
                MAX(x) AS max_x
            FROM timeseries
            GROUP BY name
        ) AS rhs
    ON
        lhs.name = rhs.max_name AND
        lhs.x = rhs.max_x
""")

Now we can show the result:

In [None]:
result.compute()

... or use it for any other dask calculation

In [None]:
result.x.mean().compute()