# Custom Functions

Apart from the SQL functions that are already implemented in `dask-sql`, it is possible to add custom functions and aggregations.
Have a look into [the documentation](https://dask-sql.readthedocs.io/en/latest/pages/custom.html) for more information.

In [None]:
import numpy as np
import dask.dataframe as dd
import dask.datasets
from dask_sql.context import Context

We use some generated test data for the notebook:

In [None]:
c = Context()
# Allows us to use the %%sql magic function
c.ipython_magic()

df = dask.datasets.timeseries().reset_index().persist()
c.create_table("timeseries", df)

As a first step, we will create a scalar function to calculate the absolute value of a column.
(Please note that this can also be done via the `ABS` function in SQL):

In [None]:
# The input to the function will be a dask series
def my_abs(x):
    return x.abs()

# As SQL is a typed language, we need to specify all types 
c.register_function(my_abs, "MY_ABS", parameters=[("x", np.float64)], return_type=np.float64)

We are now able to use our new function in all queries

In [None]:
%%sql
    SELECT
        x, y, MY_ABS(x) AS "abs_x", MY_ABS(y) AS "abs_y"
    FROM
        "timeseries"
    WHERE
        MY_ABS(x * y) > 0.5

Next, we will register an aggregation, which gets a column as input and returns a single value.
An aggregation needs to be an instance of `dask.Aggregation` (see the [dask docu](https://docs.dask.org/en/latest/dataframe-groupby.html#aggregate)).

In [None]:
my_sum = dd.Aggregation("MY_SUM", lambda x: x.sum(), lambda x: x.sum())

c.register_aggregation(my_sum, "MY_SUM", [("x", np.float64)], np.float64)

In [None]:
%%sql
    SELECT
        name, MY_SUM(x) AS "my_sum"
    FROM
        "timeseries"
    GROUP BY
        name
    LIMIT 10