-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

<i18n value="931bf77d-810b-4930-b45c-b00c184029a0"/>


# SQL UDFs and Control Flow

Databricks added support for User Defined Functions (UDFs) registered natively in SQL starting in DBR 9.1.

This feature allows users to register custom combinations of SQL logic as functions in a database, making these methods reusable anywhere SQL can be run on Databricks. These functions leverage Spark SQL directly, maintaining all of the optimizations of Spark when applying your custom logic to large datasets.

In this notebook, we'll first have a simple introduction to these methods, and then explore how this logic can be combined with **`CASE`** / **`WHEN`** clauses to provide reusable custom control flow logic.

## Learning Objectives
By the end of this lesson, you should be able to:
* Define and registering SQL UDFs
* Describe the security model used for sharing SQL UDFs
* Use **`CASE`** / **`WHEN`** statements in SQL code
* Leverage **`CASE`** / **`WHEN`** statements in SQL UDFs for custom control flow

<i18n value="df80ac46-fb12-44ed-bb37-dcc5a4d73d4a"/>


## Setup
Run the following cell to setup your environment.

In [0]:
%run ../Includes/Classroom-Setup-04.8

Python interpreter will be restarted.
Python interpreter will be restarted.



Skipping install of existing datasets to "dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02"

Validating the locally installed datasets:
| listing local files...(7 seconds)
| completed (7 seconds total)

Creating & using the schema "munirsheikhcloudseekho_0lj9_da_dewd"...(0 seconds)
Predefined tables in "munirsheikhcloudseekho_0lj9_da_dewd":
| -none-

Predefined paths variables:
| DA.paths.working_dir: dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks
| DA.paths.user_db:     dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/database.db
| DA.paths.datasets:    dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02
| DA.paths.checkpoints: dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks/_checkpoints

Setup completed (9 seconds)


<i18n value="f4fec594-3cd7-43c9-b88e-3ccd3a99c6be"/>


## Create a Simple Dataset

For this notebook, we'll consider the following dataset, registered here as a temporary view.

In [0]:
%sql
CREATE OR REPLACE TEMPORARY VIEW foods(food) AS VALUES
("beef"),
("beans"),
("potatoes"),
("bread");

SELECT * FROM foods

food
beef
beans
potatoes
bread


<i18n value="65577a77-c917-441c-895b-8ba146c837ff"/>


## SQL UDFs
At minimum, a SQL UDF requires a function name, optional parameters, the type to be returned, and some custom logic.

Below, a simple function named **`yelling`** takes one parameter named **`text`**. It returns a string that will be in all uppercase letters with three exclamation points added to the end.

In [0]:
%sql
CREATE OR REPLACE FUNCTION yelling(text STRING)
RETURNS STRING
RETURN concat(upper(text), "!!!")

<i18n value="4cffc92d-3133-45ba-97c8-b0bc4c9e419b"/>


Note that this function is applied to all values of the column in a parallel fashion within the Spark processing engine. SQL UDFs are an efficient way to define custom logic that is optimized for execution on Databricks.

In [0]:
%sql
SELECT yelling(food) FROM foods

spark_catalog.munirsheikhcloudseekho_0lj9_da_dewd.yelling(food)
BEEF!!!
BEANS!!!
POTATOES!!!
BREAD!!!


<i18n value="e1749d08-2186-4e1c-9214-18c8199388af"/>


## Scoping and Permissions of SQL UDFs

Note that SQL UDFs will persist between execution environments (which can include notebooks, DBSQL queries, and jobs).

We can describe the function to see where it was registered and basic information about expected inputs and what is returned.

In [0]:
%sql
DESCRIBE FUNCTION yelling

function_desc
Function: spark_catalog.munirsheikhcloudseekho_0lj9_da_dewd.yelling
Type: SCALAR
Input: text STRING
Returns: STRING


<i18n value="6a6eb6c6-ffc8-49d9-a39a-a5e1f6c230af"/>


By describing extended, we can get even more information. 

Note that the **`Body`** field at the bottom of the function description shows the SQL logic used in the function itself.

In [0]:
%sql
DESCRIBE FUNCTION EXTENDED yelling

function_desc
Function: spark_catalog.munirsheikhcloudseekho_0lj9_da_dewd.yelling
Type: SCALAR
Input: text STRING
Returns: STRING
Deterministic: true
Data Access: CONTAINS SQL
Configs: spark.sql.hive.convertCTAS=true
spark.sql.legacy.createHiveTableByDefault=false
spark.sql.parquet.compression.codec=snappy
spark.sql.sources.commitProtocolClass=com.databricks.sql.transaction.directory.DirectoryAtomicCommitProtocol


<i18n value="a31a4ad1-5608-4bfb-aae4-a411fe460385"/>


SQL UDFs exist as objects in the metastore and are governed by the same Table ACLs as databases, tables, or views.

In order to use a SQL UDF, a user must have **`USAGE`** and **`SELECT`** permissions on the function.

<i18n value="155c70b7-ed5e-47d2-9832-963aa18f3869"/>


## CASE/WHEN

The standard SQL syntactic construct **`CASE`** / **`WHEN`** allows the evaluation of multiple conditional statements with alternative outcomes based on table contents.

Again, everything is evaluated natively in Spark, and so is optimized for parallel execution.

In [0]:
%sql
SELECT *,
  CASE 
    WHEN food = "beans" THEN "I love beans"
    WHEN food = "potatoes" THEN "My favorite vegetable is potatoes"
    WHEN food <> "beef" THEN concat("Do you have any good recipes for ", food ,"?")
    ELSE concat("I don't eat ", food)
  END
FROM foods

food,"CASE WHEN (food = beans) THEN I love beans WHEN (food = potatoes) THEN My favorite vegetable is potatoes WHEN (NOT (food = beef)) THEN concat(Do you have any good recipes for , food, ?) ELSE concat(I don't eat , food) END"
beef,I don't eat beef
beans,I love beans
potatoes,My favorite vegetable is potatoes
bread,Do you have any good recipes for bread?


<i18n value="50bc0847-94d2-4167-befe-66e42b287ad0"/>


## Simple Control Flow Functions

Combining SQL UDFs with control flow in the form of **`CASE`** / **`WHEN`** clauses provides optimized execution for control flows within SQL workloads.

Here, we demonstrate wrapping the previous logic in a function that will be reusable anywhere we can execute SQL.

In [0]:
%sql
CREATE FUNCTION foods_i_like(food STRING)
RETURNS STRING
RETURN CASE 
  WHEN food = "beans" THEN "I love beans"
  WHEN food = "potatoes" THEN "My favorite vegetable is potatoes"
  WHEN food <> "beef" THEN concat("Do you have any good recipes for ", food ,"?")
  ELSE concat("I don't eat ", food)
END;

<i18n value="05cb00cc-097c-4607-8738-ab4353536dda"/>


Using this method on our data provides the desired outcome.

In [0]:
%sql
SELECT foods_i_like(food) FROM foods

spark_catalog.munirsheikhcloudseekho_0lj9_da_dewd.foods_i_like(food)
I don't eat beef
I love beans
My favorite vegetable is potatoes
Do you have any good recipes for bread?


<i18n value="24ee3267-9ddb-4cf5-9081-273502f5252a"/>


While the example provided here are simple string methods, these same basic principles can be used to add custom computations and logic for native execution in Spark SQL. 

Especially for enterprises that might be migrating users from systems with many defined procedures or custom-defined formulas, SQL UDFs can allow a handful of users to define the complex logic needed for common reporting and analytic queries.

<i18n value="9405ddea-5fb0-4168-9fd2-2b462d5809d9"/>

 
Run the following cell to delete the tables and files associated with this lesson.

In [0]:
%python
DA.cleanup()

Resetting the learning environment:
| dropping the schema "munirsheikhcloudseekho_0lj9_da_dewd"...(0 seconds)
| removing the working directory "dbfs:/mnt/dbacademy-users/munirsheikhcloudseekho@gmail.com/data-engineering-with-databricks"...(0 seconds)

Validating the locally installed datasets:
| listing local files...(8 seconds)
| completed (8 seconds total)



-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>