-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

<i18n value="54f96b76-aa49-4769-b067-4f5232dc4b58"/>


# Just Enough Python for Databricks SQL

While Databricks SQL provides an ANSI-compliant flavor of SQL with many additional custom methods (including the entire Delta Lake SQL syntax), users migrating from some systems may run into missing features, especially around control flow and error handling.

Databricks notebooks allow users to write SQL and Python and execute logic cell-by-cell. PySpark has extensive support for executing SQL queries, and can easily exchange data with tables and temporary views.

Mastering just a handful of Python concepts will unlock powerful new design practices for engineers and analysts proficient in SQL. Rather than trying to teach the entire language, this lesson focuses on those features that can immediately be leveraged to write more extensible SQL programs on Databricks.

## Learning Objectives
By the end of this lesson, you should be able to:
* Print and manipulate multi-line Python strings
* Define variables and functions
* Use f-strings for variable substitution

<i18n value="ae07e616-51d4-4ccb-ae28-6c22b7a203e6"/>


## Strings
Characters enclosed in single (**`'`**) or double (**`"`**) quotes are considered strings.

In [0]:
"This is a string"

Out[1]: 'This is a string'

<i18n value="f6b947f5-3dbd-4def-ab31-fa7f3692145f"/>


To preview how a string will render, we can call **`print()`**.

In [0]:
print("This is a string")

This is a string


<i18n value="5d9d751f-e0c3-434a-9bf4-5b8741131a2b"/>


By wrapping a string in triple quotes (**`"""`**), it's possible to use multiple lines.

In [0]:
print("""
This 
is 
a 
multi-line 
string
""")


This 
is 
a 
multi-line 
string



<i18n value="4f3b441e-39a7-42b8-982c-3d27e1a4f5d5"/>


This makes it easy to turn SQL queries into Python strings.

In [0]:
print("""
SELECT *
FROM test_table
""")


SELECT *
FROM test_table



<i18n value="825c71d8-26ee-4c90-910c-64b0a3d6600a"/>


When we execute SQL from a Python cell, we will pass a string as an argument to **`spark.sql()`**.

In [0]:
spark.sql("SELECT 1 AS test")

Out[5]: DataFrame[test: int]

<i18n value="86cfbbf7-5e50-4d93-99bd-794b790be9d3"/>


To render a query the way it would appear in a normal SQL notebook, we call **`display()`** on this function.

In [0]:
display(spark.sql("SELECT 1 AS test"))

test
1


<i18n value="6af577e2-3621-4871-b12b-25716defbaba"/>


**NOTE**: Executing a cell with only a Python string in it will just print the string. Using **`print()`** with a string just renders it back to the notebook.

To execute a string that contains SQL using Python, it must be passed within a call to **`spark.sql()`**.

<i18n value="6182d5b6-59e8-401f-acb1-d4d57c5b6018"/>


## Variables
Python variables are assigned using the **`=`**.

Python variable names need to start with a letter, and can only contain letters, numbers, and underscores. (Variable names starting with underscores are valid but typically reserved for special use cases.)

Many Python programmers favor snake casing, which uses only lowercase letters and underscores for all variables.

The cell below creates the variable **`my_string`**.

In [0]:
my_string = "This is a string"

<i18n value="a0d528ec-e75e-4978-ad2e-c8bd2f8d2454"/>


Executing a cell with this variable will return its value.

In [0]:
my_string

Out[8]: 'This is a string'

<i18n value="e579ffd9-9053-4c28-bd69-e0ccb66f2225"/>


The output here is the same as if we typed **`"This is a string"`** into the cell and ran it.

Note that the quotation marks aren't part of the string, as shown when we print it.

In [0]:
print(my_string)

This is a string


<i18n value="cc95571a-ebbd-4d64-8d6a-8ef34db4161b"/>


This variable can be used the same way a string would be.

String concatenation (joining to strings together) can be performed with a **`+`**.

In [0]:
print("This is a new string and " + my_string)

This is a new string and This is a string


<i18n value="8c371bcd-a3a2-466e-aef8-b76dbb61c3cd"/>


We can join string variables with other string variables.

In [0]:
new_string = "This is a new string and "
print(new_string + my_string)

This is a new string and This is a string


<i18n value="0a43629e-d3a8-4b47-97ca-f1ac4087ca4e"/>


## Functions
Functions allow you to specify local variables as arguments and then apply custom logic. We define a function using the keyword **`def`** followed by the function name and, enclosed in parentheses, any variable arguments we wish to pass into the function. Finally, the function header has a **`:`** at the end.

Note: In Python, indentation matters. You can see in the cell below that the logic of the function is indented in from the left margin. Any code that is indented to this level is part of the function.

The function below takes one argument (**`arg`**) and then prints it.

In [0]:
def print_string(arg):
    print(arg)

<i18n value="7cc9b4bf-3bd9-4bca-aed9-f2edd1a8dbcf"/>


When we pass a string as the argument, it will be printed.

In [0]:
print_string("foo")

foo


<i18n value="8be00c15-f836-440d-9c78-df661f63f5db"/>


We can also pass a variable as an argument.

In [0]:
print_string(my_string)

This is a string


<i18n value="e32021af-f1ad-4372-a25e-0e0bcad7c058"/>


Oftentimes we want to return the results of our function for use elsewhere. For this we use the **`return`** keyword.

The function below constructs a new string by concatenating our argument. Note that both functions and arguments can have arbitrary names, just like variables (and follow the same rules).

In [0]:
def return_new_string(string_arg):
    return "The string passed to this function was " + string_arg

<i18n value="91f4a773-4f34-4775-b279-1b34cc0ed5d2"/>


Running this function returns the output.

In [0]:
return_new_string("foobar")

Out[16]: 'The string passed to this function was foobar'

<i18n value="75eb96a6-81eb-4eee-b02f-f2f5b75a52f9"/>


Assigning it to a variable captures the output for reuse elsewhere.

In [0]:
function_output = return_new_string("foobar")

<i18n value="dee0b420-4c7d-4144-8d76-144be3600c08"/>


This variable doesn't contain our function, just the results of our function (a string).

In [0]:
function_output

Out[18]: 'The string passed to this function was foobar'

<i18n value="e74d9e23-a516-405a-94bb-b6c92bfdfc37"/>


## F-strings
By adding the letter **`f`** before a Python string, you can inject variables or evaluated Python code by inserting them inside curly braces (**`{}`**).

Evaluate the cell below to see string variable substitution.

In [0]:
f"I can substitute {my_string} here"

Out[19]: 'I can substitute This is a string here'

<i18n value="da4b6564-4016-4c2a-99a7-117d3a8a5876"/>


The following cell inserts the string returned by a function.

In [0]:
f"I can substitute functions like {return_new_string('foobar')} here"

Out[20]: 'I can substitute functions like The string passed to this function was foobar here'

<i18n value="1d51c2c3-518c-4c51-a48d-09af1113cbe9"/>


Combine this with triple quotes and you can format a paragraph or list, like below.

In [0]:
multi_line_string = f"""
I can have many lines of text with variable substitution:
  - A variable: {my_string}
  - A function output: {return_new_string('foobar')}
"""

print(multi_line_string)


I can have many lines of text with variable substitution:
  - A variable: This is a string
  - A function output: The string passed to this function was foobar



<i18n value="5c613732-30bc-4ca8-918d-fed64f39bb5c"/>


Or you could format a SQL query.

In [0]:
table_name = "users"
filter_clause = "WHERE state = 'CA'"

query = f"""
SELECT *
FROM {table_name}
{filter_clause}
"""

print(query)


SELECT *
FROM users
WHERE state = 'CA'



-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>