# Databricks Platform

Demonstrate basic functionality and identify terms related to working in the Databricks workspace.


##### Objectives
1. Execute code in multiple languages
1. Create documentation cells
1. Access DBFS (Databricks File System)
1. Create database and table
1. Query table and plot results
1. Add notebook parameters with widgets


##### Databricks Notebook Utilities
- <a href="https://docs.databricks.com/notebooks/notebooks-use.html#language-magic" target="_blank">Magic commands</a>: `%python`, `%scala`, `%sql`, `%r`, `%sh`, `%md`
- <a href="https://docs.databricks.com/dev-tools/databricks-utils.html" target="_blank">DBUtils</a>: `dbutils.fs` (`%fs`), `dbutils.notebooks` (`%run`), `dbutils.widgets`
- <a href="https://docs.databricks.com/notebooks/visualizations/index.html" target="_blank">Visualization</a>: `display`, `displayHTML`

### Setup
Run classroom setup to mount Databricks training datasets and create your own database for BedBricks.

Use the `%run` magic command to run another notebook within a notebook

In [0]:
%run ./Includes/Classroom-Setup

### Execute code in multiple languages
Run default language of notebook

In [0]:
print("Run default language")

Run language specified by language magic commands: `%python`, `%scala`, `%sql`, `%r`

In [0]:
%python
print("Run python")

In [0]:
%scala
println("Run scala")

In [0]:
%sql
select "Run SQL"

In [0]:
%r
print("Run R", quote=FALSE)

Run shell commands on the driver using the magic command: `%sh`

In [0]:
%sh ps | grep 'java'

Render HTML using the function: `displayHTML` (available in Python, Scala, and R)

In [0]:
html = """<h1 style="color:orange;text-align:center;font-family:Courier">Render HTML</h1>"""
displayHTML(html)

## Create documentation cells
Render cell as <a href="https://www.markdownguide.org/cheat-sheet/" target="_blank">Markdown</a> using the magic command: `%md`  

Below are some examples of how you can use Markdown to format documentation. Click this cell and press `Enter` to view the underlying Markdown syntax.


# Heading 1
### Heading 3
> block quote

1. **bold**
2. *italicized*
3. ~~strikethrough~~

---

- [link](https://www.markdownguide.org/cheat-sheet/)
- `code`

```
{
  "message": "This is a code block",
  "method": "https://www.markdownguide.org/extended-syntax/#fenced-code-blocks",
  "alternative": "https://www.markdownguide.org/basic-syntax/#code-blocks"
}
```

![Spark Logo](https://files.training.databricks.com/images/Apache-Spark-Logo_TM_200px.png)

| Element         | Markdown Syntax |
|-----------------|-----------------|
| Heading         | `#H1` `##H2` `###H3` `#### H4` `##### H5` `###### H6` |
| Block quote     | `> blockquote` |
| Bold            | `**bold**` |
| Italic          | `*italicized*` |
| Strikethrough   | `~~strikethrough~~` |
| Horizontal Rule | `---` |
| Code            | ``` `code` ``` |
| Link            | `[text](https://www.example.com)` |
| Image           | `[alt text](image.jpg)`|
| Ordered List    | `1. First items` <br> `2. Second Item` <br> `3. Third Item` |
| Unordered List  | `- First items` <br> `- Second Item` <br> `- Third Item` |
| Code Block      | ```` ``` ```` <br> `code block` <br> ```` ``` ````|
| Table           |<code> &#124; col &#124; col &#124; col &#124; </code> <br> <code> &#124;---&#124;---&#124;---&#124; </code> <br> <code> &#124; val &#124; val &#124; val &#124; </code> <br> <code> &#124; val &#124; val &#124; val &#124; </code> <br>|

## Access DBFS (Databricks File System)
The <a href="https://docs.databricks.com/data/databricks-file-system.html" target="_blank">Databricks File System</a> (DBFS) is a virtual file system that allows you to treat cloud object storage as though it were local files and directories on the cluster.

Run file system commands on DBFS using the magic command: `%fs`

In [0]:
%fs ls

In [0]:
%fs ls /databricks-datasets

In [0]:
%fs head /databricks-datasets/README.md

In [0]:
%fs mounts

`%fs` is shorthand for the <a href="https://docs.databricks.com/dev-tools/databricks-utils.html" target="_blank">DBUtils</a> module: `dbutils.fs`

In [0]:
%fs help

Run file system commands on DBFS using DBUtils directly

In [0]:
dbutils.fs.ls("/databricks-datasets/learning-spark-v2")

Visualize results in a table using the Databricks <a href="https://docs.databricks.com/notebooks/visualizations/index.html#display-function-1" target="_blank">display</a> function

In [0]:
files = dbutils.fs.ls("/databricks-datasets")
display(files)

## Create table
Run <a href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/index.html#sql-reference" target="_blank">Databricks SQL Commands</a> to create a table named `events` using BedBricks event files on DBFS.

In [0]:
%sql
CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/mnt/training/ecommerce/events/events.parquet");

In [0]:
## Creating a database
%sql

CREATE TABLE IF NOT EXISTS events USING PARQUET OPTIONS (PATH "/mnt/training/ecommerce/events/events.parquet")

In [0]:
## Dropping a database
%sql

DROP DATABASE  dbacademy_admin_databricks_novigosolutions_com_spark_programming_asp_1_2_databricks_platform  

This table was saved in the database created for you in the classroom setup. See the database name printed below.

In [0]:
print(databaseName)

View your database and table in the Data tab of the UI.

## Query table and plot results
Use SQL to query the `events` table

In [0]:
%sql
SELECT * FROM events

Run the query below and then <a href="https://docs.databricks.com/notebooks/visualizations/index.html#plot-types" target="_blank">plot</a> results by selecting the bar chart icon.

In [0]:
%sql
SELECT traffic_source, SUM(ecommerce.purchase_revenue_in_usd) AS total_revenue
FROM events
GROUP BY traffic_source

In [0]:
%sql
select traffic_source, SUM(ecommerce.purchase_revenue_in_usd) as total_revenue
from events
group by traffic_source

In [0]:
query = """select traffic_source, SUM(ecommerce.purchase_revenue_in_usd) as total_revenue
           from events
           group by traffic_source"""

df = spark.sql(query)
display(df)

## Add notebook parameters with widgets
Use <a href="https://docs.databricks.com/notebooks/widgets.html" target="_blank">widgets</a> to add input parameters to your notebook.

Create a text input widget using SQL.

In [0]:
%sql
CREATE WIDGET TEXT state DEFAULT "CA"

Access the current value of the widget using the function `getArgument`

In [0]:
%sql
SELECT *
FROM events
WHERE geo.state = getArgument("state")

Remove the text widget

In [0]:
%sql
REMOVE WIDGET state

To create widgets in Python, Scala, and R, use the DBUtils module: `dbutils.widgets`

In [0]:
dbutils.widgets.text("name", "Brickster", "Name")
dbutils.widgets.multiselect("colors", "orange", ["red", "orange", "black", "blue"], "Traffic Sources")

Access the current value of the widget using the `dbutils.widgets` function `get`

In [0]:
name = dbutils.widgets.get("name")
colors = dbutils.widgets.get("colors").split(",")

html = "<div>Hi {}! Select your color preference.</div>".format(name)
for c in colors:
    html += """<label for="{}" style="color:{}"><input type="radio"> {}</label><br>""".format(c, c, c)

displayHTML(html)

Remove all widgets

In [0]:
dbutils.widgets.removeAll()

# Explore Datasets Lab

We will use tools introduced in this lesson to explore the datasets used in this course.

### BedBricks Case Study
This course uses a case study that explores clickstream data for the online mattress retailer, BedBricks.  
You are an analyst at BedBricks working with the following datasets: `events`, `sales`, `users`, and `products`.

##### Tasks
1. View data files in DBFS using magic commands
1. View data files in DBFS using dbutils
1. Create tables from files in DBFS
1. Execute SQL to answer questions on BedBricks datasets

### 1. List data files in DBFS using magic commands
Use a magic command to display files located in the DBFS directory: **`/mnt/training/ecommerce`**

<img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> You should see four items: `events`, `products`, `sales`, `users`

In [0]:
%fs ls /mnt/training/ecommerce

### 2. List data files in DBFS using dbutils
- Use **`dbutils`** to get the files at the directory above and save it to the variable **`files`**
- Use the Databricks display() function to display the contents in **`files`**

<img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> You should see four items: `events`, `items`, `sales`, `users`

In [0]:

files = dbutils.FILL_IN
display(files)

### 3. Create tables below from files in DBFS
- Create `users` table using files at location `"/mnt/training/ecommerce/users/users.parquet"` 
- Create `sales` table using files at location `"/mnt/training/ecommerce/sales/sales.parquet"` 
- Create `products` table using files at location `"/mnt/training/ecommerce/products/products.parquet"` 
- Create `events` table using files at location `"/mnt/training/ecommerce/events/events.parquet"`

In [0]:
%sql

CREATE TABLE IF NOT EXISTS users_training USING parquet OPTIONS(path '/mnt/training/ecommerce/users/users.parquet') 

In [0]:
%sql
CREATE TABLE IF NOT EXISTS sales_training USING parquet OPTIONS(path "/mnt/training/ecommerce/sales/sales.parquet")

In [0]:
%sql
CREATE TABLE IF NOT EXISTS products_training USING parquet OPTIONS(path "/mnt/training/ecommerce/products/products.parquet")

In [0]:
%sql
CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS(path "/mnt/training/ecommerce/events/events.parquet")

Use the data tab of the workspace UI to confirm your tables were created.

### 4. Execute SQL to explore BedBricks datasets
Run SQL queries on the `products`, `sales`, and `events` tables to answer the following questions. 
- What products are available for purchase at BedBricks?
- What is the average purchase revenue for a transaction at BedBricks?
- What types of events are recorded on the BedBricks website?

The schema of the relevant dataset is provided for each question in the cells below.

#### Q1: What products are available for purchase at BedBricks?

The **`products`** dataset contains the ID, name, and price of products on the BedBricks retail site.

| field | type | description
| --- | --- | --- |
| item_id | string | unique item identifier |
| name | string | item name in plain text |
| price | double | price of item |

Execute a SQL query that selects all from the **`products`** table. 

<img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> You should see 12 products.

In [0]:
%sql

select distinct item_id, name
from products_training

#### Q2: What is the average purchase revenue for a transaction at BedBricks?

The **`sales`** dataset contains order information representing successfully processed sales.  
Most fields correspond directly with fields from the clickstream data associated with a sale finalization event.

| field | type | description|
| --- | --- | --- |
| order_id | long | unique identifier |
| email | string | the email address to which sales configuration was sent |
| transaction_timestamp | long | timestamp at which the order was processed, recorded in milliseconds since epoch |
| total_item_quantity | long | number of individual items in the order |
| purchase_revenue_in_usd | double | total revenue from order |
| unique_items | long | number of unique products in the order |
| items | array | provided as a list of JSON data, which is interpreted by Spark as an array of structs |

Execute a SQL query that computes the average **`purchase_revenue_in_usd`** from the **`sales`** table.

<img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> The result should be `1042.79`.

In [0]:
%sql
select round(avg(purchase_revenue_in_usd),2) as Average_purchase_revenue
from sales_training

#### Q3: What types of events are recorded on the BedBricks website?

The **`events`** dataset contains two weeks worth of parsed JSON records, created by consuming updates to an operational database.  
Records are received whenever: (1) a new user visits the site, (2) a user provides their email for the first time.

| field | type | description|
| --- | --- | --- |
| device | string | operating system of the user device |
| user_id | string | unique identifier for user/session |
| user_first_touch_timestamp | long | first time the user was seen in microseconds since epoch |
| traffic_source | string | referral source |
| geo (city, state) | struct | city and state information derived from IP address |
| event_timestamp | long | event time recorded as microseconds since epoch |
| event_previous_timestamp | long | time of previous event in microseconds since epoch |
| event_name | string | name of events as registered in clickstream tracker |
| items (item_id, item_name, price_in_usd, quantity, item_revenue in usd, coupon)| array | an array of structs for each unique item in the user’s cart |
| ecommerce (total_item_quantity, unique_items, purchase_revenue_in_usd)  |  struct  | purchase data (this field is only non-null in those events that correspond to a sales finalization) |

Execute a SQL query that selects distinct values in **`event_name`** from the **`events`** table

<img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> You should see 23 distinct **`event_name`** values.

In [0]:
%sql

select distinct(event_name)
from events

event_name
mattresses
down
press
shipping_info
main
warranty
finalize
login
faq
careers


In [0]:
%sql

DROP DATABASE dbacademy_admin_databricks_novigosolutions_com_spark_programming_asp_1_2___databricks_platform CASCADE

### Clean up classroom

In [0]:
%run ./Includes/Classroom-Cleanup

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>