-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

<i18n value="5f2cfc0b-1998-4182-966d-8efed6020eb2"/>



# Getting Started with the Databricks Platform

This notebook provides a hands-on review of some of the basic functionality of the Databricks Data Science and Engineering Workspace.

## Learning Objectives
By the end of this lab, you should be able to:
- Rename a notebook and change the default language
- Attach a cluster
- Use the **`%run`** magic command
- Run Python and SQL cells
- Create a Markdown cell

<i18n value="05dca5e4-6c50-4b39-a497-a35cd6d99434"/>



# Renaming a Notebook

Changing the name of a notebook is easy. Click on the name at the top of this page, then make changes to the name. To make it easier to navigate back to this notebook later in case you need to, append a short test string to the end of the existing name.

<i18n value="f07b8dd7-436d-4719-9c17-18cd47f493fe"/>



# Attaching a cluster

Executing cells in a notebook requires computing resources, which is provided by clusters. The first time you execute a cell in a notebook, you will be prompted to attach to a cluster if one is not already attached.

Attach a cluster to this notebook now by clicking the dropdown near the top-left corner of this page. Select the cluster you created previously. This will clear the execution state of the notebook and connect the notebook to the selected cluster.

Note that the dropdown menu provides the option of starting or restarting the cluster as needed. You can also detach and re-attach to a cluster in a single movement. This is useful for clearing the execution state when needed.

<i18n value="68805a5e-3b2c-4f79-819f-273d4ca95137"/>



# Using %run

Complex projects of any type can benefit from the ability to break them down into simpler, reusable components.

In the context of Databricks notebooks, this facility is provided through the **`%run`** magic command.

When used this way, variables, functions and code blocks become part of the current programming context.

Consider this example:

**`Notebook_A`** has four commands:
  1. **`name = "John"`**
  2. **`print(f"Hello {name}")`**
  3. **`%run ./Notebook_B`**
  4. **`print(f"Welcome back {full_name}`**

**`Notebook_B`** has only one commands:
  1. **`full_name = f"{name} Doe"`**

If we run **`Notebook_B`** it will fail to execute because the variable **`name`** is not defined in **`Notebook_B`**

Likewise, one might think that **`Notebook_A`** would fail becase it uses the variable **`full_name`** which is likewise not defined in **`Notebook_A`**, but it doesn't!

What actually happens is that the two notebooks are merged together as we see below and **then** executed:
1. **`name = "John"`**
2. **`print(f"Hello {name}")`**
3. **`full_name = f"{name} Doe"`**
4. **`print(f"Welcome back {full_name}")`**

And thus providing the expected behavior:
* **`Hello John`**
* **`Welcome back John Doe`**

<i18n value="260e99b3-4126-41b7-8210-b6ff01b98790"/>



The folder that contains this notebook contains a subfolder named **`ExampleSetupFolder`**, which in turn contains a notebook called **`example-setup`**. 

This simple notebook declares the variable **`my_name`**, sets it to **`None`** and then creates a DataFrame called **`example_df`**. 

Open the example-setup notebook and modify it so that name is not **`None`** but rather your name (or anyone's name) enclosed in quotes, and so that the following two cells execute without throwing an **`AssertionError`**.

<img src="https://files.training.databricks.com/images/icon_note_24.png"> You will see additional references **`_utility-methods`** and **`DBAcademyHelper`** which are used to this configure  courseware and should be ignored for this exercise.

In [0]:
%run ./ExampleSetupFolder/example-setup

Python interpreter will be restarted.
Python interpreter will be restarted.


Resetting the learning environment:
| No action taken

Skipping install of existing datasets to "dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02"

Validating the locally installed datasets:
| listing local files...(9 seconds)
| validation completed...(9 seconds total)

Using the "default" schema.

Predefined paths variables:
| DA.paths.working_dir: dbfs:/mnt/dbacademy-users/odl_user_917217@databrickslabs.com/data-engineering-with-databricks
| DA.paths.user_db:     dbfs:/mnt/dbacademy-users/odl_user_917217@databrickslabs.com/data-engineering-with-databricks/database.db
| DA.paths.datasets:    dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02
| DA.paths.checkpoints: dbfs:/mnt/dbacademy-users/odl_user_917217@databrickslabs.com/data-engineering-with-databricks/_checkpoints

Setup completed (17 seconds)


In [0]:
assert my_name is not None, "Name is still None"
print(my_name)

Ajinkya


<i18n value="ece094f7-d013-4b24-aa54-e934f4ab7dbd"/>



## Run a Python cell

Run the following cell to verify that the **`example-setup`** notebook was executed by displaying the **`example_df`** Dataframe. This table consists of 16 rows of increasing values.

In [0]:
display(example_df)

id
0
1
2
3
4
5
6
7
8
9


<i18n value="ce392afd-2e73-4a51-adc4-7d654dad6215"/>



# Change Language

Notice that the default language for this notebook is set to Python. Change this by clicking the **Python** button to the right of the notebook name. Change the default language to SQL.

Notice that the Python cells are automatically prepended with a <strong><code>&#37;python</code></strong> magic command to maintain validity of those cells. Notice that this operation also clears the execution state.

<i18n value="dfce7fd1-08e8-4cc3-92ac-a2eb74f804ef"/>



# Create a Markdown Cell

Add a new cell below this one. Populate with some Markdown that includes at least the following elements:
* A header
* Bullet points
* A link (using your choice of HTML or Markdown conventions)

<i18n value="a54470bc-2a69-4a34-acbb-fe28c4dee284"/>



## Run a SQL cell

Run the following cell to query a Delta table using SQL. This executes a simple query against a table is backed by a Databricks-provided example dataset included in all DBFS installations.

In [0]:
%sql
SELECT * FROM delta.`${DA.paths.datasets}/nyctaxi-with-zipcodes/data`

tpep_pickup_datetime,tpep_dropoff_datetime,trip_distance,fare_amount,pickup_zip,dropoff_zip
2016-02-16T22:40:45.000+0000,2016-02-16T22:59:25.000+0000,5.35,18.5,10003,11238
2016-02-05T16:06:44.000+0000,2016-02-05T16:26:03.000+0000,6.5,21.5,10282,10001
2016-02-08T07:39:25.000+0000,2016-02-08T07:44:14.000+0000,0.9,5.5,10119,10003
2016-02-29T22:25:33.000+0000,2016-02-29T22:38:09.000+0000,3.5,13.5,10001,11222
2016-02-03T17:21:02.000+0000,2016-02-03T17:23:24.000+0000,0.3,3.5,10028,10028
2016-02-10T00:47:44.000+0000,2016-02-10T00:53:04.000+0000,0.0,5.0,10038,10005
2016-02-19T03:24:25.000+0000,2016-02-19T03:44:56.000+0000,6.57,21.5,10001,11377
2016-02-02T14:05:23.000+0000,2016-02-02T14:23:07.000+0000,1.08,11.5,10103,10167
2016-02-20T15:42:20.000+0000,2016-02-20T15:50:40.000+0000,0.8,7.0,10003,10011
2016-02-14T16:19:53.000+0000,2016-02-14T16:32:10.000+0000,1.3,9.0,10199,10020


<i18n value="7499c6b6-b3f3-4641-88d9-5a260d3c11f8"/>



Execute the following cell to view the underlying files backing this table.

In [0]:
files = dbutils.fs.ls(f"{DA.paths.datasets}/nyctaxi-with-zipcodes/data")
display(files)

path,name,size,modificationTime
dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02/nyctaxi-with-zipcodes/data/_delta_log/,_delta_log/,0,1681742427354
dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02/nyctaxi-with-zipcodes/data/part-00000-80b68cae-ce6a-41cf-87cd-2573d91b4c07-c000.snappy.parquet,part-00000-80b68cae-ce6a-41cf-87cd-2573d91b4c07-c000.snappy.parquet,90261,1681701711000
dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02/nyctaxi-with-zipcodes/data/part-00001-c883942d-366f-478a-be3b-f13fd4bee0ab-c000.snappy.parquet,part-00001-c883942d-366f-478a-be3b-f13fd4bee0ab-c000.snappy.parquet,90986,1681701712000
dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02/nyctaxi-with-zipcodes/data/part-00002-bbf9fd81-4b3a-46f3-943e-841b48ae743e-c000.snappy.parquet,part-00002-bbf9fd81-4b3a-46f3-943e-841b48ae743e-c000.snappy.parquet,90740,1681701712000
dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02/nyctaxi-with-zipcodes/data/part-00003-3d80435e-15f8-4154-92c7-515307e41c1b-c000.snappy.parquet,part-00003-3d80435e-15f8-4154-92c7-515307e41c1b-c000.snappy.parquet,90840,1681701712000
dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02/nyctaxi-with-zipcodes/data/part-00004-0b996b45-a3ff-4339-afeb-8fc691770056-c000.snappy.parquet,part-00004-0b996b45-a3ff-4339-afeb-8fc691770056-c000.snappy.parquet,90818,1681701712000
dbfs:/mnt/dbacademy-datasets/data-engineering-with-databricks/v02/nyctaxi-with-zipcodes/data/part-00005-ec9ab51b-23a3-4333-8d42-1730df56bfb6-c000.snappy.parquet,part-00005-ec9ab51b-23a3-4333-8d42-1730df56bfb6-c000.snappy.parquet,90707,1681701712000


<i18n value="a17b5667-53bc-4f8a-8601-5599f4ebb819"/>


# Clearing notebook state

Sometimes it is useful to clear all variables defined in the notebook and start from the begining.  This can be useful when you want to test cells in isolation, or you simply want to reset the execution state.

Visit the **Clear** menu and select the **Clear State & Cell Outputs**.

Now try running the cell below and notice the variables defined earlier are no longer defined, until you rerun the earlier cells above.

In [0]:
print(my_name)

<i18n value="8bff18c2-3ecf-484a-9a8c-dadab7eaf0a1"/>



# Review Changes

Assuming you have imported this material into your workspace using a Databricks Repo, open the Repo dialog by clicking the **`published`** branch button at the top-left corner of this page. You should see three changes:
1. **Removed** with the old notebook name
1. **Added** with the new notebook name
1. **Modified** for creating a markdown cell above

Use the dialog to revert the changes and restore this notebook to its original state.

<i18n value="cb3c335a-dd4c-4620-9f10-6946250f2e02"/>



## Wrapping Up

By completing this lab, you should now feel comfortable manipulating notebooks, creating new cells, and running notebooks within notebooks.

-sandbox
&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>