#Lesson 04 - Databricks Notebooks

## Introduction to Notebooks

This is a **Databricks notebook**. It consists of cells, some of which contain formatted text, and some that contain executable Python code. Cells containing text, such as the one you are reading now, are referred to as **markdown cells**. Cells that contain executable code are known as **code cells**.

## The Menu Toolbar

Let's take a minute to familiarize ourselves with the menu toolbar at the top of this notebook. We will explain some of these tools in detail, but will leave the remaining items for you to explore. 

* **File**. This menu contains tools for saving, copying, renaming, and exporting your notebook as well as other file management tools. Please take a moment to explore the options available to you in this menu.
* **Edit**. The options in this menu are used to perform a variety of editing tasks such as copying and pasting cells and rearranging cells within a notebook. Please familiarize yourself with these options over time. 
* **View**. This menu provides you with options for customizing the appearance of your notebook. Feel free to explore this at your leasure. 
* **Run All**. This button will execute all of the code in your notebook from start to finish. It does not restart your Python session, so any variables that had been created previously will still exist, even if the cells in which they were defined have since been deleted. 
* **Clear**. This menu item provides tools for resetting your notebook environment. Let's look at each of these.
  * **Clear Results**. This command will clear the output generated by any code cells that have been executing during this Python session, or during a previous time that the Notebook was open. It will not clear the state of your Python sessions. ***I recommend that you run this command any time you re-open a lesson notebook.***
  * **Clear State**. This command will reset your Python session, removing any variables that have been previously defined. It does not clear the output from the notebook.
  * **Clear State & Results**. As the name suggests, this tool will perform the functions of both of the previous two commands.
  * **Clear State & Run All**. This command resets the Python session and then executes every cell in the notebook from top to botoom. ***You should run this command just before submitting any homework assignment or project.***


**Note:** Before proceeding any further in this notebook, I recommend clicking **Clear > Clear Results** so that you will be viewing a fresh notebook with no output displayed.

## Attaching a Cluster

Before we can run any of the code cells in this notebook, we must first attach the notebook to a running cluster. If you don't currently have a cluster running, please start one. Once the cluster is running, return to this notebook. 

You will see a dropdown list that says "Detached" at the upper left of this notebook. Clicking on that dropdown list will display a list of available clusters. Select a running cluster to attach it to this notebook. Once attached, you will, be able to execute the code in this notebok.

## Code Cells

The cell beneath this one is a code cell. Assuming you are attached to a cluster, you can run this cell in two different ways:

1. Select the cell by clicking on it. Then press `SHIFT + ENTER`. 
2. When you hover your mouse over the code cell, a small menu will appear in the upper-right corner of the cell. One of the items in this menu is a "play" button with a triangular icon. You can click this button to run the contents of the code cell. 

The code cell below contains code that, when executed, will display the version of Python currently running on the master node of our cluster. Run this cell now.

In [0]:
import sys
print(sys.version)

The cell below contains code that will display the version of Spark running on our cluster. Run this cell.

In [0]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
print(spark.version)

We will provide one last example of a code cell in this notebook. The code in the cell below uses the NumPy package to randomly generate two arrays `x` and `y`, and then uses Matplotlib to create a scatter plot based on these arrays.

In [0]:
import numpy as np
import matplotlib.pyplot as plt

n = 40
x = np.random.uniform(0, 1, n)
e = np.random.normal(0, 0.1, n)
y = x**3 + e

plt.scatter(x, y, c='salmon', edgecolor='k', s=80)
plt.show()

## Creating New Cells

If you place your mouse pointer slightly above (or below) an existing cell, you will see a small plus button appear above (or below) that cell near the middle of the screen. If you click on this button it will add a new code cell in that location. 

Practice creating code cells by adding two or three new cells beneath this one and then writing a few lines of Python code in each of the new cells. Make sure to run these cells.

## Creating Markdown Cells

A markdown cell can be created in a Databricks notebook by adding the command `%md` at the start of a code cell. This tells the notebook to interpret the contents of the cell as text rather than as executable Python code. To see an example of this, you can double click on any markdown cell (such as this one) to see its raw, un-rendered contents. 

Formatting text in a markdown cell is accomplished by using the **markdown** formatting language. Markdown syntax will not be thoroughly covered here, but the next cell contains many useful examples of markdown syntax, which you can view by double clicking on that cell.

### This is a Level Three Header

**This text is bold.**

_This text is italicized._

We can format text as inline code: `print('Hello World!')`

We can also format several lines as a code block, as shown below. 
```
x = 4
y = 7
print(x + y)
```

Markdown can be used to create bullet-pointed lists.
* Item 1
* Item 2
* Item 3

It can also be used to create numbered lists. 
1. Item 1
2. Item 2
3. Item 3

Additional examples of Markdown syntax can be found in this [Markdown Cheat Sheet](https://www.markdownguide.org/cheat-sheet/). For a more thorough review, you can complete this [Markdown Tutorial](https://commonmark.org/help/tutorial/index.html).

## Deleting Cells

A cell can be deleted from a Databricks Notebook by hovering your mouse over the cell and then clicking the `X` icon that appears in the upper-right corner of the cell. Practice this by creating and then deleting a cell now. If you accidentally delete a cell, you can recover it by selecting the **Undo Delete Cells** option from the **Edit** menu.

## Creating a New Notebook

You will be asked to create notebooks for each homework assignment and each project in this course. Let's walk through the notebook creation procession now. Please create a new Databricks notebook by walking through the following steps:

1. Open Databricks in a new tab. This step is to ensure that you maintain access to the instructions in this notebook as you perform the steps below. 
2. In the new tab, click on the **Workspace** icon and navigate to **Users > YourEmailAddress > Sandbox**. 
3. Right click inside your **Sandbox** folder and select **Create > Notebook**. 
4. In the window that pops up, enter a name for your notebook. Feel free to select any name that you like. 
5. Leave all other fields with their default values and click **Create**.

Your new notebook will automatically be opened. I encourage you to practice working with notebooks by attaching this notebook to a cluster and then creating a few code cells as well as a few markdown cells.