Skip to content
Branch: master
Find file History
mukulmurthy [DELTA-OSS-EXTERNAL] Update Spark Summit tutorial description and add…
… video link

Now that the videos from Spark Summit have been published, we can link to the video of the tutorial. Also updated the tutorial's description.

Closes #234

Author: Tathagata Das <tathagata.das1565@gmail.com>
Author: Mukul Murthy <38224594+mukulmurthy@users.noreply.github.com>

#6887 is resolved by tdas/lvr4tjzx.

GitOrigin-RevId: be593b21236f8b9fb0a81e81e33454b0df79135e
Latest commit 32af18c Nov 6, 2019

readme.md

Delta Lake Tutorial: Spark + AI Summit 2019 EU

Sign Up for Databricks Community Edition

This tutorial goes through many features of Delta Lake features including schema enforcement and schema evolution, interoperability between batch and streaming workloads, time travel, and DML commands like Delete and Merge. It was originally given at Spark Summit 2019 Europe and is available in both Scala and Python. The instructions on this page explain how to run the examples on Databricks Community Edition, but all the pieces (except some of the Databricks filesystem bits) should work in any Spark 2.4.2 or higher with Delta Lake 0.4.0 or higher. If you'd prefer to watch the tutorial, along with some brief background information about the problems Delta Lake tries to solve, in video form, here's the recording from one of the Spark Summit sessions:

Expand to view more details about Databricks Community Edition

 
  Start by signing up for Databricks Community Edition by going to databricks.com/try and choose Community Edition.

Note, the Community Edition link is on the right side with the white Get Started button (i.e. not the green button). This is a free edition of Databricks and does not require your credit card.

Next, sign up for Databricks Community Edition (DBCE) by filling out the form (note, no credit card is required). Once you sign up, verify your account by going to your email account that you filled out in the preceding form. Once your account is validated, go to DBCE which should look similar to below.

Once you log in, you will view the Databricks workspace similar to the screenshot below.

Create a Cluster with Databricks Runtime 6.1+

Expand to view more details about Creating a Cluster

 
  Start by clicking the Create Cluster on the left pane.

This will bring up the Create Cluster dialog as noted in the following screenshot.

Fill in the name of your cluster as well as the Databricks Runtime Version - choose the 6.1 Beta runtime.

Click on Create Cluster and then your cluster will be up and running.

Note, within DBCE, you can only create one cluster at a time. If one already exists, you will need to either use it or create a new one.

Importing Notebooks

Expand to view more details about Importing Notebooks

 
  For these next steps, we will import the following notebook so keep the following links handy:

Start by opening up one of the notebooks in the preceding links in a new window and copy the URL.

Then go back to your Databricks workspace, right click and then choose Import.

This will open up the Import Notebooks dialog in the Databricks workspace.

Paste the notebook URL you had copied from two screens prior into the Import Notebooks dialog.

Once you have imported the notebook, your screen should similar to the view below.

Attaching Notebooks

Expand to view more details about Attaching Notebooks

 
  Near the top left, click the cluster dropdown and choose the cluster you want to attach the notebook.

You can’t perform that action at this time.