1. Download the relevant data-set from the internet as CSV
2. Convert the csv file to Delta format.

#### 1.Get dataset from the web
We use the wget command to download our data-set to the driver. After unzipping the file, we create a folder on dbfs, and copy our unzipped csv file there. Note that dbfs:/ is available as /dbfs/ on the driver (because it is a FUSE mount).

In [3]:
%sh wget https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip -O /tmp/bank.zip --no-check-certificate

Unzip the file to /tmp/bank

In [5]:
%sh unzip -o /tmp/bank.zip -d /tmp/bank

Create a new directory on dbfs:/, called BankMarketing

In [7]:
%fs mkdirs /BankMarketing

Due to the FUSE mount, dbfs:/BankMarketing is now also available as /dbfs/BankMarketing on the driver. We can move our files there to make them available on dbfs.

In [9]:
%sh cp -rv /tmp/bank /dbfs/BankMarketing

#### 2.Read data and save as Delta Table
Now that we have copied our csv to dbfs:/, the next step is to convert it to Delta format.

Use spark.read to create a Spark Dataframe in which we will read our CSV data

In [12]:
bdf = spark.read.format("csv")\
.option("path", "dbfs:/BankMarketing/bank/bank-full.csv")\
.option("inferSchema", "true")\
.option("header", "true")\
.option("delimiter", ";")\
.option("quote", '"')\
.load()

In [13]:
%sql
CREATE DATABASE IF NOT EXISTS max_db
LOCATION 'dbfs:/max/db'

We now write our spark dataframe to a delta table in max_db database, and call it bank_marketing. Note that, under the hood, the Delta table is saved somewhere on DBFS, i.e. on Azure Storage.

In [15]:
bdf.write.mode('Overwrite').format("delta").saveAsTable("max_db.bank_marketing")

In [16]:
display(dbutils.fs.ls("dbfs:/max/db/bank_marketing/"))