# Create Table - SQL

__Content__
1. Setup
1. CSV example
1. JSON example

These examples demonstrate these SQL commands:
- `drop table` - remove a table so it can be (re)created 
- `create temporary table` - create an SQL table from a CSV or JSON file 
- `select` - retrieve values from specified columns of a table

__References__
- [SQL Guide - Databricks](https://docs.databricks.com/spark/latest/spark-sql/index.html)
- [Create Table - Databricks](https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html)

## 1. Setup

There really isn't any setup, except to list the full path of the files we will read into tables:

## 2. CSV examples

List some sample CSV data files.

In [8]:
%sh ls /dbfs/mnt/datalab-datasets/file-samples/*.csv

Note that the filepaths above start with `/dbfs`, 
but that the corresponding paths used in the `create table` commands __do not__ start with `/dbfs`.

The subsections below read in different CSV files and list their records.

### 2.1 `iris.csv` example

This example reads the `iris.csv` file into an SQL table.

Check that the file contains column names in the first line. (It does.)

In [14]:
%sh head -n 3 /dbfs/mnt/datalab-datasets/file-samples/iris.csv

Check the number of lines in the file.

In [16]:
%sh wc -l /dbfs/mnt/datalab-datasets/file-samples/iris.csv

There are `150` lines in the file.

The two sections below provide examples of reading the `iris.csv` file:
- Infering the schema (and so letting SQL determine the data types of the columns)
- Setting the schema (and so setting the data types of the columns in the `create` command)

#### 2.1.1 `iris` example - infer schema

Drop the `iris` table if it exists. We cannot create a table that already exists.

In [21]:
%sql
drop table if exists iris

Create the `iris` SQL table from the `iris.csv` file.

In [23]:
%sql
create temporary table iris 
using CSV 
options(path="mnt/datalab-datasets/file-samples/iris.csv", 
        header=TRUE)

Note that the filepath used in the `create table` commands __does not__ start with `/dbfs`.

Notice: 
- the `using` clause specifies that the file is in CSV format
- the `path` parameter specifies the location of the file
- the `header` parameter specifies that the first line of the file contains column names

List all of the rows and all of the columns from the `iris` table.

In [27]:
%sql
select *
from iris

#### 2.1.2 `iris` example - set schema

Drop the table.

In [30]:
%sql
drop table if exists iris

Create the table, but set the column names and data types.

In [32]:
%sql
create temporary table iris 
  (sepal_length double, sepal_width double, 
   petal_length double, petal_width double, species string)
using CSV 
options(path="mnt/datalab-datasets/file-samples/iris.csv", 
        header=TRUE)

Display the table.

In [34]:
%sql
select * 
from iris

Notice the column names are as specified in the `create` command.

### 2.2 `diamonds.csv` example

Drop the `diamonds` table.

In [38]:
%sql
drop table if exists diamonds

Create the `diamonds` SQL table from the `diamonds.csv` file.

In [40]:
%sql
create temporary table diamonds
using CSV 
options(path="mnt/datalab-datasets/file-samples/diamonds.csv", 
        header=TRUE)

In [41]:
%sql
select * 
  from diamonds

## 3. JSONL examples

List the JSON files in the `file-samples` directory. 
Not all of these files are in JSONL format, but these are:
- `each_line.json`
- `stocks.json`
- `zips.json`
- `world_bank.json`

In [44]:
%sh ls -hot /dbfs/mnt/datalab-datasets/file-samples/*.json

### 3.1 `each_line.json` example

This example reads the `each_line.json` file into an SQL table. 
First though, display the first `10` lines of the file.

In [47]:
%sh head -n 10 /dbfs/mnt/datalab-datasets/file-samples/each_line.json

Now drop and create the table. Notice that the filepath __does not__ start with `/dbfs`.

In [49]:
%sql
drop table if exists each_line;
create temporary table each_line 
  using JSON
  options(path="/mnt/datalab-datasets/file-samples/each_line.json", 
          header=TRUE)

Now display the table.

In [51]:
%sql
select * 
  from each_line

### 3.2 `zips.json` example

This example reads the `zips.json` file into an SQL table. 
First though, display the first five lines of the file.

In [54]:
%sh head -n 5 /dbfs/mnt/datalab-datasets/file-samples/zips.json

Notice that the `loc` field is a list containing two values. 
We will retrieve this data in the `Select` notebook.

In [56]:
%sql
drop table if exists zips ;
create temporary table zips 
using JSON
options(path="/mnt/datalab-datasets/file-samples/zips.json", 
        header=TRUE);
select * from zips

### 3.3 `companies.json` example

This example reads the `companies.json` file into an SQL table. 
First though, display the first five lines of the file. 

Check that the file seems to be in JSONL format by looking for "ends of lines".

In [59]:
%sh head -n 5 /dbfs/mnt/datalab-datasets/file-samples/companies.json

__Exercise:__ 
1. change the above `head` command to display the first line of the file
1. change the above `head` command to display the first two lines of the file

Read the `companies.json` file into a table and display it.

In [62]:
%sql
drop table if exists companies;
create temporary table companies
using JSON
options(path="/mnt/datalab-datasets/file-samples/companies.json", 
        header=TRUE);
select * from companies

__The End__