# Read diamonds Dataset with SQL

## Introduction
This notebook imports the `diamonds` dataset. This entails:
2. Reading the datafile (into a dataframe/table)
3. Checking the datatypes (of each column in the dataframe)
4. Setting these datatypes (if they were not initially read correctly)

The sections of this notebook (listed below, except the Setup section) correspond to each step. 
Note that the columns of the diamonds dataset are all initially read correctly. 
Other notebooks require more work to set the column datatypes correctly.

## Contents
1. Setup
2. Read datafile
3. Check column types
4. Set column types

## 1. Setup

The notebook `Include` 
- contains some references 
- loads libraries
- defines the function `get_filepaths` in R and Python to facilitate locating the datafile

Display the notebook results to see these references and the libraries.

In [6]:
%r
diamonds_filepath = '/dbfs/mnt/datalab-datasets/file-samples/diamonds.csv'

In [7]:
%python
diamonds_filepath = '/dbfs/mnt/datalab-datasets/file-samples/diamonds.csv'

### 2. Read using SQL

Delete the `diamonds` table if it exists. The `create` command will not create a table if it already exists.

In [10]:
%sql
drop table if exists diamonds

The `OK` output indicates that the command was successful.

Create the `diamonds` table from the datafile.

In [13]:
%sql
create temporary table diamonds 
using CSV 
options(path="/mnt/datalab-datasets/file-samples/diamonds.csv", 
        header=TRUE)

Note that the command succeeded.

Display all columns from the `diamonds` table.

In [16]:
%sql
select *
from diamonds

Note the `select` command in Spark SQL will display only the first 1000 rows of a table.

__The End__