# <img src ='https://airsblobstorage.blob.core.windows.net/airstream/Asset 275.png' width="50px"> Tables, Data Frames and Datasets

This notebook will show you how to create Dataframes and query a table or DataFrame that you uploaded to DBFS.

## The next command creates a table from a Databricks dataset

In [0]:
%sql 
DROP TABLE IF EXISTS diamonds;

--This will create a table called diamonds in the default database in the DBFS from the csv file below
CREATE TABLE diamonds
USING csv
OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true")

In [0]:
%sql

SELECT * from diamonds

In [0]:
#This shows you how to create a delta table
#This will create a folder in the DBFS called delta and place the diamonds delta table inside of that folder
diamonds = spark.read.csv("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header="true", inferSchema="true")
diamonds.write.format("delta").mode("overwrite").save("/delta/diamonds")

In [0]:
%sql
DROP TABLE IF EXISTS diamonds;

--This will drop the table diamonds and create a table a new table called diamonds in the default database in the DBFS from the location where the delta tables are in delta/diamonds folder in the dbfs
CREATE TABLE diamonds USING DELTA LOCATION '/delta/diamonds/'

In [0]:
%sql
SELECT * from diamonds

## The next command manipulates the data and displays the results 

Specifically, the command:
1. Selects color and price columns, averages the price, and groups and orders by color.
1. Displays a table of the results.

In [0]:
%sql
SELECT color, avg(price) AS price FROM diamonds GROUP BY color ORDER BY color

## Convert the table to a chart

Under the table, click the bar chart <img src="http://docs.databricks.com/_static/images/notebooks/chart-button.png"/></a> icon.

## Repeat the same operations using Python DataFrame API. 
This is a SQL notebook; by default command statements are passed to a SQL interpreter. To pass command statements to a Python interpreter, include the `%python` magic command.

## The next command creates a DataFrame from a Databricks dataset

In [0]:
diamonds = spark.read.csv("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header="true", inferSchema="true")

## The next command manipulates the data and displays the results

In [0]:
from pyspark.sql.functions import avg

display(diamonds.select("color","price").groupBy("color").agg(avg("price")).sort("color"))