# Spark Tables

This notebook shows how to use Spark Catalog Interface API to query databases, tables, and columns.

A full list of documented methods is available [here](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Catalog)

In [1]:
val us_flights_file = "../../databricks-datasets/learning-spark-v2/flights/departuredelays.csv"

Intitializing Scala interpreter ...

Spark Web UI available at http://192.168.0.11:4050
SparkContext available as 'sc' (version = 3.1.2, master = local[*], app id = local-1633075528522)
SparkSession available as 'spark'


us_flights_file: String = ../../databricks-datasets/learning-spark-v2/flights/departuredelays.csv


## Create Managed Tables

https://stackoverflow.com/questions/50914102/why-do-i-get-a-hive-support-is-required-to-create-hive-table-as-select-error/54552891

In [3]:
// Create database and managed tables
spark.sql("DROP DATABASE IF EXISTS learn_spark_db CASCADE") 
spark.sql("CREATE DATABASE learn_spark_db")
spark.sql("USE learn_spark_db")
spark.sql("CREATE TABLE us_delay_flights_tbl(date STRING, delay INT, distance INT, origin STRING, destination STRING)")

org.apache.spark.sql.AnalysisException:  Hive support is required to CREATE Hive TABLE (AS SELECT);

## Display the databases

In [5]:
spark.catalog.listDatabases()

res2: org.apache.spark.sql.Dataset[org.apache.spark.sql.catalog.Database] = [name: string, description: string ... 1 more field]


## Read our US Flights table

In [8]:
val df = (spark.read.format("csv")
      .schema("date STRING, delay INT, distance INT, origin STRING, destination STRING")
      .option("header", "true")
      .option("path", "departuredelays.csv")
      .load())

df: org.apache.spark.sql.DataFrame = [date: string, delay: int ... 3 more fields]


## Save into our table

In [10]:
df.write.mode("overwrite").saveAsTable("us_delay_flights_tbl_scala")

## Display tables within a Database

Note that the table is MANGED by Spark

In [12]:
spark.catalog.listColumns("us_delay_flights_tbl_scala")

res7: org.apache.spark.sql.Dataset[org.apache.spark.sql.catalog.Column] = [name: string, description: string ... 4 more fields]


## Display Columns for a table

In [13]:
spark.catalog.listColumns("us_delay_flights_tbl_scala")

res8: org.apache.spark.sql.Dataset[org.apache.spark.sql.catalog.Column] = [name: string, description: string ... 4 more fields]


## Create Unmanaged Tables

In [14]:
// Drop the database and create unmanaged tables
spark.sql("DROP DATABASE IF EXISTS learn_spark_db CASCADE")
spark.sql("CREATE DATABASE learn_spark_db")
spark.sql("USE learn_spark_db")
spark.sql("CREATE TABLE us_delay_flights_tbl (date STRING, delay INT, distance INT, origin STRING, destination STRING) USING csv OPTIONS (path '/databricks-datasets/learning-spark-v2/flights/departuredelays.csv')")

res9: org.apache.spark.sql.DataFrame = []
