# Lecture 15. Databases and Tables on Databricks (Hands On)


In this notebook, we will work with databases and tables on Databricks.


## Take a look on the Catalog Explorer

<div style="text-align: center;">
<img src="../../assets/images/Screen-Captures/Databricks Catalog explorer.jpg" style="width:1280px" >
</div> 

## Managed Tables

Let us create a table called `manged_default` and insert some data.

In [0]:
%sql
USE CATALOG hive_metastore;

CREATE TABLE managed_default
  (width INT, length INT, height INT);

INSERT INTO managed_default
VALUES (3 INT, 2 INT, 1 INT)

num_affected_rows,num_inserted_rows
1,1


It is a managed table since we are not specifying `LOCATION` keywords.

If we come back to the Catalog explorer, 
We see that the table `managed_default` has been created under the default database.

<div style="text-align: center;">
<img src="../../assets/images/Screen-Captures/Catalog Explorer - hive_metastore - default - managed_default.jpg" style="width:1280px" >
</div>

Let us run the `DESCRIBE EXTENDED` command on our table to see some metadata information.

In [0]:
%sql
DESCRIBE EXTENDED managed_default

col_name,data_type,comment
width,int,
length,int,
height,int,
,,
# Delta Statistics Columns,,
Column Names,"width, length, height",
Column Selection Method,first-32,
,,
# Detailed Table Information,,
Catalog,hive_metastore,


Here we see that there are 21 rows of metadata.

Let us scroll to see two useful information about our table.

The first one, the location.
Here, we can see that our table is created under the default Hive metastore, which is `dbfs:/user/hive/warehouse`, and 
the type of the table is `Managed`. It's a managed table 
since we didn't specify the `LOCATION` keyword during the creation of the table.

## External Tables

Let us now create an external table under the default database.

To create an external table, 
you need simply to specify in the `CREATE TABLE` statement 
the `LOCATION` keyword followed by the path to 
where this table needs to be stored.
In our case, we store this table under `/mnt/demo` directory.

Let us create this table and insert some data in it.

In [0]:
%sql
CREATE TABLE external_default
  (width INT, length INT, height INT)
LOCATION 'dbfs:/mnt/demo/external_default';
  
INSERT INTO external_default
VALUES (3 INT, 2 INT, 1 INT)

num_affected_rows,num_inserted_rows
1,1


Let us take a look on the Catalog Explorer.

<div style="text-align: center;">
<img src="../../assets/images/Screen-Captures/Catalog Explorer - hive_metastore - default - external_default.jpg" style="width:1280px" >
</div> 

Here we can see that the table has been well created in the hive metastore.

Let us now run `DESCRIBE EXTENDED` on our external table.

In [0]:
%sql
DESCRIBE EXTENDED external_default

col_name,data_type,comment
width,int,
length,int,
height,int,
,,
# Delta Statistics Columns,,
Column Names,"width, length, height",
Column Selection Method,first-32,
,,
# Detailed Table Information,,
Catalog,hive_metastore,


Here we can see that this table is indeed an external table, and it is created in the specified location under `/mnt/demo`

## Dropping Tables

Let us now see what will happen if we drop the managed table.

In [0]:
%sql
DROP TABLE managed_default

The table has been successfully deleted.

Let us confirm this by checking the table directory.

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/managed_default'

And indeed the table directory and its data files have been all removed.

Let us now drop the external table and see what will happen.

In [0]:
%sql
DROP TABLE external_default

In the hive metastore, we see that both tables are no longer exist.

<div style="text-align: center;">
<img src="../../assets/images/Screen-Captures/Catalog Explorer - hive_metastore - default - none.jpg" style="width:1280px" >
</div> 

However, if we check the table directory.

In [0]:
%fs ls 'dbfs:/mnt/demo/external_default'

path,name,size,modificationTime
dbfs:/mnt/demo/external_default/_delta_log/,_delta_log/,0,1728742787000
dbfs:/mnt/demo/external_default/part-00000-ca93c717-d1cf-4d50-a682-807e577d2441-c000.snappy.parquet,part-00000-ca93c717-d1cf-4d50-a682-807e577d2441-c000.snappy.parquet,1045,1728742789000



We see that the table directory and the data files are still there.

Since this table is created outside the database directory, the underlying data is not managed by Hive.

So, dropping the table will not delete the underlying data files as we see here.

## Creating Schemas

In addition to the default database, we can also create extra databases.

To do so we can use either `CREATE SCHEMA` syntax or `CREATE DATABASE` syntax, which is actually the same.

In [0]:
%sql
CREATE SCHEMA new_default

Here we can see that the new database has been well created.

<div style="text-align: center;">
<img src="../../assets/images/Screen-Captures/Catalog Explorer - hive_metastore - new_default.jpg" style="width:1280px" >
</div> 

Let us run the `DESCRIBE DATABASE EXTENDED` on our database to see some metadata information.

In [0]:
%sql
DESCRIBE DATABASE EXTENDED new_default

database_description_item,database_description_value
Catalog Name,hive_metastore
Namespace Name,new_default
Comment,
Location,dbfs:/user/hive/warehouse/new_default.db
Owner,root
Properties,


Here we can see that the database itself is created under the default hive warehouse directory.

Notice that the database has (.db) extension to differentiate it from other table folders in the same directory.


Let us create some tables in this new database.
Here, we will create also a managed table and an external table.

To create a new table in a database different than the default one, you need to specify the database to be used through the USE keywords.

Let us run this command.

In [0]:
%sql
USE new_default;

CREATE TABLE managed_new_default
  (width INT, length INT, height INT);
  
INSERT INTO managed_new_default
VALUES (3 INT, 2 INT, 1 INT);

-----------------------------------

CREATE TABLE external_new_default
  (width INT, length INT, height INT)
LOCATION 'dbfs:/mnt/demo/external_new_default';
  
INSERT INTO external_new_default
VALUES (3 INT, 2 INT, 1 INT);

num_affected_rows,num_inserted_rows
1,1


In the Catalog Explorer, we see that the two tables have been created.

<div style="text-align: center;">
<img src="../../assets/images/Screen-Captures/Catalog Explorer - hive_metastore - new_default - created tables.jpg" style="width:1280px" >
</div> 

Here, we can see that this new table is indeed a managed table created in its database folder under the default hive warehouse directory.

In [0]:
%sql
DESCRIBE EXTENDED managed_new_default

col_name,data_type,comment
width,int,
length,int,
height,int,
,,
# Delta Statistics Columns,,
Column Names,"width, length, height",
Column Selection Method,first-32,
,,
# Detailed Table Information,,
Catalog,hive_metastore,



And the second table where we use the LOCATION keyword has been defined as an external table under `/mnt/demo` location.

In [0]:
%sql
DESCRIBE EXTENDED external_new_default

col_name,data_type,comment
width,int,
length,int,
height,int,
,,
# Delta Statistics Columns,,
Column Names,"width, length, height",
Column Selection Method,first-32,
,,
# Detailed Table Information,,
Catalog,hive_metastore,


We can simply drop those two tables to see again that 
the table directory and the data files of the managed table have been all removed.

In [0]:
%sql
DROP TABLE managed_new_default;
DROP TABLE external_new_default;


In the Catalog Explorer, we see that both tables have been dropped from the new database.

<div style="text-align: center;">
<img src="../../assets/images/Screen-Captures/Catalog Explorer - hive_metastore - new_default.jpg" style="width:1280px" >
</div> 

However, as expected, the table directory and the data files of the external table are still there.

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/new_default.db/managed_new_default'

In [0]:
%fs ls 'dbfs:/mnt/demo/external_new_default'

path,name,size,modificationTime
dbfs:/mnt/demo/external_new_default/_delta_log/,_delta_log/,0,1728743689000
dbfs:/mnt/demo/external_new_default/part-00000-d629ace9-52bc-4a67-ba37-c70eba8be21b-c000.snappy.parquet,part-00000-d629ace9-52bc-4a67-ba37-c70eba8be21b-c000.snappy.parquet,1045,1728743690000


## Creating Schemas in Custom Location

Let us finally create a database in a custom location outside of the hive directory.

In [0]:
%sql
CREATE SCHEMA custom
LOCATION 'dbfs:/Shared/schemas/custom.db'

As we can see in the Catalog Explorer, the database has been really created in the hive metastore.

<div style="text-align: center;">
<img src="../../assets/images/Screen-Captures/Catalog Explorer - hive_metastore - custom.jpg" style="width:1280px" >
</div> 

However, if we run the `DESCRIBE DATABASE EXTENDED`, 
we see that it is created in the custom location we have defined during the creation of the database, and 
it is different from the default hive directory.

In [0]:
%sql
DESCRIBE DATABASE EXTENDED custom

database_description_item,database_description_value
Catalog Name,hive_metastore
Namespace Name,custom
Comment,
Location,dbfs:/Shared/schemas/custom.db
Owner,root
Properties,


Nothing special about this database.

You can normally create managed and external tables in this database.

In [0]:
%sql
USE custom;

CREATE TABLE managed_custom
  (width INT, length INT, height INT);
  
INSERT INTO managed_custom
VALUES (3 INT, 2 INT, 1 INT);

-----------------------------------

CREATE TABLE external_custom
  (width INT, length INT, height INT)
LOCATION 'dbfs:/mnt/demo/external_custom';
  
INSERT INTO external_custom
VALUES (3 INT, 2 INT, 1 INT);

num_affected_rows,num_inserted_rows
1,1


In hive metastore, we can see the tables of our "custom" location.

<div style="text-align: center;">
<img src="../../assets/images/Screen-Captures/Catalog Explorer - hive_metastore - custom - created tables.jpg" style="width:1280px" >
</div> 

The `managed_custom` table is indeed a managed table
since it is created in the database folder located in a custom location.

In [0]:
%sql
DESCRIBE EXTENDED managed_custom

col_name,data_type,comment
width,int,
length,int,
height,int,
,,
# Delta Statistics Columns,,
Column Names,"width, length, height",
Column Selection Method,first-32,
,,
# Detailed Table Information,,
Catalog,hive_metastore,


And the second table is an external table since it is created outside the database directory.

In [0]:
%sql
DESCRIBE EXTENDED external_custom

col_name,data_type,comment
width,int,
length,int,
height,int,
,,
# Delta Statistics Columns,,
Column Names,"width, length, height",
Column Selection Method,first-32,
,,
# Detailed Table Information,,
Catalog,hive_metastore,


By dropping the two tables,

In [0]:
%sql
DROP TABLE managed_custom;
DROP TABLE external_custom;

we can see that they have successfully deleted from the hive metastore.

<div style="text-align: center;">
<img src="../../assets/images/Screen-Captures/Catalog Explorer - hive_metastore - custom.jpg" style="width:1280px" >
</div> 

And the managed tables directory and the data files are no longer exist in the database directory located in the custom location.

In [0]:
%fs ls 'dbfs:/Shared/schemas/custom.db/managed_custom'

While the external table's directory and its data file are not deleted and are still in this external location outside the database directory.

In [0]:
%fs ls 'dbfs:/mnt/demo/external_custom'

path,name,size,modificationTime
dbfs:/mnt/demo/external_custom/_delta_log/,_delta_log/,0,1728744272000
dbfs:/mnt/demo/external_custom/part-00000-629ebb7f-0c27-4868-b7b8-f6b2c512f10e-c000.snappy.parquet,part-00000-629ebb7f-0c27-4868-b7b8-f6b2c512f10e-c000.snappy.parquet,1045,1728744274000
