In [4]:
%run "./Includes/Classroom-Setup"

### Optimization of Data Storage with Managed and Unmanaged Tables

A **managed table** is a table that manages both the data itself as well as the metadata.  In this case, a `DROP TABLE` command removes both the metadata for the table as well as the data itself.  

**Unmanaged tables** manage the metadata from a table such as the schema and data location, but the data itself sits in a different location, often backed by a blob store like the Azure Blob or S3. Dropping an unmanaged table drops only the metadata associated with the table while the data itself remains in place.


### Writing to a Managed Table

Managed tables allow access to data using the Spark SQL API.

Create a DataFrame.

In [9]:
df = spark.range(1, 100)

display(df)

id
1
2
3
4
5
6
7
8
9
10


Register the table.

In [11]:
df.write.mode("OVERWRITE").saveAsTable("myTableManaged")

Use `DESCRIBE EXTENDED` to describe the contents of the table.  Scroll down to see the table `Type`.

Notice the location is also `dbfs:/user/hive/warehouse/< your database >/mytablemanaged`.

In [13]:
%sql
DESCRIBE EXTENDED myTableManaged

col_name,data_type,comment
id,bigint,
,,
# Detailed Table Information,,
Database,jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp,
Table,mytablemanaged,
Owner,root,
Created Time,Tue Apr 14 08:23:18 UTC 2020,
Last Access,Thu Jan 01 00:00:00 UTC 1970,
Created By,Spark 2.4.4,
Type,MANAGED,


### Writing to an Unmanaged Table

Write to an unmanaged table by adding an `.option()` that includes a path.

In [15]:
unmanagedPath = f"{workingDir}/myTableUnmanaged"

df.write.mode("OVERWRITE").option('path', unmanagedPath).saveAsTable("myTableUnmanaged")

Now examine the table type and location of the data.

In [17]:
%sql
DESCRIBE EXTENDED myTableUnmanaged

col_name,data_type,comment
id,bigint,
,,
# Detailed Table Information,,
Database,jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp,
Table,mytableunmanaged,
Owner,root,
Created Time,Tue Apr 14 08:24:13 UTC 2020,
Last Access,Thu Jan 01 00:00:00 UTC 1970,
Created By,Spark 2.4.4,
Type,EXTERNAL,


### Dropping Tables

Take a look at how dropping tables operates differently in the two cases below.

Look at the files backing up the managed table.

In [20]:
hivePath = f"dbfs:/user/hive/warehouse/{databaseName}.db/mytablemanaged"

display(dbutils.fs.ls(hivePath))

path,name,size
dbfs:/user/hive/warehouse/jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp.db/mytablemanaged/_SUCCESS,_SUCCESS,0
dbfs:/user/hive/warehouse/jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp.db/mytablemanaged/_committed_8123921731873644026,_committed_8123921731873644026,824
dbfs:/user/hive/warehouse/jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp.db/mytablemanaged/_started_8123921731873644026,_started_8123921731873644026,0
dbfs:/user/hive/warehouse/jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp.db/mytablemanaged/part-00000-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-474-1-c000.snappy.parquet,part-00000-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-474-1-c000.snappy.parquet,494
dbfs:/user/hive/warehouse/jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp.db/mytablemanaged/part-00001-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-475-1-c000.snappy.parquet,part-00001-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-475-1-c000.snappy.parquet,494
dbfs:/user/hive/warehouse/jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp.db/mytablemanaged/part-00002-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-476-1-c000.snappy.parquet,part-00002-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-476-1-c000.snappy.parquet,499
dbfs:/user/hive/warehouse/jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp.db/mytablemanaged/part-00003-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-477-1-c000.snappy.parquet,part-00003-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-477-1-c000.snappy.parquet,494
dbfs:/user/hive/warehouse/jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp.db/mytablemanaged/part-00004-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-478-1-c000.snappy.parquet,part-00004-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-478-1-c000.snappy.parquet,493
dbfs:/user/hive/warehouse/jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp.db/mytablemanaged/part-00005-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-479-1-c000.snappy.parquet,part-00005-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-479-1-c000.snappy.parquet,499
dbfs:/user/hive/warehouse/jose_manuel_bustos_munoz_everis_com_etl_part_2_etl2_07_table_management_psp.db/mytablemanaged/part-00006-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-480-1-c000.snappy.parquet,part-00006-tid-8123921731873644026-ac800257-735a-4778-b8ad-9936dca9fa98-480-1-c000.snappy.parquet,494


Drop the table.

In [22]:
%sql
DROP TABLE myTableManaged

Next look at the underlying data.

In [24]:
try:
  display(dbutils.fs.ls(hivePath))
  
except Exception as e:
  print(e)

The data was deleted so spark will not find the underlying data. Perform the same operation with the unmanaged table.

In [26]:
display(dbutils.fs.ls(unmanagedPath))

path,name,size
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/_SUCCESS,_SUCCESS,0
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/_committed_1253384154319975513,_committed_1253384154319975513,824
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/_started_1253384154319975513,_started_1253384154319975513,0
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00000-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-482-1-c000.snappy.parquet,part-00000-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-482-1-c000.snappy.parquet,494
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00001-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-483-1-c000.snappy.parquet,part-00001-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-483-1-c000.snappy.parquet,494
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00002-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-484-1-c000.snappy.parquet,part-00002-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-484-1-c000.snappy.parquet,499
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00003-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-485-1-c000.snappy.parquet,part-00003-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-485-1-c000.snappy.parquet,494
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00004-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-486-1-c000.snappy.parquet,part-00004-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-486-1-c000.snappy.parquet,493
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00005-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-487-1-c000.snappy.parquet,part-00005-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-487-1-c000.snappy.parquet,499
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00006-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-488-1-c000.snappy.parquet,part-00006-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-488-1-c000.snappy.parquet,494


Drop the unmanaged table.

In [28]:
%sql
DROP TABLE myTableUnmanaged

See if the data is still there.

In [30]:
display(dbutils.fs.ls(unmanagedPath))

path,name,size
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/_SUCCESS,_SUCCESS,0
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/_committed_1253384154319975513,_committed_1253384154319975513,824
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/_started_1253384154319975513,_started_1253384154319975513,0
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00000-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-482-1-c000.snappy.parquet,part-00000-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-482-1-c000.snappy.parquet,494
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00001-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-483-1-c000.snappy.parquet,part-00001-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-483-1-c000.snappy.parquet,494
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00002-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-484-1-c000.snappy.parquet,part-00002-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-484-1-c000.snappy.parquet,499
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00003-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-485-1-c000.snappy.parquet,part-00003-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-485-1-c000.snappy.parquet,494
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00004-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-486-1-c000.snappy.parquet,part-00004-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-486-1-c000.snappy.parquet,493
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00005-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-487-1-c000.snappy.parquet,part-00005-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-487-1-c000.snappy.parquet,499
dbfs:/user/jose.manuel.bustos.munoz@everis.com/etl_part_2/etl2_07_table_management_psp/myTableUnmanaged/part-00006-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-488-1-c000.snappy.parquet,part-00006-tid-1253384154319975513-cb34ed98-bbe3-4a2f-a1d0-584da37b525b-488-1-c000.snappy.parquet,494


## Review
**Question:** What happens to the original data when I delete a managed table?  What about an unmanaged table?  
**Answer:** Deleting a managed table deletes both the metadata and the data itself. Deleting an unmanaged table does not delete the original data.

**Question:** What is a metastore?  
**Answer:** A metastore is a repository of metadata such as the location of where data is and the schema information. A metastore does not include the data itself.

In [33]:
%run "./Includes/Classroom-Cleanup"