# Lesson 14. Relational entities


## Learning Objectives

- Databases
- Tables
- The impact of the LOCATION keyword

## Hive metastore

  Every Databricks workspace has a central Hive metastore accessible by all clusters to persist table metadata.

  A *Hive metastore* is a repository of metadata that stores information for data structure, such as databases, tables and partitions.
  ```
  hive_metastore
  |- Databases
  |- Tables
  |- ...
  ```

  So it holds metadata about your table and data, such as the table definition, the format of the data, and where this data is actually stored in the underlying storage.



## Databases

- A **database** is actually a schema in Hive metastore.

  This is why in order to create a database, you could use 
  * either `CREATE DATABASE` syntax 
  * or instead use `CREATE SCHEMA` keyword, 
  
  which is exactly the same.
  ```SQL
  CREATE DATABASE db_name
  CREATE SCHEMA db_name
  ```

- By default you have a database called "default". 

  To create some tables in this default database,
  we use the `CREATE TABLE` statement without specifying any database name.

  <div style="text-align: center;">
  <img src="../../assets/images/hive_matastore default database create table.jpg" style="width:640px" >
  </div> 

  * In this case, the table definition will be under the default database in the hive metastore.

  * And the table data will be located under the *default hive directory*, which is `/user/hive/warehouse`.

- In addition to the default database, we can create other databases.

  To do so, we use the `CREATE DATABASE` or `CREATE SCHEMA` syntax.

  <div style="text-align: center;">
  <img src="../../assets/images/hive_matastore custom database create table.jpg" style="width:640px" >
  </div> 

  The database will be created in the hive metastore and 
  the database folder will be under the default hive directory (/user/hive/warehouse).

  Notice that the database folder has an extension (.db) 
  to distinguish it from the tables directories.

  Now we can use this database to create some tables.

  And again, the table's definition will be in the hive metastore.

  And their data files will be under the database folder in the hive default directory.

- It is possible also to create databases outside of the default hive directory (/user/hive/warehouse).

  To do so, we use again the `CREATE SCHEMA` syntax, 
  but this time with the `LOCATION` keyword 
  where we specify the path in which the database will be stored.

  <div style="text-align: center;">
  <img src="../../assets/images/hive_matastore custom path database create table.jpg" style="width:640px" >
  </div> 

  And the database definition will be as usual in the hive metastore.

  While the database folder will be in the specified custom location outside the default hive directory.

  And as usual, we can use this database to create some tables.

  While the table definition will be in the hive metastore, 
  the actual data files for these tables will be stored 
  in the database folder in that custom location.



## Tables

### Table types

  In Databricks, there are two types of tables: managed tables and external tables.

  <div style="text-align: center;">
  <img src="../../assets/images/Databricks 2 types of tables.jpg" style="width:640px" >
  </div>   

  * *A managed table* is when the table is created in the storage 
    under the database directory, which is the default case.

    However, an *external table* is when the table is created in the storage outside the database directory in a path specified by the `LOCATION` keyword.

  * For a managed table, Hive owns both the metadata and table data, 
    which means that it manages the lifecycle of the table.

    So when you drop a managed table, the underlying data files will be deleted.

    On the other hand, for external tables, Hive owns only the table metadata, but not the underlying data files.

    So when you drop an external table, the underlying data files will not be deleted.

### External table

  * We can simply create an external table in the default database simply 
    by using the `CREATE TABLE` statement with the `LOCATION` keyword.

    <div style="text-align: center;">
    <img src="../../assets/images/create an external table in the default database.jpg" style="width:640px" >
    </div> 
    
    The definition for this external table will be in the hive metastore under the default database.
    While the data files will be stored in the specified external location.

  * In the same way, we can create an external table in any database.

    <div style="text-align: center;">
    <img src="../../assets/images/create an external table in any database.jpg" style="width:640px" >
    </div> 

    Simply, we specify the database name with the keyword `USE`,
    and we create the table with the `LOCATION` keyword, 
    followed by the path to where this external table needs to be stored.

    We could choose the same path as the one for the default database 
    or simply use another location like in this case.

    And again, the table definition will be in the hive metastore 
    while the data files will be in the specified external location.

  * Even if the database was created in a custom location outside of the hive default directory, 
    we can normally create an external table in this database.

    <div style="text-align: center;">
    <img src="../../assets/images/create an external table in outside database.jpg" style="width:640px" >
    </div> 

    Again, we choose the database by the `USE` keyword.

    And we create the external table with the `LOCATION` keyword.

    Here, for example, we choose the same path as the one of the previous example.

    And of course, the table definition will be in the hive metastore.

    While the data files will be in that external location.

