# Databases and Tables in Databricks (Hive Metastore)

In Databricks (and Hive), a **database** = a **schema**.  
The Hive Metastore (HMS) stores metadata about databases, tables, and partitions.  

By default, tables are stored in the Hive Warehouse directory:
`/user/hive/warehouse/`

If custom schema
`/user/hive/warehouse/schema_name.db/`

## Hive Metastore

The Hive Metastore (HMS) is a central metadata store for tables, schemas (databases), and partitions.

Stores table definitions, schema info, partitioning, and file locations (but not the data itself).

Backed by a relational DB (e.g., MySQL, Postgres) + connected to Hive, Spark, Databricks, etc.

In Databricks, hive_metastore is the default catalog (if you don’t explicitly specify Unity Catalog).

In [0]:
USE CATALOG hive_metastore;

## Tables

Tables can be managed or external:

**Managed Table**: Hive (or Databricks) manages both metadata + data. Dropping table deletes data.

- Default when you don’t specify a LOCATION.

- Data stored under the database’s .db folder.

*Effect of DROP table* : 
- Both metadata (catalog entry) and data files are deleted.
- No recovery unless you had a backup (or time travel if it was a Delta table and retention not vacuumed yet).

**External Table**: Metadata managed, but data remains at the given LOCATION. 

*Effect of DROP table*: 
- Only metadata (catalog entry) is removed.
- The data files remain intact at /mnt/external/orders.
You can recreate the table pointing to the same location later.

### Impact of LOCATION Keyword

LOCATION controls where the data files are stored in the filesystem (DBFS, S3, ADLS, etc.).

#### Without LOCATION:

Table lives inside the database folder (e.g., /user/hive/warehouse/db_name.db/table_name).

#### With LOCATION:

Table lives outside the default warehouse, wherever specified.

Useful for pointing to existing data (e.g., Parquet files, Delta tables).

## Managed vs. External Tables

Tables in Databricks can be either **managed** or **external**.

| Feature              | Managed Table | External Table |
|-----------------------|---------------|----------------|
| **Data location**     | Inside Hive warehouse (`db_name.db/table`) | Custom `LOCATION` |
| **Who manages data?** | Hive / Databricks | You (user provides location) |
| **Drop table effect** | Deletes metadata + data files | Deletes metadata only |
| **Use case**          | Full lifecycle inside Hive | Point to existing data / keep data after drop |


## Managed Table Example (Default Schema)


In [0]:
-- Create a managed table (default location in hive warehouse)
CREATE TABLE managed_default (
  width INT, length INT, height INT
);

INSERT INTO managed_default VALUES (3, 2, 1);

In [0]:
-- Inspect table details
DESCRIBE EXTENDED managed_default

In [0]:
-- Drop the table
DROP TABLE managed_default;

In [0]:
-- Data and metadata are both deleted
%fs ls 'dbfs:/user/hive/warehouse/managed_default'

## External Table Example (Default Schema)

In [0]:
-- Create an external table (custom LOCATION)
CREATE TABLE external_default
 (width int, length int, height int)
USING DELTA
LOCATION 'dbfs:/mnt/demo/external/default';

INSERT INTO external_default 
VALUES (3, 2, 1);

In [0]:
-- Inspect table details
DESCRIBE EXTENDED external_default;

In [0]:
-- Drop the table
DROP TABLE external_default;

In [0]:
%fs ls 'dbfs:/mnt/demo/external/default'

## Databases (Schemas)

A database is a **logical container** for tables.  
It always corresponds to a **.db** directory under the warehouse (unless you override LOCATION).


In [0]:
-- Drop schema if exists (including tables)
DROP SCHEMA new_schema CASCADE;

-- Create a database with default location (inside hive warehouse)
CREATE DATABASE IF NOT EXISTS new_schema;

-- Equivalent syntax (schema)
CREATE SCHEMA IF NOT EXISTS new_schema;

-- Show all databases
SHOW DATABASES;

In [0]:
-- Inspect schema metadata
DESCRIBE DATABASE EXTENDED new_schema

In [0]:
-- Switch to the new schema
USE new_schema;

### Managed vs External Tables inside a Schema

In [0]:
-- Managed table inside new schema
CREATE TABLE managed_default (
  width INT, length INT, height INT
);
INSERT INTO managed_default VALUES (3, 2, 1);

-- External table inside new schema
CREATE TABLE external_default (
  width INT, length INT, height INT
)
LOCATION 'dbfs:/mnt/demo/external/default';
INSERT INTO external_default VALUES (3, 2, 1);

In [0]:
-- Inspect metadata
DESCRIBE EXTENDED new_schema.managed_default;

In [0]:
-- Inspect metadata
DESCRIBE EXTENDED new_schema.external_default;

In [0]:
-- Drop tables
DROP TABLE new_schema.managed_default;
DROP TABLE new_schema.external_default;

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/new_schema.db/managed_default'

In [0]:
%fs ls 'dbfs:/mnt/demo/external/default/'

## Database with Custom Location

In [0]:
-- Create schema with a custom location
CREATE SCHEMA custom
LOCATION 'dbfs:/Shared/schemas/custom.db'

In [0]:
DESCRIBE DATABASE EXTENDED custom

In [0]:
USE custom;

-- Managed table (stored inside custom.db path)
CREATE TABLE managed_custom
 (width int, length int, height int);

INSERT INTO managed_custom VALUES (3, 2, 1);

-- External table (independent LOCATION)
CREATE TABLE external_custom
 (width int, length int, height int)
 LOCATION 'dbfs:/mnt/demo/external_custom';

INSERT INTO external_custom VALUES (3, 2, 1);

In [0]:
DESCRIBE EXTENDED managed_custom

In [0]:
DESCRIBE EXTENDED external_custom

In [0]:
DROP TABLE managed_custom;
DROP TABLE external_custom;

In [0]:
%fs ls 'dbfs:/Shared/schemas/custom.db/managed_custom'

In [0]:
%fs ls 'dbfs:/mnt/demo/external_custom'

## Summary

**Hive Metastore** = metadata store, default catalog is hive_metastore.

**Databases/Schemas** = logical containers → .db directories.

**Tables** = *managed* (lives inside Databricks) vs. *external* (custom LOCATION).

**LOCATION** keyword decides whether data sits in default warehouse vs. custom/external path.