# Overview of Databricks Databases, Tables, & Hive Metastore

In Databricks, data is organized using relational concepts:

- **Databases (Schemas)**
- **Tables (Managed & External)**

Metadata is managed through the **Hive Metastore**, which tracks databases, tables, and partitions across the workspace.

Understanding how Databricks structures data is essential for effective data management and organization.

## 🗂️ Databases in Databricks

### 🔹 Definition:

- A **database in Databricks** is equivalent to a **schema** in the Hive metastore.
- It acts as a namespace for organizing tables.

---

### 🔹 Creating a Database:

You can create a database using:

```sql
CREATE DATABASE database_name;
```
Or

```sql
CREATE SCHEMA database_name;
```

## 🐝 Hive Metastore in Databricks

The **Hive Metastore** is a central repository that stores metadata about:

- Databases (schemas)
- Tables (managed and external)
- Partitions
- Columns and data types
- Table locations and formats

---

### 🔹 Key Points:

- Every **Databricks workspace** is connected to a **shared Hive metastore**.
- This metastore is accessible by all clusters within the workspace.
- It allows users to query and manage metadata in a consistent and centralized way.
- The metastore simplifies data governance and ensures that table definitions and data locations are tracked and organized.

---

### 🔹 Example: Viewing Metastore Contents

```sql
SHOW DATABASES;
SHOW TABLES;
DESCRIBE DATABASE database_name;


## 🗂️ Databases (Schemas) in Databricks

### 🔹 What is a Database?

- In Databricks, a **Database** = **Schema** in the Hive Metastore.
- Used to logically group tables.

---

### 🔹 Default Database:

- Every workspace comes with a **`default` database**.
- Tables created without specifying a database go here.
- Data files are stored in:

```sql
CREATE DATABASE sales_db;
CREATE SCHEMA marketing_db;



## 📂 Custom Storage Paths for Databases

By default, Databricks stores database files in:

`/user/hive/warehouse`

However, you can customize the storage location of a database by using the **`LOCATION`** keyword when creating it.

---

### 🔹 Why Use Custom Storage Paths?

- Organize data in specific cloud storage mounts (e.g., S3, ADLS, GCS).
- Separate environments (development, production, staging).
- Store data outside of the default Hive directory for better control.

---

### 🔹 Example:

```sql
CREATE DATABASE sales_db
LOCATION '/mnt/sales-data/sales_db';


## 📄 Tables in Databricks

### 🔹 Types of Tables:

---

### 1️⃣ Managed Tables:

- Data is stored **inside** the database directory.
- Hive Metastore manages both **metadata and data files**.
- Dropping the table deletes both **metadata and underlying data files**.

---

### 2️⃣ External Tables:

- Data stored **outside** the database directory.
- Hive Metastore manages only **metadata**.
- Dropping the table removes **only metadata**, leaving data files untouched.

---

### 🔹 Example: Creating an External Table

```sql
CREATE TABLE default.external_sales
USING DELTA
LOCATION '/mnt/sales-data/external_sales';
