# Environment Demo

This notebook demonstrates key features of the `hds-sparkhive-lab` environment:
- Creating and using databases
- Creating Delta tables with Hive metastore
- Querying tables using `database.table` syntax
- Updating Delta tables
- Time travel queries
- Hive metastore metadata commands

Run all cells step-by-step to explore the environment.

**Notes:**
- Use `database.table` syntax to reference tables, ensuring Hive metastore metadata is used.
- This environment mimics Databricks behavior with Delta Lake and Hive Metastore.
- Data and metadata are persisted via Docker volumes.
- You can connect to this kernel from VS Code or use Jupyter Lab UI.

In [1]:
# Import required PySpark modules
from pyspark.sql import SparkSession
from delta.tables import DeltaTable

# Create Spark session
spark = SparkSession.builder.appName("hds_sparkhive_demo").getOrCreate()

In [14]:
# Create a new database
spark.sql("CREATE DATABASE IF NOT EXISTS demo_db")

DataFrame[]

In [15]:
# Set current database
spark.catalog.setCurrentDatabase("demo_db")

In [16]:
# Create a sample DataFrame
data = [(1, "Alice"), (2, "Bob"), (3, "Charlie")]
columns = ["id", "name"]
df = spark.createDataFrame(data, columns)

In [17]:
# Write DataFrame as Delta table
df.write.format("delta").mode("overwrite").saveAsTable("demo_db.people")

In [18]:
# Query the Delta table using database.table syntax
spark.sql("SELECT * FROM demo_db.people").show()

+---+-------+
| id|   name|
+---+-------+
|  1|  Alice|
|  2|    Bob|
|  3|Charlie|
+---+-------+



In [19]:
# Show all databases
spark.sql("SHOW DATABASES").show()

+-----------+
|  namespace|
+-----------+
|    default|
|    demo_db|
|my_database|
+-----------+



In [20]:
# Show tables in the current database
spark.sql("SHOW TABLES").show()

+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
|  demo_db|   people|      false|
+---------+---------+-----------+



In [21]:
# Describe the table schema
spark.sql("DESCRIBE demo_db.people").show()

+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|      id|   bigint|   NULL|
|    name|   string|   NULL|
+--------+---------+-------+



In [25]:
# Update a record in the Delta table
delta_table = DeltaTable.forName(spark, "demo_db.people")
delta_table.update("id = 2", {"name": "'Bobby'"})

In [26]:
# Query again to see the update
spark.sql("SELECT * FROM demo_db.people").show()

+---+-------+
| id|   name|
+---+-------+
|  1|  Alice|
|  2|  Bobby|
|  3|Charlie|
+---+-------+



In [27]:
# Inspect the Delta table history
spark.sql("DESCRIBE HISTORY demo_db.people").show()

+-------+--------------------+------+--------+--------------------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|version|           timestamp|userId|userName|           operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|    operationMetrics|userMetadata|          engineInfo|
+-------+--------------------+------+--------+--------------------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|      2|2025-07-28 23:00:...|  NULL|    NULL|              UPDATE|{predicate -> ["(...|NULL|    NULL|     NULL|          1|  Serializable|        false|{numRemovedFiles ...|        NULL|Apache-Spark/3.5....|
|      1|2025-07-28 22:58:...|  NULL|    NULL|CREATE OR REPLACE...|{partitionBy -> [...|NULL|    NULL|     NULL|          0|  Serializable|        false|{numFiles -

In [28]:
# Time travel: query previous version
spark.sql("SELECT * FROM demo_db.people VERSION AS OF 0").show()

+---+-------+
| id|   name|
+---+-------+
|  1|  Alice|
|  2|    Bob|
|  3|Charlie|
+---+-------+

