
<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://raw.githubusercontent.com/derar-alhussein/Databricks-Certified-Data-Engineer-Associate/main/Includes/images/bookstore_schema.png" alt="Databricks Learning" style="width: 600">
</div>

In [0]:
%run ../Includes/Copy-Datasets


## 📖 Reading Stream

Begin by accessing a Delta table as a streaming source.  
This enables Spark to continuously read new rows as they arrive, treating the data as an unbounded table.


In [0]:
%sql
(spark.readStream
      .table("books")
      .createOrReplaceTempView("books_streaming_tmp_vw")
)


## 👁️ Displaying Streaming Data

Streaming DataFrames can be registered as temporary views.  
This allows you to write SQL queries over live data and observe real-time updates.

Use interactive dashboards or notebooks to visualize and monitor the stream as new records flow in.

In [0]:
%sql
SELECT * FROM books_streaming_tmp_vw;

## 🔄 Applying Transformations

You can perform SQL transformations on streaming views just like static ones.  
Transformations like grouping, filtering, and aggregating are supported and will continuously reflect new data as it arrives.

Windowed aggregations and time-based operations can be applied using built-in functions for temporal analysis.


In [0]:
%sql
SELECT author, count(book_id) AS total_books
FROM books_streaming_tmp_vw
GROUP BY author;


## 🚫 Unsupported Operations

Not all operations are compatible with streaming DataFrames.  
Operations like global sorting or non-deterministic transformations are restricted due to their complexity in continuous environments.

Plan streaming queries to avoid unsupported features, and refer to Spark documentation for guidance on stream-safe functions.

In [0]:
%sql
 SELECT * 
 FROM books_streaming_tmp_vw
 ORDER BY author;


## 💾 Persisting Streaming Data

To make results durable, use write operations to store streaming output into Delta tables or directories.  
Checkpointing ensures that progress and state are saved so queries can recover gracefully after interruptions.

This enables building incremental pipelines that update target tables in real-time.


In [0]:
%sql
CREATE OR REPLACE TEMP VIEW author_counts_tmp_vw AS (
  SELECT author, count(book_id) AS total_books
  FROM books_streaming_tmp_vw
  GROUP BY author
);

In [0]:
%sql
(spark.table("author_counts_tmp_vw")                               
      .writeStream  
      .trigger(processingTime='4 seconds')
      .outputMode("complete")
      .option("checkpointLocation", "dbfs:/mnt/demo/author_counts_checkpoint")
      .table("author_counts")
)

In [0]:
%sql
SELECT *
FROM author_counts

## ➕ Adding New Data

As new records are added to the source table, they are picked up automatically by the streaming query.  
This ensures downstream outputs always reflect the most recent data, with no manual refresh needed.

Streaming queries react to these changes without needing to be restarted.

In [0]:
%sql
INSERT INTO books
values ("B19", "Introduction to Modeling and Simulation", "Mark W. Spong", "Computer Science", 25),
        ("B20", "Robot Modeling and Control", "Mark W. Spong", "Computer Science", 30),
        ("B21", "Turing's Vision: The Birth of Computer Science", "Chris Bernhardt", "Computer Science", 35)

## 🛑 Streaming in Batch Mode

When real-time processing isn't necessary, use batch-style streaming with a trigger that processes available data and stops.  
This approach is helpful for on-demand updates, scheduled jobs, or catch-up processing without running a continuous stream.

In [0]:
%sql
INSERT INTO books
values ("B16", "Hands-On Deep Learning Algorithms with Python", "Sudharsan Ravichandiran", "Computer Science", 25),
        ("B17", "Neural Network Methods in Natural Language Processing", "Yoav Goldberg", "Computer Science", 30),
        ("B18", "Understanding digital signal processing", "Richard Lyons", "Computer Science", 35)

In [0]:
(spark.table("author_counts_tmp_vw")                               
      .writeStream           
      .trigger(availableNow=True)
      .outputMode("complete")
      .option("checkpointLocation", "dbfs:/mnt/demo/author_counts_checkpoint")
      .table("author_counts")
      .awaitTermination()
)

In [0]:
%sql
SELECT *
FROM author_counts