# Spark Streaming Homework
## Description
* Organize incremental copy of hotel/weather data from Azure ADLS gen2 storage into provisioned with terraform Azure ADLS gen2 storage (with a delay, one day per cycle).
* Create Databricks Notebooks (Azure libraries like hadoop-azure and azure-storage are already part of Databricks environment, details are described here). Use ABFS drivers and OAuth credentials like below:
* Create Spark Structured Streaming application with Auto Loader to incrementally and efficiently processes hotel/weather data as it arrives in provisioned Azure ADLS gen2 storage. Using Spark calculate in Databricks Notebooks for each city each day:
    * Number of distinct hotels in the city.
    * Average/max/min temperature in the city.
* Visualize incoming data in Databricks Notebook for 10 biggest cities (the biggest number of hotels in the city, one chart for one city):
    * X-axis: date (date of observation).
    * Y-axis: number of distinct hotels, average/max/min temperature.
* Deploy Databricks Notebook on cluster, to setup infrastructure use terraform scripts from module. Default resource parameters (specifically memory) will not work because of free tier limitations. You needed to setup memory and cores properly.


### Expected results
* Repository with notebook (with output results), configuration scripts, application sources, execution plan dumps, analysis and etc.
* Upload in task Readme MD file with link on repo, fully documented homework with screenshots and comments.

## Import

In [0]:
from pyspark.sql import functions as F
from pyspark.sql.window import Window
from delta.tables import DeltaTable
from pyspark.sql.types import StringType,BooleanType,DateType
from pyspark import Row

## Configuration

### Define application data paths

In [0]:
# set config path
HOTEL_WEATHER_PATH = "abfss://m13sparkstreaming@bd201stacc.dfs.core.windows.net/hotel-weather"
ROOT_PATH = '/m13sparkstreaming_python_azure'

ACC_NAME = "stdromakinwesteurope"
SA_CONTAINER = "data"
APP_PATH = f"abfss://{SA_CONTAINER}@{ACC_NAME}.dfs.core.windows.net/hotel-weather/"

# bronze
BRONZE_PATH = f'{ROOT_PATH}/bronze'
BRONZE_DATA = f'{BRONZE_PATH}/data'
BRONZE_CHECKPOINT = f'{BRONZE_PATH}/checkpoint'

# silver
SILVER_PATH = f'{ROOT_PATH}/silver'
SILVER_DATA = f'{SILVER_PATH}/data'
SILVER_CHECKPOINT = f'{SILVER_PATH}/checkpoint'

# gold
GOLD_PATH = f'{ROOT_PATH}/gold'
GOLD_DATA = f'{GOLD_PATH}/data'
GOLD_CHECKPOINT = f'{GOLD_PATH}/checkpoint'

### Configure Spark

In [0]:
spark.conf.set("fs.azure.account.auth.type.bd201stacc.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.bd201stacc.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.bd201stacc.dfs.core.windows.net", "f3905ff9-16d4-43ac-9011-842b661d556d")
spark.conf.set("fs.azure.account.oauth2.client.secret.bd201stacc.dfs.core.windows.net", "mAwIU~M4~xMYHi4YX_uT8qQ.ta2.LTYZxT")
spark.conf.set("fs.azure.account.oauth2.client.endpoint.bd201stacc.dfs.core.windows.net", "https://login.microsoftonline.com/b41b72d0-4e9f-4c26-8a69-f949f367c91d/oauth2/token")

spark.conf.set(f"fs.azure.account.key.{ACC_NAME}.dfs.core.windows.net", "zz0m/lUWe3Uiog5Ufv8ZVLBD6QVLVk7mv7ZgJLjR3ztNJAyxjGUKsbPZT4wvUYTl2NAtmqN8qhJZYDCcz8YMig==")

## Configure Database

In [0]:
%sql
CREATE DATABASE IF NOT EXISTS m13sparkstreaming;
USE m13sparkstreaming;

## Main steps

In [0]:
display(dbutils.fs.ls(HOTEL_WEATHER_PATH))

path,name,size
abfss://m13sparkstreaming@bd201stacc.dfs.core.windows.net/hotel-weather/year=2016/,year=2016/,0
abfss://m13sparkstreaming@bd201stacc.dfs.core.windows.net/hotel-weather/year=2017/,year=2017/,0


In [0]:
# clean paths
dbutils.fs.rm(SILVER_PATH, recurse=True)
dbutils.fs.rm(SILVER_CHECKPOINT, recurse=True)
 
dbutils.fs.rm(GOLD_PATH, recurse=True)
dbutils.fs.rm(GOLD_CHECKPOINT, recurse=True)

dbutils.fs.rm(BRONZE_PATH, True)
# dbutils.fs.rm(SILVER_PATH, True)

### Define schema for source data

In [0]:
hotel_weather_schema="""
  address STRING,
  avg_tmpr_c DOUBLE,
  avg_tmpr_f DOUBLE,
  city STRING,
  country STRING,
  geoHash STRING,
  id STRING,
  latitude DOUBLE,
  longitude DOUBLE,
  name STRING,
  wthr_date STRING,
  year STRING,
  month STRING,
  day STRING
"""

### Reading input data and running bronze stream

In [0]:
hotel_weather_df = (
  spark.readStream.format("cloudFiles")
  .option("cloudFiles.format", "parquet")
  .option("cloudFiles.maxFilesPerTrigger", 100)
  .option("cloudFiles.maxBytesPerTrigger", "10k")
  .option("cloudFiles.partitionColumns", "year, month, day")
  .schema(hotel_weather_schema)
  .load(HOTEL_WEATHER_PATH)
)

display(hotel_weather_df)

address,avg_tmpr_c,avg_tmpr_f,city,country,geoHash,id,latitude,longitude,name,wthr_date,year,month,day
Hyatt Place Utica,15.0,59.0,Utica,US,dpsf,463856467968,42.628861,-83.010776,45400 Park Ave,2016-10-01,2016,10,1
Berney Fly Bed and Breakfast,20.5,68.9,Mobile,US,dj3q,481036337152,30.684177,-88.062045,1118 Government St,2016-10-01,2016,10,1
Fort Conde Inn,20.5,68.9,Mobile,US,dj3q,1709396983810,30.68811,-88.04079,165 Saint Emanuel St,2016-10-01,2016,10,1
Abbeville Inn,18.8,65.9,Abbeville,US,dje7,790273982465,31.55176,-85.2837,1237 Us Highway 431 S,2016-10-01,2016,10,1
Narcis,19.4,67.0,Guernsey,US,u240,231928233986,45.079082,14.150233,Maslinica Bay,2016-10-01,2016,10,1
Economy Inn,22.1,71.7,Dillon,US,dnpe,592705486849,34.43614,-79.36981,1223 Radford Blvd,2016-10-01,2016,10,1
Econo Lodge,11.9,53.5,Bellingham,US,c28v,515396075520,48.781123,-122.485944,3750 Meridian St,2016-10-01,2016,10,1
Homeplace Inn and Suites,19.7,67.5,Jacksonville,US,9vsm,352187318275,31.964827,-95.254599,1407 E Rusk St,2016-10-01,2016,10,1
Red Roof Inn St Louis - Troy,17.6,63.6,Troy,US,dnbh,1202590842884,38.734557,-89.913938,2030 Formosa Rd,2016-10-01,2016,10,1
Americas Best Value Inn,16.7,62.1,North Platte,US,9z26,747324309509,41.13524203,-100.7566567,602 E 4th St,2016-10-01,2016,10,1


### Save stream of hotel_weather_df as bronze delta files

In [0]:
(
    hotel_weather_df.writeStream
    .format("delta")
    .option("checkpointLocation", BRONZE_CHECKPOINT)
    .queryName("bronze_stream")
    .outputMode("append")
    .start(BRONZE_DATA)
)

### Read bronze data

In [0]:
# hotel_weather_silver_df = spark.readStream.format("delta").load(BRONZE_DATA).dropna()

bronze_df = spark.readStream.format('delta').load(BRONZE_DATA)

### Transform bronze data

In [0]:
transformed_bronze_df = (
    bronze_df.select(
        'id', 
        F.col('address').alias('name'), 
        F.col('name').alias('address'),
        'avg_tmpr_c',
        'avg_tmpr_f',
        'city',
        'country',
        'geoHash',
        'latitude',
        'longitude',
        'wthr_date',
        'year',
        'month',
        'day'
    )
    .dropna()
)

In [0]:
display(transformed_bronze_df)

id,name,address,avg_tmpr_c,avg_tmpr_f,city,country,geoHash,latitude,longitude,wthr_date,year,month,day
25769803777,Travelodge,309 W Seneca St,9.2,48.6,Oswego,US,dr9x,43.45161,-76.53235,2016-10-10,2016,10,10
369367187459,Quality Inn & Suites - Riverfront,70 E 1st St,9.2,48.6,Oswego,US,dr9x,43.45818526,-76.50861875,2016-10-10,2016,10,10
1047972020226,Country Inn & Suites By Carlson,710 Park Pl,15.8,60.5,Clinton,US,dnkj,36.16315,-84.08504,2016-10-10,2016,10,10
343597383685,Acorn Motor Inn,31530 State Route 20,9.9,49.8,Oak Harbor,US,c28f,48.288952,-122.657842,2016-10-10,2016,10,10
876173328384,Econo Lodge St Robert,309 Highway Z,15.9,60.7,St. Robert,US,9ywr,37.8228,-92.14079,2016-10-10,2016,10,10
3,Hampton Inn-st Robert,103 Saint Robert Plaza Dr,15.9,60.7,Saint Robert,US,9ywr,37.82382,-92.149721,2016-10-10,2016,10,10
3143916060674,The Montcalm Marble Arch,2 Wallenberg Place Westminster Borough London W1H 7TN United Kingdom,8.4,47.2,London,GB,gcpv,51.5150522,-0.159239,2016-10-10,2016,10,10
3221225472004,DoubleTree by Hilton London West End,92 Southampton Row Camden London WC1B 4BH United Kingdom,8.4,47.2,London,GB,gcpv,51.5201065,-0.1221393,2016-10-10,2016,10,10
3332894621696,Montcalm Royal London House City of London,22 25 Finsbury Square City Islington London EC2A 1DX United Kingdom,8.4,47.2,London,GB,gcpv,51.5218066,-0.0856081,2016-10-10,2016,10,10
2508260900866,Radisson Blu Edwardian Berkshire,350 Oxford Street Westminster Borough London W1C 1BY United Kingdom,8.4,47.2,London,GB,gcpv,51.5146025,-0.1481978,2016-10-10,2016,10,10


### Running Silver stream

In [0]:
(
    transformed_bronze_df.writeStream
    .format("delta")
    .option("checkpointLocation", SILVER_CHECKPOINT)
    .queryName("silver_stream")
    .outputMode("append")
    .start(SILVER_DATA)
)

### Read silver data

In [0]:
silver_df = (
    spark.readStream
      .format('delta')
      .load(SILVER_DATA)
)

### Running Gold stream

In [0]:
gold_df = (
    silver_df
    .groupBy('city', 'wthr_date')
    .agg(
        F.approx_count_distinct('id').alias('distinct_hotels'),
        F.round(F.avg('avg_tmpr_c'), 1).alias('avg_t'),
        F.round(F.max('avg_tmpr_c'), 1).alias('max_t'),
        F.round(F.min('avg_tmpr_c'), 1).alias('min_t')
    )
    .select('city', 'wthr_date', 'distinct_hotels', 'avg_t', 'max_t', 'min_t')
)

In [0]:
display(gold_df)

city,wthr_date,distinct_hotels,avg_t,max_t,min_t
Laurel,2016-10-27,1,20.1,20.1,20.1
Rockford,2017-08-12,2,17.7,17.7,17.7
Indianapolis,2016-10-06,1,22.2,22.2,22.2
Pleasanton,2017-08-27,1,24.6,24.6,24.6
Providence,2017-08-22,1,24.5,24.5,24.5
Los Olivos,2016-10-28,1,16.3,16.3,16.3
Southport,2016-10-25,1,5.3,5.3,5.3
Prattville,2016-10-25,1,19.2,19.2,19.2
North Platte,2016-10-08,1,12.2,12.2,12.2
Dublin,2016-10-10,3,13.0,15.9,11.6


### Save gold data as delta files

In [0]:
(
    gold_df.writeStream
    .format("delta")
    .option("checkpointLocation", GOLD_CHECKPOINT)
    .queryName("gold_stream")
    .outputMode("complete")
    .start(GOLD_DATA)
)

### Register Gold table in the Metastore

In [0]:
spark.sql(
"""
    DROP TABLE IF EXISTS hotel_weather_gold
"""
)
 
spark.sql(
f"""
    CREATE TABLE hotel_weather_gold
    USING DELTA
    LOCATION "{GOLD_DATA}"
"""
)


### Create Spark Structured Streaming application with Auto Loader to incrementally and efficiently processes hotel/weather data as it arrives in provisioned Azure ADLS gen2 storage

### Using Spark calculate in Databricks Notebooks for each city each day:
* Number of distinct hotels in the city.
* Average/max/min temperature in the city.

In [0]:
%sql
SELECT * FROM hotel_weather_gold LIMIT 10;

city,wthr_date,distinct_hotels,avg_t,max_t,min_t
Arvada,2016-10-09,1,27.6,27.6,27.6
Brandywine,2016-10-31,1,11.1,11.1,11.1
London,2017-08-06,123,14.9,14.9,14.9
Paris 12,2016-10-03,1,10.7,10.7,10.7
Roanoke,2016-10-23,1,10.0,10.0,10.0
Mableton,2016-10-11,1,12.1,12.1,12.1
Woodburn,2017-08-04,1,24.2,24.2,24.2
Deadwood,2016-10-19,1,1.4,1.4,1.4
East Syracuse,2016-10-27,1,2.7,2.7,2.7
Key West,2016-10-31,2,25.4,25.4,25.4


## Create gold table for top 10 cities by number of hotels

In [0]:
spark.sql(f"""
DROP TABLE IF EXISTS top_10_cities
""")
 
spark.sql(f"""
CREATE TABLE top_10_cities 
USING DELTA LOCATION "{GOLD_DATA}/top-10-cities" AS  
WITH cte AS 
( 
  SELECT 
    city,
    max(distinct_hotels) AS num_of_hotels,
    row_number() OVER (ORDER BY max(distinct_hotels) DESC) AS rank 
  FROM hotel_weather_gold
  GROUP BY city
  LIMIT 10
) SELECT * FROM cte
""")

### Top 10 cities by number of hotels

In [0]:
%sql
SELECT * FROM top_10_cities

city,num_of_hotels,rank
London,250,1
Paris,233,2
Barcelona,211,3
Milan,157,4
Amsterdam,85,5
Paddington,19,6
New York,6,7
Memphis,5,8
Houston,5,9
Vienna,5,10


### Visualize incoming data for 10 biggest cities (the biggest number of hotels in the city, one chart for one city):
* X-axis: date (date of observation).
* Y-axis: number of distinct hotels, average/max/min temperature.

In [0]:
hotel_weather_df = spark.read.format("delta").load(GOLD_DATA)
hotel_by_city = (
    hotel_weather_df
    .groupBy("city")
    .agg(F.max("distinct_hotels").alias("distinct_hotels_num"))
    .withColumn("rank", F.row_number().over(Window.orderBy(F.col("distinct_hotels_num").desc())))
    .limit(10)
)

display(hotel_by_city)

city,distinct_hotels_num,rank
London,250,1
Paris,233,2
Barcelona,211,3
Milan,157,4
Amsterdam,85,5
Paddington,19,6
San Diego,6,7
New York,6,8
Memphis,5,9
Houston,5,10


In [0]:
top_10_biggest_cities_by_hotel = hotel_weather_df.join(hotel_by_city, on="city")

display(top_10_biggest_cities_by_hotel)

city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
Houston,2017-08-04,1,28.1,28.1,28.1,5,10
Paris,2016-10-31,233,10.7,10.7,10.7,233,2
Milan,2016-10-24,157,10.7,10.7,10.7,157,4
Barcelona,2016-10-24,1,16.6,16.6,16.6,211,3
Amsterdam,2017-08-17,6,17.1,17.1,17.1,85,5
Milan,2017-08-04,1,27.1,27.1,27.1,157,4
Amsterdam,2017-08-03,85,18.8,18.8,18.8,85,5
London,2016-10-09,1,13.6,13.6,13.6,250,1
Memphis,2016-10-12,3,21.9,21.9,21.9,5,9
Paris,2016-10-23,228,7.1,7.1,7.1,233,2


In [0]:
for i in range(11):
    display(top_10_biggest_cities_by_hotel.filter(top_10_biggest_cities_by_hotel.rank == i).orderBy("wthr_date"))

city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank


city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
Paris,2016-10-03,228,10.7,10.7,10.7,444,1
Paris,2016-10-09,228,8.7,8.7,8.7,444,1
Paris,2016-10-13,233,8.4,8.4,8.4,444,1
Paris,2016-10-19,233,9.4,9.4,9.4,444,1
Paris,2016-10-22,233,7.2,7.2,7.2,444,1
Paris,2016-10-23,228,7.1,7.1,7.1,444,1
Paris,2016-10-27,228,11.4,11.4,11.4,444,1
Paris,2016-10-31,233,10.7,10.7,10.7,444,1
Paris,2017-08-14,228,17.6,17.6,17.6,444,1
Paris,2017-08-15,228,19.4,19.4,19.4,444,1


city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
London,2016-10-09,1,13.6,13.6,13.6,250,2
London,2016-10-10,250,8.4,8.4,8.4,250,2
London,2016-10-12,123,10.4,10.4,10.4,250,2
London,2016-10-16,250,12.1,12.1,12.1,250,2
London,2016-10-23,1,9.2,9.2,9.2,250,2
London,2016-10-29,1,12.9,12.9,12.9,250,2
London,2017-08-01,123,16.1,16.1,16.1,250,2
London,2017-08-02,1,15.9,15.9,15.9,250,2
London,2017-08-03,7,16.6,16.7,16.2,250,2
London,2017-08-06,123,14.9,14.9,14.9,250,2


city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
Barcelona,2016-10-03,1,16.7,16.7,16.7,211,3
Barcelona,2016-10-13,1,13.6,13.6,13.6,211,3
Barcelona,2016-10-24,1,16.6,16.6,16.6,211,3
Barcelona,2016-10-28,211,17.1,17.1,16.7,211,3
Barcelona,2017-08-10,1,18.0,18.0,18.0,211,3


city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
Milan,2016-10-06,157,12.1,12.1,12.1,157,4
Milan,2016-10-16,157,12.3,12.3,12.3,157,4
Milan,2016-10-18,1,11.8,11.8,11.8,157,4
Milan,2016-10-24,157,10.7,10.7,10.7,157,4
Milan,2016-10-31,8,10.0,10.0,10.0,157,4
Milan,2017-08-04,1,27.1,27.1,27.1,157,4
Milan,2017-08-06,157,23.9,23.9,23.9,157,4
Milan,2017-08-11,8,17.2,17.2,17.2,157,4
Milan,2017-08-12,8,19.5,19.5,19.5,157,4
Milan,2017-08-14,157,22.4,22.4,22.4,157,4


city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
Amsterdam,2016-10-02,85,13.7,13.7,13.7,85,5
Amsterdam,2017-08-03,85,18.8,18.8,18.8,85,5
Amsterdam,2017-08-04,85,18.3,18.3,18.3,85,5
Amsterdam,2017-08-05,8,16.3,16.3,16.3,85,5
Amsterdam,2017-08-08,85,16.2,16.2,16.2,85,5
Amsterdam,2017-08-10,6,16.4,16.4,16.4,85,5
Amsterdam,2017-08-11,85,16.9,16.9,16.9,85,5
Amsterdam,2017-08-17,6,17.1,17.1,17.1,85,5


city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
Paddington,2016-10-10,19,8.4,8.4,8.4,19,6
Paddington,2016-10-16,19,12.1,12.1,12.1,19,6
Paddington,2017-08-07,19,15.6,15.6,15.6,19,6


city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
New York,2016-10-05,1,26.5,26.5,26.5,6,7
New York,2016-10-07,1,10.7,10.7,10.7,6,7
New York,2016-10-21,6,19.1,19.1,19.1,6,7
New York,2016-10-23,1,26.6,26.6,26.6,6,7
New York,2016-10-26,1,12.9,12.9,12.9,6,7
New York,2017-08-02,1,27.6,27.6,27.6,6,7
New York,2017-08-10,1,21.2,21.2,21.2,6,7
New York,2017-08-11,6,23.3,23.3,23.3,6,7
New York,2017-08-13,1,21.1,21.1,21.1,6,7
New York,2017-08-21,1,27.1,27.1,27.1,6,7


city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
San Diego,2017-08-16,6,19.9,19.9,19.9,6,8
San Diego,2017-08-17,6,19.6,19.6,19.6,6,8


city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
Memphis,2016-10-12,3,21.9,21.9,21.9,5,9
Memphis,2016-10-21,5,12.6,12.7,12.6,5,9
Memphis,2016-10-26,3,19.8,19.8,19.8,5,9
Memphis,2016-10-27,2,21.1,21.1,21.1,5,9
Memphis,2017-08-14,2,24.6,24.6,24.6,5,9
Memphis,2017-08-15,2,26.6,26.6,26.6,5,9
Memphis,2017-08-22,2,28.6,28.6,28.6,5,9


city,wthr_date,distinct_hotels,avg_t,max_t,min_t,distinct_hotels_num,rank
Memphis,2016-10-12,3,21.9,21.9,21.9,5,10
Memphis,2016-10-21,5,12.6,12.7,12.6,5,10
Memphis,2016-10-26,3,19.8,19.8,19.8,5,10
Memphis,2016-10-27,2,21.1,21.1,21.1,5,10
Memphis,2017-08-14,2,24.6,24.6,24.6,5,10
Memphis,2017-08-15,2,26.6,26.6,26.6,5,10
Memphis,2017-08-22,2,28.6,28.6,28.6,5,10


### Visualize incoming data for 10 biggest cities SQL

In [0]:
%sql
SELECT
  hwg.wthr_date,
  hwg.distinct_hotels,
  hwg.avg_t,
  hwg.max_t,
  hwg.min_t
FROM hotel_weather_gold AS hwg
JOIN top_10_cities AS tc ON hwg.city = tc.city
WHERE tc.rank = 1
ORDER BY hwg.wthr_date

wthr_date,distinct_hotels,avg_t,max_t,min_t
2016-10-09,1,13.6,13.6,13.6
2016-10-10,250,8.4,8.4,8.4
2016-10-12,123,10.4,10.4,10.4
2016-10-16,250,12.1,12.1,12.1
2016-10-23,1,9.2,9.2,9.2
2016-10-29,1,12.9,12.9,12.9
2017-08-01,123,16.1,16.1,16.1
2017-08-02,1,15.9,15.9,15.9
2017-08-03,7,16.6,16.7,16.2
2017-08-06,123,14.9,14.9,14.9


### Execution plan

In [0]:
%sql
EXPLAIN EXTENDED
SELECT
  hwg.wthr_date,
  hwg.distinct_hotels,
  hwg.avg_t,
  hwg.max_t,
  hwg.min_t
FROM hotel_weather_gold AS hwg
JOIN top_10_cities AS tc ON hwg.city = tc.city
WHERE tc.rank = 3
ORDER BY hwg.wthr_date

plan
"== Parsed Logical Plan == 'Sort ['hwg.wthr_date ASC NULLS FIRST], true +- 'Project ['hwg.wthr_date, 'hwg.distinct_hotels, 'hwg.avg_t, 'hwg.max_t, 'hwg.min_t]  +- 'Filter ('tc.rank = 3)  +- 'Join Inner, ('hwg.city = 'tc.city)  :- 'SubqueryAlias hwg  : +- 'UnresolvedRelation [hotel_weather_gold], [], false  +- 'SubqueryAlias tc  +- 'UnresolvedRelation [top_10_cities], [], false == Analyzed Logical Plan == wthr_date: string, distinct_hotels: bigint, avg_t: double, max_t: double, min_t: double Sort [wthr_date#673723 ASC NULLS FIRST], true +- Project [wthr_date#673723, distinct_hotels#673724L, avg_t#673725, max_t#673726, min_t#673727]  +- Filter (rank#673730 = 3)  +- Join Inner, (city#673722 = city#673728)  :- SubqueryAlias hwg  : +- SubqueryAlias spark_catalog.m13sparkstreaming.hotel_weather_gold  : +- Relation[city#673722,wthr_date#673723,distinct_hotels#673724L,avg_t#673725,max_t#673726,min_t#673727] parquet  +- SubqueryAlias tc  +- SubqueryAlias spark_catalog.m13sparkstreaming.top_10_cities  +- Relation[city#673728,num_of_hotels#673729L,rank#673730] parquet == Optimized Logical Plan == Sort [wthr_date#673723 ASC NULLS FIRST], true +- Project [wthr_date#673723, distinct_hotels#673724L, avg_t#673725, max_t#673726, min_t#673727]  +- Join Inner, (city#673722 = city#673728)  :- Filter isnotnull(city#673722)  : +- Relation[city#673722,wthr_date#673723,distinct_hotels#673724L,avg_t#673725,max_t#673726,min_t#673727] parquet  +- Project [city#673728]  +- Filter ((isnotnull(rank#673730) AND (rank#673730 = 3)) AND isnotnull(city#673728))  +- Relation[city#673728,num_of_hotels#673729L,rank#673730] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- Sort [wthr_date#673723 ASC NULLS FIRST], true, 0  +- Exchange rangepartitioning(wthr_date#673723 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [id=#221543]  +- Project [wthr_date#673723, distinct_hotels#673724L, avg_t#673725, max_t#673726, min_t#673727]  +- BroadcastHashJoin [city#673722], [city#673728], Inner, BuildRight, false  :- Filter isnotnull(city#673722)  : +- FileScan parquet m13sparkstreaming.hotel_weather_gold[city#673722,wthr_date#673723,distinct_hotels#673724L,avg_t#673725,max_t#673726,min_t#673727] Batched: true, DataFilters: [isnotnull(city#673722)], Format: Parquet, Location: PreparedDeltaFileIndex[dbfs:/m13sparkstreaming_python_azure/gold/data], PartitionFilters: [], PushedFilters: [IsNotNull(city)], ReadSchema: struct  +- BroadcastExchange HashedRelationBroadcastMode(ArrayBuffer(input[0, string, true]),false), [id=#221539]  +- Project [city#673728]  +- Filter ((isnotnull(rank#673730) AND (rank#673730 = 3)) AND isnotnull(city#673728))  +- FileScan parquet m13sparkstreaming.top_10_cities[city#673728,rank#673730] Batched: true, DataFilters: [isnotnull(rank#673730), (rank#673730 = 3), isnotnull(city#673728)], Format: Parquet, Location: PreparedDeltaFileIndex[dbfs:/m13sparkstreaming_python_azure/gold/data/top-10-cities], PartitionFilters: [], PushedFilters: [IsNotNull(rank), EqualTo(rank,3), IsNotNull(city)], ReadSchema: struct"


## Stop all streams

In [0]:
for streams in spark.streams.active:
    streams.stop()