### Create a Sample Dataframe

In [0]:

transactions = [
    (101, "apple", 5.50),
    (101, "banana", 2.75),
    (102, "milk", 3.20),
    (103, "bread", 2.00),
    (102, "eggs", 4.10),
    (104, "cheese", 6.30),
    (105, "juice", 3.80),
    (101, "butter", 2.90),
    (103, "yogurt", 1.50),
    (104, "cereal", 4.25)
]
columns = ["store_id", "item", "amount"]

transactionsDF = spark.createDataFrame(transactions, columns)
transactionsDF.show()

+--------+------+------+
|store_id|  item|amount|
+--------+------+------+
|     101| apple|   5.5|
|     101|banana|  2.75|
|     102|  milk|   3.2|
|     103| bread|   2.0|
|     102|  eggs|   4.1|
|     104|cheese|   6.3|
|     105| juice|   3.8|
|     101|butter|   2.9|
|     103|yogurt|   1.5|
|     104|cereal|  4.25|
+--------+------+------+



### Create Sample Dimension table

In [0]:
store = [
    (101, "Downtown Market"),
    (102, "Uptown Grocers"),
    (103, "Central Mart"),
    (104, "Fresh Foods"),
    (105, "Neighborhood Store")
]
store_columns = ["store_id", "store_name"]

storeDF = spark.createDataFrame(store, store_columns)
storeDF.show()

+--------+------------------+
|store_id|        store_name|
+--------+------------------+
|     101|   Downtown Market|
|     102|    Uptown Grocers|
|     103|      Central Mart|
|     104|       Fresh Foods|
|     105|Neighborhood Store|
+--------+------------------+



### When AQE is Enabled (True by default in Spark 3.0+ versions)

In [0]:
from pyspark.sql.functions import broadcast

joinDF = transactionsDF.join(broadcast(storeDF), transactionsDF['store_id'] == storeDF['store_id'])
joinDF.show()


+--------+------+------+--------+------------------+
|store_id|  item|amount|store_id|        store_name|
+--------+------+------+--------+------------------+
|     101| apple|   5.5|     101|   Downtown Market|
|     101|banana|  2.75|     101|   Downtown Market|
|     102|  milk|   3.2|     102|    Uptown Grocers|
|     103| bread|   2.0|     103|      Central Mart|
|     102|  eggs|   4.1|     102|    Uptown Grocers|
|     104|cheese|   6.3|     104|       Fresh Foods|
|     105| juice|   3.8|     105|Neighborhood Store|
|     101|butter|   2.9|     101|   Downtown Market|
|     103|yogurt|   1.5|     103|      Central Mart|
|     104|cereal|  4.25|     104|       Fresh Foods|
+--------+------+------+--------+------------------+



- No Shuffle of Large Table involved in Broadcast Join

In [0]:
joinDF.explain()

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- BroadcastHashJoin [store_id#296L], [store_id#315L], Inner, BuildRight, false, true
   :- Filter isnotnull(store_id#296L)
   :  +- Scan ExistingRDD[store_id#296L,item#297,amount#298]
   +- Exchange SinglePartition, EXECUTOR_BROADCAST, [plan_id=2513]
      +- Filter isnotnull(store_id#315L)
         +- Scan ExistingRDD[store_id#315L,store_name#316]




In [0]:
joinDF.explain(True)

== Parsed Logical Plan ==
Join Inner, (store_id#296L = store_id#315L)
:- LogicalRDD [store_id#296L, item#297, amount#298], false
+- ResolvedHint (strategy=broadcast)
   +- LogicalRDD [store_id#315L, store_name#316], false

== Analyzed Logical Plan ==
store_id: bigint, item: string, amount: double, store_id: bigint, store_name: string
Join Inner, (store_id#296L = store_id#315L)
:- LogicalRDD [store_id#296L, item#297, amount#298], false
+- ResolvedHint (strategy=broadcast)
   +- LogicalRDD [store_id#315L, store_name#316], false

== Optimized Logical Plan ==
Join Inner, (store_id#296L = store_id#315L), rightHint=(strategy=broadcast)
:- Filter isnotnull(store_id#296L)
:  +- LogicalRDD [store_id#296L, item#297, amount#298], false
+- Filter isnotnull(store_id#315L)
   +- LogicalRDD [store_id#315L, store_name#316], false

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- BroadcastHashJoin [store_id#296L], [store_id#315L], Inner, BuildRight, false, true
   :- Filter isnotnull(store_id

### When AQE is disabled (False)

In [0]:
spark.conf.set("spark.sql.adaptive.enabled", False)
spark.conf.get("spark.sql.adaptive.enabled")

Out[40]: 'false'

In [0]:
from pyspark.sql.functions import broadcast

joinDF1 = transactionsDF.join(broadcast(storeDF), transactionsDF['store_id'] == storeDF['store_id'])
joinDF1.show()

+--------+------+------+--------+------------------+
|store_id|  item|amount|store_id|        store_name|
+--------+------+------+--------+------------------+
|     101| apple|   5.5|     101|   Downtown Market|
|     101|banana|  2.75|     101|   Downtown Market|
|     102|  milk|   3.2|     102|    Uptown Grocers|
|     103| bread|   2.0|     103|      Central Mart|
|     102|  eggs|   4.1|     102|    Uptown Grocers|
|     104|cheese|   6.3|     104|       Fresh Foods|
|     105| juice|   3.8|     105|Neighborhood Store|
|     101|butter|   2.9|     101|   Downtown Market|
|     103|yogurt|   1.5|     103|      Central Mart|
|     104|cereal|  4.25|     104|       Fresh Foods|
+--------+------+------+--------+------------------+



In [0]:
joinDF1.explain()

== Physical Plan ==
*(2) BroadcastHashJoin [store_id#296L], [store_id#315L], Inner, BuildRight, false, false
:- *(2) Filter isnotnull(store_id#296L)
:  +- *(2) Scan ExistingRDD[store_id#296L,item#297,amount#298]
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [plan_id=2666]
   +- *(1) Filter isnotnull(store_id#315L)
      +- *(1) Scan ExistingRDD[store_id#315L,store_name#316]




**Summary:**

- AQE (adaptive.enabled=True) allows Spark to optimize the plan during execution, so the explain output is more dynamic and may show adaptive nodes.
- With AQE off, the plan is static and shows the exact operators Spark will use, making the explain output more straightforward and final.