-
Talk on tuning of Spark for big jobs by FB guys: Tuning Apache Spark for Large Scale Workloads - Sital Kedia & Gaoxiang Liu -2017
-
Talk on Catalyst & Tungsten framework by Sameer Agarwal: SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal
- Key takeaways:
- Catalyst framework role in converting sql queries to logical plan to physical plan and than applying transformation to produce optimized query plan.
- Tungsten Framework introduction, sharing concept of Volcana Iterator Model and brief introduction of Whole-Stage Codegen
- Talk on Sql table bucketing support: Hive Bucketing in Apache Spark - Tejas Patil
- Key takeaways:
- Introduction to bucketing and how it is helping in reducing sortinf & shuffling operations done by spark sql planner.
- Comparsion of bucketing support in hive & spark and various jira's tickets around spark sql optimization for bucketing.
-
Talk on Spark's memory model & data aware cache: A Developer’s View into Spark's Memory Model - Wenchen Fan
-
Talk on Spark's cost based optimizer: Cost Based Optimizer in Apache Spark 2 2 - Ron Hu & Sameer Agarwal
-
Spark Scheduler: Apache Spark Scheduler