Skip to content

Notes/blogs/tutorials/talks around basics & better optimzed usage of Apache Spark

Notifications You must be signed in to change notification settings

abhioncbr/spark-notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

spark-notes [Notes/blogs/tutorials/talks around basics & better optimzed usage Apache Spark]

  1. Talk on tuning of Spark for big jobs by FB guys: Tuning Apache Spark for Large Scale Workloads - Sital Kedia & Gaoxiang Liu -2017

  2. Talk on Catalyst & Tungsten framework by Sameer Agarwal: SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal

  • Key takeaways:
    • Catalyst framework role in converting sql queries to logical plan to physical plan and than applying transformation to produce optimized query plan.
    • Tungsten Framework introduction, sharing concept of Volcana Iterator Model and brief introduction of Whole-Stage Codegen
  1. Talk on Sql table bucketing support: Hive Bucketing in Apache Spark - Tejas Patil
  • Key takeaways:
    • Introduction to bucketing and how it is helping in reducing sortinf & shuffling operations done by spark sql planner.
    • Comparsion of bucketing support in hive & spark and various jira's tickets around spark sql optimization for bucketing.
  1. Talk on Spark's memory model & data aware cache: A Developer’s View into Spark's Memory Model - Wenchen Fan

  2. Talk on Spark's cost based optimizer: Cost Based Optimizer in Apache Spark 2 2 - Ron Hu & Sameer Agarwal

  3. Spark Scheduler: Apache Spark Scheduler

About

Notes/blogs/tutorials/talks around basics & better optimzed usage of Apache Spark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published