Skip to content

Meeting note 10/18 #2

@codingjaguar

Description

@codingjaguar
  1. generate a synthetic workload
  2. initial implementation:
    1. how to index cache? Serialize the list of tuples [(column, predict), …]
    2. how to store the cache items? use the cache() provided by sparksql
    3. when executing a query, the cache planner asks logical planner for all the tables, columns and the predicates applied on them, then pass a list of key value pairs to cache manager. cache manager is responsible for inserting callbacks to spark so that the internal results will be materialized.
  3. next implementation:
    1. cache the joined tables
  4. Workflow analyze
    1. Use cache planner to analyze all the workflow at first
    2. Cache planner then start the sparkSQL plans sequentially.
  5. RDD collector:
    1. Cache manager need to collect data, maybe from spark

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions