Meeting note 10/18

1. generate a synthetic workload
2. initial implementation: 
   1. how to index cache? Serialize the list of tuples [(column, predict), …]
   2. how to store the cache items? use the cache() provided by sparksql
   3. when executing a query, the cache planner asks logical planner for all the tables, columns and the predicates applied on them, then pass a list of key value pairs to cache manager. cache manager is responsible for inserting callbacks to spark so that the internal results will be materialized.
3. next implementation:
   1. cache the joined tables
4. Workflow analyze
   1. Use cache planner to analyze all the workflow at first
   2. Cache planner then start the sparkSQL plans sequentially.
5. RDD collector:
   1. Cache manager need to collect data, maybe from spark


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Meeting note 10/18 #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Meeting note 10/18 #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions