forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
- generate a synthetic workload
- initial implementation:
- how to index cache? Serialize the list of tuples [(column, predict), …]
- how to store the cache items? use the cache() provided by sparksql
- when executing a query, the cache planner asks logical planner for all the tables, columns and the predicates applied on them, then pass a list of key value pairs to cache manager. cache manager is responsible for inserting callbacks to spark so that the internal results will be materialized.
- next implementation:
- cache the joined tables
- Workflow analyze
- Use cache planner to analyze all the workflow at first
- Cache planner then start the sparkSQL plans sequentially.
- RDD collector:
- Cache manager need to collect data, maybe from spark
Metadata
Metadata
Assignees
Labels
No labels