Elasecutor is a novel executor scheduler for data analytics systems. It dynamically allocates and explicitly sizes resources to executors over time according to the predicted time-varying resource demands. Rather than placing executors using their peak demand, Elasecutor strategically assigns them to machines based on a concept called dominant remaining resource to minimize resource fragmentation. Elasecutor further adaptively reprovisions resources in order to tolerate inaccurate demand prediction.
Spark 2.1.0, Hadoop 2.6.0, Ubuntu 16.04.2 LTS (Kernel 4.4.0), OpenJDK 7u85, cgroups management tools, psutil, Scala 2.10.4, Python 3
$ build/mvn -DskipTests clean package
Besides the sheduler module, Elasecutor consists of many components: Monitor Surrogate, Reprovisioning Module, Prediction Module, and Resource Usage Depository (RUD). To make them work, you need to start them manually.
Start Monitor Surrogate at each slave server using the following command:
$ python resMon.py
To start RUD at master server, run:
$ python collect.py
The RUD would connect to the Monitor Surrogate to get resource profiles.
Next, start Prediction Module at master server, run:
$ python predict.py
Finally, compile the subsystem at each slave server and correspondingly start Allocation Module.
We use the the workloads from HiBench.
Please read Architecture.md for more system design details.