LCS

LCS (short for Least Cost Strategy) is an efficient data eviction strategy for Spark.

Introduction

As an in-memory distributed computing system, Spark is often used to speed up iterative applications. It caches intermediate data generated by previous iterations into memory, so there is no need to repeat the generation when reusing these data later. This sharing mechanism of caching data in memory makes Spark much faster than other systems. When memory used for caching data reaches the capacity limits, data eviction will be performed to supply space for new data, and the evicted data need to be recovered when they are used again. However, classical strategies do not aware of recovery cost, which could cause system performance degradation. This paper shows that the recovery costs have significant difference in Spark, thus a cost aware eviction strategy can obviously reduces the total recovery cost. To this end, a strategy named LCS is proposed, which gets dependencies information between cache data via analyzing application, and calculates the recovery cost during running. By predicting how many times cache data will be reused and using it to weight the recovery cost, LCS always evicts the data which lead to minimum recovery cost in future. Experimental results show that this approach can achieve better performance when memory space is not sufficient, and reduce 30% to 50% of the total execution time.

Building LCS

Same as Spark, LCS is also built using Apache Maven. To build LCS, run:

build/mvn -DskipTests clean package

More detailed documentation about building Spark is available from the project site, at "Building Spark".

Refer to the Configuration Guide in the online documentation for an overview on how to configure Spark.

Usage

Running sample programs in the examples directory for experiment. For example:

./bin/run-example SparkPageRank

will run the PageRank example locally.

You can set the MASTER environment variable when running examples to submit examples to a cluster. This can be a mesos:// or spark:// URL, "yarn-cluster" or "yarn-client" to run on YARN, and "local" to run locally with one thread, or "local[N]" to run locally with N threads. You can also use an abbreviated class name if the class is in the examples package. For instance:

MASTER=spark://host:7077 ./bin/run-example SparkPageRank

Many of the example programs print usage help if no params are given.

Issus about LCS in the forum of Spark

"SPARK-14289".

Publications about LCS

[1] Yuanzhen Geng, Xuanhua Shi, Cheng Pei, Hai Jin, and Wenbin Jiang, "LCS: an efficient data eviction strategy for spark", International Journal of Parallel Programming, 45(6), 1285-1297, 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 12,957 Commits
R		R
assembly		assembly
bagel		bagel
bin		bin
build		build
conf		conf
core		core
data/mllib		data/mllib
dev		dev
docker		docker
docs		docs
ec2		ec2
examples		examples
external		external
extras		extras
graphx		graphx
launcher		launcher
licenses		licenses
mllib		mllib
network		network
project		project
python		python
repl		repl
sbin		sbin
sbt		sbt
sql		sql
streaming		streaming
tools		tools
unsafe		unsafe
yarn		yarn
.gitattributes		.gitattributes
.gitignore		.gitignore
.rat-excludes		.rat-excludes
CHANGES.txt		CHANGES.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
make-distribution.sh		make-distribution.sh
pom.xml		pom.xml
pylintrc		pylintrc
scalastyle-config.xml		scalastyle-config.xml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LCS

Introduction

Building LCS

Usage

Issus about LCS in the forum of Spark

Publications about LCS

About

Releases

Packages

Contributors 654

Languages

License

CGCL-codes/LCS

Folders and files

Latest commit

History

Repository files navigation

LCS

Introduction

Building LCS

Usage

Issus about LCS in the forum of Spark

Publications about LCS

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 654

Languages

Packages