No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
build
execution
java
.gitignore
HADOOP.md
LICENSE
README.md

README.md

GiraphWrapper

Resources

directory structure of results/

The measured runtimes for analyzing a given setup with Giraph are written to the following directory structure:

results/
	<dataset>/
		<batches>/
			<partitioning>/
 				<metric>/
					<workers>/
						<run>.dat
						aggr

The results from multiple runs (= repetitions) of the same setup are aggregated in aggr.

metrics from Giraph

Here, metrics used from giraph examples are listed. Each metric is specified as:

  • KEY: name/link

sources

we use:

tested but not working

others:

  • random walk
  • page rank

metrics from Okapi

Here, metrics used from ocapi are listed. W identifies algorithms working for weighted graphs, D are directed graphs, and U are undirected graphs. U & D means that the algorithm can be applied to directed or undirected graphs, if only one is used the other is assumed to not be supported. W menas that weighted graphs are supported but no statement about directed or undirected is mde.

Each algorithm is specified as:

  • KEY: name/link (graph type) (optional description)

sources

we use:

tested but not working

  • OKAPI_MSSP_FRACTION_${fraction of vertices to use as source} (always fails)

others:

Arguments for the use of GiraphRunner

complete help output (GiraphRunner -h)

usage: org.apache.giraph.utils.ConfigurationUtils [-aw <arg>] [-c <arg>]
       [-ca <arg>] [-cf <arg>] [-eif <arg>] [-eip <arg>] [-eof <arg>]
       [-esd <arg>] [-h] [-jyc <arg>] [-la] [-mc <arg>] [-op <arg>] [-pc
       <arg>] [-q] [-th <arg>] [-ve <arg>] [-vif <arg>] [-vip <arg>] [-vof
       <arg>] [-vsd <arg>] [-vvf <arg>] [-w <arg>] [-wc <arg>] [-yh <arg>]
       [-yj <arg>]
 -aw,--aggregatorWriter <arg>           AggregatorWriter class
 -c,--combiner <arg>                    MessageCombiner class
 -ca,--customArguments <arg>            provide custom arguments for the
                                        job configuration in the form: -ca
                                        <param1>=<value1>,<param2>=<value2
                                        > -ca <param3>=<value3> etc. It
                                        can appear multiple times, and the
                                        last one has effect for the same
                                        param.
 -cf,--cacheFile <arg>                  Files for distributed cache
 -eif,--edgeInputFormat <arg>           Edge input format
 -eip,--edgeInputPath <arg>             Edge input path
 -eof,--edgeOutputFormat <arg>          Edge output format
 -esd,--edgeSubDir <arg>                subdirectory to be used for the
                                        edge output
 -h,--help                              Help
 -jyc,--jythonClass <arg>               Jython class name, used if
                                        computation passed in is a python
                                        script
 -la,--listAlgorithms                   List supported algorithms
 -mc,--masterCompute <arg>              MasterCompute class
 -op,--outputPath <arg>                 Output path
 -pc,--partitionClass <arg>             Partition class
 -q,--quiet                             Quiet output
 -th,--typesHolder <arg>                Class that holds types. Needed
                                        only if Computation is not set
 -ve,--outEdges <arg>                   Vertex edges class
 -vif,--vertexInputFormat <arg>         Vertex input format
 -vip,--vertexInputPath <arg>           Vertex input path
 -vof,--vertexOutputFormat <arg>        Vertex output format
 -vsd,--vertexSubDir <arg>              subdirectory to be used for the
                                        vertex output
 -vvf,--vertexValueFactoryClass <arg>   Vertex value factory class
 -w,--workers <arg>                     Number of workers
 -wc,--workerContext <arg>              WorkerContext class
 -yh,--yarnheap <arg>                   Heap size, in MB, for each Giraph
                                        task (YARN only.) Defaults to
                                        giraph.yarn.task.heap.mb => 1024
                                        (integer)
                                        MB.
 -yj,--yarnjars <arg>                   comma-separated list of JAR
                                        filenames to distribute to Giraph
                                        tasks and ApplicationMaster. YARN
                                        only. Search order: CLASSPATH,
                                        HADOOP_HOME, user current dir.

components

edges

 -eif,--edgeInputFormat <arg>           Edge input format
 -eip,--edgeInputPath <arg>             Edge input path
 -eof,--edgeOutputFormat <arg>          Edge output format
 -esd,--edgeSubDir <arg>                subdirectory to be used for the
                                        edge output

vertices

 -ve,--outEdges <arg>                   Vertex edges class
 -vif,--vertexInputFormat <arg>         Vertex input format
 -vip,--vertexInputPath <arg>           Vertex input path
 -vof,--vertexOutputFormat <arg>        Vertex output format
 -vsd,--vertexSubDir <arg>              subdirectory to be used for the
                                        vertex output
 -vvf,--vertexValueFactoryClass <arg>   Vertex value factory class

log

 -h,--help                              Help
 -q,--quiet                             Quiet output

results

 -op,--outputPath <arg>                 Output path

computation

 -mc,--masterCompute <arg>              MasterCompute class
 -pc,--partitionClass <arg>             Partition class

workers

 -w,--workers <arg>                     Number of workers
 -wc,--workerContext <arg>              WorkerContext class

misc

 -jyc,--jythonClass <arg>               Jython class name, used if
                                        computation passed in is a python
                                        script
 -yj,--yarnjars <arg>                   comma-separated list of JAR
                                        filenames to distribute to Giraph
                                        tasks and ApplicationMaster. YARN
                                        only. Search order: CLASSPATH,
                                        HADOOP_HOME, user current dir.
 -yh,--yarnheap <arg>                   Heap size, in MB, for each Giraph
                                        task (YARN only.) Defaults to
                                        giraph.yarn.task.heap.mb => 1024
                                        (integer)
                                        MB.
 -ca,--customArguments <arg>            provide custom arguments for the
                                        job configuration in the form: -ca
                                        <param1>=<value1>,<param2>=<value2
                                        > -ca <param3>=<value3> etc. It
                                        can appear multiple times, and the
                                        last one has effect for the same
                                        param.
 -cf,--cacheFile <arg>                  Files for distributed cache
 -la,--listAlgorithms                   List supported algorithms
 -th,--typesHolder <arg>                Class that holds types. Needed
                                        only if Computation is not set
 -aw,--aggregatorWriter <arg>           AggregatorWriter class
 -c,--combiner <arg>                    MessageCombiner class