- Giraph Quick Start Guide http://giraph.apache.org/quick_start.html
- GiraphRunner class https://giraph.apache.org/xref/org/apache/giraph/GiraphRunner.html
- GiraphRunner class https://giraph.apache.org/apidocs/org/apache/giraph/GiraphRunner.html
- Example to run Giraph (IO!): https://marsty5.com/2013/04/29/run-example-in-giraph-shortest-paths/
- Example for GiraphRunner (Args!): https://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/
The measured runtimes for analyzing a given setup with Giraph are written to the following directory structure:
results/
<dataset>/
<batches>/
<partitioning>/
<metric>/
<workers>/
<run>.dat
aggr
The results from multiple runs (= repetitions) of the same setup are aggregated in aggr
.
Here, metrics used from giraph examples are listed. Each metric is specified as:
- KEY: name/link
- https://www.quora.com/What-are-the-algorithms-built-of-top-of-Giraph-by-far
- http://giraph.apache.org
- https://github.com/apache/giraph/tree/release-1.0/giraph-examples/src/main/java/org/apache/giraph/examples
- GIRAPH_WEAK_CONNECTIVITY
- ConnectedComponentsComputation.java
- directed / undirected unweighted => EdgeList
- GIRAPH_TRIANGLE_CLOSING (not terminating)
- GIRAPH_IN_DEGREE (incorrect input format?!?)
- GIRAPH_OUT_DEGREE (incorrect input format?!?)
- GIRAPH_APSP_SINGLE (incorrect input format?!?)
- random walk
- page rank
Here, metrics used from ocapi are listed. W identifies algorithms working for weighted graphs, D are directed graphs, and U are undirected graphs. U & D means that the algorithm can be applied to directed or undirected graphs, if only one is used the other is assumed to not be supported. W menas that weighted graphs are supported but no statement about directed or undirected is mde.
Each algorithm is specified as:
- KEY: name/link (graph type) (optional description)
- http://grafos.ml/okapi.html#analytics
- https://github.com/grafos-ml/okapi/tree/master/src/main/java/ml/grafos/okapi/graphs
- OKAPI_CLUSTERING_COEFFICIENT
- Readme
- ClusteringCoefficient.java
- directed / undirected, unweighted => EdgeList
- OKAPI_TRIANGLES_COUNT
- Readme
- Triangles.java
- undirected, unweighted => EdgeList
- OKAPI_TRIANGLES_LIST
- Readme
- Triangle.java
- undirected, unweighted => EdgeList
- OKAPI_JACCARD_EXACT
- Readme
- Jaccard.java
- directed / undirected, unweighted => EdgeList
- OKAPI_JACCARD_APPROX
- Readme
- Jaccard.java
- directed / undirected, unweighted => EdgeList
- OKAPI_MSSP_LIST_${list of vertex indexes}
- Readme
- MultipleSourceShortestPaths.java
- directed / undirected, weighted => WeightedEdgeList
- indexes must be separated by
:
, e.g.,0:35:2:5
- OKAPI_MSSP_FRACTION_${fraction of vertices to use as source} (always fails)
- Readme
- MultipleSourceShortestPaths.java
- directed / undirected, weighted => WeightedEdgeList
- fraction is specified as a decimal number, e.g.,
0.12
for 12%
- K-core
- Semi-clustering (W)
- Semi-metric
- Semi-metric triangles
- Page rang
- Sybil rank (W)
- Adamic-Adar similarity (U)
- Maximum B-matching
usage: org.apache.giraph.utils.ConfigurationUtils [-aw <arg>] [-c <arg>]
[-ca <arg>] [-cf <arg>] [-eif <arg>] [-eip <arg>] [-eof <arg>]
[-esd <arg>] [-h] [-jyc <arg>] [-la] [-mc <arg>] [-op <arg>] [-pc
<arg>] [-q] [-th <arg>] [-ve <arg>] [-vif <arg>] [-vip <arg>] [-vof
<arg>] [-vsd <arg>] [-vvf <arg>] [-w <arg>] [-wc <arg>] [-yh <arg>]
[-yj <arg>]
-aw,--aggregatorWriter <arg> AggregatorWriter class
-c,--combiner <arg> MessageCombiner class
-ca,--customArguments <arg> provide custom arguments for the
job configuration in the form: -ca
<param1>=<value1>,<param2>=<value2
> -ca <param3>=<value3> etc. It
can appear multiple times, and the
last one has effect for the same
param.
-cf,--cacheFile <arg> Files for distributed cache
-eif,--edgeInputFormat <arg> Edge input format
-eip,--edgeInputPath <arg> Edge input path
-eof,--edgeOutputFormat <arg> Edge output format
-esd,--edgeSubDir <arg> subdirectory to be used for the
edge output
-h,--help Help
-jyc,--jythonClass <arg> Jython class name, used if
computation passed in is a python
script
-la,--listAlgorithms List supported algorithms
-mc,--masterCompute <arg> MasterCompute class
-op,--outputPath <arg> Output path
-pc,--partitionClass <arg> Partition class
-q,--quiet Quiet output
-th,--typesHolder <arg> Class that holds types. Needed
only if Computation is not set
-ve,--outEdges <arg> Vertex edges class
-vif,--vertexInputFormat <arg> Vertex input format
-vip,--vertexInputPath <arg> Vertex input path
-vof,--vertexOutputFormat <arg> Vertex output format
-vsd,--vertexSubDir <arg> subdirectory to be used for the
vertex output
-vvf,--vertexValueFactoryClass <arg> Vertex value factory class
-w,--workers <arg> Number of workers
-wc,--workerContext <arg> WorkerContext class
-yh,--yarnheap <arg> Heap size, in MB, for each Giraph
task (YARN only.) Defaults to
giraph.yarn.task.heap.mb => 1024
(integer)
MB.
-yj,--yarnjars <arg> comma-separated list of JAR
filenames to distribute to Giraph
tasks and ApplicationMaster. YARN
only. Search order: CLASSPATH,
HADOOP_HOME, user current dir.
-eif,--edgeInputFormat <arg> Edge input format
-eip,--edgeInputPath <arg> Edge input path
-eof,--edgeOutputFormat <arg> Edge output format
-esd,--edgeSubDir <arg> subdirectory to be used for the
edge output
-ve,--outEdges <arg> Vertex edges class
-vif,--vertexInputFormat <arg> Vertex input format
-vip,--vertexInputPath <arg> Vertex input path
-vof,--vertexOutputFormat <arg> Vertex output format
-vsd,--vertexSubDir <arg> subdirectory to be used for the
vertex output
-vvf,--vertexValueFactoryClass <arg> Vertex value factory class
-h,--help Help
-q,--quiet Quiet output
-op,--outputPath <arg> Output path
-mc,--masterCompute <arg> MasterCompute class
-pc,--partitionClass <arg> Partition class
-w,--workers <arg> Number of workers
-wc,--workerContext <arg> WorkerContext class
-jyc,--jythonClass <arg> Jython class name, used if
computation passed in is a python
script
-yj,--yarnjars <arg> comma-separated list of JAR
filenames to distribute to Giraph
tasks and ApplicationMaster. YARN
only. Search order: CLASSPATH,
HADOOP_HOME, user current dir.
-yh,--yarnheap <arg> Heap size, in MB, for each Giraph
task (YARN only.) Defaults to
giraph.yarn.task.heap.mb => 1024
(integer)
MB.
-ca,--customArguments <arg> provide custom arguments for the
job configuration in the form: -ca
<param1>=<value1>,<param2>=<value2
> -ca <param3>=<value3> etc. It
can appear multiple times, and the
last one has effect for the same
param.
-cf,--cacheFile <arg> Files for distributed cache
-la,--listAlgorithms List supported algorithms
-th,--typesHolder <arg> Class that holds types. Needed
only if Computation is not set
-aw,--aggregatorWriter <arg> AggregatorWriter class
-c,--combiner <arg> MessageCombiner class