Apache Nemo (Incubating) - Data Processing System for Flexible Employment With Different Deployment Characteristics
Branch: master
Clone or download
wonook and wynot12 [NEMO-335] DB for storing metrics (#192)
JIRA: [NEMO-335: DB for storing metrics](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-335)

**Major changes:**
- Brings `sqlite-jdbc` and `postgresql` library for interacting with the sqlite db and PostgreSQL.
- Automatically writes the (0) DAG specifications (# of (root) vertices & edges), (1) duration, (2) memory specs (jvm memory and machine memory), (3) the execution properties of the IR DAG to the DB (located at the project root with the `LICENSE` file), or to a PostgreSQL server upon each execution.

**Minor changes to note:**
- Overall cleanup and refactoring of metric classes.

**Tests for the changes:**
- Existing tests work.

**Other comments:**
- None

Closes #192
Latest commit 3115974 Feb 15, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github [NEMO-322] Committer's Guide (#186) Jan 18, 2019
bin [NEMO-294] Beam-Runner (#163) Dec 3, 2018
client [NEMO-335] DB for storing metrics (#192) Feb 15, 2019
common [NEMO-335] DB for storing metrics (#192) Feb 15, 2019
compiler [NEMO-335] DB for storing metrics (#192) Feb 15, 2019
conf [NEMO-335] DB for storing metrics (#192) Feb 15, 2019
deploy [NEMO-239] Edit Source Headers for ASF Guideline Compliance (#132) Oct 26, 2018
examples [NEMO-332] Refactor RunTimePass (#191) Feb 7, 2019
runtime [NEMO-335] DB for storing metrics (#192) Feb 15, 2019
webui [NEMO-159] Nemo Web UI (#181) Dec 15, 2018
.editorconfig [NEMO-142] Add .editorconfig and formatter.xml (#106) Aug 17, 2018
.gitignore [NEMO-335] DB for storing metrics (#192) Feb 15, 2019
.travis.yml [NEMO-25] Improve WebUI to use RESTful APIs by Nemo (#95) Dec 14, 2018
DISCLAIMER [NEMO-305] Add DISCLAIMER (#168) Nov 30, 2018
LICENSE [NEMO-239] Edit Source Headers for ASF Guideline Compliance (#132) Oct 26, 2018
NOTICE [NEMO-225] Drop REEF JARs from source tree (#127) Oct 23, 2018
README.md update README (#185) Jan 17, 2019
checkstyle.license [NEMO-306] Add license checkstyle (#171) Dec 3, 2018
checkstyle.xml [NEMO-315] Remove checkstyle settings for javadoc error (#176) Dec 10, 2018
formatter.xml [NEMO-306] Add license checkstyle (#171) Dec 3, 2018
log4j.properties [NEMO-335] DB for storing metrics (#192) Feb 15, 2019
pom.xml [NEMO-335] DB for storing metrics (#192) Feb 15, 2019

README.md

Nemo

Build Status

A Data Processing System for Flexible Employment With Different Deployment Characteristics.

Online Documentation

Details about Nemo and its development can be found in:

Please refer to the Contribution guideline to contribute to our project.

Nemo prerequisites and setup

Prerequisites

  • Java 8
  • Maven
  • YARN settings
  • Protobuf 2.5.0
    • On Ubuntu 14.04 LTS and its point releases:

      sudo apt-get install protobuf-compiler
    • On Ubuntu 16.04 LTS and its point releases:

      sudo add-apt-repository ppa:snuspl/protobuf-250
      sudo apt update
      sudo apt install protobuf-compiler=2.5.0-9xenial1
    • On macOS:

      brew tap homebrew/versions
      brew install protobuf@2.5
    • Or build from source:

    • To check for a successful installation of version 2.5.0, run protoc --version

Installing Nemo

  • Run all tests and install: mvn clean install -T 2C
  • Run only unit tests and install: mvn clean install -DskipITs -T 2C

Running Beam applications

Apache Nemo is an official runner of Apache Beam, and it can be executed from Beam, using NemoRunner, as well as directly from the Nemo project. The details of using NemoRunner from Beam is shown on the NemoRunner page of the Apache Beam website. Below describes how Beam applications can be run directly on Nemo.

Configurable options

  • -job_id: ID of the Beam job
  • -user_main: Canonical name of the Beam application
  • -user_args: Arguments that the Beam application accepts
  • -optimization_policy: Canonical name of the optimization policy to apply to a job DAG in Nemo Compiler
  • -deploy_mode: yarn is supported(default value is local)

Examples

## MapReduce example
./bin/run_beam.sh \
	-job_id mr_default \
	-executor_json `pwd`/examples/resources/executors/beam_test_executor_resources.json \
	-optimization_policy org.apache.nemo.compiler.optimizer.policy.DefaultPolicy \
	-user_main org.apache.nemo.examples.beam.WordCount \
        -user_args "`pwd`/examples/resources/inputs/test_input_wordcount `pwd`/outputs/wordcount"

## YARN cluster example
./bin/run_beam.sh \
	-deploy_mode yarn \
 	-job_id mr_transient \
	-executor_json `pwd`/examples/resources/executors/beam_test_executor_resources.json \
 	-user_main org.apache.nemo.examples.beam.WordCount \
 	-optimization_policy org.apache.nemo.compiler.optimizer.policy.TransientResourcePolicy \
	-user_args "hdfs://v-m:9000/test_input_wordcount hdfs://v-m:9000/test_output_wordcount"

Resource Configuration

-executor_json command line option can be used to provide a path to the JSON file that describes resource configuration for executors. Its default value is config/default.json, which initializes one of each Transient, Reserved, and Compute executor, each of which has one core and 1024MB memory.

Configurable options

  • num (optional): Number of containers. Default value is 1
  • type: Three container types are supported:
    • Transient : Containers that store eviction-prone resources. When batch jobs use idle resources in Transient containers, they can be arbitrarily evicted when latency-critical jobs attempt to use the resources.
    • Reserved : Containers that store eviction-free resources. Reserved containers are used to reliably store intermediate data which have high eviction cost.
    • Compute : Containers that are mainly used for computation.
  • memory_mb: Memory size in MB
  • capacity: Number of Tasks that can be run in an executor. Set this value to be the same as the number of CPU cores of the container.

Examples

[
  {
    "num": 12,
    "type": "Transient",
    "memory_mb": 1024,
    "capacity": 4
  },
  {
    "type": "Reserved",
    "memory_mb": 1024,
    "capacity": 2
  }
]

This example configuration specifies

  • 12 transient containers with 4 cores and 1024MB memory each
  • 1 reserved container with 2 cores and 1024MB memory

Monitoring your job using web UI

Nemo Compiler and Engine can store JSON representation of intermediate DAGs.

  • -dag_dir command line option is used to specify the directory where the JSON files are stored. The default directory is ./dag. Using our online visualizer, you can easily visualize a DAG. Just drop the JSON file of the DAG as an input to it.

Examples

./bin/run_beam.sh \
	-job_id als \
	-executor_json `pwd`/examples/resources/executors/beam_test_executor_resources.json \
  	-user_main org.apache.nemo.examples.beam.AlternatingLeastSquare \
  	-optimization_policy org.apache.nemo.compiler.optimizer.policy.TransientResourcePolicy \
  	-dag_dir "./dag/als" \
  	-user_args "`pwd`/examples/resources/inputs/test_input_als 10 3"

Speeding up builds

  • To exclude Spark related packages: mvn clean install -T 2C -DskipTests -pl \!compiler/frontend/spark,\!examples/spark
  • To exclude Beam related packages: mvn clean install -T 2C -DskipTests -pl \!compiler/frontend/beam,\!examples/beam