Euphoria

Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model that can express both batch and stream transformations.

The main goal of the API is to ease the creation of programs with business logic independent of a specific runtime framework/engine and independent of the source or destination of the processed data. Such programs are then transferable with little effort to new environments and new data sources or destinations - idealy just by configuration.

Key features

Unified API that supports both batch and stream processing using the same code
Avoids vendor lock-in - migrating between different engines is matter of configuration
Declarative Java API using Java 8 Lambda expressions
Support for different notions of time (event time, ingestion time)
Flexible windowing (Time, TimeSliding, Session, Count)

Download

The best way to use Euphoria is by adding the following Maven dependency to your pom.xml:

<dependency>
  <groupId>cz.seznam.euphoria</groupId>
  <artifactId>euphoria-core</artifactId>
  <version>0.7.0</version>
</dependency>

You may want to add additional modules, such as support of various engines or I/O data sources/sinks. For more details read the Maven Dependencies wiki page.

WordCount example

// Define data source and data sinks
DataSource<String> dataSource = new SimpleHadoopTextFileSource(inputPath);
DataSink<String> dataSink = new SimpleHadoopTextFileSink<>(outputPath);

// Define a flow, i.e. a chain of transformations
Flow flow = Flow.create("WordCount");

Dataset<String> lines = flow.createInput(dataSource);

Dataset<String> words = FlatMap.named("TOKENIZER")
    .of(lines)
    .using((String line, Collector<String> context) -> {
      for (String word : line.split("\\s+")) {
        context.collect(word);
      }
    })
    .output();

Dataset<Pair<String, Long>> counted = ReduceByKey.named("COUNT")
    .of(words)
    .keyBy(w -> w)
    .valueBy(w -> 1L)
    .combineBy(Sums.ofLongs())
    .output();

MapElements.named("FORMAT")
    .of(counted)
    .using(p -> p.getFirst() + "\n" + p.getSecond())
    .output()
    .persist(dataSink);

// Initialize an executor and run the flow (using Apache Flink)
try {
  Executor executor = new FlinkExecutor();
  executor.submit(flow).get();
} catch (InterruptedException ex) {
  LOG.warn("Interrupted while waiting for the flow to finish.", ex);
} catch (IOException | ExecutionException ex) {
  throw new RuntimeException(ex);
}

Supported Engines

Euphoria translates flows, also known as data transformation pipelines, into the specific API of a chosen, supported big-data processing engine. Currently, the following are supported:

Apache Flink
Apache Spark
An independent, standalone, in-memory engine which is part of the Euphoria project suitable for running flows in unit tests.

In the WordCount example from above, to switch the execution engine from Apache Flink to Apache Spark, we'd merely need to replace FlinkExecutor with SparkExecutor.

Bugs / Features / Contributing

There's still a lot of room for improvements and extensions. Have a look into the issue tracker and feel free to contribute by reporting new problems, contributing to existing ones, or even open issues in case of questions. Any constructive feedback is warmly welcome!

As usually with open source, don't hesitate to fork the repo and submit a pull requests if you see something to be changed. We'll be happy see euphoria improving over time.

Building

To build the Euphoria artifacts, the following is required:

Git
Java 8

Building the project itself is a matter of:

git clone https://github.com/seznam/euphoria
cd euphoria
./gradlew publishToMavenLocal -xtest

Documentation

An incipient documentation is currently maintained in the form of a Wiki on Github, including a brief FAQ page.
Another source of documentation are deliberately simple examples maintained in the euphoria-examples module.

Contact us

Feel free to open an issue in the issue tracker

License

Euphoria is licensed under the terms of the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 969 Commits
benchmarks		benchmarks
cd		cd
euphoria-avro		euphoria-avro
euphoria-beam-io		euphoria-beam-io
euphoria-core		euphoria-core
euphoria-examples		euphoria-examples
euphoria-flink		euphoria-flink
euphoria-fluent		euphoria-fluent
euphoria-hadoop		euphoria-hadoop
euphoria-hbase		euphoria-hbase
euphoria-kafka		euphoria-kafka
euphoria-local		euphoria-local
euphoria-operator-testkit		euphoria-operator-testkit
euphoria-spark		euphoria-spark
euphoria-testing		euphoria-testing
gradle		gradle
thirdparty-guava		thirdparty-guava
.gitignore		.gitignore
.travis.yml		.travis.yml
HEADER		HEADER
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
license-header.txt		license-header.txt
pom.xml		pom.xml
settings.gradle		settings.gradle

License

Licenses found

seznam/euphoria

Folders and files

Latest commit

History

Repository files navigation

Euphoria

Key features

Download

WordCount example

Supported Engines

Bugs / Features / Contributing

Building

Documentation

Contact us

License

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Languages