Skip to content

Latest commit

 

History

History
352 lines (312 loc) · 9.13 KB

README.md

File metadata and controls

352 lines (312 loc) · 9.13 KB

Vinyl: Relational Streams for Java

Vinyl is now in beta. Feedback is welcome!

Inspired by SQL, built for Java. Vinyl extends Java Streams with relational operations, based around its central Record type. It aims to integrate smoothly and minimally on top of existing Streams, while staying efficient, safe, and easy-to-use.

Vinyl requires a Java version of at least 9.

Adding Vinyl to your build

Vinyl's Maven group ID is io.avery, and its artifact ID is vinyl.

To add a dependency on Vinyl using Maven, use the following:

<dependency>
  <groupId>io.avery</groupId>
  <artifactid>vinyl</artifactid>
  <version>0.1</version>
</dependency>

Quick Overview

The package documentation gives a couple examples and introduces core concepts.

A Guided Example

To make use of Vinyl, we first need to declare some "fields" that we will use later in relational operations.

Field<Integer> number = new Field<>("number");
Field<Integer> times2 = new Field<>("times2");
Field<Integer> square = new Field<>("square");

Notice that we give each field a name and a type argument. The name is the field's toString() representation. The type argument says what type of values a "record" can associate with this field.

Now we can start writing streams. For a simple example, we'll enrich a sequence of numbers:

RecordStream numbers = RecordStream.aux(IntStream.range(0, 10).boxed())
    .mapToRecord(into -> into
        .field(number, i -> i)
        .field(times2, i -> i + i)
        .field(square, i -> i * i)
    );

We first wrap a normal stream in a RecordStream.Aux, an "auxiliary" stream that extends Stream with the mapToRecord() method. With mapToRecord(), we describe how to create each field on an outgoing record, from an incoming element. The API ensures that all outgoing records share the same set of fields, or "header". This is a key aspect of a RecordStream.

At this point, the data conceptually looks like:

number times2 square
0 0 0
1 2 1
2 4 4
3 6 9
4 8 16
5 10 25
6 12 36
7 14 49
8 16 64
9 18 81

Let's try this one more time. This time, instead of generating numbers, we'll convert some data we already have as a list of Child objects into a record stream.

Field<String> firstName = new Field<>("firstName");
Field<String> lastName = new Field<>("lastName");
Field<Integer> favoriteNumber = new Field<>("favoriteNumber");

RecordStream favoriteNumbers = RecordStream.aux(children.stream())
    .mapToRecord(into -> into
        .field(firstName, Child::getFirstName)
        .field(lastName, Child::getLastName)
        .field(favoriteNumber, Child::getFavoriteNumber)
    );

This data conceptually looks like:

firstName lastName favoriteNumber
Amelia Rose 7
James Johnson 7
Maria Cabrero 4
Lisa Woods 100
Marc Vincent 3
Tyler Laine 9
Olivia Pineau 2
Sunder Suresh 22
Megan Alis 4

If we wanted, we could relate children's favorite numbers with more info about those numbers, by joining our two data sets together:

RecordStream joinedNumbers = favoriteNumbers
    .leftJoin(numbers,
              on -> on.match((left, right) -> Objects.equals(left.get(favoriteNumber), right.get(number))),
              select -> select
                  .leftAllFields()
                  .rightAllFieldsExcept(number)
    );

We left-join the favoriteNumbers to the numbers, providing a join condition that matches when the left-side record's favoriteNumber is equal to the right-side record's number. For our outgoing records, we select all fields from the left-side record and all fields from the right-side record (excluding number, since it will be redundant with favoriteNumber). The resulting data conceptually looks like:

firstName lastName favoriteNumber times2 square
Amelia Rose 7 14 49
James Johnson 7 14 49
Maria Cabrero 4 8 16
Lisa Woods 100 null null
Marc Vincent 3 6 9
Tyler Laine 9 18 81
Olivia Pineau 2 4 4
Sunder Suresh 22 null null
Megan Alis 4 8 16

While this join yields the results we expect, it is not efficient for larger input data. The problem is that the match() lambda is opaque to Vinyl, so there is not enough information for Vinyl to optimize the join. When the join is evaluated, for each left-side record, we will loop over the whole right side searching for records that match - a nested loop. We could provide more information by writing the join condition differently:

RecordStream joinedNumbers = favoriteNumbers
    .leftJoin(numbers,
              on -> on.eq(on.left(favoriteNumber), on.right(number)),
              select -> select
                  .leftAllFields()
                  .rightAllFieldsExcept(number)
    );

Now, Vinyl knows we are doing an equality test between the left and right sides. When the join is evaluated, we will first index the right side, grouping records by their number value. Then, for each left-side record, we will look up its favoriteNumber value in the index, quickly finding all right-side records that match.

Since a RecordStream is a Stream, we can still use any of the usual stream operations:

int sumOfMSquares = joinedNumbers
    .filter(record -> record.get(firstName).startsWith("M"))
    .mapToInt(record -> record.get(square))
    .sum();

Or, if we need to use the same records again, we may store them in a RecordSet:

RecordSet data = joinedNumbers.toRecordSet();

Like a RecordStream, a RecordSet has a single header shared by all its records. This means we can easily get back to a RecordStream from a RecordSet:

RecordStream reStream = data.stream();

Here is our full example again, with streams inlined:

Field<Integer> number = new Field<>("number");
Field<Integer> times2 = new Field<>("times2");
Field<Integer> square = new Field<>("square");
Field<String> firstName = new Field<>("firstName");
Field<String> lastName = new Field<>("lastName");
Field<Integer> favoriteNumber = new Field<>("favoriteNumber");

RecordSet data = RecordStream.aux(children.stream())
    .mapToRecord(into -> into
        .field(firstName, Child::getFirstName)
        .field(lastName, Child::getLastName)
        .field(favoriteNumber, Child::getFavoriteNumber)
    )
    .leftJoin(RecordStream.aux(IntStream.range(0, 10).boxed())
                  .mapToRecord(into -> into
                      .field(number, i -> i)
                      .field(times2, i -> i + i)
                      .field(square, i -> i * i)
                  ),
              on -> on.eq(on.left(favoriteNumber), on.right(number)),
              select -> select
                  .leftAllFields()
                  .rightAllFieldsExcept(number)
    )
    .toRecordSet();