Skip to content
Jolan Rensen edited this page May 9, 2022 · 1 revision

Getting started

Inspired by ScalaTuplesInKotlin, the API introduces a lot of helper- extension functions to make working with Scala Tuples a breeze in your Kotlin Spark projects. While working with data classes is encouraged, for pair-like Datasets / RDDs / DStreams Scala Tuples are recommended, both for the useful helper functions, as well as for Spark performance. To enable these features simply add

import org.jetbrains.kotlinx.spark.api.tuples.*

to the start of your file.

Creation

Tuple creation can be done in the following manners:

val a: Tuple2<Int, Long> = tupleOf(1, 2L)
val b: Tuple3<String, Double, Int> = t("test", 1.0, 2)
val c: Tuple3<Float, String, Int> = 5f X "aaa" X 1

NOTE: While the X method is the quickest way to create a tuple, some caution is necessary, as

tupleOf(1) X 2 !== tupleOf(tupleOf(1), 2)

but due to the way the infix method works:

tupleOf(1) X 2 == tupleOf(1, 2)

Expanding / merging Tuples

Tuples can be expanded and merged like this:

// expand
tupleOf(1, 2).appendedBy(3) == tupleOf(1, 2, 3)
tupleOf(1, 2) + 3 == tupleOf(1, 2, 3)
tupleOf(2, 3).prependedBy(1) == tupleOf(1, 2, 3)
1 + tupleOf(2, 3) == tupleOf(1, 2, 3)

// merge
tupleOf(1, 2) concat tupleOf(3, 4) == tupleOf(1, 2, 3, 4)
tupleOf(1, 2) + tupleOf(3, 4) == tupleOf(1, 2, 3, 4)

// extend tuple instead of merging with it
tupleOf(1, 2).appendedBy(tupleOf(3, 4)) == tupleOf(1, 2, tupleOf(3, 4))
tupleOf(1, 2) + tupleOf(tupleOf(3, 4)) == tupleOf(1, 2, tupleOf(3, 4))

NOTE: Prepending a tuple with a String might result in unexpected behavior like this, since String has the operator fun plus(other: Any?):

"some string" + tupleOf(1, 2) == "some string(1,2)"

In these cases you can turn to

tupleOf(1, 2).prependedBy("some string") == tupleOf("some string", 1, 2)

Empty Tuple

The concept of EmptyTuple from Scala 3 is also already present:

tupleOf(1).dropLast() == tupleOf() == emptyTuple() == EmptyTuple

Helper functions

Finally, all these tuple helper functions are also baked in:

  • componentX()
    • for destructuring: val (a, b) = tuple
  • contains(x)
    • for if (x in tuple) { ... }
  • iterator()
    • for for (x in tuple) { ... }
    • generalizes types to smallest common ancestor
  • asIterable()
    • generalizes types to smallest common ancestor
  • size
  • get(n) / get(i..j)
    • for tuple[1] / tuple[i..j]
    • returns single item or list of items
    • generalizes types to smallest common ancestor
    • can throw IndexOutOfBoundsException
  • getOrNull(n) / getOrNull(i..j)
    • same as get(n), but returns null instead of throwing an exception
  • getAs<T>(n) / getAs<T>(i..j)
    • returns a single item or list of items cast to T
    • can throw ClassCastException and IndexOutOfBoundsException
  • getAsOrNull<T>(n) / getAsOrNull<T>(i..j)
    • same as getAs<T>(n) but returns null instead of throwing an exception
  • copy(_1 = ..., _5 = ...)
    • similar to datasets, this returns a copy of the Tuple with only the provided arguments replaced
  • first() / last()
  • _1, _6 etc. (instead of _1(), _6())
  • zip
    • zips two tuples as one large Tuple of Tuple2s
    • is infix
    • on different sizes, the smallest size is kept
  • dropLast() / dropFirst()
    • returns a new tuple without the first or last element
    • same as dropLast1() / dropFirst1()
  • dropN() / dropLastN()
    • returns a new tuple with the first or last N elements dropped
    • used like drop11()
    • drop0() simply copies the tuple
    • returns EmptyTuple if all elements are dropped
  • takeN() / takeLastN()
    • returns a new tuple with the first or last N elements dropped
    • used like take11()
    • take0() simply returns EmptyTuple
  • splitAtN()
    • returns a Tuple2 with the original split at position N
    • for:
val a: Tuple3<Int, Double, String> = tupleOf(1, 2.0, "3.0")
val (c: Tuple2<Int, Double>, d: Tuple1<String>) = a.splitAt2()
  • can also return EmptyTuple when splitAt0()
  • map
    • generalizes types to smallest common ancestor
    • can be used to convert all values in a tuple at once
  • cast
    • used to cast contents of a tuple
    • used like tuple.cast<Int, String, Int>()
    • can throw ClassCastException