Skip to content

Commit

Permalink
Merge pull request #3 from audienceproject/scala3
Browse files Browse the repository at this point in the history
Upgrade to Scala 3
  • Loading branch information
jacobfi committed Mar 22, 2024
2 parents df0fe83 + ccaa740 commit 111841d
Show file tree
Hide file tree
Showing 41 changed files with 751 additions and 1,223 deletions.
56 changes: 36 additions & 20 deletions README.md
Expand Up @@ -7,77 +7,93 @@ Single node, in-memory DataFrame analytics library.
* Fluent expression DSL
* Immutable public API; `Array` under the hood

## News

* 2024-03-22: Upgraded to Scala 3 with version `0.2.0`. Some parts of the source code have been completely rewritten,
but the feature set remains the same. The intention is for Crossbow to be a Scala 3 library going forward.

## Installing

The library is available through Maven Central.

SBT style dependency: `"com.audienceproject" %% "crossbow" % "latest"`

# API
```scala
import com.audienceproject.crossbow.DataFrame
import com.audienceproject.crossbow.Implicits._

```scala 3
import com.audienceproject.crossbow.{*, given}

val data = Seq(("a", 1), ("b", 2), ("c", 3))
val df = DataFrame.fromSeq(data)

df.printSchema()

/**
* _0: String
* _0: string
* _1: int
*/

df.as[(String, Int)].foreach(println)

/**
* ("a", 1)
* ("b", 2)
* ("c", 3)
* (a, 1)
* (b, 2)
* (c, 3)
*/
```

## Transforming
```scala

```scala 3
val df = Seq((1, 2), (3, 4)).toDataFrame("x", "y")
df.select($"x" + $"y" / 2d as "avg", ($"x", $"y") as "tuple")

// Lambda functions
val toUpper = lambda[String, String](_.toUpperCase)
df.select(toUpper($"a") as "upperCaseA")
val pythagoras = lambda[(Int, Int), Double]:
(a, b) => math.sqrt(a * a + b * b)
df.select(pythagoras($"x", $"y"))
```

## Filtering
```scala

```scala 3
df.filter($"x" >= 2 && $"y" % 2 =!= 0)
```

## Grouping
```scala
df.groupBy($"someKey").agg(sum($"x") / count($"x") as "avg", collect($"x") as "xs")

```scala 3
val df = Seq(("foo", 1), ("foo", 2), ("bar", 3)).toDataFrame("someKey", "x")
df.groupBy($"someKey").agg(sum($"x") / count() as "avg", collect($"x") as "xs")

// Custom aggregators
val product = reducer[Int, Int](1)(_ * _)
df.groupBy($"someKey").agg(product($"x") as "product")
```

## Sorting
```scala

```scala 3
df.sortBy($"x")

// Sorting on 'x' first, then 'y'
df.sortBy(($"x", $"y"))
df.sortBy(($"someKey", $"x"))

// Sorting with explicit ordering (e.g. integer descending)
import com.audienceproject.crossbow.expr.Order
df.sortBy($"x", Order.by(Ordering.Int.reverse))
df.sortBy($"x")(using Order.by(Ordering.Int.reverse))
```

## Joining
```scala

```scala 3
val otherDf = Seq(("foo", 1, 10d), ("foo", 2, 20d), ("bar", 3, 30d)).toDataFrame("someKey", "x", "y")

// Inner join
df.join(otherDf, $"someKey")

// Inner join on multiple columns
df.join(otherDf, ($"key1", $"key2"))
df.join(otherDf, ($"someKey", $"x"))

// Other join types
import com.audienceproject.crossbow.JoinType
df.join(otherDf, $"someKey", JoinType.LeftOuter)
```
10 changes: 4 additions & 6 deletions build.sbt
Expand Up @@ -2,15 +2,13 @@ organization := "com.audienceproject"

name := "crossbow"

version := "0.1.6"
version := "0.2.0"

scalaVersion := "2.13.6"
crossScalaVersions := Seq(scalaVersion.value, "2.12.12", "2.11.12")
scalaVersion := "3.3.3"

scalacOptions ++= Seq("-deprecation", "-feature", "-language:existentials")
scalacOptions ++= Seq("-deprecation", "-feature", "-language:implicitConversions")

libraryDependencies += "org.scala-lang" % "scala-reflect" % scalaVersion.value
libraryDependencies += "org.scalatest" %% "scalatest-funsuite" % "3.2.0" % "test"
libraryDependencies += "org.scalatest" %% "scalatest-funsuite" % "3.2.17" % "test"

/**
* Maven specific settings for publishing to Maven central.
Expand Down
2 changes: 1 addition & 1 deletion project/build.properties
@@ -1 +1 @@
sbt.version = 1.6.2
sbt.version = 1.9.8

0 comments on commit 111841d

Please sign in to comment.