Skip to content

Releases: HyukjinKwon/spark-connect-ruby

v0.2.0

10 Jun 10:38

Choose a tag to compare

Adds Structured Streaming and Declarative Pipelines (Spark 4.1+), plus
temporary views, the catalog create_table family, new_session / interrupts /
operation tags, and assorted DataFrame additions (with_watermark,
repartition_by_range, checkpoint, col_regex, to_json, ...). Regenerated
against the Spark Connect 4.1.0 protocol.

Published to RubyGems: https://rubygems.org/gems/spark-connect/versions/0.2.0

See CHANGELOG.md for the full list. Not yet supported: UDFs, foreach/foreachBatch,
and MLlib-over-Connect.

Full Changelog: v0.1.0...v0.2.0

v0.1.0

10 Jun 09:35

Choose a tag to compare

First release of spark-connect, a pure-Ruby client for Apache Spark Connect.

Highlights:

  • PySpark-style DataFrame API (select/filter/join/group_by/agg/window/SQL/...)
  • Column expressions and a broad function library (SparkConnect::F)
  • DataFrameReader/Writer (CSV, JSON, Parquet, ORC, JDBC, tables) + v2 writer
  • Catalog, runtime config, observations, full Spark SQL type system
  • Apache Arrow result decoding over a resilient gRPC client
  • Targets the Spark Connect 4.0 protocol (works with 3.5+ servers)

Documentation: https://hyukjinkwon.github.io/spark-connect-ruby/

See CHANGELOG.md for the full list.

Full Changelog: https://github.com/HyukjinKwon/spark-connect-ruby/commits/v0.1.0