Releases: HyukjinKwon/spark-connect-ruby
v0.2.0
Adds Structured Streaming and Declarative Pipelines (Spark 4.1+), plus
temporary views, the catalog create_table family, new_session / interrupts /
operation tags, and assorted DataFrame additions (with_watermark,
repartition_by_range, checkpoint, col_regex, to_json, ...). Regenerated
against the Spark Connect 4.1.0 protocol.
Published to RubyGems: https://rubygems.org/gems/spark-connect/versions/0.2.0
See CHANGELOG.md for the full list. Not yet supported: UDFs, foreach/foreachBatch,
and MLlib-over-Connect.
Full Changelog: v0.1.0...v0.2.0
v0.1.0
First release of spark-connect, a pure-Ruby client for Apache Spark Connect.
Highlights:
- PySpark-style DataFrame API (select/filter/join/group_by/agg/window/SQL/...)
- Column expressions and a broad function library (
SparkConnect::F) - DataFrameReader/Writer (CSV, JSON, Parquet, ORC, JDBC, tables) + v2 writer
- Catalog, runtime config, observations, full Spark SQL type system
- Apache Arrow result decoding over a resilient gRPC client
- Targets the Spark Connect 4.0 protocol (works with 3.5+ servers)
Documentation: https://hyukjinkwon.github.io/spark-connect-ruby/
See CHANGELOG.md for the full list.
Full Changelog: https://github.com/HyukjinKwon/spark-connect-ruby/commits/v0.1.0