Skip to content

Interoperability between Polars and Clickhouse

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENCE-APACHE
MIT
LICENCE-MIT
Notifications You must be signed in to change notification settings

cpg314/polarhouse

Repository files navigation

Polarhouse connects together

More specifically, it allows:

  • inserting Polars Dataframes into Clickhouse tables (and creating these if necessary).
  • and vice-versa retrieving Clickhouse query results as Polars Dataframes.

Polarhouse uses the native TCP Clickhouse protocol via the klickhouse crate. It maps the Polars and Clickhouse types, and builds Polars Series (resp. Clickhouse columns) after transforming the data if necessary.

Polars
┌──────────┬─────────┬──────┬───────────────────────────┐
│ name     ┆ is_rich ┆ age  ┆ address                   │
│ ---      ┆ ---     ┆ ---  ┆ ---                       │
│ str      ┆ u8      ┆ i32  ┆ struct[2]                 │
╞══════════╪═════════╪══════╪═══════════════════════════╡
│ Batman   ┆ 1       ┆ 30   ┆ {{"Chicago","IL"},"USA"}  │
│ Superman ┆ null    ┆ null ┆ {{"New York","NY"},"USA"} │
└──────────┴─────────┴──────┴───────────────────────────┘
Clickhouse
┌─name─────┬─is_rich─┬──age─┬─address.city.city─┬─address.city.state─┬─address.country─┐
│ Batman   │ true    │   30 │ Chicago           │ IL                 │ USA             │
│ Superman │ null    │ null │ New York          │ NY                 │ USA             │
└──────────┴─────────┴──────┴───────────────────┴────────────────────┴─────────────────┘

Polars to Clickhouse

Rust

let ch = klickhouse::Client::connect("localhost:9000", Default::default()).await?;

let df: DataFrame = ...

// Deduce table schema from the dataframe
let table = polarhouse::ClickhouseTable::from_polars_schema(table_name, df.schema(), [])?;

// Create Clickhouse table corresponding to the Dataframe (optional)
table.create(&ch, TableCreateOptions { primary_keys: &["name"] , ..Default::default() }).await?;

// Insert dataframe contents into table
table.insert_df(df, &ch).await?;

Clickhouse to Polars

Rust

let ch = klickhouse::Client::connect("localhost:9000", Default::default()).await?;

// Retrieve Clickhouse query results as a Dataframe.
let df: DataFrame = polarhouse::get_df_query(
    klickhouse::SelectBuilder::new(table_name).select("*"),
    Default::default(),
    &ch,
).await?;

Python

from polarhouse import Client
client = await Client.connect("localhost:9000", caching=True)
df = await self.client.get_df_query("SELECT * from superheros")

Status

This is for now only a proof of concept.

Alternative solutions

Tests

$ docker run --network host --rm --name clickhouse clickhouse/clickhouse-server:latest
$ cargo nextest run -r --nocapture

Supported types

  • Integers
  • Floating points
  • Strings
  • Booleans
  • Categorical (Polars) / Low cardinality (Clickhouse)
  • Structs (Polars), which get flattened into Clickhouse, with fields names separated by .
  • Nullables
  • Lists (Polars) / Arrays (Clickhouse)
  • UUIDs (mapped to Strings in Polars)
  • Arrays (Polars)
  • Tuples
  • DateTime
  • Time
  • Duration
  • ...