Description
After upgrading to pgbulkinsert:9.0.0, the previous ability to use saveAll with a Stream has been removed.
This makes it impossible to use the library in true streaming scenarios (e.g. reading large files row-by-row from Excel/CSV without collecting everything into memory).
In previous versions, it was possible to pipe a lazy stream directly into saveAll. In 9.0.0 only Iterable is accepted, which forces full materialization of the dataset.
This effectively breaks streaming ingestion pipelines and increases memory usage for large datasets.
Problem/Example
implementation("org.dhatim:fastexcel-reader:0.20.0")
implementation("de.bytefish:pgbulkinsert:9.0.0")
fun example() {
val mapper = PgMapper.forClass(Entity::class.java) // skipped map fields to columns for example
val writer = PgBulkInsert.PgBulkWriter(mapper)
ReadableWorkbook(Files.newInputStream(Path.of("entities.xlsx"))).use { wb ->
wb.firstSheet.openStream().use { rows -> // <- Stream<Row>
dataSource.connection.use {
// Argument type mismatch: actual type is 'Stream<T?>!', but '(Mutable)Iterable<Entity!>!' was expected.
writer.saveAll(it, "schema.table", rows.map { mapRowToEntity(it) })
}
}
}
}
Impact
- Breaks streaming ingestion workflows
- Forces collect() / materialization into memory
- Makes it unsafe for large file imports (Excel/CSV, ETL pipelines)
- Regression compared to previous API flexibility
Expected behavior
saveAll should support streaming input, e.g.:
- Stream
- or at least provide an overload:
saveAll(Connection conn, String table, Stream<T> stream)
Workaround
Currently the only option is:
writer.saveAll(conn, "schema.table", Iterable { stream.iterator() })
Description
After upgrading to pgbulkinsert:9.0.0, the previous ability to use saveAll with a Stream has been removed.
This makes it impossible to use the library in true streaming scenarios (e.g. reading large files row-by-row from Excel/CSV without collecting everything into memory).
In previous versions, it was possible to pipe a lazy stream directly into saveAll. In 9.0.0 only Iterable is accepted, which forces full materialization of the dataset.
This effectively breaks streaming ingestion pipelines and increases memory usage for large datasets.
Problem/Example
Impact
Expected behavior
saveAll should support streaming input, e.g.:
Workaround
Currently the only option is: