Skip to content

Improve connection reuse on HBaseIO.ReadAll #20119

@damccorm

Description

@damccorm

The recent refactor of HBase.ReadAll in BEAM-9279 creates new connections in the @ProcessElement method (once per element), in the case that a pipeline is used on streaming mode this could be costly so we should find a way to cache and reuse connections to avoid both slow start of reads and saturating the clusters.

Notice that this is an ongoing issue for DoFn based IOs that manifested first on Writes for JdbcIO BEAM-7230 and was recently discussed too in the context of the CassandraIO refactor: #10546 (comment)

Imported from Jira BEAM-9554. Original Jira may contain additional context.
Reported by: iemejia.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions