Expose some private methods as public#19
Expose some private methods as public#19The-Alchemist wants to merge 1 commit intobytefish:masterfrom
Conversation
Many APIs only give you access to one entity at a time, even if there is some kind of stream backing it (e.g., JDBC, Kafka).
So, I was hoping to inline and modify the `PgBulkInsert::saveAll()` method and call the following directly:
```java
CopyManager cpManager = connection.getCopyAPI();
CopyIn copyIn = cpManager.copyIn(getCopyCommand());
try (PgBinaryWriter bw = new PgBinaryWriter()) {
// Wrap the CopyOutputStream in our own Writer:
bw.open(new PGCopyOutputStream(copyIn));
// Insert Each Column:
entities.forEach(entity -> this.saveEntity(bw, entity));
}
```
However, two methods are private and can't be called directly.
Exposing them would allow users to control the opening and closing of a `PgBinaryWriter`, allowing usage with other APIs that provide messages one-by-one.
|
I don't think it would be the right abstraction. Did you take a look at the BulkProcessor: https://bytefish.de/blog/pgbulkinsert_bulkprocessor/? |
|
@bytefish : Good call, I didn't know about |
|
@The-Alchemist Hey, I didn't say it is useful in your use case. 😁 If it doesn't fit your needs, just let me know and maybe we can come up with something. 😎 |
|
Actually, yes, @bytefish , not that I looked it, It seem neither |
|
You are right. I think you want to open a connection and stream the data into the Postgres database, right? That's basically what I will add a method, that suits your needs, but I won't be able before the weekend. |
|
Thank you, @bytefish ! For now, I've resorted to using reflection to I'm not sure if it's just wrapping and exposing a |
|
@The-Alchemist You are totally right. I did a quick research, it's possible to turn it into a Stream, but it is complicated. I will adjust the API and come up with something. 🤔 And if nothing works, I will just make the methods public. 😎 |
|
@bytefish : thank you, you're awesome! |
Initial Refactoring to decouple Column Mapping from actual Postgres Connection handling. Increasing Version Number to 2.0 due to breaking API changes. Refactoring Tests.
|
@The-Alchemist I first decoupled the actual column mapping and the insertion methods. That leaves us with just two tiny methods in the class in question. I am now thinking about how to go further. I am thinking about the following: The only nitpick with this kind of refactoring would be... I wrote this library for Apache Flink (https://flink.apache.org/) experiments I did. In there you have a Stream of incoming events, that I want to bulk write to the database. And I don't want to create a PgBulkInsert class for every Batch I am going to write. But actually... You could cache the most expensive part (the Column Mapping creation) and the PgBulkInsert class will be a tiny wrapper. I don't know if it is expensive to instantiate it for every Batch (I don't think so). What do you think? |
|
I think it could look something like this: // Copyright (c) Philipp Wagner. All rights reserved.
// Licensed under the MIT license. See LICENSE file in the project root for full license information.
package de.bytefish.pgbulkinsert;
import de.bytefish.pgbulkinsert.exceptions.SaveEntityFailedException;
import de.bytefish.pgbulkinsert.mapping.AbstractMapping;
import de.bytefish.pgbulkinsert.pgsql.PgBinaryWriter;
import org.postgresql.PGConnection;
import org.postgresql.copy.CopyIn;
import org.postgresql.copy.CopyManager;
import org.postgresql.copy.PGCopyOutputStream;
import java.sql.SQLException;
import java.util.stream.Stream;
public class BulkWriter<TEntity> implements IBulkWriter<TEntity> {
private final CopyIn copyIn;
private final PgBinaryWriter writer;
private final AbstractMapping<TEntity> mapping;
public BulkWriter(CopyIn copyIn, PgBinaryWriter writer, AbstractMapping mapping)
{
this.copyIn = copyIn;
this.writer = writer;
this.mapping = mapping;
open();
}
private void open() {
writer.open(new PGCopyOutputStream(copyIn));
}
@Override
public void save(TEntity entity) throws SQLException {
saveEntity(writer, entity);
}
@Override
public void saveAll(Stream<TEntity> entities) throws SQLException {
entities.forEach(entity -> this.saveEntity(writer, entity));
}
private void saveEntity(PgBinaryWriter bw, TEntity entity) throws SaveEntityFailedException {
synchronized (bw) {
// Start a New Row:
bw.startRow(mapping.getColumns().size());
// Iterate over each column mapping:
mapping.getColumns().forEach(column -> {
try {
column.getWrite().invoke(bw, entity);
} catch (Exception e) {
throw new SaveEntityFailedException(e);
}
});
}
}
public static <TEntity> BulkWriter<TEntity> create(PGConnection connection, AbstractMapping mapping) throws SQLException {
// Get the Copy API from the PgConnection:
CopyManager copyManager = connection.getCopyAPI();
// Create the CopyIn:
CopyIn copyIn = copyManager.copyIn(mapping.getCopyCommand());
// Create the Binary Writer:
PgBinaryWriter writer = new PgBinaryWriter();
// And return the PgBulkInsert:
return new BulkWriter<>(copyIn, writer, mapping);
}
@Override
public void close() throws Exception {
if(writer != null) {
writer.close();
}
}
} |
|
@The-Alchemist I am not happy with the above, because I particularly don't like the // Cast to the underlying PGConnection:
PGConnection pgConnection = PostgreSqlUtils.getPGConnection(connection);
// The Mapping to use:
AbstractMapping<GeometricEntity> mapping = new GeometricEntityMapping();
// Construct the Bulk Writer:
try(BulkWriter<GeometricEntity> bulkInsert = BulkWriter.create(pgConnection, )) {
// Save the entities:
bulkInsert.saveAll(entities.stream());
} catch (Exception e) {
// Pokemon Exception Handling!
}Of course I can wrap the What do you think? |
|
@The-Alchemist But when designing the API this way, you have to know, that nothing gets written until you call close on the BulkWriter. I don't like this kind of API design, because it leads to people writing issues like: "I am calling save(...), but nothing gets written to Postgres." or even worse it crashes on the Postgres-side. And I have been bitten by So the existing API design is easier to use. |
|
@The-Alchemist What I would suggest for you use case is the following. So what I suggest is, that you create an implementation, that fits your use case best and you have full control over the API design. I am thinking of something like the following That would make both of us happy. 👍 class TheAlchemistBulkInsert<TEntity> implements AutoCloseable {
private final CopyIn copyIn;
private final PgBinaryWriter writer;
private final AbstractMapping<TEntity> mapping;
public TheAlchemistBulkInsert(CopyIn copyIn, PgBinaryWriter writer, AbstractMapping mapping)
{
this.copyIn = copyIn;
this.writer = writer;
this.mapping = mapping;
open();
}
private void open() {
writer.open(new PGCopyOutputStream(copyIn));
}
public void save(TEntity entity) throws SQLException {
saveEntity(writer, entity);
}
public void saveAll(Stream<TEntity> entities) throws SQLException {
entities.forEach(entity -> this.saveEntity(writer, entity));
}
private void saveEntity(PgBinaryWriter bw, TEntity entity) throws SaveEntityFailedException {
synchronized (bw) {
// Start a New Row:
bw.startRow(mapping.getColumns().size());
// Iterate over each column mapping:
mapping.getColumns().forEach(column -> {
try {
column.getWrite().invoke(bw, entity);
} catch (Exception e) {
throw new SaveEntityFailedException(e);
}
});
}
}
public static <TEntity> TheAlchemistBulkInsert<TEntity> create(PGConnection connection, AbstractMapping mapping) throws SQLException {
// Get the Copy API from the PgConnection:
CopyManager copyManager = connection.getCopyAPI();
// Create the CopyIn:
CopyIn copyIn = copyManager.copyIn(mapping.getCopyCommand());
// Create the Binary Writer:
PgBinaryWriter writer = new PgBinaryWriter();
// And return the PgBulkInsert:
return new TheAlchemistBulkInsert<>(copyIn, writer, mapping);
}
@Override
public void close() throws Exception {
if(writer != null) {
writer.close();
}
}
} |
|
All Unit Tests passed. I have released Version 2.0 with the major breaking change. You should now have the possibility to utilize the mappings as shown in the sample above. |
Many APIs only give you access to one entity at a time, even if there is some kind of stream backing it (e.g., JDBC, Kafka).
So, I was hoping to inline and modify the
PgBulkInsert::saveAll()method and call the following directly:However, two methods are private and can't be called directly.
Exposing them would allow users to control the opening and closing of a
PgBinaryWriter, allowing usage with other APIs that provide messages one-by-one.