-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink:fix flink streaming query problem [ Cannot get a client from a closed pool] #6614
Conversation
@stevenzwu Can you have a look? |
* use pool. | ||
*/ | ||
private void checkIsClosedOtherwiseReuse() { | ||
if (connections != null && connections.isClosed()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can confirm whether it is closed before each use. If it is closed, we need to open it again and use it again
@xuzhiwen1255 thanks a lot for the root cause analysis. I agree with your conclusion. Let's discuss what is the right fix though. I am not sure we should add the extra complexity for I was wondering why this doesn't happen for
Both
I guess issue #6455 is probably the firs report of using closeable This issue/PR pointed out the problem of reusing the This is a more fundamental change. Hence, I would like to get more feedbacks from @hililiwei @pvary @rdblue . |
BTW, this usage pattern of
|
I need to think about this some more, but it feels strange to me that something which does not own an object 'closes' it. If I understand correctly the |
We can actually try to solve this problem from the inside of TableLoader, it makes sense to me to clone a table. But I also wonder if: If a catalog is closed, do tables loaded with its internal objects need to be kept available?
The
Please correct me if I make a mistake. I need to continue thinking about this as well. |
@hililiwei I think you have a good question here. if catalog is closed, should |
@@ -56,7 +56,6 @@ public <R> R run(Action<R, C, E> action, boolean retry) throws E, InterruptedExc | |||
C client = get(); | |||
try { | |||
return action.run(client); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove unnecessary whitespace changes.
|
||
protected abstract C reconnect(C client); | ||
|
||
protected abstract void close(C client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move these back.
I think that if a catalog is closed, it's reasonable for tables to stop operating as well. The catalog manages its shared resources and if it chooses to share a connection pool with tables then it makes sense for the tables to no longer be able to connect after the pool is closed. Tables should not own their own connection pools, so some resource needs to manage them and the catalog is a good place to do that. I think the problem is that the |
So from that point of view, cloning tables doesn't seem like a good idea, they cannot share resources. . We should keep the catalog available in the loader, or create a statically shared catalog. |
Yes, that will be bigger discussion.
We can deprecate the we can just get rid of I think this Flink usage pattern of
The problem for @xuzhiwen1255 I definitely think we should close this PR as this is not the right way to fix the problem. |
This means that the table needs access to an open connection pool until the table is closed. I think that closing the JDBC pool before closing the table is the mistake here. I think it is not by chance that we do not have a close method on the general Catalog interface. As a general rule we expect the Catalog to be an easy static wrapper around the resources needed to access the table snapshot pointer. The JDBC and maybe some other Catalog implementations did not do this pushback, and we are in the situation where the different Catalog implementations behave differently. We should standardize the behavior (who is responsible for closing the connection pools). Hive has its own PoolCache to close unused pools, JDBC doesn't have this (if I understand correctly) |
static wrapper can make the object lifecycle management difficult. E.g., Flink needs to unload dynamically loaded classes in user code that often includes connectors. |
What is the best way to use Connection pools in Flink tasks? Like a pool for |
That is a good question. It is a bit challenging. The easier model is share nothing across tasks (e.g. no global static conn pools). Let's say each TM as 8 subtask, each task need to open a client (and connection pool) talking to some external system. Each task is responsible for the ownership and lifecycle of the client. This ties the client/conn pool lifecycle to the Flink subtask lifecycle. Yeah, if it is a global static, I think your suggestion of loading classes in the Flink parent classloader is probably the way to go. But that would requires some meddling of the image packaging of including the connector jars in the |
@rdblue I agree that in this case tables weren't able to connect/refresh after the pool is closed by the catalog. But I feel it should be ok to use the table as read-only (like a
It is not necessarily wrong for I am not saying it is in the most efficient way, as it implies each |
I'm sorry, I've been spending time with my family recently, so I haven't discussed this issue together. I would like to share my opinion.
+1,I think that after the catalog is closed, the table should be closed as well. I think this is a code bug. Suppose we use another catalog that is not currently jdbcCatalog, and it passes in some closeable objects, but the same problem still exists when catalog#close is closed. Therefore, we need to avoid still using the table loaded by the current catalog after the catalog is closed. This will cause some unexpected situations to occur. In fact, some problems have been exposed now.
// Abstract out a table accessor, through the public accessor to access the table,he manages the life cycle of the tables and catalog.
public class TableAccessor implements Closeable {
private final TableLoader tableLoader;
private Table table;
public TableAccessor(TableLoader tableLoader) {
this.tableLoader = tableLoader;
}
private Table lazyTable() {
if (table == null) {
tableLoader.open();
try (TableLoader loader = tableLoader) {
this.table = loader.loadTable();
} catch (IOException e) {
throw new UncheckedIOException("Failed to close table loader", e);
}
}
return table;
}
@Override
public void close() throws IOException {
tableLoader.close();
}
}
// ----- icebergSource ---------
private TableAccessor tableAccessor;
IcebergSource(
TableLoader tableLoader) {
this.tableLoader = tableLoader;
// An accessor is created when an icebergSource is built, and subsequent operations or reference retrieval on the table is obtained by the accessor
tableAccessor = new TableAccessor(tableLoader);
}
private List<IcebergSourceSplit> planSplitsForBatch(String threadName) {
ExecutorService workerPool =
ThreadPools.newWorkerPool(threadName, scanContext.planParallelism());
try {
List<IcebergSourceSplit> splits =
FlinkSplitPlanner.planIcebergSourceSplits(lazyTable(), scanContext, workerPool);
LOG.info(
"Discovered {} splits from table {} during job initialization",
splits.size(),
lazyTable().name());
return splits;
} finally {
workerPool.shutdown();
// close accessor
tableAccessor.close();
}
} For streaming mode, we pass the accessor directly to the ContinuousSplitPlannerImpl, which closes the accessor. private TableAccessor tableAccessor;
public ContinuousSplitPlannerImpl(TableAccessor tableAccessor .......) {
this.tableAccessor =tableAccessor;
.....
}
@Override
public void close() throws IOException {
if (!isSharedPool) {
workerPool.shutdown();
}
tableAccessor.close();
}
In this way, we need to access the table through the accessor of a table. tableLoader is maintained by the accessor. When the accessor is closed, we close the tableLoader to ensure that it is closed correctly and that the reference type in the table will not be unavailable because catalog is closed first. @stevenzwu @pvary @hililiwei What do you think of this plan. |
|
+1 for this.
We should first standardize the behavior. The current approach, in my opinion, is not the best way to solve the problem. From the above discussion, it seems that we have the following two solutions:
Did I miss anything? |
@hililiwei, this won't work. as a table may need to access the client pools that is closed by catalog
this is a feasible solution for fixing the current problem with TableLoader. And we have to fix the problem for
This could be the proper long-term solution. we need to think carefully here and make sure the new API makes sense. |
@xuzhiwen1255 regarding your |
@stevenzwu @hililiwei I tried to change it, please review it for me, thank you. |
...ink/src/main/java/org/apache/iceberg/flink/source/enumerator/ContinuousSplitPlannerImpl.java
Show resolved
Hide resolved
flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/TableLoader.java
Outdated
Show resolved
Hide resolved
@pvary Can you take a look? Thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really close, except for one nit comment on the unnecessary whitespace/empty line change.
@hililiwei @pvary do you have other comments for this PR?
@xuzhiwen1255 can you rebase this PR? we will need to update the REST catalog from PR #7044 |
@stevenzwu No problem, the modification has been completed. |
thanks @xuzhiwen1255 for the contribution . can you create a backport PR for 1.14 and 1.15? |
@stevenzwu Of course I would. |
Solve the problems of #6455
cause :