HDDS-6021: EC: Client side exclude nodes list should expire after certain time period or based on the list size.#2973
Conversation
…tain time period or based on the list size.
|
|
||
| public Set<DatanodeDetails> getDatanodes() { | ||
| return datanodes; | ||
| return datanodes.keySet(); |
There was a problem hiding this comment.
From the Map docs:
Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined.
I wonder if it is safe to just return the keySet here, which could get modified concurrently while it is being used elsewhere. It might be safer to construct a new Set from the keySet, and return that, so it stands alone.
If we do do that, then I wonder if we could simplify this entire change, and just remove the exipred nodes from the list when it is requested, eg:
Set<DatanodeDetails> nodes = new HashSet<>();
Iterator it = datanodes.entrySet().iterator();
while (it.hasNext()) {
Entry<> e = it.next()
if (e.getValue() > timeout) {
it.remove();
} else {
nodes.add(e.getKey());
}
}
return nodes;
There was a problem hiding this comment.
Good point. I have removed background thread now.
| isMultipart, info, unsafeByteBufferConversion, xceiverClientFactory, | ||
| openID); | ||
| assert replicationConfig instanceof ECReplicationConfig; | ||
| getExcludeList() |
There was a problem hiding this comment.
See my earlier comment, which may allow us to get rid of the background thread.
However if we need to keep the thread, I think the responsibility for starting the thread should be internal to the ExcludeList class, rather than having to ensure the user of the class needs to know about it, and start it. Should we not make ExcludeList start its own thread via its constructor or on first use?
There was a problem hiding this comment.
Same as above reply.
...ommon/src/test/java/org/apache/hadoop/hdds/scm/container/common/helpers/TestExcludeList.java
Outdated
Show resolved
Hide resolved
...ds/common/src/main/java/org/apache/hadoop/hdds/scm/container/common/helpers/ExcludeList.java
Outdated
Show resolved
Hide resolved
...ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/BlockOutputStreamEntryPool.java
Show resolved
Hide resolved
|
|
||
| @Config(key = "exclude.nodes.expiry.time", | ||
| defaultValue = "600000", | ||
| description = "Ozone EC client to remove the node from the exclude" + |
There was a problem hiding this comment.
Should the expiry apply to both EC and non-EC clients?
The description doesn't read nicely for me - can we reword to something like:
Time after which an excluded node is reconsidered for writes. If the value is zero, the node is excluded for the life of the client
...ds/common/src/main/java/org/apache/hadoop/hdds/scm/container/common/helpers/ExcludeList.java
Show resolved
Hide resolved
|
|
||
| public ExcludeList() { | ||
| datanodes = new HashSet<>(); | ||
| datanodes = new HashMap<>(); |
There was a problem hiding this comment.
I think this needs to be a ConcurrentHashMap (or synchronize all public methods in the class), as we could run into trouble it we add a datanode while iterating the list to remove entries.
|
Thanks for the changes on this - the code looks good now, I but I think we need to switch the map to a concurrent hash map - could you have a think about that please? |
|
Thanks @sodonnel , currently adding nodes would not happen concurrently (And also ExcludeList is not shared between KeyOutPutStreams, its per KeyOutPutStream). We do invoke addDns methods in handleStripeFailure checks, that will happen only after stripe write. In non EC flow, anyway we are not using this currently. |
What changes were proposed in this pull request?
Introduced a daemon in exclude list class to expire the nodes after certain time period. That time could be generally node expiry time.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6021
How was this patch tested?
Added test and I will be adding few more tests.