Skip to content

HDDS-6021: EC: Client side exclude nodes list should expire after certain time period or based on the list size.#2973

Merged
umamaheswararao merged 8 commits intoapache:HDDS-3816-ecfrom
umamaheswararao:HDDS-6021
Jan 18, 2022
Merged

HDDS-6021: EC: Client side exclude nodes list should expire after certain time period or based on the list size.#2973
umamaheswararao merged 8 commits intoapache:HDDS-3816-ecfrom
umamaheswararao:HDDS-6021

Conversation

@umamaheswararao
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Introduced a daemon in exclude list class to expire the nodes after certain time period. That time could be generally node expiry time.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6021

How was this patch tested?

Added test and I will be adding few more tests.


public Set<DatanodeDetails> getDatanodes() {
return datanodes;
return datanodes.keySet();
Copy link
Copy Markdown
Contributor

@sodonnel sodonnel Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Map docs:

Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined.

I wonder if it is safe to just return the keySet here, which could get modified concurrently while it is being used elsewhere. It might be safer to construct a new Set from the keySet, and return that, so it stands alone.

If we do do that, then I wonder if we could simplify this entire change, and just remove the exipred nodes from the list when it is requested, eg:

Set<DatanodeDetails> nodes = new HashSet<>();
Iterator it = datanodes.entrySet().iterator();
while (it.hasNext()) {
  Entry<> e = it.next()
  if (e.getValue() > timeout) {
    it.remove();
  } else {
    nodes.add(e.getKey());
  }
}
return nodes;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have removed background thread now.

isMultipart, info, unsafeByteBufferConversion, xceiverClientFactory,
openID);
assert replicationConfig instanceof ECReplicationConfig;
getExcludeList()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my earlier comment, which may allow us to get rid of the background thread.

However if we need to keep the thread, I think the responsibility for starting the thread should be internal to the ExcludeList class, rather than having to ensure the user of the class needs to know about it, and start it. Should we not make ExcludeList start its own thread via its constructor or on first use?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above reply.


@Config(key = "exclude.nodes.expiry.time",
defaultValue = "600000",
description = "Ozone EC client to remove the node from the exclude" +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the expiry apply to both EC and non-EC clients?

The description doesn't read nicely for me - can we reword to something like:

Time after which an excluded node is reconsidered for writes. If the value is zero, the node is excluded for the life of the client


public ExcludeList() {
datanodes = new HashSet<>();
datanodes = new HashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be a ConcurrentHashMap (or synchronize all public methods in the class), as we could run into trouble it we add a datanode while iterating the list to remove entries.

@sodonnel
Copy link
Copy Markdown
Contributor

Thanks for the changes on this - the code looks good now, I but I think we need to switch the map to a concurrent hash map - could you have a think about that please?

@umamaheswararao
Copy link
Copy Markdown
Contributor Author

Thanks @sodonnel , currently adding nodes would not happen concurrently (And also ExcludeList is not shared between KeyOutPutStreams, its per KeyOutPutStream). We do invoke addDns methods in handleStripeFailure checks, that will happen only after stripe write. In non EC flow, anyway we are not using this currently.
But for safety and defensive programming, it may be a good idea to change that to ConcurrentHMap.

Copy link
Copy Markdown
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@umamaheswararao umamaheswararao merged commit aaf6f17 into apache:HDDS-3816-ec Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants