Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMBARI-23941] Updating serviceLevelParams and clusterLevelParams in server-side metadata holder both when adding and removing a service/property #1366

Merged
merged 4 commits into from May 30, 2018

Conversation

smolnar82
Copy link
Contributor

@smolnar82 smolnar82 commented May 24, 2018

What changes were proposed in this pull request?

When a service has been removed from the cluster its serviceLevelParams stayed in server side's metadata holder but removed from the agent side cache. Next time when someone wanted to re-install the same service there were no metadata change event triggered so that the agent side code failed due to missing serviceLevelParams for that service (the same is true for clusterLevelParams metadata).

How was this patch tested?

Added unit test code to cover these cases; latest JUnit test results in ambari-server:

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:12 h
[INFO] Finished at: 2018-05-24T03:33:29+02:00
[INFO] Final Memory: 107M/1592M
[INFO] ------------------------------------------------------------------------

In addition to uni testing the following integration test steps were executed:

  1. installed Ambari 2.7.0.0-562
  2. replaced ambari-server.jar with mine after modifying and building the code
  3. deployed a cluster: HDFS via BP and then ZK and Kafka on UI
  4. removed ZK and Kafka
  5. added ZK and Kafka again without any issue (when reproducing the issue I as not able to re-install any of them)

…erver-side metadata holder both when adding and removing a service/property
@smolnar82 smolnar82 self-assigned this May 24, 2018
@asfgit
Copy link

asfgit commented May 24, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/2437/
Test PASSed.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except a typo.

@@ -87,7 +87,16 @@ public boolean updateServiceLevelParams(SortedMap<String, MetadataServiceInfo> u
break;
}
}

for (String key : serviceLevelParams.keySet()) {
if (!update.containsKey(key) || !update.get(key).equals(update.get(key))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in this line: update.get(key) is compared to itself (update.get(key)) instead of serviceLevelParams.get(key).

Further, I think this comparison can be removed: values for keys present in both maps are already compared in the first loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm...this is weird..I fixed this issue long before I created the PR (it also broker unit tests). Let me check my stash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

break;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is almost identical to updateServiceLevelParams. (The duplication is not introduced in this change, but is highlighted by the fact that the same problem needs to be fixed in two places.) Can you please extract a method that takes 2 maps and updates one of them from the other? Then please delegate to that method from these two. Please make sure to keep the version that uses null-safe comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea; I'll extract it to a util method (maybe an existing/new class; will check this out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@rlevas
Copy link
Contributor

rlevas commented May 24, 2018

@jonathan-hurley , this seems like it may be familiar to you. If so, then maybe there is a common solution to your issues.

@jonathan-hurley
Copy link
Member

@rlevas No, my problem is separate from this one. This is simply keeping the agent and server caches in sync. The issue I'm having is that when adding or removing services/components, a massive event needs to be fired fill with data - and it doesn't happen automatically.

@rlevas
Copy link
Contributor

rlevas commented May 24, 2018

@jonathan-hurley , Thanks for commenting so quickly.

@zeroflag
Copy link
Contributor

Looks good, but can you check if we need to guarantee thread safety somehow in the update method or do we always call this on a single thread?

@smolnar82
Copy link
Contributor Author

@zeroflag I'll take care of proper locking being applied (even if it was modified on 1 thread it does not harm).

@smolnar82
Copy link
Contributor Author

smolnar82 commented May 24, 2018

@adoroszlai @zeroflag @rlevas @mpapirkovskyy please review again; thx!

@asfgit
Copy link

asfgit commented May 24, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/2447/
Test PASSed.

@adoroszlai
Copy link
Contributor

@smolnar82 Thanks for the changes, they look good with respect to what was requested.

However, on second thought, I have some concern about how this entire change would work with "partial" updates. For example, the following method passes service-level params only for a single service. With your change, wouldn't this kind of update get rid of all other existing services from the metadata?

MetadataCluster metadataCluster = new MetadataCluster(null,
getMetadataServiceLevelParams(cl.getService(serviceName)),
new TreeMap<>(),
null);

Similarly, there are cases where empty map in update means "no changes" in that part of the metadata:

MetadataCluster metadataCluster = new MetadataCluster(null,
new TreeMap<>(),
getMetadataClusterLevelConfigsParams(cl, stackId),
null);

MetadataCluster metadataCluster = new MetadataCluster(null,
getMetadataServiceLevelParams(cl),
new TreeMap<>(),
null);

Wouldn't these updates clear service- or cluster-level params?

@smolnar82
Copy link
Contributor Author

smolnar82 commented May 24, 2018

@adoroszlai
Thanks for your comment. Let me think it over again and give a meaningful answer to make your concerns go away or change the code if needed.

In the meantime I'd really appreciate if @mpapirkovskyy would review this change and let me know if this is the approach what we should follow. Thanks in advance!

Copy link
Contributor

@mpapirkovskyy mpapirkovskyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tend to think that agent change to not delete params may be better/easier for now, so it will be in sync with server.

I don't think some logic in agent scripts can rely on presence/absence of param rather than its value.

@aonishuk any thoughts?

private Set<String> statusCommandsToRun = new HashSet<>();
private SortedMap<String, MetadataServiceInfo> serviceLevelParams = new TreeMap<>();
private SortedMap<String, String> clusterLevelParams = new TreeMap<>();
private final static Lock LOCK = new ReentrantLock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for static lock?
There are multiple instances of this object and all of them are going use single lock instance.
It looks like theres better place for lock, and again not for static one:
org.apache.ambari.server.agent.stomp.MetadataHolder#handleUpdate

Another possibility is to use ConcurrentSkipListMap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to have the lock a simple one (i.e. not static). However I'd keep it here to make sure we lock only the update piece and do not spend more time on locking where it is not necessary.

final boolean changed = !Objects.equals(mapToBeUpdated, updatedMap);

if (changed) {
mapToBeUpdated.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @adoroszlai that this is going to break partial metadata updates.
So we should either always send full metadata which will require additional modifications.
Or change logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it soon

….e. service or cluster level params should not be touched) and for the case when service level params are updated for a single service only (in case of cluter level params we always use full metadata)
@smolnar82
Copy link
Contributor Author

@adoroszlai @mpapirkovskyy @zeroflag @rlevas please review my changes; thanks in advance!

I re-executed the integration test I described above and re-run the unit tests in ambari-server:

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 44:11 min
[INFO] Finished at: 2018-05-29T08:01:32+02:00
[INFO] Final Memory: 106M/1505M
[INFO] ------------------------------------------------------------------------

@asfgit
Copy link

asfgit commented May 29, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/2475/
Test PASSed.

@smolnar82
Copy link
Contributor Author

@adoroszlai @mpapirkovskyy @zeroflag @rlevas please review my changes; thanks in advance!

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except minor nit about locking.

public boolean updateServiceLevelParams(SortedMap<String, MetadataServiceInfo> update, boolean fullMetadataInUpdatedMap) {
if (update != null) {
if (this.serviceLevelParams == null) {
this.serviceLevelParams = new TreeMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these map creations should also be guarded by lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right (this was a last minute movement at certainly not thought over); it's fixed now.

@asfgit
Copy link

asfgit commented May 30, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/2502/
Test FAILed.
Test FAILured.

@smolnar82
Copy link
Contributor Author

retest this please

@asfgit
Copy link

asfgit commented May 30, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/2505/
Test FAILed.
Test FAILured.

@smolnar82 smolnar82 changed the title AMBARI-23941. Updating serviceLevelParams and clusterLevelParams in server-side metadata holder both when adding and removing a service/property [AMBARI-23941] Updating serviceLevelParams and clusterLevelParams in server-side metadata holder both when adding and removing a service/property May 30, 2018
Copy link
Contributor

@aonishuk aonishuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpapirkovskyy the solution to do this on agent sounds hacky to me, should be done by server by all logic.

However it's possible, but I think it would be better the server and agent data are in synch to avoid other issues caused by this hack.

@smolnar82
Copy link
Contributor Author

retest this please

@adoroszlai
Copy link
Contributor

@smolnar82 I think the fix for broken unit test needs to be merged from origin/trunk and pushed here to successfully "retest".

@asfgit
Copy link

asfgit commented May 30, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/2512/
Test PASSed.

@smolnar82 smolnar82 merged commit 02da908 into apache:trunk May 30, 2018
@smolnar82
Copy link
Contributor Author

@adoroszlai if that was the case I should not have even seen the error since I did not merge anything from origin/trunk during my patches to avoid force pushing

@smolnar82 smolnar82 deleted the AMBARI-23941 branch May 30, 2018 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants