Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.x Eureka server should provide Status overrides #441

Closed
qiangdavidliu opened this issue Feb 26, 2015 · 2 comments
Closed

2.x Eureka server should provide Status overrides #441

qiangdavidliu opened this issue Feb 26, 2015 · 2 comments
Labels

Comments

@qiangdavidliu
Copy link
Contributor

Concept

It is possible to override Eureka InstanceInfo information but issuing REST API call to any of the Eureka write cluster nodes. The override data will be kept in separate registry, and override information will be applied on the client interest stream.

public interface EurekaOverrideRegistry {
    Observable<Void> registerOverride(InstanceInfoOverride override);
    Observable<Void> removeOverride(String instanceId);

    ChangeNotification<InstanceInfoOverride> forInterest(Interest interests);
}

public class InstanceInfoOverride {
    private Source source;
    private final String instanceId;
    private final Set<Delta> overrides;
    ...
}

It is possible to override any InstanceInfo attribute, however this feature is provided primarily for status override.

The override information is replicated within the cluster in the same way regular InstanceInfo data are replicated. As a client will usually resolve DNS entry when making a REST call it is very likely that each override update (create/remove) will end up on different write cluster node. This will make a lot of room for different race conditions to happen.

Conflict resolution

Because there is no direct, permanent association between client and a server during override, the sourced/multi copy instance info holder approach cannot be applied here. Instead, single copy of override data is maintained in the registry with the following updates semantic:

Local EurekaOverrideRegistry.registerOverride:

  • if there is no entry, add a new one and replicate it to another nodes in the write cluster
  • if there is an entry, and it is different, swap it with the current one and replicate it
  • if there is an entry identical to the new one, ignore it

Local EurekaOverrideRegistry.removeOverride:

  • if there is no entry, replicate the delete operation to remote nodes
  • if there is an entry, remove it, and replicate the delete operation to remote nodes

Replicated EurekaOverrideRegistry.registerOverride:

  • if there is no entry, add a new one
  • if there is an entry, and it is different, swap it
  • if there is an entry identical to the new one, ignore it

Replicated EurekaOverrideRegistry.removeOverride:

  • if there is no entry, ignore it
  • if there is an entry, remove it, and if entry source is this server, replicate the delete operation to remote nodes

Examples 1 Override register/remove on the same server

  • local registration is followed by data replication
  • local removal, removes local entry and forwards delete operation to other nodes in the cluster
  • remote endpoints remove their entries, and since the source is different from their own, no further replication is done

Example 2 Override register/remove on two different nodes, with full replication between the two events

  • local registration is followed by data replication
  • local removal, removes local entry and forwards delete operation to other nodes in the cluster
  • remote endpoints remove their entries, the one that handled registration, delegates further remove operation
  • all nodes get again the same remove request, but as there is no associated entry, it is ignored

Example 3 Override register/remove on two different nodes, before replication happens between the two events

  • local registration is followed by data replication
  • when another node receives removeOverride, there is no associated entry yet (it was not replicated yet). The remove operation is replicated further.
  • following is registerOverride replication to remote nodes
  • remote endpoints that received register before remove, will drop an entry
  • remote endpoints that received register after remove, will have an entry in the registry
  • remote endpoint that received the original removeOverride, will remove its entry, and send another removeOverride to its peers
  • remote endpoints that sill hold the copy of the data, will remove it

This approach solves simple cases of race conditions, but not all of them. For example if multiple override requests are issued one after another with different content data, each server may be left with different view than its peers. To solve this problem, we would have to introduce operation versioning, and require a client to generate unique and monotonically increasing version values for subsequent registerOverride operations, and use proper version during removeOverride. If no version is provided by client, we would fallback to previously described behavior.

REST API

URI encoded form:

PUT /eureka2/v1/overrides/{id}?${field_name}=${field_value}

field_name is a name of any of the InstanceInfo field.

Request with a body:

PUT /eureka2/v1/overrides/{id}
{
    'status': 'OUT_OF_SERVICE'
}
@NiteshKant
Copy link
Contributor

It is possible to override any InstanceInfo attribute, however this feature is provided primarily for status override.

Since, we know that it is useful only for status overrides, is there any value in making it generic for any override? Do we see this applying to any other attribute in the future?

The override data will be kept in separate registry

Does this mean, we will always be merging multiple notifications from multiple Observable sources? Is the override an optional feature or it is always available with the server?
If it is always available, why would we want to put it in a different registry and incur the merge overhead?

If no version is provided by client, we would fallback to previously described behavior.

It looks like we know that the operation with no version is bound to have inconsistent data. So, should we disallow operations without versions?

It seems we are missing a case of disaster recovery, where all eureka nodes go down and all override statuses are wiped out. Contrary to the usual case, where the data is regenerated, this data will never be regenerated. Eureka 1.x handles this case by querying the AWS API to see if the owning ASG (or may be instance) is enabled or not. I think we should do that too, WDYT?

Having said the above, do you think the source of truth (AWS status) can be used for conflict resolution?

@qiangdavidliu qiangdavidliu added 2.x and removed 2.x labels Mar 20, 2015
@qiangdavidliu
Copy link
Contributor Author

archiving as eureka2 work is going through some larger internal changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants