treat some deserialization errors as transient in remoting #20641

Closed
patriknw opened this Issue May 27, 2016 · 6 comments

Projects

None yet

2 participants

@patriknw
Member

EndpointWriter is only treating NotSerializableException as transient error (message dropped but association remains). toBinary of the serializer may throw any kind of exception. NotSerializableException is only used for the lookup of the serializer.

For deserialization I can't see that any exception is caught for deserialize. That might be related to #13495 but I don't understand that comment

This is doable only if the lower layers are "reliable" in that sense that framing synch can be assumed to be preserved if a message is passed to the upper layers.

Perhaps that is for system messages, but as far as I can see we can treat them differently by checking msg.reliableDeliveryEnabled.

@patriknw patriknw added this to the 2.4.x milestone May 27, 2016
@drewhk
Member
drewhk commented May 27, 2016

That comment means that in case of deserialization errors it is safer to drop the association, since if the framing layer is corrupted then every subsequent message will fail deserialization, so it is safer to resynch (by opening a new connection).

@drewhk
Member
drewhk commented May 27, 2016

I don't think we should tread serialization errors as transient, these are serious errors, why should we tolerate them?

@drewhk
Member
drewhk commented May 27, 2016

at least I want to understand the use case

@patriknw
Member

For rolling upgrade scenarios we might have to be more tolerant to serialization errors and just drop/log them.

Dropping the association will have the same end user result, message not delivered, but it's much more costly and can result in all kind of side effects such as unreachability and such.
Why would the association be corrupt if user serialization of some messages fail?

By framing layer, do you mean that TCP in old remoting can become corrupted, or what layer do you mean?

@patriknw
Member

@drewhk convinced me that we should not change this for the old remoting. Artery will be like this, though. Thanks for clarifying

@patriknw patriknw closed this May 27, 2016
@patriknw patriknw modified the milestone: will not fix, 2.4.x May 27, 2016
@patriknw patriknw changed the title from treat all serialization errors as transient in remoting to treat some deserialization errors as transient in remoting Dec 16, 2016
@patriknw
Member

New take on this is to catch only a specific exception from deserialization and not others that could be because of byte corruptions.

@patriknw patriknw reopened this Dec 16, 2016
@patriknw patriknw modified the milestone: 2.4.15, will not fix Dec 16, 2016
@patriknw patriknw removed the bug label Dec 16, 2016
@patriknw patriknw self-assigned this Dec 16, 2016
@patriknw patriknw added a commit that referenced this issue Dec 16, 2016
@patriknw patriknw catch NotSerializableException from deserialization, #20641
* to be able to introduce new messages and still support rolling upgrades,
  i.e. a cluster of mixed versions
* note that it's only catching NotSerializableException, which we already
  use for unknown serializer ids and class manifests
* note that it is not catching for system messages, since that could result
  in infinite resending
06f5c53
@patriknw patriknw added a commit that referenced this issue Dec 16, 2016
@patriknw patriknw catch NotSerializableException from deserialization, #20641
* to be able to introduce new messages and still support rolling upgrades,
  i.e. a cluster of mixed versions
* note that it's only catching NotSerializableException, which we already
  use for unknown serializer ids and class manifests
* note that it is not catching for system messages, since that could result
  in infinite resending

(cherry picked from commit f486b1d)
41c3109
@patriknw patriknw added a commit that referenced this issue Dec 16, 2016
@patriknw patriknw catch NotSerializableException from deserialization, #20641
* to be able to introduce new messages and still support rolling upgrades,
  i.e. a cluster of mixed versions
* note that it's only catching NotSerializableException, which we already
  use for unknown serializer ids and class manifests
* note that it is not catching for system messages, since that could result
  in infinite resending
e494ec2
@patriknw patriknw added a commit that referenced this issue Dec 16, 2016
@patriknw patriknw catch NotSerializableException from deserialization, #20641
* to be able to introduce new messages and still support rolling upgrades,
  i.e. a cluster of mixed versions
* note that it's only catching NotSerializableException, which we already
  use for unknown serializer ids and class manifests
* note that it is not catching for system messages, since that could result
  in infinite resending

(cherry picked from commit e494ec2)
78eec6d
@patriknw patriknw closed this Dec 16, 2016
@jrudolph jrudolph added a commit to jrudolph/akka that referenced this issue Jan 2, 2017
@patriknw @jrudolph patriknw + jrudolph catch NotSerializableException from deserialization, #20641
* to be able to introduce new messages and still support rolling upgrades,
  i.e. a cluster of mixed versions
* note that it's only catching NotSerializableException, which we already
  use for unknown serializer ids and class manifests
* note that it is not catching for system messages, since that could result
  in infinite resending
168092d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment