Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CompatibleFieldSerializer fails with IndexOutOfBoundsException on field removal. #286

Closed
solf opened this issue Feb 17, 2015 · 13 comments
Closed

Comments

@solf
Copy link

solf commented Feb 17, 2015

Many kryo versions are affected, I've tried with 3.0.0, 2.24.0, 2.23.1 and more.

Using some rather complex data structures this can happen during deserialization with CompatibleFieldSerializer if I remove (comment out) field in Java class (in particular case it was a simple String field):

Caused by: java.lang.IndexOutOfBoundsException: Index: 8139, Size: 8139
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:43)
    at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:677)
    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
    ... 23 more

(this is using 2.24.0 but line numbers may be off by one or two because of debug output I had to add to track it down)

Unfortunately data model is proprietary so I can share serialized data + java classes.

However I did a bunch of digging around and here's what I found -- it seems that CompatibleFieldSerializer skips de-serialization for missing fields like this: inputChunked.nextChunks();

This works fine EXCEPT when the actual reading code (which is employed if field is present) creates an additional object in MapReferenceResolver. I've added debug code that dumps what is added to ReferenceResolver (format is [id] -> [object]) and here's what happens:
when field is present (deserialisation works fine):

7126 -> []
00:00 DEBUG: [kryo] Read object reference [com.esotericsoftware.kryo.util.MapReferenceResolver] 7065: MyClass[null:null]
cached field: com.esotericsoftware.kryo.serializers.UnsafeCacheFields.UnsafeObjectField
7127 -> 0001-01-01
cached field: com.esotericsoftware.kryo.serializers.UnsafeCacheFields.UnsafeObjectField
7128 -> 1
cached field: com.esotericsoftware.kryo.serializers.UnsafeCacheFields.UnsafeObjectField
7129 -> {}
7130 -> 2015-01-27

when field is removed (deserialization crashes later on):

7126 -> []
00:00 DEBUG: [kryo] Read object reference [com.esotericsoftware.kryo.util.MapReferenceResolver] 7065: MyClass[null:null]
cached field: com.esotericsoftware.kryo.serializers.UnsafeCacheFields.UnsafeObjectField
7127 -> 0001-01-01
00:00 DEBUG: [kryo] Skip obsolete field.
cached field: com.esotericsoftware.kryo.serializers.UnsafeCacheFields.UnsafeObjectField
7128 -> {}
7129 -> 2015-01-27
7130 -> DayAmount@1a70b8

Because of the field skip, object ID 7128 (which should be '1' according to 'normal' deserialization process) is skipped and causes all subsequent IDs to shift -- which ultimately causes deserialization to fail.

These conditions don't happen every time -- not every skipped field is going to cause this failure. But it does happen in my case (not on the first skipped field instance, but eventually).

And finally I've added stack trace dump -- in case of 7128 this is the stack trace that leads to reference resolver state being modified:

java.lang.Exception
    at com.esotericsoftware.kryo.util.MapReferenceResolver.setReadObject(MapReferenceResolver.java:39)
    at com.esotericsoftware.kryo.Kryo.reference(Kryo.java:823)
    at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:731)
    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
    at com.esotericsoftware.kryo.serializers.CompatibleFieldSerializer.read(CompatibleFieldSerializer.java:91)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
    at com.esotericsoftware.kryo.serializers.CompatibleFieldSerializer.read(CompatibleFieldSerializer.java:91)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
    at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116)
    at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:1)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
    at com.esotericsoftware.kryo.serializers.CompatibleFieldSerializer.read(CompatibleFieldSerializer.java:91)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:1)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
    at com.esotericsoftware.kryo.serializers.CompatibleFieldSerializer.read(CompatibleFieldSerializer.java:91)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:1)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
    at com.esotericsoftware.kryo.serializers.CompatibleFieldSerializer.read(CompatibleFieldSerializer.java:91)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
    at com.esotericsoftware.kryo.serializers.CompatibleFieldSerializer.read(CompatibleFieldSerializer.java:91)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657)
<proprietary code redacted>

This problem is affecting my production code (i.e. I can't remove obsolete fields) so if there's anything else I can do to help tracking this down (without divulging proprietary info) -- do let me know please, I'll do the best I can to help.

@solf
Copy link
Author

solf commented Feb 17, 2015

Having examined the code further, I've managed to come up with a simple code to reproduce the problem.

Eclipse/Maven project can be downloaded here: https://copy.com/4Ez9nMNBujRP4ECb

Run TestKryoMain, observe that result doesn't match expectation: ddd value should be 'bbb', got: ccc

Uncomment bbb field in TestKryoData and run again for comparison.

This one is actually much worse -- it doesn't fail, but deserialized data is actually WRONG.

The serialized data for this was created with the following field configuration in TestKryoData:

    public String aaa = "aaa";
    public String bbb = "bbb";
    public String ccc = "ccc";
    public String ddd = bbb;

@NathanSweet
Copy link
Member

Please don't post a link to a zip file.

I don't see a problem with this test. Please give step by step instructions
on how to reproduce the problem using this code.
http://pastebin.com/dGpmXpQQ

On Tue, Feb 17, 2015 at 4:24 PM, solf notifications@github.com wrote:

Having examined the code further, I've managed to come up with a simple
code to reproduce the problem.

Eclipse/Maven project can be downloaded here:
https://copy.com/4Ez9nMNBujRP4ECb

Run TestKryoMain, observe that result doesn't match expectation: ddd value
should be 'bbb', got: ccc

Uncomment bbb field in TestKryoData and run again for comparison.

This one is actually much worse -- it doesn't fail, but deserialized data
is actually WRONG.

The serialized data for this was created with the following field
configuration in TestKryoData:

public String aaa = "aaa";
public String bbb = "bbb";
public String ccc = "ccc";
public String ddd = bbb;


Reply to this email directly or view it on GitHub
#286 (comment)
.

@solf
Copy link
Author

solf commented Feb 17, 2015

To easily see the problem you need a kryo-serialized file that is inside that archive.

If you want to reproduce this using the code -- uncomment ddd declaration:
public String ddd = bbb; // and obviously comment the other one
uncomment bbb declaration:
public String bbb = "bbb";
change if-statement to if (true) (so that write code executes rather than the read one).

Run it to obtain the serialized file.

Revert all the changes and run it again -- observe that ddd value isn't 'bbb' but is, in fact 'ccc'.

As an aside, code in pastebin is wrong in the sense that it says that value should be 'ddd' -- it should be 'bbb'.

P.S. I don't see an easy way to reproduce the problem with a single java run if we don't have serialized data already (as in the archive that I linked) -- because it requires different Java files structure for write & read.

P.P.S. What is the problem with zip files?

@NathanSweet
Copy link
Member

A zip file is unnecessary when it can be done in a single .java file. It's
a waste of time to download, unzip, and import a project when I could just
copy/paste.

I tested the code I posted. Eg:

  1. Set if to true, run to output binary file.
  2. Set if to false: ddd value should be 'ddd', got: ddd (correct)
  3. Uncommand bbb = "bbb": ddd value should be 'ddd', got: ddd (correct)
  4. Comment ddd = "ddd", uncomment ddd = bbb: ddd value should be 'ddd', got: ddd (correct)
  5. Set if to true, run to output binary file.
  6. Set if to false: ddd value should be 'ddd', got: bbb (correct)

I don't see an issue. Please provide repro steps similar to the above that
show a problem using the code I posted.

On Wed, Feb 18, 2015 at 12:17 AM, solf notifications@github.com wrote:

What is the problem with zip files?

To easily see the problem you need a kryo-serialized file that is inside
that archive.

If you want to reproduce this using the code -- uncomment ddd declaration:
public String ddd = bbb; // and obviously comment the other one
uncomment bbb declaration:
public String bbb = "bbb";
change if-statement to if (true) (so that write code executes rather than
the read one).

Run it to obtain the serialized file.

Revert all the changes and run it again -- observe that ddd value isn't
'bbb' but is, in fact 'ccc'.

As an aside, code in pastebin is wrong in the sense that it says that
value should be 'ddd' -- it should be 'bbb'.


Reply to this email directly or view it on GitHub
#286 (comment)
.

@solf
Copy link
Author

solf commented Feb 18, 2015

Start with this:
http://pastebin.com/e8RpFkfm

Then run this:
http://pastebin.com/bN5R0kMu

Observe that deserialization result is wrong (under other circumstances the same problem would cause deserialization to crash).

Uncomment bbb field and run again -- observe that deserialization is fine now.

This is caused by the fact that skipping field during deserialization leads to wrong indices in reference resolver (and additionally skipped object is not deserialized which will lead to other problems if it is referenced later on in the stream).

You can skip the first file if you take serialization data from the project I've posted.

@NathanSweet
Copy link
Member

Thanks, I see the issue now. Here is the problem:

Field ddd is serialized with a reference to the value stored in bbb.
The reference ordinal for bbb is serialized as 1. bbb is then removed
from the class. When deserialization happens, the bytes for the bbb field
are skipped and no reference ordinal is stored. When ccc is read, it's
value is stored as reference 1. When ddd is read, it uses reference 1
which is ccc when it should have been bbb.

If we deserialized the bytes for bbb instead of skipping them, we could
store bbb's value under the correct reference ordinal, then ddd would
find the correct value. However, this assumes that the bytes for bbb can
be deserialized. In the example above, it could because 1) it's just a
string, and 2) ddd needs the value. However, when the bytes for bbb are
encountered, how would Kryo what type to read? The bbb field has been
removed, so we don't know what the bytes represent. Even if we did, in some
scenarios you might remove an entire class and a field of that type. This
would be impossible to deserialize.

You'd think it might be possible to store null in the references so the
reference ordinal doesn't get off by one. The problem is again Kryo doesn't
know what type of data was stored for bbb.
ReferenceResolver#useReferences determines if a value uses up a reference
ordinal, but since we don't know the type, we can't know if was should use
up a reference ordinal with a null.

CompatibleFieldSerializer could store field type information or whether a
reference should be used for each field, but this would be bloating an
already bloated serialization approach. If you wanted to do this, I suggest
writing your own serializer.

I don't see a solution to the problem. I've added javadocs to
CompatibleFieldSerializer to make it clear there can be issues when
references are enabled. I suggest using TaggedFieldSerializer or
VersionFieldSerializer, both of which have better performance and avoid the
problem by disallowing removing fields. TaggedFieldSerializer allows fields
to be deprecated and renamed, so while they cannot be removed, they won't
clutter your code.

On Wed, Feb 18, 2015 at 2:18 PM, solf notifications@github.com wrote:

Start with this:
http://pastebin.com/e8RpFkfm

Then run this:
http://pastebin.com/bN5R0kMu

Observe that deserialization result is wrong (under other circumstances
the same problem would cause deserialization to crash).

Uncomment bbb field and run again -- observe that deserialization is fine
now.

This is caused by the fact that skipping field during deserialization
leads to wrong indices in reference resolver (and additionally skipped
object is not deserialized which will lead to other problems if it is
referenced later on in the stream).

You can skip the first file if you take serialization data from the
project I've posted.


Reply to this email directly or view it on GitHub
#286 (comment)
.

@solf
Copy link
Author

solf commented Feb 18, 2015

I don't think this is impossible to solve.

The first thing that comes to mind -- if there's a case of an object that needs deserialization, but there's no corresponding field (e.g. as in the case of bbb above), then store serialized object data somewhere (it should be possible since obviously it is possible to skip it). If there's later reference to it, it will also have class/type information and thus it should be possible to de-serialize stored object data at that point. Basically something like lazy de-serialization.

As to the question of reference index shift -- I can't say without digging more into kryo code, but I would imagine it should be possible to determine whether we have stored reference or stored object data at the given point? My assumption is that references are created when there's object data at that point.

How does it sound?

P.S. Field addition/removal is the original reason I've chosen kryo for my needs. If this is unsupported it removes (for me) the principal reason for using kryo.

@NathanSweet
Copy link
Member

On Wed, Feb 18, 2015 at 3:53 PM, solf notifications@github.com wrote:

I don't think this is impossible to solve.

The first thing that comes to mind -- if there's a case of an object that
needs deserialization, but there's no corresponding field (e.g. as in the
case of bbb above), then store serialized object data somewhere (it should
be possible since obviously it is possible to skip it). If there's later
reference to it, it will also have class/type information and thus it
should be possible to de-serialize stored object data at that point.
Basically something like lazy de-serialization.

​This is an interesting idea. CompatibleFieldSerializer is already pretty
inefficient though, this would make it even more so. The size of the buffer
that keeps serialized bytes would need to be configurable. Would you like
to create a PR that adds this feature?

As to the question of reference index shift -- I can't say without digging
more into kryo code, but I would imagine it should be possible to determine
whether we have stored reference or stored object data at the given point?
My assumption is that references are created when there's object data at
that point.

​It's not possible, the ReferenceResolver determines if a reference is
needed for a type. If you don't know the type, you can't know if a
reference is needed. Eg, generally primitive type wrappers don't use
references, but application code can avoid references for any types they
like.

P.S. Field addition/removal is the original reason I've chosen kryo for my

needs. If this is unsupported it removes (for me) the principal reason for
using kryo.

​Adding fields is relatively easy and has many options.​ Removing them is
what is complicated. I prefer TaggedFieldSerializer, then if I must
"remove" fields I have something like this:

@Tag(15) @Deprecated public boolean ignored1;
...
@Tag(18) @Deprecated public SomeType ignored2;
...
@Tag(23) @Deprecated public float ignored3;

CompatibleFieldSerializer has to use chunked encoding to skip bytes. This
is what I don't like because it requires a lot of allocation.

@solf
Copy link
Author

solf commented Feb 23, 2015

I'm a bit busy at the moment and couldn't get back to this yet.

Could you please elaborate though? If, as you say, it is impossible to determine whether at a given point we have a stored reference or a stored object (so e.g. we can't figure out whether reference resolver should increment an index), what would be the point of adding 'lazy deserialization' I proposed? It still wouldn't work, would it?

@NathanSweet
Copy link
Member

On Mon, Feb 23, 2015 at 3:33 PM, solf notifications@github.com wrote:

I'm a bit busy at the moment and couldn't get back to this yet.

Could you please elaborate though? If, as you say, it is impossible to
determine whether at a given point we have a stored reference or a stored
object (so e.g. we can't figure out whether reference resolver should
increment an index), what would be the point of adding 'lazy
deserialization' I proposed? It still wouldn't work, would it?

Ah you're right. We could store bytes that we skip, but we won't know if
the bytes start with a reference ordinal because we don't know they type of
the field because the field is gone. This means later, when we encounter a
field that is a reference to the skipped bytes, we won't know we should use
those bytes.

@solf
Copy link
Author

solf commented Mar 6, 2015

So I've spent more time on this.
I also took a look at other serialization systems (e.g. XStream) and they seem to often have very similar problems -- because it seems everyone's reluctant to include full class information in the serialized data.

So I came with an idea to stuff obsolete fields declaration into a static inner class. That way they don't eat any memory in original class instances, don't clutter original class (aside from the inner class itself), and are not serialized when doing serialization. And since field information is still available, deserialization can actually work.

The patch for this is very simple and is localized almost entirely to CompatibleFieldSerializer -- I only had to add a single field to CachedField class.

Patch for 2.4.0 can be found here: https://copy.com/GKX9bFipYQytSTbM

Please consider adding something similar to the kryo distribution.

The performance impact of this is basically nil when feature is not used. When obsolete data is actually present during de-serialization -- then there's additional object instantiation for each instance with obsolete fields.

There are a couple of obvious improvements to be considered:

  • Pass reference to instance of obsolete fields class to the original class somehow. So that e.g. original class can migrate obsoleted fields into new structure(s).
  • If not doing the above, probably there's no actual need to create new instances of 'obsolete data holder' classes for each instance of actual class.

For reference, here's how data class in test code should look like with this support:

    public static class TestKryoData
    {
        public String aaa = "aaa";
//      public String bbb = "bbb";
        public String ccc = "ccc";
//      public String ddd = bbb;
        public String ddd = "ddd";

        private static class ObsoleteKryoFields
        {
            public String bbb;
        }
    }

@diaimm
Copy link

diaimm commented Jun 24, 2016

Hi, all.

'Cause I have same issue when we are using Kryo CompatibleFieldSerializer to handle the serialization between memcached data and java bean, I tried to make some patch for it.

I've tried many things and it seems I've made one working code.
Is there any one try?


This class is extending CompatibleFieldSerializer and overriding read().
Basically the logic is following CompatibleFieldSerializer#read(), but a little bit refactored and fixing the bug implicit bug.(raing IndexOutOfBoundsException).

Even thought I've fixed the bug about IndexOutOfBoundsException in CompatibleFieldSerializer#read(), I've found that mapping the value from Input to Field fails because of wrong chunk reading calculation. And DummyCachedField is the trick to cover such a failure.

Simple case it was but, different from the javadoc on CompatibleFieldSerializer, this code seems to be working with Kryo.setReference(true) option.

I hope this class can be helpful.


public static class FieldMappingCompatibleSerializer extends CompatibleFieldSerializer {
public FieldMappingCompatibleSerializer(Kryo kryo, Class<?> type) {
super(kryo, type);
}

@AllArgsConstructor
private static class DummyCachedField extends CachedField {
    private Kryo kryo;
    @Override
    public void write(Output output, Object object) {
    }

    @Override
    public void read(Input input, Object object) {
        kryo.readClassAndObject(input);
    }

    @Override
    public void copy(Object original, Object copy) {
    }
};

public T read(Kryo kryo, Input input, Class<T> type) {
    T object = create(kryo, input, type);
    kryo.reference(object);
    ObjectMap context = kryo.getGraphContext();
    CachedField[] fieldsOfDTOToMap = getCachedFields(kryo, input, context);
    InputChunked inputChunked = new InputChunked(input, 1024);
    boolean hasGenerics = getGenerics() != null;
    for (CachedField cachedField : fieldsOfDTOToMap) {
        if (hasGenerics && cachedField.getField() != null) {
            cachedField = getField(cachedField.getField().getName());
        }

        cachedField.read(inputChunked, object);
        inputChunked.nextChunks();
    }
    return object;
}

private CachedField[] getCachedFields(Kryo kryo, Input input, ObjectMap context) {
    CachedField[] fields = (CachedField[]) context.get(this);
    if (fields != null) {
        return fields;
    }

    int cachedDTOFieldsLength = input.readVarInt(true);
    if (TRACE) {
        trace("kryo", "Read " + cachedDTOFieldsLength + " field names.");
    }

    String[] fieldNamesFromInput = getFieldNamesFromInput(input, cachedDTOFieldsLength);
    List<CachedField> newFieldsToCacheBuffer = new ArrayList<CachedField>();
    Map<String, CachedField> allFieldsOfEntity = getAllFieldsOfEntityMap();
    for (int i = 0, n = fieldNamesFromInput.length; i < n; i++) {
        newFieldsToCacheBuffer.add(findMatchedField(kryo, fieldNamesFromInput[i], allFieldsOfEntity));
    }

    CachedField[] result = newFieldsToCacheBuffer.toArray(new CachedField[0]);
    context.put(this, result);

    return result;
}

private Map<String, CachedField> getAllFieldsOfEntityMap() {
    Map<String, CachedField> allFieldsOfEntity = new HashMap<String, CachedField>();
    for (CachedField field : getFields()) {
        allFieldsOfEntity.put(field.getField().getName(), field);
    }
    return allFieldsOfEntity;
}

private CachedField findMatchedField(Kryo kryo, String findingField, Map<String, CachedField> allFieldsOfEntity) {
    if (!allFieldsOfEntity.containsKey(findingField)) {
        if (TRACE) {
            trace("kryo", "Ignore obsolete field: " + findingField);
        }

        return new DummyCachedField(kryo);
    }

    return allFieldsOfEntity.get(findingField);
}

private String[] getFieldNamesFromInput(Input input, int length) {
    List<String> fieldNamesBuffer = Lists.newArrayList();
    for (int i = 0; i < length; i++) {
        fieldNamesBuffer.add(input.readString());
    }

    return fieldNamesBuffer.toArray(new String[0]);
}

}


Kryo setting


Kryo kryo = new Kryo();
kryo.setReferenceResolver(new MapReferenceResolver());
kryo.setDefaultSerializer(FieldMappingCompatibleSerializer.class);
kryo.addDefaultSerializer(Enum.class, EnumNameSerializer.class);
kryo.setInstantiatorStrategy(new Kryo.DefaultInstantiatorStrategy(new StdInstantiatorStrategy()));

Thanks.

@magro
Copy link
Collaborator

magro commented Jun 28, 2016

Further discussion in #291

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants