Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve field removal in CompatibleFieldSerializer #291

Closed
solf opened this issue Mar 10, 2015 · 17 comments
Closed

Improve field removal in CompatibleFieldSerializer #291

solf opened this issue Mar 10, 2015 · 17 comments

Comments

@solf
Copy link

solf commented Mar 10, 2015

This is an extension of #286

Please improve removed field handling in CompatibleFieldSerializer so that it is actually possible to remove fields (rather than having to keep them indefinitely for backward compatibility).

Consider the patch I've come up with: https://copy.com/akbTt9twsSr2CNqU
And second patch to force exception on missing fields (avoids risk on incorrect de-serialization; must be applied after patch above): https://copy.com/hk1YcLDhxqfMaylJ

This does two things:

  • Obsolete fields can be removed into inner(nested) class (ObsoleteKryoFields) that ought to be static + private. This way these obsolete fields don't consume any memory when instantiating original class and don't clutter its code (other than the inner class itself).
  • If there are obsolete fields during de-serialization, then if class declares 'private transient ObsoleteKryoFields obsoleteKryoFields' field, then its value will be set to the instance holding obsolete fields (thus an easy support for migrating obsolete values into their new 'homes'). This also works across superclasses (hence the requirement for the field to be 'private').

If there are no actual obsolete fields in the data being de-serialized, then the performance hit is minimal.

For testing/demonstration:

  • Start with http://pastebin.com/e8RpFkfm -- create serialized data on disk.
  • Test with something like this: http://pastebin.com/9eWD0vha You can also remove/comment out ObsoleteKryoFields inner class(es) and watch how de-serialization fails when fields are removed without this additional support.
@NathanSweet
Copy link
Member

Can you submit the patch as a PR so it is more easily considered and merged?

On Tue, Mar 10, 2015 at 3:45 PM, solf notifications@github.com wrote:

This is an extension of #286
#286

Please improve removed field handling in CompatibleFieldSerializer so that
it is actually possible to remove fields (rather than having to keep them
indefinitely for backward compatibility).

Consider the patch I've come up with: https://copy.com/akbTt9twsSr2CNqU

This does two things:

  • Obsolete fields can be removed into inner(nested) class
    (ObsoleteKryoFields) that ought to be static + private. This way these
    obsolete fields don't consume any memory when instantiating original class
    and don't clutter its code (other than the inner class itself).
  • If there are obsolete fields during de-serialization, then if class
    declares 'private transient ObsoleteKryoFields obsoleteKryoFields' field,
    then its value will be set to the instance holding obsolete fields (thus an
    easy support for migrating obsolete values into their new 'homes'). This
    also works across superclasses (hence the requirement for the field to be
    'private').

If there are no actual obsolete fields in the data being de-serialized,
then the performance hit is minimal.

For testing/demonstration:

  • Start with http://pastebin.com/e8RpFkfm -- create serialized data on
    disk.
  • Test with something like this: http://pastebin.com/9eWD0vha You can
    also remove/comment out ObsoleteKryoFields inner class(es) and watch how
    de-serialization fails when fields are removed without this additional
    support.


Reply to this email directly or view it on GitHub
#291.

@solf
Copy link
Author

solf commented Mar 11, 2015

Not easily, no. I'm not familiar with the term & what it entails.

The diff I've provided is pretty trivial (basically CompatibleFieldSerializer is changed + one field in CachedField). If this is truly a problem for some reason, I'll see if I can do PR (whatever that means).

Oh, and I think I forgot to mention -- this is patch for 2.4.0 release.

@magro
Copy link
Collaborator

magro commented Mar 12, 2015

PR stands for pull request, you can read about this here:
https://help.github.com/articles/creating-a-pull-request/

Cheers,
Martin
Am 11.03.2015 13:33 schrieb "solf" notifications@github.com:

Not easily, no. I'm not familiar with the term & what it entails.

The diff I've provided is pretty trivial (basically
CompatibleFieldSerializer is changed + one field in CachedField). If this
is truly a problem for some reason, I'll see if I can do PR (whatever that
means).

Oh, and I think I forgot to mention -- this is patch for 2.4.0 release.


Reply to this email directly or view it on GitHub
#291 (comment)
.

@solf
Copy link
Author

solf commented Mar 13, 2015

Well, I tried to create PR today, but I only have very limited GIT knowledge and trying to guess what I'm supposed to do, this is what I ended up with trying to push the branch I've created locally:
Can't connect to any URI: https://github.com/EsotericSoftware/kryo (https://github.com/EsotericSoftware/kryo: git-receive-pack not permitted)

Am I supposed to first create the branch using the web interface or what?

@magro
Copy link
Collaborator

magro commented Mar 13, 2015

You have to fork kryo and add your fork as remote to your git repo. This
should help: https://help.github.com/articles/fork-a-repo/
Am 13.03.2015 16:27 schrieb "solf" notifications@github.com:

Well, I tried to create PR today, but I only have very limited GIT
knowledge and trying to guess what I'm supposed to do, this is what I ended
up with trying to push the branch I've created locally:
Can't connect to any URI: https://github.com/EsotericSoftware/kryo (
https://github.com/EsotericSoftware/kryo: git-receive-pack not permitted)

Am I supposed to first create the branch using the web interface or what?


Reply to this email directly or view it on GitHub
#291 (comment)
.

@solf
Copy link
Author

solf commented Mar 16, 2015

Thanks for the pointers.

Btw, I've added another patch (on top of the first patch) -- it is intended to throw exception on missing field definitions -- this is because of otherwise there's real risk of incorrect de-serialization: https://copy.com/hk1YcLDhxqfMaylJ

On the subject of PR -- I could spent a couple more hours and hopefully figure it out... but would there be any point to it? Applying both patches linked here takes about 30 seconds (I checked :P) so I'm not sure what would I achieve by spending time on PR.

@jeffrey-aguilera
Copy link

Is there a timeline for resolving this issue of removed fields with CompatibleFieldSerializer when references are enabled (the default configuration)?

@solf
Copy link
Author

solf commented Dec 9, 2015

Based on discussion in #286 it doesn't like the devs see at as an actual problem for some reason.

If you are interested, you can try the patches I've created above and perhaps make a PR out of them -- perhaps that'll get the things moving. For myself, I've just patched the version I'm using and I don't much care about arguing with devs whether this needs to be integrated :)

P.S. In case it is relevant -- I grant the rights to anyone to use the patches I've proposed above for any and all purpose with no strings attached. No any kind of warranty though.

@jeffrey-aguilera
Copy link

I tried to get around this issue by disabling references, only to be met with StackOverflowException. Basically, CompatibleFieldSerializer is of little value. It is much slower than the other serializers, so its only use-case is to maintain some form of compatibility across schema changes. Yet, it does not work across schema changes due to the references issue. I'm thinking Avro is a better approach for me.

@diaimm
Copy link

diaimm commented Jun 27, 2016

Hi, all.

'Cause I have same issue when we are using Kryo CompatibleFieldSerializer to handle the serialization between memcached data and java bean, I tried to make some patch for it.

I've tried many things and it seems I've made one working code.
Is there any one try?


This class is extending CompatibleFieldSerializer and overriding read().
Basically the logic is following CompatibleFieldSerializer#read(), but a little bit refactored and fixing the bug implicit bug.(raing IndexOutOfBoundsException).

Even thought I've fixed the bug about IndexOutOfBoundsException in CompatibleFieldSerializer#read(), I've found that mapping the value from Input to Field fails because of wrong chunk reading calculation. And DummyCachedField is the trick to cover such a failure.

Simple case it was but, different from the javadoc on CompatibleFieldSerializer, this code seems to be working with Kryo.setReference(true) option.

I hope this class can be helpful.


public static class FieldMappingCompatibleSerializer extends CompatibleFieldSerializer {
public FieldMappingCompatibleSerializer(Kryo kryo, Class<?> type) {
super(kryo, type);
}

@AllArgsConstructor
private static class DummyCachedField extends CachedField {
    private Kryo kryo;
    @Override
    public void write(Output output, Object object) {
    }

    @Override
    public void read(Input input, Object object) {
        kryo.readClassAndObject(input);
    }

    @Override
    public void copy(Object original, Object copy) {
    }
};

public T read(Kryo kryo, Input input, Class<T> type) {
    T object = create(kryo, input, type);
    kryo.reference(object);
    ObjectMap context = kryo.getGraphContext();
    CachedField[] fieldsOfDTOToMap = getCachedFields(kryo, input, context);
    InputChunked inputChunked = new InputChunked(input, 1024);
    boolean hasGenerics = getGenerics() != null;
    for (CachedField cachedField : fieldsOfDTOToMap) {
        if (hasGenerics && cachedField.getField() != null) {
            cachedField = getField(cachedField.getField().getName());
        }

        cachedField.read(inputChunked, object);
        inputChunked.nextChunks();
    }
    return object;
}

private CachedField[] getCachedFields(Kryo kryo, Input input, ObjectMap context) {
    CachedField[] fields = (CachedField[]) context.get(this);
    if (fields != null) {
        return fields;
    }

    int cachedDTOFieldsLength = input.readVarInt(true);
    if (TRACE) {
        trace("kryo", "Read " + cachedDTOFieldsLength + " field names.");
    }

    String[] fieldNamesFromInput = getFieldNamesFromInput(input, cachedDTOFieldsLength);
    List<CachedField> newFieldsToCacheBuffer = new ArrayList<CachedField>();
    Map<String, CachedField> allFieldsOfEntity = getAllFieldsOfEntityMap();
    for (int i = 0, n = fieldNamesFromInput.length; i < n; i++) {
        newFieldsToCacheBuffer.add(findMatchedField(kryo, fieldNamesFromInput[i], allFieldsOfEntity));
    }

    CachedField[] result = newFieldsToCacheBuffer.toArray(new CachedField[0]);
    context.put(this, result);

    return result;
}

private Map<String, CachedField> getAllFieldsOfEntityMap() {
    Map<String, CachedField> allFieldsOfEntity = new HashMap<String, CachedField>();
    for (CachedField field : getFields()) {
        allFieldsOfEntity.put(field.getField().getName(), field);
    }
    return allFieldsOfEntity;
}

private CachedField findMatchedField(Kryo kryo, String findingField, Map<String, CachedField> allFieldsOfEntity) {
    if (!allFieldsOfEntity.containsKey(findingField)) {
        if (TRACE) {
            trace("kryo", "Ignore obsolete field: " + findingField);
        }

        return new DummyCachedField(kryo);
    }

    return allFieldsOfEntity.get(findingField);
}

private String[] getFieldNamesFromInput(Input input, int length) {
    List<String> fieldNamesBuffer = Lists.newArrayList();
    for (int i = 0; i < length; i++) {
        fieldNamesBuffer.add(input.readString());
    }

    return fieldNamesBuffer.toArray(new String[0]);
}

}


Kryo setting


Kryo kryo = new Kryo();
kryo.setReferenceResolver(new MapReferenceResolver());
kryo.setDefaultSerializer(FieldMappingCompatibleSerializer.class);
kryo.addDefaultSerializer(Enum.class, EnumNameSerializer.class);
kryo.setInstantiatorStrategy(new Kryo.DefaultInstantiatorStrategy(new StdInstantiatorStrategy()));

Thanks.

@magro
Copy link
Collaborator

magro commented Jun 28, 2016

@diaimm Please submit it as a pull request so that we can use the usual tools to discuss it. A test case would also be great.

@diaimm
Copy link

diaimm commented Jun 28, 2016

OK, I will. Thanks for your response.

  • And, I've sent a pull-request for my patch (including unit test).

https://github.com/EsotericSoftware/kryo/pull/437/files

Thanks.

@cuihz
Copy link

cuihz commented Jun 29, 2016

@diaimm I tried your FieldMappingCompatibleSerializer. I create a class with two fields x and y. They are all int array(int[]). Then, I create an instance of the class called p, set x a value and setY(p.getX()). I write p to a file by kryo. Then I removed field x and read the file by kryo , I get an Exception.

Exception in thread "main" com.esotericsoftware.kryo.KryoException: Buffer underflow.
at com.esotericsoftware.kryo.io.Input.require(Input.java:199)
at com.esotericsoftware.kryo.io.Input.readAscii_slow(Input.java:616)
at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:594)
at com.esotericsoftware.kryo.io.Input.readString(Input.java:472)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:150)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)

@diaimm
Copy link

diaimm commented Jun 29, 2016

OK, cuihz, I'll try to figure out the cause and fix it. Thanks for your report.


Hi, cuih. Actually I'm a very newbie about Kryo who has met this problem and tried to solve it from just few days ago.

Anyway, I've looked into the codes of Kryo, and there were some classes I really cannot sure of what they are really doing, and they are Input and Output.

But in my guess, they are controlling InputStream/OutputStream to serialize/deserialize the classes(instances).
And, Input is writing the bytes and the bytes include indices of the fields and sometimes the types of fields. Especially, when Input handles primitive related fields, it seems that it is not storing the type of the field. And when Kryo deserializes the data, it gets the type of the value from the CachedField.

This means, too awful to me, that during deserialization, if old primitive related fields are missed, there is no way to calculate the bytes.

Is there anyone who can make me sure of my assumption?

@NathanSweet
Copy link
Member

NathanSweet commented Jun 9, 2018

To summarize: CompatibleFieldSerializer fails to deserialize if field B is a reference to field A, and then field A is removed. The reason why: field A bytes are skipped but not interpreted because Kryo doesn't know what type the bytes represent.

@diaimm's PR doesn't seem to work and he disappeared.

@solf's copy.com links unfortunately no longer exist. It's too bad we didn't have time to work on this sooner, and the code wasn't submitted via Git. The approach is interesting, though a little odd and unfortunately manual.

An automatic approach would be best. Since CompatibleFieldSerializer is already sacrificing size for compatibility, we could write the concrete type before every field. This allows us to deserialize a field's data even if that field is removed. We can't write the types along with the names the first time the type is encountered because we need to support polymorphism. Writing the field value types could also allow chunked encoding to be optional. When disabled, performance is obviously much better and fields can be added/removed, but classes cannot be removed.

@solf
Copy link
Author

solf commented Jun 9, 2018

Not sure if this is still relevant after your commit -- the commit is big, can't say off-hand what it does -- but here's the latest version of my patch: https://yadi.sk/d/-2dGdQYF3XZwX8

It is still against kryo-2.24.0 source code: https://github.com/EsotericSoftware/kryo/archive/kryo-2.24.0.zip

Can be applied by: patch -p0 < kryo.diff
(from the directory where pom.xml is)

In addition to original fixes it also contains CompatibleFieldSerializerWithSameNameFieldSupport which has 'limited support for same-name fields (specifically only top-most in class hierarchy field is used)'

I can't really speak to the quality/performance of this code, I was mostly focused on 'get it to work now so I can deploy/fix code in the prod'.

@magro
Copy link
Collaborator

magro commented Jun 9, 2018

@solf support for same names has been added with 5a7b7c5 already btw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants