Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default value is not supported for schema evolution #93

Closed
oleksmir opened this issue Jun 12, 2019 · 9 comments
Closed

Default value is not supported for schema evolution #93

oleksmir opened this issue Jun 12, 2019 · 9 comments
Labels

Comments

@oleksmir
Copy link

oleksmir commented Jun 12, 2019

I am trying deserializing old messages using new schema, but default values seems not working.
Old schema:

@namespace("com.github.oleksmir.goavro")
protocol User {
  record User {
    string id = "test_id";
    int    age = 100;
  }
}
{
  "type" : "record",
  "name" : "User",
  "namespace" : "com.github.oleksmir.goavro",
  "fields" : [ {
    "name" : "id",
    "type" : "string",
    "default" : "test_id"
  }, {
    "name" : "age",
    "type" : "int",
    "default" : 100
  } ]
}

New schema:

@namespace("com.github.oleksmir.goavro")
protocol User {
  record User {
    string id = "test_id";
    int    age = 100;
    string firstName = "test_first_name";
    string lastName = "test_last_name";
    string info = "test_info";
  }
}
{
  "type" : "record",
  "name" : "User",
  "namespace" : "com.github.oleksmir.goavro",
  "fields" : [ {
    "name" : "id",
    "type" : "string",
    "default" : "test_id"
  }, {
    "name" : "age",
    "type" : "int",
    "default" : 100
  }, {
    "name" : "firstName",
    "type" : "string",
    "default" : "test_first_name"
  }, {
    "name" : "lastName",
    "type" : "string",
    "default" : "test_last_name"
  }, {
    "name" : "info",
    "type" : "string",
    "default" : "test_info"
  } ]
}

Test:

func TestDeserializeUser(t *testing.T) {
	oldUser := userold.NewUser()

	var buf bytes.Buffer
	err := oldUser.Serialize(&buf)
	assert.Nil(t, err)

	newPort, err := usernew.DeserializeUser(bytes.NewReader(buf.Bytes()))
	if err != nil && err != io.EOF {
		t.Fatalf("failed to deserialize port with new schema: %v", err)
	}

	assert.Equal(t, "test_first_name", newPort.FirstName)
	assert.Equal(t, "test_last_name", newPort.LastName)
	assert.Equal(t, "test_info", newPort.Info)
}

Test result:

Not equal: 
expected: "test_first_name"
actual  : ""

NewUser (with new schema) is created inside DeserializeUser function, that is why missed field receive default golang values instead of default schema values:

func DeserializeUser(r io.Reader) (*User, error) {
	t := NewUser()

	deser, err := compiler.CompileSchemaBytes([]byte(t.Schema()), []byte(t.Schema()))
	if err != nil {
		return nil, err
	}

	err = vm.Eval(r, deser, t)
	return t, err
}
@oleksmir
Copy link
Author

Related issue in goavro package linkedin/goavro#166 with thoughts why schema evolution is broken in Avro 1.x.

@actgardner
Copy link
Owner

This isn't related to karrick's discussion of Aveo being an untagged format, it seems like a VM bug. I'll take a look in the next few days, thanks for reporting it.

@actgardner
Copy link
Owner

actgardner commented Jun 18, 2019

Hey oleksmir, so it turns out this kind of related to Karrick's post after all. Because you're dealing with raw records and not container files, you need to provide the writer schema when deserializing data. This code works as intended:

  package avro
                                                                                                                                                                                                                                                                                                                                                                                                                      
  import (                                                                                                                                                                                                                    
    "bytes"                                                                                                                                                                                                             
    "testing"                                                                                                                                                                                                                                                                                                                                                                                                                               
    "github.com/actgardner/gogen-avro/compiler"                                                                                                                                                                         
    evolution "github.com/actgardner/gogen-avro/test/defaults/evolution"                                                                                                                                                  
    "github.com/actgardner/gogen-avro/vm"                                                                                                                                                                                                                                                                                                                                                                                                   
    "github.com/stretchr/testify/assert"                                                                                                                                                                        
  )                           
                                                                                                                                                                                                                                                                                                                                                                                                            
  func TestDeserializeUser(t *testing.T) {
    // Make an old User struct and serialize it                                                                                                                                                                                   
    oldUser := NewUser()                                                                                                                                                                                                                                                                                                                                                                                                                    
    var buf bytes.Buffer                                                                                                                                                                                             
    err := oldUser.Serialize(&buf)                                                                                                                                                                                      
    assert.Nil(t, err) 
     
    // Make a new User struct                                                                                                                                                                                                                                                                                                                                                                 
    newUser := evolution.NewUser()           
  
    // Compile the deserializer program that reads an old User and puts the contents in a new User.
    //  This program knows how to set the default fields and won't throw an io.EOF because it runs out of bytes.                                                                                                                                                                          
    deser, err := compiler.CompileSchemaBytes([]byte(oldUser.Schema()), []byte(newUser.Schema()))                                                                                                                       
    assert.Nil(t, err)
  
    // Evaluate the program over the input data (buf.Bytes()) and put the result in newUser
    // You can reuse deser, the bytecode for deserializing your structs, any time you're taking in the old schema and putting the data in the new structs. This will improve performance                                                                                                                                                                                                                                                                                                                                                                                                                      
    err = vm.Eval(bytes.NewReader(buf.Bytes()), deser, newUser)                                                                                                                                                         
    assert.Nil(t, err)
  
    // Because the VM knows about the old and new schemas it can correctly set the default values                                                                                                                                                                                                                                                                                                                                                                                                                      
    assert.Equal(t, "test_first_name", newUser.FirstName)                                                                                                                                                               
    assert.Equal(t, "test_last_name", newUser.LastName)                                                                                                                                                                 
    assert.Equal(t, "test_info", newUser.Info)                                                                                                                                                                  
  }

@oleksmir
Copy link
Author

oleksmir commented Jun 18, 2019

Cool, thank you. It works and default values are set properly. Now I found that unions are not handled.
Old schema:

@namespace("com.github.oleksmir.goavro")
protocol User {
  record User {
    string id = "test_id";
    int    age = 100;
  }
}

New schema:

@namespace("com.github.oleksmir.goavro")
protocol User {
  record User {
    string id = "test_id";
    int    age = 100;
    string firstName = "test_first_name";
    string lastName = "test_last_name";
    string info = "test_info";
    union {null, string} unionField = null;
  }
}

After deserializing (using method from previous comment) the newUser structure will have nil on unionField place instead of having pointer to UnionNullString. It would be nice if somewhere default union structure would be created. Currently it throws panic dereferencing nil.

@actgardner
Copy link
Owner

@oleksmir this time it was a bug in the generated code which wasn't setting the defaults for union fields correctly. I've corrected it in the head of master and added a test, let me know if it's resolved for you.

@actgardner actgardner added the bug label Jun 20, 2019
@oleksmir
Copy link
Author

oleksmir commented Jun 24, 2019

@actgardner thank you for fast fix. Addition of new union field works as expected no matter in which part of record it is added. I found another case where it does not work.
Old schema:

@namespace("io.test.kafka.go")
protocol UnionType {
  @namespace("io.test.kafka.go.union")
  record UnionType {
    string id;
    int imo;
    union {null, string} unionField1 = null;
  }
}

I removed unionField1 and added unionField2. Evolved schema:

@namespace("io.test.kafka.go")
protocol UnionType {
  @namespace("io.test.kafka.go.union")
  record UnionType {
    string id;
    int imo;
    union {null, int} unionField2 = null;
  }
}

The test I run:

func TestUnionTypeEvolution(t *testing.T) {
	oldRecord := unionold.NewUnionType()
	oldRecord.UnionField1 = &unionold.UnionNullString{
		String: "test",
		UnionType: unionold.UnionNullStringTypeEnumString,
	}

	var buf bytes.Buffer
	err := oldRecord.Serialize(&buf)
	assert.Nil(t, err)

	newRecord := unionnew.NewUnionType()

	deserializer, err := compiler.CompileSchemaBytes([]byte(oldRecord.Schema()), []byte(newRecord.Schema()))
	assert.Nil(t, err)

	err = vm.Eval(bytes.NewReader(buf.Bytes()), deserializer, newRecord)

	assert.Nil(t, err)
	assert.NotNil(t, newRecord)
	assert.NotNil(t, newRecord.UnionField2)
}

The error I get:

&errors.errorString{s:"Incompatible types: <nil> &{ [0xc000134280 0xc0001342c0] [null string]}"}

Even If I use {null, string} for unionField2 error will be the same. Moreover it occurs when I remove union type which are present in old schema but not in new. You can use this schema as evolved to test this case:

@namespace("io.test.kafka.go")
protocol UnionType {
  @namespace("io.test.kafka.go.union")
  record UnionType {
    string id;
    int imo;
  }
}

@actgardner
Copy link
Owner

Hey @oleksmir, I finally found some time to work on this. The following cases now work correctly and have regression tests:

  • when the writer is a non-union type and the reader is a union type
  • when the writer is a union type and the reader is a non-union type
  • when the writer is a union type and the reader does not have a corresponding field

Can you try the current master out and let me know if it works for you? Then I can cut a new release. Sorry about the delay!

@actgardner
Copy link
Owner

I've tagged release 6.1 with these fixes, feel free to open a new issue if you have more problems with schema evolution!

@oleksmir
Copy link
Author

@actgardner Sorry for long delay. Seem to be working now. Thank you a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants