Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Avro to 1.9.1 #5938

Merged
merged 11 commits into from
Jan 6, 2020
Merged

Upgrade Avro to 1.9.1 #5938

merged 11 commits into from
Jan 6, 2020

Conversation

massakam
Copy link
Contributor

Motivation

Currently, Pulsar uses Avro 1.8.2, a version released two years ago. The latest version of Avro is 1.9.1, which uses FasterXML's Jackson 2.x instead of Codehaus's Jackson 1.x. Jackson is prone to security issues, so we should not keep using older versions.
https://blog.godatadriven.com/apache-avro-1-9-release

Modifications

Avro 1.9 has some major changes:

{
  "name": "fieldName",
  "type": [
    "null",
    "string"
  ],
  "default": "defaultValue"
}

The default value of a nullable field must be null (cf. https://issues.apache.org/jira/browse/AVRO-1803), and the default value of the field as above is actually null. However, this PR disables the validation in order to maintain the traditional behavior.


try {
// Disable validation of default values for compatibility
validateDefaults.set(false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not a very good way. However, there seems to be no public method to disable validation. Another option is to enable validation of default values.

@massakam
Copy link
Contributor Author

retest this please

@massakam massakam added this to the 2.6.0 milestone Dec 26, 2019
@massakam
Copy link
Contributor Author

rerun cpp tests

@sijie sijie merged commit d6f240e into apache:master Jan 6, 2020
@massakam massakam deleted the upgrade-avro branch January 6, 2020 04:13
@tuteng
Copy link
Member

tuteng commented Mar 21, 2020

Add label release-2.5.1, due to #6406 dependency

tuteng pushed a commit to AmateurEvents/pulsar that referenced this pull request Mar 21, 2020
### Motivation

Currently, Pulsar uses Avro 1.8.2, a version released two years ago. The latest version of Avro is 1.9.1, which uses FasterXML's Jackson 2.x instead of Codehaus's Jackson 1.x. Jackson is prone to security issues, so we should not keep using older versions.
https://blog.godatadriven.com/apache-avro-1-9-release

### Modifications

Avro 1.9 has some major changes:

- The library used to handle logical datetime values has changed from Joda-Time to JSR-310 (apache/avro#631)
- Namespaces no longer include "$" when generating schemas containing inner classes using ReflectData (apache/avro#283)
- Validation of default values has been enabled (apache/avro#288). This results in a validation error when parsing the following schema:
```json
{
  "name": "fieldName",
  "type": [
    "null",
    "string"
  ],
  "default": "defaultValue"
}
```
The default value of a nullable field must be null (cf. https://issues.apache.org/jira/browse/AVRO-1803), and the default value of the field as above is actually null. However, this PR disables the validation in order to maintain the traditional behavior.

(cherry picked from commit d6f240e)
@psilos
Copy link

psilos commented Apr 7, 2020

@massakam Thanks for this fix and looking forward for the newer Avro library to be integrated with Pulsar.
Question on the fix, wouldn't the replacement of Joda-Time with JSR-310 Instant break backwards compatibility of Avro Usage within Pulsar ?

@psilos
Copy link

psilos commented Apr 9, 2020

@codelipenghui @sijie thoughts on the above comment? ^ Wouldn't that fix break backwards compatibility with existing code base?

If it does, should Pulsar also take an action of being more compliant with default Avro behaviour such as :

  1. Validation of default values
  2. Don't allow null values by default

@sijie
Copy link
Member

sijie commented Apr 9, 2020

@psilos good point.

@codelipenghui @tuteng I think we need to verify the two questions that @psilos raised. If they are breaking behaviors, then it is a problem for 2.5.1. Can you guys verify that?

@codelipenghui
Copy link
Contributor

codelipenghui commented Apr 10, 2020

@psilos I have pushed a PR #6704 to adds Joda time logical type conversion so that we can keep forwarding compatibility, And it should be included in 2.5.1.

About the allow null, this is a legacy. We'd better keep forwarding compatibility. I agree with you that Pulsar schema should compliant with default Avro behavior. We'd better draft a PIP to change the default behavior and provide options to fall back to the old behavior. I think he can plan the default value validation and allow null in 2.6.0 since 2.5.1 is on the release process.

sijie pushed a commit that referenced this pull request Apr 11, 2020
### Motivation

After upgrade to Apache Avro 1.9.x, the default time conversion changed to JSR-310. For forwarding compatibility, we'd better add the Joda time conversion.

related to #5938 

### Modifications

Add joda time conversions

### Verifying this change

New integration test added
tuteng pushed a commit that referenced this pull request Apr 13, 2020
### Motivation

Currently, Pulsar uses Avro 1.8.2, a version released two years ago. The latest version of Avro is 1.9.1, which uses FasterXML's Jackson 2.x instead of Codehaus's Jackson 1.x. Jackson is prone to security issues, so we should not keep using older versions.
https://blog.godatadriven.com/apache-avro-1-9-release

### Modifications

Avro 1.9 has some major changes:

- The library used to handle logical datetime values has changed from Joda-Time to JSR-310 (apache/avro#631)
- Namespaces no longer include "$" when generating schemas containing inner classes using ReflectData (apache/avro#283)
- Validation of default values has been enabled (apache/avro#288). This results in a validation error when parsing the following schema:
```json
{
  "name": "fieldName",
  "type": [
    "null",
    "string"
  ],
  "default": "defaultValue"
}
```
The default value of a nullable field must be null (cf. https://issues.apache.org/jira/browse/AVRO-1803), and the default value of the field as above is actually null. However, this PR disables the validation in order to maintain the traditional behavior.

(cherry picked from commit d6f240e)
tuteng pushed a commit that referenced this pull request Apr 13, 2020
After upgrade to Apache Avro 1.9.x, the default time conversion changed to JSR-310. For forwarding compatibility, we'd better add the Joda time conversion.

related to #5938

Add joda time conversions

New integration test added

(cherry picked from commit 854716f)

Handle conflict
@psilos
Copy link

psilos commented Apr 17, 2020

@codelipenghui Thanks very much for the quick fix, and looking forward for the new release.

Unfortunately one of the areas where we could have used the Avro upgrade was inside Pulsar Functions, are you planning to add the extra flag(s) into Pulsar Functions as well?

Currently there is no way to override the default allow null behaviour, neither the use of JSR-310 Instant. CC @sijie

@codelipenghui
Copy link
Contributor

@psilos could you please help create an issue? so that we can keep it be tracked. Since 2.5.1 RC is out, we can plan it in 2.5.2

BTW, if you are using the Avro compiler, you need to upgrade the compiler. Otherwise, you may meet some class not found exception. Because some classes are deleted in the new Avro lib.

jiazhai pushed a commit to jiazhai/pulsar that referenced this pull request May 18, 2020
### Motivation

Currently, Pulsar uses Avro 1.8.2, a version released two years ago. The latest version of Avro is 1.9.1, which uses FasterXML's Jackson 2.x instead of Codehaus's Jackson 1.x. Jackson is prone to security issues, so we should not keep using older versions.
https://blog.godatadriven.com/apache-avro-1-9-release

### Modifications

Avro 1.9 has some major changes:

- The library used to handle logical datetime values has changed from Joda-Time to JSR-310 (apache/avro#631)
- Namespaces no longer include "$" when generating schemas containing inner classes using ReflectData (apache/avro#283)
- Validation of default values has been enabled (apache/avro#288). This results in a validation error when parsing the following schema:
```json
{
  "name": "fieldName",
  "type": [
    "null",
    "string"
  ],
  "default": "defaultValue"
}
```
The default value of a nullable field must be null (cf. https://issues.apache.org/jira/browse/AVRO-1803), and the default value of the field as above is actually null. However, this PR disables the validation in order to maintain the traditional behavior.
(cherry picked from commit d6f240e)
jiazhai pushed a commit to jiazhai/pulsar that referenced this pull request May 18, 2020
After upgrade to Apache Avro 1.9.x, the default time conversion changed to JSR-310. For forwarding compatibility, we'd better add the Joda time conversion.

related to apache#5938

Add joda time conversions

New integration test added
(cherry picked from commit 854716f)
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
### Motivation

Currently, Pulsar uses Avro 1.8.2, a version released two years ago. The latest version of Avro is 1.9.1, which uses FasterXML's Jackson 2.x instead of Codehaus's Jackson 1.x. Jackson is prone to security issues, so we should not keep using older versions.
https://blog.godatadriven.com/apache-avro-1-9-release

### Modifications

Avro 1.9 has some major changes:

- The library used to handle logical datetime values has changed from Joda-Time to JSR-310 (apache/avro#631)
- Namespaces no longer include "$" when generating schemas containing inner classes using ReflectData (apache/avro#283)
- Validation of default values has been enabled (apache/avro#288). This results in a validation error when parsing the following schema:
```json
{
  "name": "fieldName",
  "type": [
    "null",
    "string"
  ],
  "default": "defaultValue"
}
```
The default value of a nullable field must be null (cf. https://issues.apache.org/jira/browse/AVRO-1803), and the default value of the field as above is actually null. However, this PR disables the validation in order to maintain the traditional behavior.
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
### Motivation

After upgrade to Apache Avro 1.9.x, the default time conversion changed to JSR-310. For forwarding compatibility, we'd better add the Joda time conversion.

related to apache#5938 

### Modifications

Add joda time conversions

### Verifying this change

New integration test added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants