Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-2876]AVRO datatype support through SDK #2671

Closed
wants to merge 1 commit into from

Conversation

Indhumathi27
Copy link
Contributor

This PR supports following Avro DataTypes to carbon format through SDK. Avro datatypes include,

  1. Avro Union
  2. Avro Enum
  3. Avro Logical type Decimal

Please refer JIRA CARBONDATA-2876 for further detail.

  • Any interfaces changed?

  • Any backward compatibility impacted?

  • Document update required?

  • Testing done
    Test file has been added

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6462/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8149/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/78/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6466/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8153/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/82/

@Indhumathi27 Indhumathi27 changed the title [WIP]AVRO datatype support through SDK [CARBONDATA-2876]AVRO datatype support through SDK Aug 30, 2018
if (logicalType instanceof LogicalTypes.Decimal) {
out = new BigDecimal(new String(((ByteBuffer) fieldValue).array(),
CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this semi-colon

CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
;
} else {
out = fieldValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else case handling is not required as Binary data type is not supported...so write a proper comment to say that only decimal logical type is support for avro Byte data type. Once binary data type is supported we can add the else block

}
}
Object[] values;
values = new Object[countIfNotNull];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Rename countIfNotNull to notNullUnionFieldsCount
  2. merge above 2 lines 'Object[] values = new Object[countIfNotNull];'
  3. Check union behavior for only NULL type and handle if any special handling is required

int j = 0;
for (Schema unionField : fieldsUnion) {
if (!unionField.getType().equals(Schema.Type.NULL)) {
values[j] = avroFieldToObjectForUnionType(unionField, fieldValue, avroField);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Try to reuse the code as much as possible. You can extract the primitives types computation in a separate method and override only complex types as different methods for union and rest of the data types
  2. Write a function for union type to identify the schema for which the computation need to be done. Do not call the computation for all the union types as at one time only one type of value will exists

@manishgupta88
Copy link
Contributor

@Indhumathi27 ..please modify the code as per the comments then we can continue with further review of code

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8182/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/111/

values[j] = avroFieldToObjectForUnionType(unionField, fieldValue, avroField);
} else {
values[j] = null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Remove else block
  2. Combine above 2 if conditions into 1 using && operator
  3. break the loop once if check is success

CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
} else {
// As binary type is not supported yet, fill value as null
out = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this else block

// Get union types and store as Struct<type>
ArrayList<StructField> unionFields = new ArrayList<>();
for (Schema avroSubField : avroField.schema().getTypes()) {
StructField unionField = prepareSubFields(avroField.name() + i++, avroSubField);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for NULL schema here

unionFields.add(unionField);
}
}
if (unionFields.size() != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace 'unionFields.size()' with 'unionFields.isEmpty()'

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8188/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/117/

if (notNullUnionFieldsCount != 1) {
j++;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify the code as below
int j = 0;
for (Schema unionField : unionFields) {
if (unionField.getType().equals(Schema.Type.NULL) ) {
continue;
}
if (checkFieldValueType(unionField.getType(), fieldValue)) {
values[j] = avroFieldToObjectForUnionType(unionField, fieldValue, avroField);
break;
}
j++;
}

@manishgupta88
Copy link
Contributor

LGTM...can be merged once build passes

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8197/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/126/

@asfgit asfgit closed this in b588cb6 Aug 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants