Skip to content

Conversation

@ihuzenko
Copy link
Member

@ihuzenko ihuzenko commented Jan 8, 2019

  1. Added persistence of MAP key and value types in Drill views (affects .view.drill file) for avoiding cast problems in future.
  2. Preserved backward compatibility of older view files by treating untyped maps as ANY.

@arina-ielchiieva
Copy link
Member

@vdiravka could you please review?

break;
}
keyType = dataType.getKeyType() == null ? null : new FieldType(null, dataType.getKeyType());
valueType = dataType.getValueType() == null ? null : new FieldType(null, dataType.getValueType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't use dataType.getSqlTypeName().getName() instead of null?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confused Field Type Name with Filed Name. Filed Name should be used for FiledType. The name of FiledType is confusing. Consider renaming FiledType to Filed.
Regarding null in constructor, I agree there is no need to store the key, value names for every internal MAP (it will be not a view, but dataset then).
But I think it will be good to introduce the new constructor without Field Name.
This constructor will serve for representing internal Fields of complex Fields (MAP now, maybe List in future) in the View.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good idea. Done the changes.

@Test // DRILL-6944
public void testMapColumnOfNewlyCreatedView() throws Exception {
try {
test("CREATE VIEW dfs.tmp.`mapf_view` AS SELECT mapf FROM dfs.`avro/map_string_to_long.avro`");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to create view for Parquet and Json with complex data types and the view involves ANY instead of MAP datatype in the result. Do you know the reason of that?
What is the DESCRIBE output for this view for Avro? Does it involve MAP, not ANY?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did some experiments with map column in avro and figured out that Drill detected MAP type exactly same as for binary table. And bug DRILL-6944 was reproduced for view created over avro file before I made the fix (this fact mentioned in Jira also). So I think we can rely on added tests to ensure that issue is gone.

test("CREATE VIEW dfs.tmp.`mapf_view` AS SELECT mapf FROM dfs.`avro/map_string_to_long.avro`");
test("SELECT * FROM dfs.tmp.`mapf_view`");
testBuilder()
.sqlQuery("SELECT mapf['ki'] as ki FROM dfs.tmp.`mapf_view`")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please escape mapf with back ticks or rename to map_f.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escaped

return isNullable;
}

public FieldType getKeyType() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add JavaDocs. All above similar methods have them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added javadocs and @JsonIgnore for methods which won't take part in serialization.

default:
break;
}
keyType = dataType.getKeyType() == null ? null : new FieldType(null, dataType.getKeyType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should keyType and valueType be used with other data types than MAP?
Looks like you can add MAP case for swtich statement and place initializing of the new fields there.
Note: looks like precision, sacle and intervalQualifier also can be used only for specific data types, but if you are not sure, just leave them as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's for maps only, good catch. Added new case for MAP above.

} else if (typeName == SqlTypeName.MAP) {
type = field.isMapTypesPresent()
? factory.createMapType(getType(field.getKeyType(), factory), getType(field.getValueType(), factory))
: factory.createSqlType(SqlTypeName.ANY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please put a comment here why ANY is used in case of untyped MAP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added explanation comment.

"isNullable" : false
} ],
"workspaceSchemaPath" : [ ]
} No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it's required to add new lines for non source files
generated by Drill and used only for testing purposes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think the proper way to generate it by Drill in the process of creating view.
It is a good style to always put the newline as a last character if it is allowed by the file format. For some tools it can be the marker of EOF.
Anyway it is pretty minor, since Drill views can be consumed only by Drill. You can leave it as is.

…B binary table

1. Added persistence of MAP key and value types in Drill views (affects .view.drill file) for avoiding cast problems in future.
2. Preserved backward compatibility of older view files by treating untyped maps as ANY.
@ihuzenko
Copy link
Member Author

@vdiravka I've tried to address your comments, could you please review again ?

Copy link
Member

@vdiravka vdiravka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

"isNullable" : false
} ],
"workspaceSchemaPath" : [ ]
} No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think the proper way to generate it by Drill in the process of creating view.
It is a good style to always put the newline as a last character if it is allowed by the file format. For some tools it can be the marker of EOF.
Anyway it is pretty minor, since Drill views can be consumed only by Drill. You can leave it as is.

@asfgit asfgit closed this in de863af Jan 18, 2019
lushuifeng pushed a commit to lushuifeng/drill that referenced this pull request Jun 21, 2019
…B binary table

1. Added persistence of MAP key and value types in Drill views (affects .view.drill file) for avoiding cast problems in future.
2. Preserved backward compatibility of older view files by treating untyped maps as ANY.

closes apache#1602
xiangt920 pushed a commit to xiangt920/drill that referenced this pull request Dec 26, 2019
…B binary table

1. Added persistence of MAP key and value types in Drill views (affects .view.drill file) for avoiding cast problems in future.
2. Preserved backward compatibility of older view files by treating untyped maps as ANY.

closes apache#1602
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants