Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAWQ-1228. Use profile based on file format in HCatalog integration(HiveRC, HiveText profiles) for homogeneous tables. #1076

Closed
wants to merge 24 commits into from

Conversation

avocader
Copy link
Contributor

@avocader avocader commented Jan 7, 2017

Work still in progress, but the main part is done. I want to have earlier feedback on it, it's quite a big diff.

@shivzone
Copy link
Contributor

shivzone commented Jan 9, 2017

Each property in the profile was meant to be a particular class which allowed any user to customize/extend it. In the same theme, instead of definiging outputFormat in the pxf-profile, lets go with and get the classname.


@Override
void parseDelimiterChar(InputData input) {
delimiter = 44; //,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a temporary workaround ?
Why not use a static variable to store the delimiter value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's temporary, work is in progress to actually use appropriate parameters from Hive's SerDe.

initPartitionFields(toks[3]);
filterInFragmenter = new Boolean(toks[4]);
HiveUserData hiveUserData = HiveUtilities.parseHiveUserData(input);
initPartitionFields(hiveUserData.getPartitionKeys());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we always have value for partitionKeys ? If not, initParitionFields will throw NPE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, even if a table has no partitions, we have "!HNPT!" value.

@@ -57,11 +59,11 @@
*/
public class HiveColumnarSerdeResolver extends HiveResolver {
private static final Log LOG = LogFactory.getLog(HiveColumnarSerdeResolver.class);
private ColumnarSerDeBase deserializer;
//private ColumnarSerDeBase deserializer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove line

initPartitionFields(toks[HiveInputFormatFragmenter.TOK_KEYS]);
filterInFragmenter = new Boolean(toks[HiveInputFormatFragmenter.TOK_FILTER_DONE]);
HiveUserData hiveUserData = HiveUtilities.parseHiveUserData(input, PXF_HIVE_SERDES.LAZY_SIMPLE_SERDE);
initPartitionFields(hiveUserData.getPartitionKeys());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible NPE in this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitionKeys always have not-null value.

@@ -49,6 +49,7 @@ under the License.
<accessor>org.apache.hawq.pxf.plugins.hive.HiveAccessor</accessor>
<resolver>org.apache.hawq.pxf.plugins.hive.HiveResolver</resolver>
<metadata>org.apache.hawq.pxf.plugins.hive.HiveMetadataFetcher</metadata>
<outputFormat>BINARY</outputFormat>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultOutputFormat

void initSerde(InputData input) {
/* nothing to do here */
void initSerde(InputData input) throws Exception {
if (((ProtocolData) inputData).outputFormat() == OutputFormat.TEXT) {
Copy link
Contributor Author

@avocader avocader Jan 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invert check.

@shivzone
Copy link
Contributor

Please make sure you resolve the conflicts when you merge with master

@avocader avocader changed the title HAWQ-1228. Initial commit. HAWQ-1228. Use profile based on file format in HCatalog integration(HiveRC, HiveText profiles) for homogeneous tables. Jan 26, 2017
@@ -68,7 +68,6 @@ void free_token_resources(PxfInputData *inputData);
Datum gpbridge_import(PG_FUNCTION_ARGS)
{
gpbridge_check_inside_extproto(fcinfo, "gpbridge_import");
// ExternalSelectDesc desc = EXTPROTOCOL_GET_SELECTDESC(fcinfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove instead of commenting out


/* location - should be an array of text with one element:
* pxf://<ip:port/namaservice>/<db>.<path>?Profile=profileName&delimiter=delimiterCode */
static Datum GetLocationForFormat(char *profile, List *outputFormats, char *pxf_service_address, char *path, char *name, int delimiter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename path to db and name to path, the comment and the actual variable names are confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

this(name, type, sourceType);
this.modifiers = modifiers;
}

public Field(String name, EnumHawqType type, boolean isComplexType, String sourceType, String[] modifiers) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add isComplexType as the final argument to keep the method overloading structure clean

@@ -31,6 +31,8 @@
*/
public class InputData {

public static final String DELIMITER_KEY = "DELIMITER";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra line


HiveUserData hiveUserData = new HiveUserData(toks[0], toks[1], toks[2], toks[3], Boolean.valueOf(toks[4]), toks[5]);

if (supportedSerdes.length > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation

@avocader avocader closed this Feb 2, 2017
@avocader avocader deleted the HAWQ-1228 branch February 2, 2017 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants