Skip to content
This repository has been archived by the owner on Jul 9, 2021. It is now read-only.

Sqoop-3224: Binary ftp transfer mode #41

Closed
wants to merge 27 commits into from

Conversation

christeoh
Copy link

No description provided.

Chris Teoh added 27 commits December 13, 2017 00:13
Copy link
Contributor

@szvasas szvasas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Chris,
Thank you for submitting this pull request, I have left some comments and also submitted a couple of code suggestions in my own Sqoop fork: https://github.com/szvasas/sqoop/tree/szvasas-SQOOP-3224

You will find the description of my changes in the commit messages. For some reason I was not able to create a pull request against your Sqoop fork, is it possible that it is private somehow?


// Indicates if binary or ascii FTP transfer mode should be used
@StoredAsProperty("mainframe.ftp.transfermode")
private String mainframeFtpTransferMode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my idea is to add a new field to com.cloudera.sqoop.SqoopOptions.FileLayout, let's say BinaryFile which would only be supported for mainframe imports (we should add validator to check this) and I would use this new field instead of introducing the new mainframeFtpTransferMode Sqoop option. This would not make the Sqoop interface more complex and would let us reuse already existing concepts. What do you think?

@@ -61,6 +62,9 @@ private void writeObject(Object o) throws IOException {
if (o instanceof Text) {
Text to = (Text) o;
out.write(to.getBytes(), 0, to.getLength());
} else if (o instanceof BytesWritable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to keep the BinaryKeyOutputFormat because we change the behavior of RawKeyTextOutputFormat now which may lead to issues in other connectors and its name suggests that it handles text and not binary. You can use a technique to elimiate the code duplication similar I used in MainframeDatasetBinaryImportMapper.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @szvasas . I need a bit of help with applying the technique on the OutputFormats as they contain inner classes.

@@ -0,0 +1,128 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure we will need this new class.
I understand that when a SqoopRecord is generated runtime it will always have a text field and not binary but that might be solved by changing org.apache.sqoop.manager.MainframeManager#getColumnTypes

My guess is that if we changed that to return column type based on the transfer mode or file type in the SqoopOptions object we could just use the generated class.

@szvasas
Copy link
Contributor

szvasas commented Dec 17, 2018

Closing this PR since the corresponding JIRA seems to be closed already.

@szvasas szvasas closed this Dec 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants