Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Sqoop component optimization #2917

Closed
Eights-Li opened this issue Jun 6, 2020 · 3 comments
Closed

[Feature] Sqoop component optimization #2917

Eights-Li opened this issue Jun 6, 2020 · 3 comments
Labels
feature new feature

Comments

@Eights-Li
Copy link
Contributor

Is your feature request related to a problem? Please describe.
dev branch sqoop task need to enhancment.
optimization points:
Sqoop's data access and data export do not support Hadoop-level custom parameters, that is, -D level parameters
– MR task name
– MR map and reduce memory and quantity, etc.
• Split-by field is not supported. If -m is greater than 1, if the primary key of the relational database table is not self-increasing, Sqoop It may cause duplicate data imported into Hadoop. The general solution is to specify a split-by field. therefore, split-by needs support
• Cannot customize parameters, such as import mysql, some tables can add –direct to speed up the import speed

Describe the solution you'd like
ideas:
• The task name of Sqoop is universal, and it must be changed to the required parameter on the Sqoop page
• Add Hadoop custom parameter input box for setting MR parameter memory, etc.
• Add Sqoop task-level custom parameters, like –driect, –fetch-size and other parameters used in specific situations
• Add option button to choose, custom script or use template script, refer to the design of DataX node

@Eights-Li Eights-Li added the feature new feature label Jun 6, 2020
@743294668
Copy link
Contributor

The suggestion is very good. At present, the Sqoop node type does only support data import and export, and other Sqoop commands do not support it. But according to my understanding, the person in charge of the Sqoop class node may not want to open the function of custom parameters, because this is more difficult to verify.
The above is just my understanding. I hope @zixi0825 can give us your opinions. Thank you.

@zixi0825
Copy link
Member

zixi0825 commented Jun 8, 2020

The solution is good. When I designed, I thought that the graphical interface can meet the needs of simple and fast use. It is better to use custom scripts to support custom parameters. Custom scripts I think can be implemented with shell scripts, so they are not added to the sqoop task type.

@Eights-Li
Copy link
Contributor Author

merge into dev, close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new feature
Projects
None yet
Development

No branches or pull requests

3 participants