Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datax---transformer的替换换行符与emoji的方法 #96

Open
AronChung opened this issue Jan 18, 2021 · 0 comments
Open

Datax---transformer的替换换行符与emoji的方法 #96

AronChung opened this issue Jan 18, 2021 · 0 comments
Labels
DataX datax深入研究
Projects

Comments

@AronChung
Copy link
Owner

第一种,直接在parameter中获取字段做过滤,但这种方式很难测试,\n不知该如何转义,最终无法实现替换转义

"transformer":[{
"name": "dx_groovy",
"parameter":
{"code": "Column column = record.getColumn(16);\ndef str = column.asString();\nif(str == null){return null;}\ndef newStr=null;\nnewStr = str.replaceAll(\"\\n\",\"\\n\");\nrecord.setColumn(16, new StringColumn(newStr));\nreturn record;",
"extraPackage":[]}
}]

第二种,在GroovyTransformerStaticUtil.java类中写静态方法

public class GroovyTransformerStaticUtil  {

    public static String hiveColumnEscaped(String str) {
        if(str == null) return null;

        String result = StringUtils.replace(str, "\n", "\\n");


        result = StringUtils.replace(result, "\1", "\\1");
        result = StringUtils.replace(result, "\2", "\\2");
        result = StringUtils.replace(result, "\3", "\\3");
        return result;
    }

    public static Record hiveRowEscaped(Record record) {
        int length = record.getColumnNumber();

        for( int i = 0; i < length; i++){
            String result = "";
            Column column = record.getColumn(i);
            String str = column.asString();
            if(str == null) return record;

            result = StringUtils.replace(str, "\n", "\\n");


            result = StringUtils.replace(result, "\1", "\\1");
            result = StringUtils.replace(result, "\2", "\\2");
            result = StringUtils.replace(result, "\3", "\\3");

            record.setColumn(i, new StringColumn(result));

        }
        return record;
    }
}

然后再parameter中引用:

"transformer":[{
"name": "dx_groovy",
"parameter":
{"code": "Column column = record.getColumn(16);\ndef str = column.asString();\ndef newStr=null;\nnewStr = hiveEscaped(str);\nrecord.setColumn(16, new StringColumn(newStr));\nreturn record;",
"extraPackage":[]}
}]

由于这种替换单个字段的方式需要指定column,所以最好采取第三种引用方式,即使用代码中的hiveRowEscaped():

"transformer":[{
"name": "dx_groovy",
"parameter":
{"code": "Record finalRecord = hiveRowEscaped(record);\nreturn finalRecord;",
"extraPackage":[]}
}]
@AronChung AronChung added the DataX datax深入研究 label Jan 18, 2021
@AronChung AronChung added this to DataX in My Blog Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataX datax深入研究
Projects
My Blog
  
DataX
Development

No branches or pull requests

1 participant