New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-2186] Add readCsvAsRow methods to CsvReader and scala ExecutionEnv #3012
Conversation
Hi @StephanEwen, @thvasilo could you look at this PR? |
Thanks for work @tonycox. What do you mean about test negative case? If typeMap does not match with fields type in file for example |
this(configureTypes(mainType, size, Collections.<Integer, TypeInformation<?>>emptyMap())); | ||
} | ||
|
||
private static TypeInformation<?>[] configureTypes(TypeInformation<?> mainType, int size, Map<Integer, TypeInformation<?>> additionalTypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you format argumets like as for RowTypeInfo#createComparator#219
for example:
private static TypeInformation<?>[] configureTypes(
TypeInformation<?> mainType,
int size,
Map<Integer, TypeInformation<?>> additionalTypes) {
@@ -351,6 +356,32 @@ public CsvReader ignoreInvalidLines(){ | |||
return new DataSource<T>(executionContext, inputFormat, typeInfo, Utils.getCallLocationName()); | |||
} | |||
|
|||
public DataSource<Row> rowType(Class<?> mainTargetType, int size, Map<Integer, Class<?>> additionalTypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc
return new DataSource<Row>(executionContext, inputFormat, rowTypeInfo, Utils.getCallLocationName()); | ||
} | ||
|
||
public DataSource<Row> rowType(Class<?> mainTargetType, int size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc
@@ -348,6 +349,47 @@ class ExecutionEnvironment(javaEnv: JavaEnv) { | |||
wrap(new DataSource[T](javaEnv, inputFormat, typeInfo, getCallLocationName())) | |||
} | |||
|
|||
def readCsvFileAsRow[T : ClassTag : TypeInformation]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add scaladoc for method? may do not understand, what is additionalTypes
ignoreFirstLine: Boolean = false, | ||
ignoreComments: String = null, | ||
lenient: Boolean = false, | ||
includedFields: Array[Int] = null): DataSet[Row] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more two spase for parameters like as for other methods in class, space chars should is 6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or 3 tabs :)
"111,222,333,444,555,666,777,888,999,000\n" + | ||
"a,b,c," + 40 + "," + 50.0d + "," + false + | ||
",g,h,i,aa,bb,cc,dd,ee,ff,gg,hh,ii,mm," + | ||
"aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,mmm\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of concatenations, the fileContent equals
"1,2,3,4,5.0,true" +
",7,8,9,11,22,33,44,55,66,77,88,99,00," +
"111,222,333,444,555,666,777,888,999,000\n" +
"a,b,c,40,50.0,false," +
"g,h,i,aa,bb,cc,dd,ee,ff,gg,hh,ii,mm," +
"aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,mmm\n"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it's simplier to see values, I'll make it static to creating it once.
@ex00 Thank you for review, Do you have any negative cases that breake this down? |
Hi @tonycox, thanks for your reply.
Yes, i have. which is expected result, if any field in source file is empty, is null or default values or something else? |
@ex00 default value of any empty field is null |
I'm closing this as "Abandoned", since there is no more activity and the code base has moved on quite a bit. Please re-open this if you feel otherwise and work should continue. |
Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.
General
Documentation
Tests & Build
mvn clean verify
has been executed successfully locally or a Travis build has passedRework CSV import to support very wide files