Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-2186] Add readCsvAsRow methods to CsvReader and scala ExecutionEnv #3012

Closed
wants to merge 2 commits into from

Conversation

tonycox
Copy link
Contributor

@tonycox tonycox commented Dec 15, 2016

Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.

  • General

    • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
    • The pull request addresses only one issue
    • Each commit in the PR has a meaningful commit message (including the JIRA id)
  • Documentation

    • Documentation has been added for new functionality
    • Old documentation affected by the pull request has been updated
    • JavaDoc for public methods has been added
  • Tests & Build

    • Functionality added by the pull request is covered by tests
    • mvn clean verify has been executed successfully locally or a Travis build has passed

Rework CSV import to support very wide files

@tonycox
Copy link
Contributor Author

tonycox commented Dec 21, 2016

Hi @StephanEwen, @thvasilo could you look at this PR?

@ex00
Copy link

ex00 commented Feb 9, 2017

Thanks for work @tonycox.
The PR looks good to me. I left a few minor comments about PR.

What do you mean about test negative case? If typeMap does not match with fields type in file for example

this(configureTypes(mainType, size, Collections.<Integer, TypeInformation<?>>emptyMap()));
}

private static TypeInformation<?>[] configureTypes(TypeInformation<?> mainType, int size, Map<Integer, TypeInformation<?>> additionalTypes) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you format argumets like as for RowTypeInfo#createComparator#219
for example:

private static TypeInformation<?>[] configureTypes(
		TypeInformation<?> mainType,
		int size,
		Map<Integer, TypeInformation<?>> additionalTypes) {

@@ -351,6 +356,32 @@ public CsvReader ignoreInvalidLines(){
return new DataSource<T>(executionContext, inputFormat, typeInfo, Utils.getCallLocationName());
}

public DataSource<Row> rowType(Class<?> mainTargetType, int size, Map<Integer, Class<?>> additionalTypes) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add javadoc

return new DataSource<Row>(executionContext, inputFormat, rowTypeInfo, Utils.getCallLocationName());
}

public DataSource<Row> rowType(Class<?> mainTargetType, int size) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add javadoc

@@ -348,6 +349,47 @@ class ExecutionEnvironment(javaEnv: JavaEnv) {
wrap(new DataSource[T](javaEnv, inputFormat, typeInfo, getCallLocationName()))
}

def readCsvFileAsRow[T : ClassTag : TypeInformation](
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add scaladoc for method? may do not understand, what is additionalTypes

ignoreFirstLine: Boolean = false,
ignoreComments: String = null,
lenient: Boolean = false,
includedFields: Array[Int] = null): DataSet[Row] = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more two spase for parameters like as for other methods in class, space chars should is 6.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or 3 tabs :)

"111,222,333,444,555,666,777,888,999,000\n" +
"a,b,c," + 40 + "," + 50.0d + "," + false +
",g,h,i,aa,bb,cc,dd,ee,ff,gg,hh,ii,mm," +
"aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,mmm\n";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of concatenations, the fileContent equals

"1,2,3,4,5.0,true" +
",7,8,9,11,22,33,44,55,66,77,88,99,00," +
"111,222,333,444,555,666,777,888,999,000\n" +
"a,b,c,40,50.0,false," +
"g,h,i,aa,bb,cc,dd,ee,ff,gg,hh,ii,mm," +
"aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,mmm\n"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's simplier to see values, I'll make it static to creating it once.

@tonycox
Copy link
Contributor Author

tonycox commented Feb 9, 2017

@ex00 Thank you for review,
If typeMap does not match with fields type in file it will fall with ParseException.
But if Double parser try to parse an integer it will be okay.

Do you have any negative cases that breake this down?

@ex00
Copy link

ex00 commented Feb 13, 2017

Hi @tonycox, thanks for your reply.

Do you have any negative cases that breake this down?

Yes, i have. which is expected result, if any field in source file is empty, is null or default values or something else?

@tonycox
Copy link
Contributor Author

tonycox commented Feb 13, 2017

@ex00 default value of any empty field is null

@aljoscha
Copy link
Contributor

I'm closing this as "Abandoned", since there is no more activity and the code base has moved on quite a bit. Please re-open this if you feel otherwise and work should continue.

@aljoscha aljoscha closed this Oct 15, 2019
@aljoscha aljoscha self-assigned this Oct 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants