[FLINK-4002] [py] Improve testing infraestructure #2063

omaralvarez · 2016-06-02T09:18:16Z

The Verify() test function now does not error out when array elements are missing:

env.generate_sequence(1, 5)\
         .map(Id()).map_partition(Verify([1,2,3,4], "Sequence")).output()

I have also documented test functions.

While documenting, two questions arise:

First, Verify2 function has no use as is, performing a if value in self.expected: before:

try:
    self.expected.remove(value)
except Exception:
    raise Exception()

Makes this function useless, since it will never raise and exception, if I am not mistaken.

Second, I am not sure why there are two test scripts, main_test.py and main_test2.py.

zentol · 2016-06-02T09:50:51Z

you are correct, Verify2 should be modified.

there are 2 test scripts since most operations are running at the same time, eating up a lot of memory due to the memory-mapped files. putting them all in one can lead to OOM errors.

omaralvarez · 2016-06-02T10:33:43Z

Ok, I will modify it and commit the corrected version.

omaralvarez · 2016-06-03T12:04:52Z

I have corrected Verify2(), but there is another case that is not checked, when the resulting datasets have more elements than expected, right now the error will go unnoticed.

I also wanted to ask, is there a way to execute only the python tests? Since I want to unify the utilities in a file, but without knowing what is the execution path, I cannot make sure if the module will be imported correctly.

zentol · 2016-06-03T12:09:56Z

when we have more data than expected, remove() will be called on an empty list and should throw an exception, no?

if you want to execute the python tests you only have to call mvn verify on the flink-python package.

cd flink-libraries/flink-python
mvn verify

omaralvarez · 2016-06-03T12:19:19Z

Sorry, I said it wrong, it's the opposite. The case that fails in Verify() and Verify2(), is when we have more values in expected than in the resulting DataSet.

zentol · 2016-06-03T12:29:19Z

you're right, that falls through. we should add additional checks:

Verify: index is equal to the size of expected values
Verify2: expected is empty

omaralvarez · 2016-06-05T10:26:35Z

I have fixed the last errors in the test functions. But, while trying to refactor the utility code, that now is repeated in both test files, I think I found another bug.

The thing is that, in order to be able to have another utils.py file, we need to execute the tests as:

pyflink2.sh test_main.py utils.py

Right now, if HDFS is not used, our case with env.execute(local=True), the packages are not copied to the temp folder along with the script file, and the runner fails not being able to locate the module that has been imported. If we add this module to the PYTHONPATHeverything works fine, but I believe this should not happen. This is probably a matter for another JIRA issue altogether.

zentol · 2016-06-05T11:20:14Z

test files are never copied to the tmp folder regardless of whether HDFS is used or not. This is not a bug, but intended behaviour.

You can easily add the utils.py to the test invokation by modifying the call to the PythonPlanBinder.main in PythonPlanBinderTest. it should be as simple as appending "/utils.py" to the call.

omaralvarez · 2016-06-06T06:23:16Z

Yes, I know that how python scripts are executed for the test is different. Let me elaborate:

Since running the tests is quite costly in my laptop, I normally test my changes executing them in a local instance of Flink 1.0.3, since this is less taxing. Once I complete the changes, I run mvn verify. The problem is that when I call pyflink2.sh test_main.py utils.py, the module that I pass to the test script, is ignored unless I use HDFS, in which case, everything works fine.

omaralvarez · 2016-06-06T18:59:41Z

I can confirm that there is a bug in PythonPlanBinder.java. When you pass packages to the runPlan() function, they are not copied to the temp directory in which plan.py file is generated. I manually changed the prepareFiles() call to copy the package files, and everything worked fine.

Should I open another issue, or just fix it here? This is not only a bug related to the testing infrastructure, but the Python API in general. I can provide a small example, where the code fails.

zentol · 2016-06-06T19:28:13Z

you can add it to this PR as a separate commit.

zentol · 2016-06-06T19:44:56Z

looking at the code right now; i may have figured out why the files aren't copied, but i find it odd that it supposedly works with hdfs. it actually should never copy additional files if no parameters are given.

omaralvarez · 2016-06-06T19:49:57Z

Yes, I had the same thought looking at the code, I could not figure out why it worked with HDFS... I will try to fix it tomorrow.

…assed and modules are

omaralvarez · 2016-06-07T18:16:42Z

I think everything should be ready now. I was not able to pinpoint why HDFS worked, I assume distributeFiles() copied the complete directory and that is why it worked, but I'm not 100% sure.

omaralvarez · 2016-06-09T16:52:36Z

Is everything ok? Should I look into something else?

zentol · 2016-06-09T17:41:07Z

Looks good, I'll merge it later on.

omaralvarez · 2016-06-09T18:31:17Z

Thanks!

[FLINK-4002] [py] Improve testing infraestructure

784a602

[FLINK-4002] [py] Fix Verify2()

4090d42

[FLINK-4002] [py] Fix last error in Verify() and Verify2()

b23beac

omaralvarez added 3 commits June 7, 2016 19:51

[FLINK-4002] [py] Fix PythonPlanBinder error when arguments are not p…

b1c7fb2

…assed and modules are

[FLINK-4002] [py] Reuse utility functions in tests

48ea5b1

[FLINK-4002] [py] Remove comment that is not necessary anymore

6215232

asfgit closed this in 76dcbd4 Jun 10, 2016

rmetzger added the component=API/Python label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-4002] [py] Improve testing infraestructure #2063

[FLINK-4002] [py] Improve testing infraestructure #2063

omaralvarez commented Jun 2, 2016 •

edited

zentol commented Jun 2, 2016

omaralvarez commented Jun 2, 2016

omaralvarez commented Jun 3, 2016 •

edited

zentol commented Jun 3, 2016

omaralvarez commented Jun 3, 2016

zentol commented Jun 3, 2016

omaralvarez commented Jun 5, 2016 •

edited

zentol commented Jun 5, 2016

omaralvarez commented Jun 6, 2016 •

edited

omaralvarez commented Jun 6, 2016 •

edited

zentol commented Jun 6, 2016

zentol commented Jun 6, 2016

omaralvarez commented Jun 6, 2016 •

edited

omaralvarez commented Jun 7, 2016

omaralvarez commented Jun 9, 2016

zentol commented Jun 9, 2016

omaralvarez commented Jun 9, 2016

[FLINK-4002] [py] Improve testing infraestructure #2063

[FLINK-4002] [py] Improve testing infraestructure #2063

Conversation

omaralvarez commented Jun 2, 2016 • edited

zentol commented Jun 2, 2016

omaralvarez commented Jun 2, 2016

omaralvarez commented Jun 3, 2016 • edited

zentol commented Jun 3, 2016

omaralvarez commented Jun 3, 2016

zentol commented Jun 3, 2016

omaralvarez commented Jun 5, 2016 • edited

zentol commented Jun 5, 2016

omaralvarez commented Jun 6, 2016 • edited

omaralvarez commented Jun 6, 2016 • edited

zentol commented Jun 6, 2016

zentol commented Jun 6, 2016

omaralvarez commented Jun 6, 2016 • edited

omaralvarez commented Jun 7, 2016

omaralvarez commented Jun 9, 2016

zentol commented Jun 9, 2016

omaralvarez commented Jun 9, 2016

omaralvarez commented Jun 2, 2016 •

edited

omaralvarez commented Jun 3, 2016 •

edited

omaralvarez commented Jun 5, 2016 •

edited

omaralvarez commented Jun 6, 2016 •

edited

omaralvarez commented Jun 6, 2016 •

edited

omaralvarez commented Jun 6, 2016 •

edited