Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-4002] [py] Improve testing infraestructure #2063

Closed
wants to merge 6 commits into from

Conversation

omaralvarez
Copy link
Contributor

@omaralvarez omaralvarez commented Jun 2, 2016

The Verify() test function now does not error out when array elements are missing:

env.generate_sequence(1, 5)\
         .map(Id()).map_partition(Verify([1,2,3,4], "Sequence")).output()

I have also documented test functions.

While documenting, two questions arise:

First, Verify2 function has no use as is, performing a if value in self.expected: before:

try:
    self.expected.remove(value)
except Exception:
    raise Exception()

Makes this function useless, since it will never raise and exception, if I am not mistaken.

Second, I am not sure why there are two test scripts, main_test.py and main_test2.py.

@zentol
Copy link
Contributor

zentol commented Jun 2, 2016

you are correct, Verify2 should be modified.

there are 2 test scripts since most operations are running at the same time, eating up a lot of memory due to the memory-mapped files. putting them all in one can lead to OOM errors.

@omaralvarez
Copy link
Contributor Author

Ok, I will modify it and commit the corrected version.

@omaralvarez
Copy link
Contributor Author

omaralvarez commented Jun 3, 2016

I have corrected Verify2(), but there is another case that is not checked, when the resulting datasets have more elements than expected, right now the error will go unnoticed.

I also wanted to ask, is there a way to execute only the python tests? Since I want to unify the utilities in a file, but without knowing what is the execution path, I cannot make sure if the module will be imported correctly.

@zentol
Copy link
Contributor

zentol commented Jun 3, 2016

when we have more data than expected, remove() will be called on an empty list and should throw an exception, no?

if you want to execute the python tests you only have to call mvn verify on the flink-python package.

cd flink-libraries/flink-python
mvn verify

@omaralvarez
Copy link
Contributor Author

Sorry, I said it wrong, it's the opposite. The case that fails in Verify() and Verify2(), is when we have more values in expected than in the resulting DataSet.

@zentol
Copy link
Contributor

zentol commented Jun 3, 2016

you're right, that falls through. we should add additional checks:

  • Verify: index is equal to the size of expected values
  • Verify2: expected is empty

@omaralvarez
Copy link
Contributor Author

omaralvarez commented Jun 5, 2016

I have fixed the last errors in the test functions. But, while trying to refactor the utility code, that now is repeated in both test files, I think I found another bug.

The thing is that, in order to be able to have another utils.py file, we need to execute the tests as:

pyflink2.sh test_main.py utils.py

Right now, if HDFS is not used, our case with env.execute(local=True), the packages are not copied to the temp folder along with the script file, and the runner fails not being able to locate the module that has been imported. If we add this module to the PYTHONPATHeverything works fine, but I believe this should not happen. This is probably a matter for another JIRA issue altogether.

@zentol
Copy link
Contributor

zentol commented Jun 5, 2016

test files are never copied to the tmp folder regardless of whether HDFS is used or not. This is not a bug, but intended behaviour.

You can easily add the utils.py to the test invokation by modifying the call to the PythonPlanBinder.main in PythonPlanBinderTest. it should be as simple as appending "/utils.py" to the call.

@omaralvarez
Copy link
Contributor Author

omaralvarez commented Jun 6, 2016

Yes, I know that how python scripts are executed for the test is different. Let me elaborate:

Since running the tests is quite costly in my laptop, I normally test my changes executing them in a local instance of Flink 1.0.3, since this is less taxing. Once I complete the changes, I run mvn verify. The problem is that when I call pyflink2.sh test_main.py utils.py, the module that I pass to the test script, is ignored unless I use HDFS, in which case, everything works fine.

@omaralvarez
Copy link
Contributor Author

omaralvarez commented Jun 6, 2016

I can confirm that there is a bug in PythonPlanBinder.java. When you pass packages to the runPlan() function, they are not copied to the temp directory in which plan.py file is generated. I manually changed the prepareFiles() call to copy the package files, and everything worked fine.

Should I open another issue, or just fix it here? This is not only a bug related to the testing infrastructure, but the Python API in general. I can provide a small example, where the code fails.

@zentol
Copy link
Contributor

zentol commented Jun 6, 2016

you can add it to this PR as a separate commit.

@zentol
Copy link
Contributor

zentol commented Jun 6, 2016

looking at the code right now; i may have figured out why the files aren't copied, but i find it odd that it supposedly works with hdfs. it actually should never copy additional files if no parameters are given.

@omaralvarez
Copy link
Contributor Author

omaralvarez commented Jun 6, 2016

Yes, I had the same thought looking at the code, I could not figure out why it worked with HDFS... I will try to fix it tomorrow.

@omaralvarez
Copy link
Contributor Author

I think everything should be ready now. I was not able to pinpoint why HDFS worked, I assume distributeFiles() copied the complete directory and that is why it worked, but I'm not 100% sure.

@omaralvarez
Copy link
Contributor Author

Is everything ok? Should I look into something else?

@zentol
Copy link
Contributor

zentol commented Jun 9, 2016

Looks good, I'll merge it later on.

@omaralvarez
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants