Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8979] reintroduce mypy-protobuf stub generation #10734

Merged
merged 6 commits into from
Feb 26, 2020

Conversation

chadrik
Copy link
Contributor

@chadrik chadrik commented Jan 31, 2020

I think this should be resolved now by virtue of the dep being included in build-requirements.txt.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@chadrik
Copy link
Contributor Author

chadrik commented Jan 31, 2020

R: @robertwb
R: @udim

@chadrik chadrik changed the title [BEAM-8979] re-introduce mypy-protobuf stub generation [BEAM-8979] reintroduce mypy-protobuf stub generation Jan 31, 2020
@udim
Copy link
Member

udim commented Jan 31, 2020

retest this please

@udim
Copy link
Member

udim commented Jan 31, 2020

Run Python 3.7 Load Tests ParDo Dataflow Batch

@udim
Copy link
Member

udim commented Jan 31, 2020

Run BigQueryIO Write Performance Test Python Batch

@udim
Copy link
Member

udim commented Jan 31, 2020

run python 3.7 postcommit

@udim
Copy link
Member

udim commented Jan 31, 2020

Still seeing errors:

17:17:38 protoc-gen-mypy: program not found or is not executable
17:17:38 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
17:17:38 Traceback (most recent call last):
17:17:38   File "setup.py", line 308, in <module>
17:17:38     'mypy': generate_protos_first(mypy),
17:17:38   File "/home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch_PR/src/build/gradleenv/1922375555/local/lib/python2.7/site-packages/setuptools/__init__.py", line 145, in setup
17:17:38     return distutils.core.setup(**attrs)
17:17:38   File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
17:17:38     dist.run_commands()
17:17:38   File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
17:17:38     self.run_command(cmd)
17:17:38   File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
17:17:38     cmd_obj.run()
17:17:38   File "/home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch_PR/src/build/gradleenv/1922375555/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line 44, in run
17:17:38     self.run_command('egg_info')
17:17:38   File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
17:17:38     self.distribution.run_command(command)
17:17:38   File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
17:17:38     cmd_obj.run()
17:17:38   File "setup.py", line 232, in run
17:17:38     gen_protos.generate_proto_files(log=log)
17:17:38   File "/home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch_PR/src/sdks/python/gen_protos.py", line 308, in generate_proto_files
17:17:38     '%s' % ret_code)
17:17:38 RuntimeError: Protoc returned non-zero status (see logs for details): 1

https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch_PR/11/console

@chadrik
Copy link
Contributor Author

chadrik commented Jan 31, 2020

@udim thanks for the heads up. My guess is that the mypy-protoc executable is not getting on PATH.

The tests did not show up in the PRs until you requested them, but I can see them here now. Can I run the test phrases myself now?

@chadrik
Copy link
Contributor Author

chadrik commented Jan 31, 2020

run python 3.7 postcommit

@chadrik
Copy link
Contributor Author

chadrik commented Jan 31, 2020

@udim I can't trigger the jenkins jobs. Also, it's unclear to me how to run this job locally via gradle. Can you give me some advice?

Also, do you have any idea what's different about the way these two jobs are run that would cause their virtualenv's bin dir to not be on the search PATH?

@udim
Copy link
Member

udim commented Jan 31, 2020

@udim thanks for the heads up. My guess is that the mypy-protoc executable is not getting on PATH.

The tests did not show up in the PRs until you requested them, but I can see them here now. Can I run the test phrases myself now?

I think the phrases are limited to Apache committers sadly. Not sure if this requirement will ever go away.

@udim
Copy link
Member

udim commented Jan 31, 2020

@udim I can't trigger the jenkins jobs. Also, it's unclear to me how to run this job locally via gradle. Can you give me some advice?

Also, do you have any idea what's different about the way these two jobs are run that would cause their virtualenv's bin dir to not be on the search PATH?

I ran:

$ git fetch origin pull/10734/head:pr10734
$ git checkout pr10734
$ ./gradlew :sdks:python:sdist
...
> Task :sdks:python:sdist
setup.py:244: UserWarning: You are using Apache Beam with Python 2. New releases of Apache Beam will soon support Python 3 only.
  'You are using Apache Beam with Python 2. '
/usr/local/google/home/ehudm/src/beam/build/gradleenv/1922375555/local/lib/python2.7/site-packages/setuptools/dist.py:476: UserWarning: Normalizing '2.20.0.dev' to '2.20.0.dev0'
  normalized_version,
No handlers could be found for logger "gen_protos"
Traceback (most recent call last):
  File "setup.py", line 308, in <module>
    'mypy': generate_protos_first(mypy),
  File "/usr/local/google/home/ehudm/src/beam/build/gradleenv/1922375555/local/lib/python2.7/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
    dist.run_commands()
  File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "/usr/local/google/home/ehudm/src/beam/build/gradleenv/1922375555/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line 44, in run
    self.run_command('egg_info')
  File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "setup.py", line 232, in run
    gen_protos.generate_proto_files()
  File "/usr/local/google/home/ehudm/src/beam/sdks/python/gen_protos.py", line 314, in generate_proto_files
    protoc_gen_mypy = _find_protoc_gen_mypy()
  File "/usr/local/google/home/ehudm/src/beam/sdks/python/gen_protos.py", line 233, in _find_protoc_gen_mypy
    (fname, ', '.join(search_paths)))
RuntimeError: Could not find protoc-gen-mypy in /usr/local/google/home/ehudm/src/beam/build/gradleenv/1922375555/bin, /usr/local/google/home/ehudm/src/beam/build/gradleenv/1922375555/bin, /usr/local/google/home/ehudm/.pyenv/plugins/pyenv-virtualenv/shims, /usr/local/google/home/ehudm/.pyenv/shims, /usr/local/google/home/ehudm/.pyenv/bin, /usr/lib/google-golang/bin, /usr/local/buildtools/java/jdk/bin, /usr/local/sbin, /usr/local/bin, /usr/sbin, /usr/bin, /sbin, /bin, /google/data/ro/teams/iblaze, /google/data/ro/projects/devtools/rebaser, /usr/local/google/home/ehudm/bin

Scan of subsequent run:
https://gradle.com/s/nobazytmko36c

It fails at the sdist task.
Perhaps your shell already has this binary installed, which is why it works for you.

@chadrik
Copy link
Contributor Author

chadrik commented Feb 1, 2020

I think the phrases are limited to Apache committers sadly. Not sure if this requirement will ever go away.

Understood. I'm going to try to apply the necessary pressure at work to get CLA approved this week.

Scan of subsequent run: https://gradle.com/s/nobazytmko36c
It fails at the sdist task. Perhaps your shell already has this binary installed, which is why it works for you.

I tested this locally. I removed mypy_protobuf from the venv that I'm running gradle from, deleted the gradleenv directory, and ran ./gradlew :sdks:python:sdist and it worked fine for me locally. The gradleenv gets recreated and mypy_protobuf gets pip installed into it.

Here's the output that I get (after fixing the logging in the latest commit):

> Task :sdks:python:setupVirtualenv
Already using interpreter /Users/chad/dev/beam-tests/.venv-dev-typing/bin/python2.7
Using real prefix '/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7'
New python executable in /Users/chad/dev/beam-tests/beam-typing/build/gradleenv/1922375555/bin/python2.7
Also creating executable in /Users/chad/dev/beam-tests/beam-typing/build/gradleenv/1922375555/bin/python
Installing setuptools, pip, wheel...
done.
Looking in links: file:///Users/chad/.cache/pip/wheelhouse
Collecting tox==3.11.1
  Using cached tox-3.11.1-py2.py3-none-any.whl (76 kB)
Collecting grpcio-tools<=1.14.2,>=1.3.5
  Using cached grpcio_tools-1.14.2-cp27-cp27m-macosx_10_12_x86_64.whl (1.8 MB)
Processing /Users/chad/Library/Caches/pip/wheels/bf/c9/a3/c538d90ef17cf7823fa51fc701a7a7a910a80f6a405bf15b1a/future-0.16.0-cp27-none-any.whl
Processing /Users/chad/Library/Caches/pip/wheels/6e/95/9a/c57fed4950861038248dd0507909adc5661ffdd1887cb74ae2/mypy_protobuf-1.12-cp27-none-any.whl
Collecting six<2,>=1.0.0
  Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
Processing /Users/chad/Library/Caches/pip/wheels/66/13/60/ef107438d90e4aad6320e3424e50cfce5e16d1e9aad6d38294/filelock-3.0.12-cp27-none-any.whl
Collecting virtualenv>=14.0.0
  Using cached virtualenv-16.7.9-py2.py3-none-any.whl (3.4 MB)
Collecting pluggy<1,>=0.3.0
  Using cached pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
Collecting py<2,>=1.4.17
  Using cached py-1.8.1-py2.py3-none-any.whl (83 kB)
Collecting toml>=0.9.4
  Using cached toml-0.10.0-py2.py3-none-any.whl (25 kB)
Requirement already satisfied, skipping upgrade: setuptools>=30.0.0 in /Users/chad/dev/beam-tests/beam-typing/build/gradleenv/1922375555/lib/python2.7/site-packages (from tox==3.11.1) (44.0.0)
Collecting grpcio>=1.14.2
  Using cached grpcio-1.26.0-cp27-cp27m-macosx_10_9_x86_64.whl (2.3 MB)
Collecting protobuf>=3.5.0.post1
  Using cached protobuf-3.11.2-cp27-cp27m-macosx_10_9_x86_64.whl (1.3 MB)
Collecting importlib-metadata>=0.12; python_version < "3.8"
  Using cached importlib_metadata-1.5.0-py2.py3-none-any.whl (30 kB)
Collecting enum34>=1.0.4; python_version < "3.4"
  Using cached enum34-1.1.6-py2-none-any.whl (12 kB)
Collecting futures>=2.2.0; python_version < "3.2"
  Using cached futures-3.3.0-py2-none-any.whl (16 kB)
Collecting contextlib2; python_version < "3"
  Using cached contextlib2-0.6.0.post1-py2.py3-none-any.whl (9.8 kB)
Collecting pathlib2; python_version < "3"
  Using cached pathlib2-2.3.5-py2.py3-none-any.whl (18 kB)
Collecting zipp>=0.5
  Using cached zipp-1.1.0-py2.py3-none-any.whl (4.6 kB)
Collecting configparser>=3.5; python_version < "3"
  Using cached configparser-4.0.2-py2.py3-none-any.whl (22 kB)
Processing /Users/chad/Library/Caches/pip/wheels/91/95/75/19c98a91239878abbc7c59970abd3b4e0438a7dd5b61778335/scandir-1.10.0-cp27-cp27m-macosx_10_13_x86_64.whl
Installing collected packages: six, filelock, virtualenv, contextlib2, scandir, pathlib2, zipp, configparser, importlib-metadata, pluggy, py, toml, tox, enum34, futures, grpcio, protobuf, grpcio-tools, future, mypy-protobuf
Successfully installed configparser-4.0.2 contextlib2-0.6.0.post1 enum34-1.1.6 filelock-3.0.12 future-0.16.0 futures-3.3.0 grpcio-1.26.0 grpcio-tools-1.14.2 importlib-metadata-1.5.0 mypy-protobuf-1.12 pathlib2-2.3.5 pluggy-0.13.1 protobuf-3.11.2 py-1.8.1 scandir-1.10.0 six-1.14.0 toml-0.10.0 tox-3.11.1 virtualenv-16.7.9 zipp-1.1.0

> Task :sdks:python:sdist
INFO:gen_protos:Found protoc_gen_mypy at /Users/chad/dev/beam-tests/beam-typing/build/gradleenv/1922375555/bin/protoc-gen-mypy
...

What I noticed when looking at the full log from the gradle scan is this:

:sdks:python:setupVirtualenv UP-TO-DATE
:sdks:python:sdist FAILED

That's setupVirtualenv task is supposed to install build-requirements.txt which now contains the new mypy_protobuf requirement. If that task is cached and doesn't run then that would explain why it's not being installed. Do you think that could be the case?

@udim
Copy link
Member

udim commented Feb 4, 2020

I did a ./gradlew clean and then ./gradlew :sdks:python:sdist and it worked.

setupVirtualenv isn't cached in the Gradle sense (the output is not saved in Gradle's cache), but it is skipped if the output directory is already present. However, on Jenkins nodes the workspace should be cleaned up between runs and the gradleenv/ directories live in said workspace.

@chadrik
Copy link
Contributor Author

chadrik commented Feb 4, 2020

However, on Jenkins nodes the workspace should be cleaned up between runs and the gradleenv/ directories live in said workspace.

Then what does :sdks:python:setupVirtualenv UP-TO-DATE indicate? Your ./gradlew clean seems to confirm that this is a caching issue, so what's the next step to troubleshooting this?

@udim
Copy link
Member

udim commented Feb 4, 2020

The Jenkins logs also show that setupVirtualenv successful installed mypy-protobuf-1.12. I'll try again with the added debug output.

@udim
Copy link
Member

udim commented Feb 4, 2020

Run BigQueryIO Write Performance Test Python Batch

@udim
Copy link
Member

udim commented Feb 4, 2020

retest this please

@udim
Copy link
Member

udim commented Feb 4, 2020

Run Python 3.7 Load Tests ParDo Dataflow Batch

@chadrik
Copy link
Contributor Author

chadrik commented Feb 4, 2020

hmmm... so my new code found the executable, but protoc still complained. I wonder if it's not executable for some reason. I can't think of any explanation for why that would happen.

INFO:gen_protos:Found protoc_gen_mypy at /home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch_PR/src/build/gradleenv/1922375555/bin/protoc-gen-mypy
/home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch_PR/src/build/gradleenv/1922375555/bin/protoc-gen-mypy: program not found or is not executable

Are the normal python tox tests passing?

@udim
Copy link
Member

udim commented Feb 4, 2020

Okay so I ssh-ed into the Jenkins node and tried running the command:

-su: /home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch_PR/src/build/gradleenv/1922375555/bin/protoc-gen-mypy: /home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_: bad interpreter: No such file or directory

Basically, the file has a shebang line that's getting truncated.
I found this: pypa/virtualenv#596 (comment)

FYI the first line is this:

#!/home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch_PR/src/build/gradleenv/1922375555/bin/python2.7

@udim
Copy link
Member

udim commented Feb 4, 2020

Unless you have another solution @chadrik, I believe we'll need to shorten Jenkins job names. I think a limit of 20 characters should work.

@chadrik
Copy link
Contributor Author

chadrik commented Feb 4, 2020

Nice find. Well now we know what the two failing jobs have in common: really long names!

Shorter names would be a good change regardless. Those two job names feel a bit like a word salad right now.

How about these:

  • Python_BigQueryIO_Write_Perf
  • Python37_Dataflow_ParDo

I don't actually know what these do so I just picked out the most important sounding bits, and tried to harmonize them with our prevailing conventions. Feel free to have a go at it.

It would also be good to add a comment about this issue to the jenkins groovy code that generates these, or even add an assertion. Do you want to take this issue or would you like me to?

@udim
Copy link
Member

udim commented Feb 4, 2020

I did a lot of digging into distutils, and it seems that there is already a solution for this problem. Some packages have it, for example:

$ head /home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch_PR/src/build/gradleenv/1922375555/bin/easy_install
#!/bin/sh
'''exec' /home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch_PR/src/build/gradleenv/1922375555/bin/python2.7 "$0" "$@"
' '''

I'm not sure what's up with mypy-protobuf, but perhaps it's using another installation method?

@udim
Copy link
Member

udim commented Feb 4, 2020

As for shortened test names, please go ahead unless you think fixing mypy-protobuf would be faster.

The names should all start with beam_.
It probably makes sense to only touch Python related Jenkins jobs.
List of jobs:
https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md

@udim
Copy link
Member

udim commented Feb 4, 2020

Perhaps if we put in a wrapper script for mypy-protobuf...

@udim
Copy link
Member

udim commented Feb 5, 2020

retest this please

@udim
Copy link
Member

udim commented Feb 5, 2020

Run BigQueryIO Write Performance Test Python Batch

@chadrik
Copy link
Contributor Author

chadrik commented Feb 5, 2020

ok, almost there. there was a mistake in my setup.py. fixed it. (sorry to drag you along on this. I'm going to get the committers thing sorted out this week, hopefully today).

@udim
Copy link
Member

udim commented Feb 5, 2020

Run Python 3.7 Load Tests ParDo Dataflow Batch

@udim
Copy link
Member

udim commented Feb 5, 2020

BTW, did you mean to push changes here?

@chadrik
Copy link
Contributor Author

chadrik commented Feb 5, 2020

No, I just needed to push changes to mypy-protobuf, which I did.

@chadrik
Copy link
Contributor Author

chadrik commented Feb 5, 2020

@udim, this works but it's pip-installing mypy_protobuf from my github repo. I've opened a PR on that project to fix this, but there's no telling how long it will take them to merge it and make a new release to PyPI. Should we merge this as-is and make a new jira for fixing it when a new version of mypy_protobuf, or should we wait for the official release?

@udim
Copy link
Member

udim commented Feb 8, 2020

I'd rather wait for an official release if you don't mind

@chadrik
Copy link
Contributor Author

chadrik commented Feb 11, 2020

I'd rather wait for an official release if you don't mind

Done! Ready to test and hopefully merge.

@nipunn1313
Copy link

🎉

@udim
Copy link
Member

udim commented Feb 12, 2020

retest this please

@udim
Copy link
Member

udim commented Feb 12, 2020

Run Python 3.7 Load Tests ParDo Dataflow Batch

@udim
Copy link
Member

udim commented Feb 12, 2020

Run BigQueryIO Write Performance Test Python Batch

@chadrik
Copy link
Contributor Author

chadrik commented Feb 13, 2020

ValueError: unsupported pickle protocol: 3

@udim any idea what's going on with the python failures?

@udim
Copy link
Member

udim commented Feb 14, 2020

Not sure, looking

@udim
Copy link
Member

udim commented Feb 14, 2020

Run Python PreCommit

@chadrik
Copy link
Contributor Author

chadrik commented Feb 19, 2020

The Python test has been stuck for days.

Btw, I've submitted my CLAs to the secretary, so hopefully I'll be able to solve these problems on my own soon.

@chadrik
Copy link
Contributor Author

chadrik commented Feb 22, 2020

Run Python PreCommit

@chadrik
Copy link
Contributor Author

chadrik commented Feb 22, 2020

retest this

@chadrik
Copy link
Contributor Author

chadrik commented Feb 23, 2020

@udim I used my new committer powers to push the tests along. All checks have passed!

@udim udim merged commit 631cb81 into apache:master Feb 26, 2020
@udim
Copy link
Member

udim commented Mar 2, 2020

I just pulled the latest changes locally and I can no longer run tox or setup.py sdist.
Reopened https://issues.apache.org/jira/browse/BEAM-8979

@chadrik
Copy link
Contributor Author

chadrik commented Mar 2, 2020

Arrg! This was working! Right? Do you have any idea if something changed?

@udim
Copy link
Member

udim commented Mar 2, 2020

It works through the Gradle tasks' virtualenvs, so Jenkins is fine. See bug for workaround/solution. Perhaps using pyproject.toml is the way to go.

@chadrik
Copy link
Contributor Author

chadrik commented Mar 2, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants