Better catch the error when pip install#233
Better catch the error when pip install#233liyanhui1228 merged 12 commits intoGoogleCloudPlatform:masterfrom
Conversation
| r'not install packages due to an EnvironmentError: (?P<error>.*)') | ||
| # Pattern for pip installation error related to the package being | ||
| # installed. | ||
| PIP_INSTALL_ERROR_PATTERN = re.compile( |
There was a problem hiding this comment.
If I understand correctly, the idea is that the only time that pip will return a non-zero error code that is package-related is when the package doesn't exist. And you think that is cleaner than the previous approach. Could you document that and maybe point to some source that shows that is true?
There was a problem hiding this comment.
Yeah I was thinking that if there is a docker timeout error during the pip install process, it will also be treated as an INSTALL_ERROR in the result, which actually isn't. And there might still be some other type of error happened during the installation, probably we could restrict that install error only means the package and version doesn't exist.
There was a problem hiding this comment.
A docker timeout should lead to a ReadTimeout, which results in
raise PipCheckerError(...)
Are you sure that a non-zero pip install return can only be one of:
- the package/version not existing
- some sort of internal error
There was a problem hiding this comment.
Oh I didn't mean that a non-zero pip install returncode can only be the two cases. I was just seeing in our BigQuery data there were some results showing INSTALL_ERROR but apparently they weren't real install errors but because of docker timeout. I changed to turn the raise_on_failure which can also fix this.
|
LGTM - please address Brian's comment |
brianquinlan
left a comment
There was a problem hiding this comment.
I don't understand how a docker timeout can end up returning a non-500 result. Could you explain the scenario?
| stdout=True, | ||
| stderr=True, | ||
| raise_on_failure=False) | ||
| raise_on_failure=True) |
There was a problem hiding this comment.
I don't think that this will work (see the comment 7 lines up).
There was a problem hiding this comment.
There was a problem hiding this comment.
Ahhh...interesting. 137 means that the process ended due to sigkill (http://tldp.org/LDP/abs/html/exitcodes.html). We could probably check for that error code specifically. If we really wanted to be paranoid, when could then check to confirm that the container has already exited, but that might be a bad idea.
There was a problem hiding this comment.
Yep and probably we could just raise an error for that case, then it will return a 500 and will not be treated as INSTALL_ERROR.
There was a problem hiding this comment.
And I have tested that if I set the docker timeout to a very short time like 1 second, I could reproduce this.
brianquinlan
left a comment
There was a problem hiding this comment.
Maybe this belongs in _run_command (since this could occur for any command). So something like:
def _run_command(
...
if returncode > 128 and returncode < 137:
...
elif returncode and raise_on_failure:
raise PipError(...)| command=command, | ||
| returncode=returncode) | ||
|
|
||
| # Checking for error caused by sigkill (128+9) |
There was a problem hiding this comment.
| # Checking for error caused by sigkill (128+9) | |
| # Checking for cases where the command was killed by a signal. | |
| # If a process was killed by a signal, then it's exit code will be | |
| # 128 + <signal number>. | |
| # If a docker container exits with a running command then it will be | |
| # killed with SIGKILL => 128 + 9 = 137 |
| returncode=returncode) | ||
|
|
||
| # Checking for error caused by sigkill (128+9) | ||
| if returncode >= 128 and returncode <= 137: |
There was a problem hiding this comment.
| if returncode >= 128 and returncode <= 137: | |
| if returncode > 128 and returncode <= 137: |
| # Checking for error caused by sigkill (128+9) | ||
| if returncode >= 128 and returncode <= 137: | ||
| raise PipCheckerError( | ||
| error_msg="The docker container timed out before executing" |
There was a problem hiding this comment.
| error_msg="The docker container timed out before executing" | |
| error_msg="The command {0} was killed by the signal {1}. " |
| if returncode >= 128 and returncode <= 137: | ||
| raise PipCheckerError( | ||
| error_msg="The docker container timed out before executing" | ||
| " pip command. Error msg: {}".format(output)) |
There was a problem hiding this comment.
| " pip command. Error msg: {}".format(output)) | |
| "This likely means that the Docker container timed out. ' | |
| 'Error msg: {}".format(command, returncode - 128, output)) |
|
Looks good but shouldn't there be a test for this? |
|
|
||
| if duration > pip_checker.TIME_OUT: | ||
| raise docker.errors.APIError(message="time out", | ||
| explanation="Request time out.") |
There was a problem hiding this comment.
This wasn't mocking the docker container timeout behavior correctly before, which should actually return a 137 code instead of raising a docker.errors.APIError. Changing this to match the behavior and the test in line 123 is for testing the timeout error.
brianquinlan
left a comment
There was a problem hiding this comment.
test_pip_checker.py line #130:
with patch_timeout, self.assertRaisesRegex(pip_checker.PipCheckerError, 'killed by signal 9')| # killed with SIGKILL => 128 + 9 = 137 | ||
| if returncode > 128 and returncode <= 137: | ||
| raise PipCheckerError( | ||
| error_msg="The command {} was killed by the signal {}." |
There was a problem hiding this comment.
| error_msg="The command {} was killed by the signal {}." | |
| error_msg="The command {} was killed by the signal {}. " |
| # killed with SIGKILL => 128 + 9 = 137 | ||
| if returncode > 128 and returncode <= 137: | ||
| raise PipCheckerError( | ||
| error_msg="The command {} was killed by the signal {}." |
There was a problem hiding this comment.
| error_msg="The command {} was killed by the signal {}." | |
| error_msg="The command {} was killed by signal {}." |
| if returncode > 128 and returncode <= 137: | ||
| raise PipCheckerError( | ||
| error_msg="The command {} was killed by signal {}. " | ||
| "This likely means that the Docker container timed" |
There was a problem hiding this comment.
| "This likely means that the Docker container timed" | |
| "This likely means that the Docker container timed " |
| 'self': False, | ||
| } | ||
| ) | ||
| if status_type is 'self-success': |

No description provided.