Skip to content
This repository was archived by the owner on Oct 21, 2022. It is now read-only.

Better catch the error when pip install#233

Merged
liyanhui1228 merged 12 commits intoGoogleCloudPlatform:masterfrom
liyanhui1228:install_error
Jan 30, 2019
Merged

Better catch the error when pip install#233
liyanhui1228 merged 12 commits intoGoogleCloudPlatform:masterfrom
liyanhui1228:install_error

Conversation

@liyanhui1228
Copy link
Copy Markdown
Member

No description provided.

Comment thread compatibility_server/pip_checker.py Outdated
Comment thread compatibility_server/pip_checker.py Outdated
r'not install packages due to an EnvironmentError: (?P<error>.*)')
# Pattern for pip installation error related to the package being
# installed.
PIP_INSTALL_ERROR_PATTERN = re.compile(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the idea is that the only time that pip will return a non-zero error code that is package-related is when the package doesn't exist. And you think that is cleaner than the previous approach. Could you document that and maybe point to some source that shows that is true?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking that if there is a docker timeout error during the pip install process, it will also be treated as an INSTALL_ERROR in the result, which actually isn't. And there might still be some other type of error happened during the installation, probably we could restrict that install error only means the package and version doesn't exist.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docker timeout should lead to a ReadTimeout, which results in
raise PipCheckerError(...)

Are you sure that a non-zero pip install return can only be one of:

  1. the package/version not existing
  2. some sort of internal error

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't mean that a non-zero pip install returncode can only be the two cases. I was just seeing in our BigQuery data there were some results showing INSTALL_ERROR but apparently they weren't real install errors but because of docker timeout. I changed to turn the raise_on_failure which can also fix this.

@ylil93
Copy link
Copy Markdown
Contributor

ylil93 commented Jan 28, 2019

LGTM - please address Brian's comment

Copy link
Copy Markdown
Contributor

@brianquinlan brianquinlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how a docker timeout can end up returning a non-500 result. Could you explain the scenario?

Comment thread compatibility_server/pip_checker.py Outdated
stdout=True,
stderr=True,
raise_on_failure=False)
raise_on_failure=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this will work (see the comment 7 lines up).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In here if the docker was already killed, the container.exec_run will return a code 137 with no error message. And if raise_on_failure is False, the _run_command func will return 137, b''. Then in here, it will be treated as a INSTALL_ERROR, which leads to...
screenshot from 2019-01-28 17-27-35

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh...interesting. 137 means that the process ended due to sigkill (http://tldp.org/LDP/abs/html/exitcodes.html). We could probably check for that error code specifically. If we really wanted to be paranoid, when could then check to confirm that the container has already exited, but that might be a bad idea.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep and probably we could just raise an error for that case, then it will return a 500 and will not be treated as INSTALL_ERROR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I have tested that if I set the docker timeout to a very short time like 1 second, I could reproduce this.

Copy link
Copy Markdown
Contributor

@brianquinlan brianquinlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this belongs in _run_command (since this could occur for any command). So something like:

def _run_command(
   ...

    if returncode > 128 and returncode < 137:
        ...
    elif returncode and raise_on_failure:
      raise PipError(...)

Comment thread compatibility_server/pip_checker.py Outdated
command=command,
returncode=returncode)

# Checking for error caused by sigkill (128+9)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Checking for error caused by sigkill (128+9)
# Checking for cases where the command was killed by a signal.
# If a process was killed by a signal, then it's exit code will be
# 128 + <signal number>.
# If a docker container exits with a running command then it will be
# killed with SIGKILL => 128 + 9 = 137

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread compatibility_server/pip_checker.py Outdated
returncode=returncode)

# Checking for error caused by sigkill (128+9)
if returncode >= 128 and returncode <= 137:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if returncode >= 128 and returncode <= 137:
if returncode > 128 and returncode <= 137:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread compatibility_server/pip_checker.py Outdated
# Checking for error caused by sigkill (128+9)
if returncode >= 128 and returncode <= 137:
raise PipCheckerError(
error_msg="The docker container timed out before executing"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error_msg="The docker container timed out before executing"
error_msg="The command {0} was killed by the signal {1}. "

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread compatibility_server/pip_checker.py Outdated
if returncode >= 128 and returncode <= 137:
raise PipCheckerError(
error_msg="The docker container timed out before executing"
" pip command. Error msg: {}".format(output))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" pip command. Error msg: {}".format(output))
"This likely means that the Docker container timed out. '
'Error msg: {}".format(command, returncode - 128, output))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@brianquinlan
Copy link
Copy Markdown
Contributor

Looks good but shouldn't there be a test for this?


if duration > pip_checker.TIME_OUT:
raise docker.errors.APIError(message="time out",
explanation="Request time out.")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't mocking the docker container timeout behavior correctly before, which should actually return a 137 code instead of raising a docker.errors.APIError. Changing this to match the behavior and the test in line 123 is for testing the timeout error.

Copy link
Copy Markdown
Contributor

@brianquinlan brianquinlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_pip_checker.py line #130:

with patch_timeout, self.assertRaisesRegex(pip_checker.PipCheckerError, 'killed by signal 9')

Comment thread compatibility_server/pip_checker.py Outdated
# killed with SIGKILL => 128 + 9 = 137
if returncode > 128 and returncode <= 137:
raise PipCheckerError(
error_msg="The command {} was killed by the signal {}."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error_msg="The command {} was killed by the signal {}."
error_msg="The command {} was killed by the signal {}. "

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread compatibility_server/pip_checker.py Outdated
# killed with SIGKILL => 128 + 9 = 137
if returncode > 128 and returncode <= 137:
raise PipCheckerError(
error_msg="The command {} was killed by the signal {}."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error_msg="The command {} was killed by the signal {}."
error_msg="The command {} was killed by signal {}."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread compatibility_server/pip_checker.py Outdated
if returncode > 128 and returncode <= 137:
raise PipCheckerError(
error_msg="The command {} was killed by signal {}. "
"This likely means that the Docker container timed"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"This likely means that the Docker container timed"
"This likely means that the Docker container timed "

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

'self': False,
}
)
if status_type is 'self-success':
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@liyanhui1228 liyanhui1228 merged commit cb817cc into GoogleCloudPlatform:master Jan 30, 2019
@liyanhui1228 liyanhui1228 deleted the install_error branch January 30, 2019 19:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants