Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddel: AttributeError: type object 'HttpError' has no attribute 'resp' #133

Closed
de-code opened this issue Oct 22, 2018 · 8 comments
Closed

Comments

@de-code
Copy link

de-code commented Oct 22, 2018

I've created too many jobs "by accident" (or rather I hoped it would re-use compute engines). When I tried to delete all of them using: ddel --provider google-v2 --project my-project-name --jobs '*'

I am getting the following exception at some point:

Traceback (most recent call last):
  File "/path/to/venv/bin/ddel", line 11, in <module>
    sys.exit(main())
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/commands/ddel.py", line 137, in main
    create_time_min=create_time)
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/commands/ddel.py", line 184, in ddel_tasks
    user_ids, job_ids, task_ids, labels, create_time_min, create_time_max)
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/providers/google_v2.py",line 1069, in delete_jobs
    tasks)
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/providers/google_base.py", line 445, in cancel
    batch_fn, cancel_fn, ops[first_op:first_op + max_batch])
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/providers/google_base.py", line 409, in _cancel_batch
    batch.execute()
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/providers/google_v2.py",line 400, in execute
    self._response_handler(request_id, response, exception)
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/providers/google_base.py", line 383, in handle_cancel_response
    msg = 'error %s: %s' % (exception.resp.status, exception.resp.reason)
AttributeError: type object 'HttpError' has no attribute 'resp'

It could be that there are so many tasks to delete. HttpError should have the resp set in the constructor, not sure why it hasn't in that case. Maybe it's a different object (although the full classname was googleapiclient.errors.HttpError).

I got around it by putting a try/catch google_base, something like (which is obviously a workaround, not a proper solution, but enough to get everything deleted - which took a while):

      try:
        msg = 'error %s: %s' % (exception.resp.status, exception.resp.reason)
        if exception.resp.status == FAILED_PRECONDITION_CODE:
          detail = json.loads(exception.content)
          status = detail.get('error', {}).get('status')
          if status == FAILED_PRECONDITION_STATUS:
            msg = 'Not running'
      except AttributeError:
        msg = 'error %s' % exception
@mbookman
Copy link
Contributor

Thanks for reporting this @de-code!

We can certainly work around this, although I'd like to be able reproduce it or otherwise reason about why you are seeing this. We have not seen this particular error in our end-to-end tests that call ddel and looking at the HttpError constructor, the "resp" member has been there for the last 8 years.

Are you able to reproduce this error consistently? Can you dump out anything further on this object using repr or inspect?

@de-code
Copy link
Author

de-code commented Oct 23, 2018

I have since been able to cancel all tasks and I am not sure I feel brave enough to get into the same situation again (I am a bit worried about the cost - unless I can limit the compute engines to a very small number).

I can give you more information of what lead to it:

I created a tasks.tsv with around 5700 entries and fired that off. What I haven't reported is another issue, that it failed and existed (with the --wait) but left the tasks running / pending. I can raise it but don't have more information than that.

All of the tasks are very short lived (probably not the intended use-case). Therefore the overhead of creating a compute engine is much greater. If however it re-used the VM and just run docker exec or run many times the overhead would be much lower. The UI cloud console showed around 4/5 pages of compute engines running and being started. First I deleted them there, which probably wasn't a good move. Then I went to the Pipelines page which only showed 32 pending tasks which I cancelled there as well. But it was still creating new compute engines. That is when I tried to cancel all of the tasks via ddel (eventually successfully). It's possible that it was caused by deleting the VMs or because the tasks are short lived.

@slagelwa
Copy link

slagelwa commented Nov 7, 2018

Also working with a dsub user that just showed me the same error. He's using the latest version of sub and has a tasks file with thousands of entries. The only difference is that they aren't deleting using a '*', but they are using a job id. The error is very similar:

xxxxxx@bxxxxx:~$ ddel --provider google-v2 --project phs-207015 --jobs 'bg-noise--xxxxx--181106-032254-56'
Delete running jobs:
user:
set(['xxxxxxx'])

job-id:
['bg-noise--xxxxxxxx--181106-032254-56']

Found 2544 tasks to delete.
Traceback (most recent call last):
File "/usr/local/bin/ddel", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/dsub/commands/ddel.py", line 137, in main
create_time_min=create_time)
File "/usr/local/lib/python2.7/dist-packages/dsub/commands/ddel.py", line 184, in ddel_tasks
user_ids, job_ids, task_ids, labels, create_time_min, create_time_max)
File "/usr/local/lib/python2.7/dist-packages/dsub/providers/google_v2.py", line 1093, in delete_jobs
tasks)
File "/usr/local/lib/python2.7/dist-packages/dsub/providers/google_base.py", line 445, in cancel
batch_fn, cancel_fn, ops[first_op:first_op + max_batch])
File "/usr/local/lib/python2.7/dist-packages/dsub/providers/google_base.py", line 409, in _cancel_batch
batch.execute()
File "/usr/local/lib/python2.7/dist-packages/dsub/providers/google_v2.py", line 412, in execute
self._response_handler(request_id, response, exception)
File "/usr/local/lib/python2.7/dist-packages/dsub/providers/google_base.py", line 383, in handle_cancel_response
msg = 'error %s: %s' % (exception.resp.status, exception.resp.reason)
AttributeError: type object 'HttpError' has no attribute 'resp'

@mbookman
Copy link
Contributor

mbookman commented Nov 7, 2018

Thanks for reporting this Joe.
Would you be able to send me the output for "pip list" from their machine? I'm interested to see if there is a package versioning issue we should understand better.

In the meantime, you can drop in the code change suggested above to handle the original exception more cleanly so that the user can delete their tasks.

@slagelwa
Copy link

slagelwa commented Nov 7, 2018

Package                  Version
------------------------ ------------------
asn1crypto               0.24.0
avro                     1.8.2
awscli                   1.15.62
boto                     2.49.0
botocore                 1.10.61
CacheControl             0.11.7
cachetools               2.1.0
certifi                  2018.1.18
chardet                  3.0.4
colorama                 0.3.9
crcmod                   1.7
cryptography             2.1.4
cwltool                  1.0.20180302231433
docutils                 0.14
dsub                     0.2.2
enum34                   1.1.6
future                   0.16.0
futures                  3.2.0
gax-google-logging-v2    0.8.3
gax-google-pubsub-v1     0.8.3
gcloud                   0.18.3
google-api-python-client 1.7.4
google-auth              1.5.0
google-auth-httplib2     0.0.3
google-compute-engine    2.8.3
google-gax               0.12.5
googleapis-common-protos 1.5.3
grpc-google-logging-v2   0.8.1
grpc-google-pubsub-v1    0.8.1
grpcio                   1.15.0
html5lib                 0.999999999
httplib2                 0.11.3
idna                     2.6
ipaddress                1.0.17
isodate                  0.6.0
jmespath                 0.9.3
keyring                  10.6.0
keyrings.alt             3.0
lockfile                 0.12.2
mistune                  0.8.3
oauth2client             4.1.2
parameterized            0.6.1
pip                      10.0.1
ply                      3.8
protobuf                 3.6.1
pyasn1                   0.4.3
pyasn1-modules           0.2.2
pycrypto                 2.6.1
pygobject                3.26.1
pyOpenSSL                17.5.0
pyparsing                2.2.0
python-dateutil          2.7.3
pytz                     2018.5
pyxdg                    0.25
PyYAML                   3.12
rdflib                   4.2.1
rdflib-jsonld            0.4.0
requests                 2.18.4
retrying                 1.3.3
rsa                      3.4.2
ruamel.yaml              0.15.34
s3transfer               0.1.13
schema-salad             2.6.20171201034858
SecretStorage            2.3.1
setuptools               39.0.1
shellescape              3.4.1
six                      1.11.0
SPARQLWrapper            1.7.6
tabulate                 0.8.2
typing                   3.6.2
uritemplate              3.0.0
urllib3                  1.22
webencodings             0.5
wheel                    0.30.0
You are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

@mbookman
Copy link
Contributor

mbookman commented Nov 9, 2018

Thanks Joe.

One of our engineers was able to reproduce the problem.
It looks like it has always been there for the google_v2 provider and is a 1-line (1-character) change:

Line 415 change:

exception = sys.exc_info()[0]

to:

exception = sys.exc_info()[1]

We'll have the fix included in the next release.

@slagelwa
Copy link

slagelwa commented Nov 9, 2018 via email

@mbookman
Copy link
Contributor

This was fixed in release 0.2.3 (#136)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants