New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error checkpointing when no cache enabled #92

Closed
kylechard opened this Issue Feb 10, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@kylechard
Collaborator

kylechard commented Feb 10, 2018

We should catch the following error when running without any app caching and throw a helpful error

KeyError Traceback (most recent call last)
in ()
12 [d[i].result() for i in range(5)]
13
---> 14 checkpoint_file = dfk.checkpoint()
15 #print(checkpoint_file) # Prints the checkpoint dir
16 #To load the checkpoint from a previous run specify the runinfo/RUNID directory:

/usr/local/lib/python3.5/dist-packages/parsl/dataflow/dflow.py in checkpoint(self)
504 if self.tasks[task_id]['app_fu'].done() and
505 not self.tasks[task_id]['checkpoint']:
--> 506 hashsum = self.tasks[task_id]['hashsum']
507 if not hashsum:
508 continue

KeyError: 'hashsum'

@kylechard kylechard changed the title from Error checkpointing without config to Error checkpointing when no cache enabled Feb 10, 2018

@yadudoc yadudoc added the bug label Feb 12, 2018

@yadudoc

This comment has been minimized.

Contributor

yadudoc commented Feb 12, 2018

@kylechard Were you able to resolve what was happening here ? I had suspicions that you might have messed up you ipp kernel running different configs in the same notebook ?

@kylechard

This comment has been minimized.

Collaborator

kylechard commented Feb 12, 2018

I think this was the error when no apps had cache=True set. In anycase I think we should catch this error and show a useful error.

@yadudoc

This comment has been minimized.

Contributor

yadudoc commented Feb 14, 2018

I'm able to reproduce this.

Should we raise an warning when checkpoint is called and no apps were checkpointed ? That would notify the user of the two possible cases :

  1. There are no apps that are cache-able
  2. There are no new apps since the last checkpoint
@kylechard

This comment has been minimized.

Collaborator

kylechard commented Feb 14, 2018

Yes, I think that would be useful.

yadudoc added a commit that referenced this issue Feb 14, 2018

@yadudoc

This comment has been minimized.

Contributor

yadudoc commented Feb 14, 2018

Now the user gets a more reasonable warning from the logger:
parsl.dataflow.dflow [WARNING] No tasks checkpointed, please ensure caching is enabled

If the user has not enabled logging all of this info would not get to the screen, and the user will miss it unless they enable logging. Probably should be addressed in issue #85

@yadudoc yadudoc closed this Feb 14, 2018

yadudoc added a commit that referenced this issue Feb 23, 2018

Bumping version from 0.4.0 to 0.4.1
Several fixes included as part of this point release:
* Cobalt provider issues with job state #101
* Parsl updates config inadvertently #98
* No blocks provisioned if parallelism/blocks = 0 #97
* Checkpoint restart assumes rundir bug #95
* Logger continues after cleanup called enhancement #93
* Error checkpointing when no cache enabled  #92
* Several fixes to libsubmit.

New Providers:
* GoogleCloud
* GridEngine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment