Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APScheduler main loop throws exception IOError 514 #45

Closed
agronholm opened this issue Dec 26, 2013 · 16 comments
Closed

APScheduler main loop throws exception IOError 514 #45

agronholm opened this issue Dec 26, 2013 · 16 comments

Comments

@agronholm
Copy link
Owner

Originally reported by: maoz_guttman (Bitbucket: maoz_guttman, GitHub: Unknown)


Hi,

I hit the "APScheduler main loop throws exception IOError 514" as described in issue #18

It is not easy to reproduce it since it is rarely happens and you have to run APScheduler for a long time with a lot of jobs, but it is easy to fix.

  1. time.sleep API is raising an "IOError: [Errno 514] Unknown error 514" exception rarely. You might have to call time.sleep API thousands of times in order to reproduce it. You can read about it here.
  2. threading._Event.wait API is calling that time.sleep API when the timeout arguement is not None.
  3. scheduler.py --> _main_loop function is calling threading._Event.wait API.

A simple way to workaround it is by wrapping "self._wakeup.wait(wait_seconds)" in scheduler.py --> _main_loop function in a try-except block. Something like:

#!python
def _main_loop(self):
  ...
  try:
    self._wakeup.wait(wait_seconds)
  except IOError as exception:
    pass
  ...

Are you willing to do that fix?

Thanks,

Maoz


@agronholm
Copy link
Owner Author

Original comment by Alex Grönholm (Bitbucket: agronholm, GitHub: agronholm):


What Linux kernel version are you running?

@agronholm
Copy link
Owner Author

Original comment by maoz_guttman (Bitbucket: maoz_guttman, GitHub: Unknown):


#!tcsh
% uname --kernel-release
2.6.16.60-0.58.1.3835.0.PTF.638363-smp

@agronholm
Copy link
Owner Author

Original comment by Alex Grönholm (Bitbucket: agronholm, GitHub: agronholm):


Wow, that is ancient. Can you verify that this is still a problem on present day kernels?

@agronholm
Copy link
Owner Author

Original comment by Alex Grönholm (Bitbucket: agronholm, GitHub: agronholm):


From what I found out, this seems to be a problem when the select() is interrupted by a signal. That might be a way to reproduce the issue more quickly.

@agronholm
Copy link
Owner Author

Original comment by maoz_guttman (Bitbucket: maoz_guttman, GitHub: Unknown):


In my Python script I am calling time.sleep API in few places and I had to wrap it in a try-except block since I hit that IOError 514 exception

@agronholm
Copy link
Owner Author

Original comment by Alex Grönholm (Bitbucket: agronholm, GitHub: agronholm):


Yes but can you reproduce this problem by interrupting time.sleep() with a signal?

@agronholm
Copy link
Owner Author

Original comment by maoz_guttman (Bitbucket: maoz_guttman, GitHub: Unknown):


Didn't try since the try-except block "workaround" is good enough

@agronholm
Copy link
Owner Author

Original comment by maoz_guttman (Bitbucket: maoz_guttman, GitHub: Unknown):


I will try next week to send a Unix signal during time.sleep() and will update you later.

@agronholm
Copy link
Owner Author

Original comment by Alex Grönholm (Bitbucket: agronholm, GitHub: agronholm):


Good, because if this is only a bug in old Linux kernels, I don't want to add such a workaround.

@agronholm
Copy link
Owner Author

Original comment by maoz_guttman (Bitbucket: maoz_guttman, GitHub: Unknown):


Update: I am trying to reproduce it by calling time.sleep API but without success (yet)

@agronholm
Copy link
Owner Author

Original comment by maoz_guttman (Bitbucket: maoz_guttman, GitHub: Unknown):


Hi,

I was running time.sleep in a dummy script (using threads) for more than a week and have not yet hit the "IOError 514 exception".
I know that "IOError 514 exception" can happen since it happened again in the production tool/environment inside the APScheduler package.

I will be grateful if you will patch APScheduler as I described at the beginning of this thread.

Thanks,
Maoz

@agronholm
Copy link
Owner Author

Original comment by Alex Grönholm (Bitbucket: agronholm, GitHub: agronholm):


But did you try interrupting the sleep with signals?

@agronholm
Copy link
Owner Author

Original comment by maoz_guttman (Bitbucket: maoz_guttman, GitHub: Unknown):


Yes. I did it in a naïve way.
I ran a Python program that just has: time.sleep(100).
Then I ran kill from Unix prompt on it with various signals (e.g. SIGTERM, etc.) but the "IOError 514 exception" did not happen.

Is that what you meant by interrupting the sleep with signals?

@agronholm
Copy link
Owner Author

Original comment by Alex Grönholm (Bitbucket: agronholm, GitHub: agronholm):


Yes. I wish there was a way to reproduce the issue.
Is it possible for you to try running your software on a more recent Linux just to see if the problem occurs there too? I'd like to get to the bottom of this before I make a decision. I'm very reluctant to ignore exceptions if I don't understand when and why they are happening. I'm not rejecting your proposal, but I want to be clear about why it's needed.

@agronholm
Copy link
Owner Author

Original comment by Alex Grönholm (Bitbucket: agronholm, GitHub: agronholm):


I decided to include this fix in 2.1.2. Thank you.

@agronholm
Copy link
Owner Author

Original comment by maoz_guttman (Bitbucket: maoz_guttman, GitHub: Unknown):


tnx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant