Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exiting with unfinished jobs #30

Closed
kldavis4 opened this issue Oct 7, 2017 · 8 comments
Closed

Exiting with unfinished jobs #30

kldavis4 opened this issue Oct 7, 2017 · 8 comments

Comments

@kldavis4
Copy link

kldavis4 commented Oct 7, 2017

I wrote an app which parses a large xml file with around 17M items of interest. These are added to a queue and using batch processing, written to a web service. When parsing is complete, the app exits (status code 0) with about half of the items are left unwritten. Any idea what might be going on?

@leanderlee
Copy link
Member

Node will exit if it does not have any callbacks waiting. Try hooking an event to drain or put a callback that waits for the last event to finish to prevent the process from exiting.

@kldavis4
Copy link
Author

kldavis4 commented Oct 8, 2017

Awesome, thanks for your help.

@kldavis4
Copy link
Author

kldavis4 commented Oct 9, 2017

I attached a listener for the drain event and it is firing only once, just before exiting. There are still 11.5M tasks remaining at that point. What is the best way to see how many remaining tasks are queued? My intuition is that tasks are getting lost somehow, as I am not using a persistent store, and I would expect memory issues with that many queued tasks.

@leanderlee
Copy link
Member

You can try calling queue.getStats() to see the total. It would be a bug if it calls drain before it's completed all of the tasks unless tasks were getting completed faster than they are being pushed on. Do you have some code to reproduce this?

@leanderlee leanderlee reopened this Oct 9, 2017
@kldavis4
Copy link
Author

kldavis4 commented Oct 9, 2017

Yeah, I am calling getStats on that drain event, but that is telling me how many have been completed, not how many are remaining, right? Any tips on getting better visibility into queue state would help a lot.

I do have code which I can share, but it currently involves parsing the english wikipedia article dump and posting it to elasticsearch. I can try to reduce it to something a bit more manageable, but it will take some time.

@leanderlee
Copy link
Member

One way you can get more visibility is by passing in your own store (or modifying the memory store) and adding a counter. You really only need to implement 5 functions or stub them out to see if it's working properly.

https://github.com/diamondio/better-queue#custom-store

@kldavis4
Copy link
Author

kldavis4 commented Oct 9, 2017

Ok, great. Let me give that a go and I'll report what I find.

@kldavis4
Copy link
Author

kldavis4 commented Oct 9, 2017

Ok, mystery solved.. My task object had a non-unique id field derived from my input and I didn't explicitly configure a unique id so a large number of tasks were getting merged. Thanks for your help in troubleshooting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants