Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Phase has an issue where not all jobs come through and an infinite wait loop is started #18

Closed
6 of 7 tasks
andygello555 opened this issue Feb 23, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@andygello555
Copy link
Owner

andygello555 commented Feb 23, 2023

  • Add a timeout to the Scout procedure that will forcibly stop it after a certain amount of time
    • ScoutTimeout constant + update README
    • Start a timer at the start of the Scout procedure that will throw a panic if the time is reached
    • Update the deferred error email sender to also recover from any panics, as well as stop the timer if it hasn't been done already
  • Add a timeout to the producers in the Update Phase that will drop the current batch of jobs if there have been no finished jobs for a while
  • Find out why the bug is occurring
    • Race test the Update Phase

Monday.com Item ID: #4034318492

@andygello555 andygello555 self-assigned this Feb 23, 2023
@andygello555 andygello555 added bug Something isn't working and removed bug Something isn't working labels Feb 23, 2023
andygello555 added a commit that referenced this issue Feb 23, 2023
- Added a timeout timer to the Scout procedure that will cause a panic if the timeout is reached (23/02/2023 - 11:30:28)
- Added the ScoutTimeout constant to Scrape.Constants as well as adding documentation for this to the README (23/02/2023 - 11:31:32)
- The deferred function that is used to send Error emails in the Scout procedure is also now checking for panics that occur (most likely from the timeout timer) (23/02/2023 - 11:33:53)
@andygello555
Copy link
Owner Author

andygello555 commented Feb 23, 2023

Can't find bug at the moment. Don't have enough time. The timeout for the Scout procedure should hopefully stop the Scout procedure then machinery will restart/resume the procedure for that day, if this happens again.

andygello555 added a commit that referenced this issue Feb 23, 2023
- Added a uuid.UUID field to both updateDeveloperJob and updateDeveloperResult for a sanity checking bug #18 (23/02/2023 - 13:09:39)
- This now means that the finishedJobs variable in UpdatePhase is a channel of uuid.UUID instead of ints (23/02/2023 - 13:10:23)
- queueDeveloperRange now returns a set of uuid.UUIDs of the queued updateDeveloperJobs, this is then used by the producer to tick off each finished job that comes in from the consumer rather than checking the cardinality against high-low (23/02/2023 - 13:13:40)
- Hopefully, this will mean that if any jobs aren't queued up by queueDeveloperRange will not be taken into account (can't happen anyway but sanity checks) (23/02/2023 - 13:19:23)
@andygello555 andygello555 reopened this Apr 5, 2023
andygello555 added a commit that referenced this issue Apr 17, 2023
- Updated twitter.ClientWrapper struct and methods to better synchronise ClientWrapper.RateLimits and ClientWrapper.TweetCap (17/04/2023 - 12:05:17)
- Added the twitter.ClientWrapper.RateLimit method which returns the latest rate limit for the given BindingType. This was in order to make accessing the sync.Map that holds the rate limits a bit easier (17/04/2023 - 12:06:05)
- Access token that is held in reddit.Client can now only be accessed and set using methods. This is because there is now a RWMutex that manages synchronisation for it (17/04/2023 - 12:06:58)
- Changed all accesses to ClientWrapper.RateLimits in update.go to instead use the ClientWrapper.RateLimit method (17/04/2023 - 12:07:56)
- Updated gapi to v1.0.1 to consolidate synchronisation fixes (17/04/2023 - 12:10:15)
andygello555 added a commit that referenced this issue Apr 17, 2023
- Producers in the Update phase will now drop the current batch of jobs if they have not yet recieved a finished job in a while (17/04/2023 - 13:30:21)
- The above behaviour is controlled by 2 new scrape constants: UpdateProducerFinishedJobTimeout and UpdateProducerFinishedJobMaxTimeouts (17/04/2023 - 13:30:50)
- These constants + explanations have been added to the README (17/04/2023 - 13:31:17)
- Some scrape constants in the README were missing a default value, this has now been resolved (17/04/2023 - 13:31:43)
@andygello555
Copy link
Owner Author

I think the bug was because of the creation and closure of the finished jobs queue within the producer. I've made it so that the finished jobs queue is a fixed length (len(unscrapedDevelopers)) in #2 and the producer will either keep dequeuing or waiting until all jobs are seen.

I also added a maximum number of possible waits in a row. The batch of jobs will be dropped if this is exceeded.

andygello555 added a commit that referenced this issue Apr 19, 2023
- Reduced the amount of logging that the Reddit API client does (19/04/2023 - 12:25:05)
- Synchronisation bug within SetTweetCap where TweetCapMutex was being acquired twice (19/04/2023 - 12:41:23)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant