Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: MCP Server must process all transfer packages sent to it at once #911

Closed
peterVG opened this issue Sep 17, 2019 · 3 comments
Closed
Labels
Piql NHA Type: enhancement An improvement to existing functionality.
Milestone

Comments

@peterVG
Copy link
Collaborator

peterVG commented Sep 17, 2019

Please describe the problem you'd like to be solved.
Automation tools runs a cron job to poll MCP Server for the status of transfer packages it has sent to it. Each time MCP Server submits a job to Gearman to check on the status of the package even if it is backed up and working on packages sent earlier. This is a significant performance bottleneck.

Describe the solution you'd like to see implemented.
Move responsibility for throttling (or controlling rate of transfers) to the MCP Server. Its
API returns a UUID for the transfer to Automation Tools. MCPServer would then decide when to actually start processing the transfer based on processing availability. When MCPServer is too busy, it could return a ‘busy’ / retry response to Automation Tools.

Describe alternatives you've considered.

Additional context
related to PR artefactual/archivematica#1472


For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Done:

  • All PRs related to this issue are properly linked 👍
  • All PRs related to this issue have been merged 👍
  • Test plan for this issue has been implemented and passed 👍
  • Documentation regarding this issue has been written and it has been added to the release notes, if needed 👍
@helrond
Copy link

helrond commented Sep 26, 2019

Would this same throttling logic apply for transfers started via the Archivematica API but not using Automation Tools code? I'm guessing yes, since my understanding is that the Automation Tools use the API, but I wanted to check before I make an ass out of ume.

We use the Archivematica (pipeline) API to start and approve transfers, but have written our own code to do that because it gives us more visibility/control over that process. Having this logic baked into Archivematica would be really helpful in terms of removing some (not entirely effective) logical complexity in this code.

@cole
Copy link

cole commented Sep 26, 2019

@helrond that's the idea here — anything started via the API, either by automation tools or otherwise.

@sromkey sromkey added this to the 1.11 milestone Sep 26, 2019
@cole cole added Status: review The issue's code has been merged and is ready for testing/review. and removed Status: in progress Issue that is currently being worked on. labels Oct 9, 2019
@cole cole removed their assignment Oct 9, 2019
@sallain sallain added the Type: enhancement An improvement to existing functionality. label Dec 3, 2019
@sevein sevein removed the Status: review The issue's code has been merged and is ready for testing/review. label Mar 25, 2020
@sevein
Copy link
Contributor

sevein commented Mar 25, 2020

This has been tested in a few different environments, particularly at NHA.

MCPServer resource usage is not unbounded anymore. Users can adjust ARCHIVEMATICA_MCPSERVER_CONCURRENT_PACKAGES to control the number of packages to process concurrently. See the full list of options for more details (rpc threards, worker threads, concurrent packages...). A good way to measure resource usage is using our integration with Prometheus/Grafana.

@sevein sevein closed this as completed Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Piql NHA Type: enhancement An improvement to existing functionality.
Projects
None yet
Development

No branches or pull requests

6 participants