New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: "cleanup names" produces too much output #1118
Comments
@jrwdunham I have seen this too. In case it's useful - For a single object nested inside two folders, we have 15 lines of logging in the Transfer and Ingest records, respectively. Working with another client at the moment, we are considering using the |
A related issue was causing "characterize and extract metadata" job to hang indefinitely. The FITS metadata extracted would be 2,1 mb for a particular transfer and the MCPServer log would record a OperationalError: (2006, 'MySQL server has gone away') without failing. When setting max_allowed_packet = 32M and restarting services the transfer went through without a hitch. |
Related to archivematica/Issues#956 |
I recently analysed the output changes made for 1.9 amount of data for 10,116 files (all files renamed): mysql> select LENGTH(stdOut) from Tasks
where taskUUID
like "c3fd658f-0ae8-469a-8e40-c20c1ad5af1f";
Total: 1.7MB And stderr: mysql> select LENGTH(stdError) from Tasks
where taskUUID
like "c3fd658f-0ae8-469a-8e40-c20c1ad5af1f";
Total: 11.3MB 1.10 same data-set: mysql> select LENGTH(stdOut) from Tasks where taskUUID like "d1195897-2230-4242-a0e1-35584eb7b607";
+----------------+
| LENGTH(stdOut) |
+----------------+
| 3779475 |
+----------------+
1 row in set (0.00 sec) Total: 3.7MB Stderr: NB. Stderr was removed from this microservice in 1.10. @sromkey @sevein given the reduction in verbosity with 1.10 noted above I'd like to propose we close this ticket in favor of related tickets (e.g. increasing SQL limits, or disabling FITS by default) and then creating more specific issues around high-volume output microservices as those issues arise. This ticket could sit open for a while unless there's a specific target we'd like to reach with it. Does that sound okay? |
That sounds like a good plan! |
@sevein you? |
Sounds good. |
Problem
The "cleanup names" micro-service runs once per package and will print at least one line to stdout and stderr for each file in the package. For very large transfers/SIPs, this can mean many MBs of output that must be stored in the Tasks table. This leads to a bloated Tasks table and, worse, can lead to dropped connections to the MySQL server, which Archivematica's MCP Server does not handle gracefully: the entire daemon process exits. This causes all processing to halt and the user must restart the Archivematica services and restart the transfer.
Workaround
The
max_allowed_packet
setting in MySQL is 1MB by default. Setting a higher value for themax_allowed_packet
variable in the mysql config file will provide an immediate work around to this issue. Based on tests with a transfer containing 3174 files, the maximum packet created was just under 6MB. Assuming a desired transfer size of 21,000, it is conceivable that the current version of Archivematica could attempt to construct a packet that is up to 7 times larger (35MB). Tests were run successfully with the value set to 32MB. It is probably safe to set it as high as 128MB, but it may be better to also look at some of the longer term solutions listed below, and stick with a value like 32MB or 64MB.The steps we followed to implement this workaround were:
Stop services
open (as root) the mysql config file at /etc/my.cnf.d/server.conf for editing (we used vi).
add the following entries to the section labelled [mariadb]:
Note that only the last variable (
max_allowed_packet
) has to be added. All of the others were mysql tuning parameters recommended by mysqltuner.pl (https://github.com/major/MySQLTuner-perl/). These additional changes did not significantly affect our test results.restart services
Longer term solutions
The text was updated successfully, but these errors were encountered: