-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TextGrid export step is very slow #807
Comments
For additional context, all utterances in the examples above should have a duration between 8 and 30 seconds. |
I have been issued the same problem with very slow export of TextGrids, when using the latest MFA version. Here is my solution for this problem (@mmcauliffe). I am not an expert in SQL, but in my opinion, the main problem is that there are two heavy SQL queries made for each aligned file (see It took about 1 minute to process my data instead of the estimated 50 hours (~1 file/s + several hours for multiprocessing initialization). Changes made:
To make it more convenient for me, I added the
I put my implementation below, just in case one would need it. My MFA call is now: construct_all_output_tiers (modified construct_output_tiers)
export_textgrids (modified)
align_corpus_cli (slightly modified)
export_files (slightly modified)
|
@dan-ya Thank you very much for your detailed reply! I wonder if there are similar problems in other parts of the code that also seem to get sudden performance drops for me, like generating MFCCs or generating alignments. Yet these are not as slow as exporting textgrids. |
Thanks for this @dan-ya ! I've added a multiprocessing version of this to 3.1.0, along with making sure it shouldn't overwhelm memory for lower speed systems (using the |
Thank you very much, @mmcauliffe, for taking care of it! |
Debugging checklist
[X] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there?
[X] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of
mfa version
?[X] Have you tried rerunning the command with the
--clean
flag?Describe the issue
I am running
mfa align
on a series of datasets containing about 500k utterances using 64 cores in parallel. While doing so, some steps like alignment passes become very slow to the point of not being able to estimate it/s and suddenly increasing by 5000 in one go every 20~30 minutes.Apart from that, TextGrid exporting is also surprisingly slow, reaching 0~1 it/s. For example, in one of my datasets this step alone has been running for 2 days and 8 hours, and it is estimated to take 9~10 days more! In another dataset of size 1.7M utterances the approximate it/s rounds to zero.
Also, I'm not sure if this helps, but I have noticed similar issues with
mfa g2p
. In this case, some runs became completely bottlenecked, down to not being able to estimate it/s and expecting multiple days of work, then sometimes suddenly the it/s would spike to multiple thousands and finish immediately.Sounds like some kind of bottleneck, possibly related to multiprocessing or database access.
For Reproducing your issue
Please fill out the following:
Desktop (please complete the following information):
Additional context
Running in a DGX A100, 128 CPUs (256 cores), 2 TB RAM.
The text was updated successfully, but these errors were encountered: