Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-coding with covariates message: re-running unfinished contigs... indefinitely? #5

Open
hbeale opened this issue Jun 22, 2022 · 5 comments

Comments

@hbeale
Copy link

hbeale commented Jun 22, 2022

Hi - awesome program. I ran MutEnricher with docker without covariates successfully. Now I've added covariates, and it's been running for over a week without completing on 10 processors with 120GB of RAM. I'm scanning 18000 ~300 base regions in 75 samples. Can you suggest how I can tell whether it's in a loop or making progress? Thanks!

Messages from the last week:

re-running unfinished contigs: ['chr1', 'chr2']
  chr2 done.
  chr1 done.
  re-running unfinished contigs: ['chr1', 'chr2']
@asoltis
Copy link
Owner

asoltis commented Jun 23, 2022

Hello,

Thank you for using the tool. Either the code is getting stuck in a loop during the affinity propagation step because of non-convergence (perhaps trying self similarity parameters that don't converge and getting stuck) or an error is being thrown in the multiprocessing that is "locking" the process up instead of exiting. Does it keep printing out these same procedural commands, or is it just stuck on the current iteration? Are there any oddities in the covariates files for these chromosomes that may cause errors, i.e. do you see real valued similarities being produced in the temp files? The amount of data you are using should be able to finish in ~10-15 minutes.

If possible, you could share your covariates input files to help with debugging. You could also do a test run considering only these chromosomes and pay attention to the self similarity parameters being selected for the re-runs - if it keeps testing the same values, then there may be a bug in the restart settings (I have not run into such issues across many tests, but it is possible).

@hbeale
Copy link
Author

hbeale commented Jun 23, 2022

Thanks for the reply. It kept printing the same commands. The data in the similarities file looks real to me, but I'm not sure I'd know what to look for. Here's the top of the chr1 similarities file:

1 2 -1.02
1 3 -0.0475
1 4 -0.387
1 5 -0.677
1 6 -1.15
1 7 -0.0399
1 8 -0.0746
1 9 -0.799
1 10 -0.618
1 11 -0.511

and the top of the chr20 similarities file:

1 2 -0.00185
1 3 -0.00517
1 4 -0.019
1 5 -0.00188
1 6 -0.00879
1 7 -0.0977
1 8 -0.039
1 9 -0.121
1 10 -0.029
1 11 -0.372

I'm not sure how I'd pay attention to the self similarity parameters being selected.

I re-ran the process skipping chr 1 and 2, and it ran without error.

Attached is the covariates file, which I created with get_region_covariates.py.
hg19_gb_V40lift37_200bases_up_covariates.txt

Thanks again for your help!

@asoltis
Copy link
Owner

asoltis commented Jun 23, 2022

Thank you for the file - I can take a look at it and see if I can reproduce the errors and spot issues.

In the meantime, you can check the self similarity parameter used by looking at the summary.txt files in the apcluster_regions folder for the relevant chromosomes. The value is indicated by the "Preferences: " field, e.g.:

maxits=1000
convits=50
dampfact=0.9
number of data points: 2137
Preferences: -0.193000

the self-similarity parameter used here is -0.193. When a chromosome run does not converge, the code selects a new value close to, but different from, the prior value; it may be getting stuck picking the same value over and over. Note that you would have to monitor the progress and record the values manually each time the code says it is re-running (the file value are overwritten when a new iteration is started).

Something else you can try for the time being, and which may actually fix the issue and be overall simpler, is adjusting the affinity propagation iteration parameters (--ap-iters and/or --ap-convits). If you increase the total iterations and/or decrease the required convergence iterations it may help the algorithm converge as-is and complete.

@asoltis
Copy link
Owner

asoltis commented Jun 24, 2022

Update:

I was able to find a solution to the stalled chromosomes. It seems affinity propagation is getting stuck in oscillating cycles across multiple values of the self similarity parameter for chr1 and chr2. To combat this, AP has a damping factor that can be adjusted - in the code, this is fixed at 0.9, but bumping this up to 0.95 enabled convergence for all chromosomes (the overall run finished in ~15 minutes). Below I'm attaching the AP clustering outputs from this that you can use with your samples (i.e. with the precomputed cluster capabilities in the code):

hg19_gb_V40lift37_200bases_up_covariates_apcluster_regions.zip

Unfortunately, adjusting the damping factor is not a current code option, so you would have to adjust the actual python code to do so. I can add this as an option in a subsequent release. This testing also pointed me to a minor bug in the re-selection of the self similarity parameter that I can update as well. Hopefully the above results can address your immediate needs in the meantime.

@hbeale
Copy link
Author

hbeale commented Jun 24, 2022

That's brilliant, thank you so much for the quick and thorough help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants