Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak of conda-forge slim 4.0.1 #404

Closed
bguo068 opened this issue Nov 7, 2023 · 5 comments
Closed

Potential memory leak of conda-forge slim 4.0.1 #404

bguo068 opened this issue Nov 7, 2023 · 5 comments

Comments

@bguo068
Copy link

bguo068 commented Nov 7, 2023

I am trying to run forward simulation with SLiM 4.0.1. When I am testing on desktop (windows11), the following runs very well. But when I run it on Linux server via slim command line (installed from conda-forge), many of the simulations fail due to lack of memory even I allocate 50G RAM.

Typical error message:

WARNING (main()): memory usage of 40987 MB is dangerously close to the limit of 40960 MB 
reported by the operating system.  This SLiM process may soon be killed by the operating system
 for exceeding the memory limit.  You might raise the per-process memory limit, or modify your 
model to decrease memory usage.  You can turn off this memory check with the '-x' 
command-line option.  (Limit exceeded at end of tick 152.)

_InstantiateSLiMObjectsFromTables tsk_treeseq_init(): Out of memory. (TSK_ERR_NO_MEMORY)

Error on script line 116, character 7:

         sim.readFromPopulationFile(paste("tmp_slim_state_single_pop_", outid, ".trees", sep=''));
             ^^^^^^^^^^^^^^^^^^^^^^

My code below involves with many restarts because I want to condition on the establishment of selection. But I do not expect restart will cause a lot of amounts of memory increase. Could it be a bug causing memory leak?

Sorry for not providing of a minimal code snippet to reproduce the potential bug as I want to provide the complete script I am using.

The error should be reproducible by running slim xxx_script.slim given the conda-forge slim version

 $ slim --version
SLiM version 4.0.1, built Nov  7 2022 13:33:54
Git commit SHA-1: unknown (built from a non-Git source archive)
function (NULL)set_default(string k, lifs v)
{
	if (!exists(k))
		defineConstant(k, v);
	catn(c("Parameter", k, v), sep='\t');
	return NULL;
}

initialize()
{
	set_default("L", 750000); // chromosome length in bp
	set_default("nchrs", 14); // chromosome length in bp	
	set_default("selpos", asInteger(L / 3)); // selection position in bp
	set_default("num_origins", 2); //how many genomes contains the selected mutation when selection starts.
	set_default("N", 10000); // ancient effective population size
	set_default("h", 0.5); // dominant coefficient
	set_default("s", 0.3); // selection coefficient
	set_default("g_sel_start", 80); // time of selected mutation being introduced (generations ago --BACKWARD)
	set_default("r", 6.67e-7); // recombinantion rate
	set_default("outid", 1); // idx
	set_default("max_restart", 10000); // max number of restart
	set_default("sim_relatedness", F); // whether simulate high relatedness
	set_default("sim_relatedness_power", 0.0); // parameter1 for mapping relatedness to probability
	set_default("sim_relatedness_delta", 0.01); // parameter2 for mapping relatedness to probability
	set_default("sim_relatedness_g", 40); // generations ago to start simulate inbreeding via mating to relatives
	set_default("N0", 1000); // the effective population size at sampling time
	set_default("g_ne_change_start", 200); // Ne change time (generations ago -- BACKWARD)
	set_default("slim_total_generations", // time of simulation ended -- forward
		max(g_sel_start, g_ne_change_start + 1, sim_relatedness_g));
	initializeSLiMOptions(keepPedigrees=T);
	initializeTreeSeq();
	initializeMutationRate(0.0);
	initializeMutationType("m1", 0.5, "f", 0.0); // neutral
	initializeMutationType("m2", h, "f", s); // selected
	initializeGenomicElementType("g1", m1, 1);
	
	// whole genome length
	initializeGenomicElement(g1, 0, L * nchrs - 1);
	
	// rate maps
	ends = rep(0, 2 * nchrs - 1);
	rates = rep(r, 2 * nchrs - 1);
	for (i in 1:nchrs)
	{
		ends[i * 2 - 2] = L * i - 1; // Note: Eidos indexing is from 0 not from zeros
	}
	for (i in 1:(nchrs - 1))
	{
		rates[i * 2 - 1] = 0.5;
		ends[i * 2 - 1] = L * i;
	}
	initializeRecombinationRate(rates, ends);
	
	// define global
	defineGlobal("restart_counter", 1);
}

s0 9999:10000 mateChoice()
{
	rel = individual.relatedness(sourceSubpop.individuals);
	return weights * (rel^sim_relatedness_power + sim_relatedness_delta);
}

1 early()
{
	sim.addSubpop("p1", N);
	if (sim_relatedness)
	{
		community.rescheduleScriptBlock(s0, slim_total_generations - sim_relatedness_g + 1);
	}
	community.rescheduleScriptBlock(s1, slim_total_generations - g_ne_change_start + 1);
	community.rescheduleScriptBlock(s2, slim_total_generations - g_sel_start - 1, slim_total_generations); // minus 1 so that it allows the s2 code block the save the state
	community.rescheduleScriptBlock(s3, slim_total_generations + 1, slim_total_generations + 1);
	print(slim_total_generations);
}

// control populatio size
s1 300: early()
{
	t = slim_total_generations - sim.cycle; // generation ago
	Nt = (N / N0)^(t / g_ne_change_start) * N0; // calculate Nt
	p1.setSubpopulationSize(asInteger(Nt)); // set new population size
}

// condition on selection establishment (not lost)
s2 450: late()
{
	if (sim.cycle == slim_total_generations - g_sel_start - 1 & s != 0.0)
	{
		sim.treeSeqOutput(paste("tmp_slim_state_single_pop_", outid, ".trees", sep=''));
		print(c('saved state:', paste("tmp_slim_state_single_pop_", outid, ".trees", sep='')));
		for (i in 1:nchrs){
			sample(p1.genomes, num_origins).addNewDrawnMutation(m2, selpos + (i-1) * L);
		}
	}
	else if (sim.cycle >= slim_total_generations - g_sel_start & s != 0)
	{
		mut = sim.mutationsOfType(m2);
		nfixed = sum(sim.substitutions.mutationType == m2);
		nmut = mut.size();
		
		
		nlost = nchrs - nmut - nfixed;
		catn(c("current", nmut, "lost", nlost, "fixed", nfixed));
		
		need_restart = 0;
		if (nfixed >= nchrs - 3)
		{
			// print("selected mutation fixed");
			// catn(c("DAF", slim_total_generations - sim.cycle, 1.0), sep='\t');
			community.deregisterScriptBlock(self);
		}
		else if ((nlost>3) & (restart_counter < max_restart))
		{
			catn(c("selected mutation lost; restarting...", restart_counter));
			sim.readFromPopulationFile(paste("tmp_slim_state_single_pop_", outid, ".trees", sep=''));
			setSeed(rdunif(1, 0, asInteger(2^62 - 1)));
			for (i in 1:nchrs){
				sample(p1.genomes, num_origins).addNewDrawnMutation(m2, selpos + (i-1) * L);
			}
			restart_counter = restart_counter + 1;
		}
		else
		{
			catn(c("DAF", slim_total_generations - sim.cycle, "_FRQ_", sim.mutationFrequencies(p1, mut), "_POS_", mut.position,"_FIXPOS_", sim.substitutions.position ), sep='\t');
		
		}
	}
}

s3 500 late()
{
	
	mut = sim.mutationsOfType(m2);
	nfixed = sum(sim.substitutions.mutationType == m2);
	nmut = mut.size();
	
	nlowafreq = sum(p1.species.mutationFrequencies(p1, mut) < 0.2);
	
	if ((nlowafreq > 0)  &(sim_relatedness)){
		catn(c("selected mutation with LOW ALLEL FREQUENCIES; restarting...", restart_counter));
		sim.readFromPopulationFile(paste("tmp_slim_state_single_pop_", outid, ".trees", sep=''));
		setSeed(rdunif(1, 0, asInteger(2^62 - 1)));
		for (i in 1:nchrs){
			sample(p1.genomes, num_origins).addNewDrawnMutation(m2, selpos + (i-1) * L);
		}
		restart_counter = restart_counter + 1;
	}
	else{
		// finish simulation only when there is no low allele frequency of selected mutations
		sim.simulationFinished();
		catn(c("restart_counter", restart_counter), sep='\t');
		sim.treeSeqOutput(paste("tmp_slim_out_single_pop_", outid, ".trees", sep=''));
	}
}

late()
{
	if (sim.cycle < slim_total_generations)
		catn(c('True_Ne', slim_total_generations - sim.cycle - 1, p1.individualCount), sep='\t');
}
@bhaller
Copy link
Contributor

bhaller commented Nov 7, 2023

There does appear to be a leak. Finding it is proving challenging. Stay tuned.

@bhaller
Copy link
Contributor

bhaller commented Nov 7, 2023

Reaching out to the tskit folks for help with tskit-dev/tskit#2868.

@bguo068
Copy link
Author

bguo068 commented Nov 7, 2023

Thanks a lot for such as a deep-dive into the issue. I will follow up the discussion on the other side as well. 🙏

@bhaller
Copy link
Contributor

bhaller commented Nov 8, 2023

Thanks a lot for such as a deep-dive into the issue. I will follow up the discussion on the other side as well. 🙏

No worries. It's a bad bug, and might have gone unnoticed for quite a while if someone like you hadn't reported it. Thanks for the very useful bug report!

@bhaller
Copy link
Contributor

bhaller commented Nov 8, 2023

OK, this has been fixed, mostly by updating the version of tskit inside SLiM, and also by fixing one 400-byte leak per reload inside SLiM's own code. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants