New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU usage and GPU usage too little #244
Comments
And the vcf files are placed in a harddisk. After searching this question in Google. I find python have the GIL lock. Did this prevent the full usage of CPU? So I use mutipleprocess on my hand for different types of cancer def extract_signature_for_folder(folder):
output_dir = f"/harddisk/sxt/VCFinput/{folder}"
output_path = f"/harddisk/sxt/output/{folder}"
print(f"Signature extraction for {folder} started.")
sig.sigProfilerExtractor("vcf", output_path, output_dir, "GRCh38")
print(f"Signature for {folder} extracted.")
if __name__ == "__main__":
with Pool(128) as p:
p.map(extract_signature_for_folder, cancer_types) |
Sincerely waiting for your hearing. |
I run a small example for examination |
Your input matrix has 96 rows and 2 columns, but your extraction is from signatures 1 to 25. This does not work and you need a larger input matrix (the max rank is 2 for a 96x2 input). Please review the README and run the example using the matrix file as input (code below): from SigProfilerExtractor import sigpro as sig
def main_function():
# to get input from table format (mutation catalog matrix)
path_to_example_table = sig.importdata("matrix")
data = path_to_example_table # you can put the path to your tab delimited file containing the mutational catalog matrix/table
sig.sigProfilerExtractor("matrix", "example_output", data, opportunity_genome="GRCh38", minimum_signatures=1, maximum_signatures=3)
if __name__=="__main__":
main_function() |
hi professor @mdbarnesUCSD |
Plus, can I only extract signatures for SBS and DIUNC except INDELs? But when I change the context_type parameter, nothing has changed. It still generating matrices for INDELs as usual. How can I make it? |
Plus, can I only extract signatures for SBS and DIUNC except INDELs? But when I change the context_type parameter, nothing has changed. It still generating matries for INDELs as usual. |
how to choose the max_signatures parameter when using VCF files as input? |
The maximum_signatures needs to be a value less than the number of samples that you have. I would suggest start with matrix inputs rather than VCFs. You can run SigProfilerMatrixGenerator to generate the matrices and this may help you identify if there are any issues with your VCFs. You can then use the INDEL matrix you created from SigProfilerMatrixGenerator as the input for SigProfilerExtractor. |
Hello @mdbarnesUCSD sig.sigProfilerExtractor("vcf","/home/sxt/HDD/output/rectum", "/home/sxt/HDD/VCFinput/rectum", "GRCh38",minimum_signatures=1,maximum_signatures=3) The terminal prints (base) sxt@C233-Primary-Server:~$ /home/sxt/miniconda3/bin/conda run -p /home/sxt/miniconda3 --no-capture-output python /tmp/pycharm_project_825/rectum.py
************** Reported Current Memory Use: 0.5 GB *****************
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 337.43 seconds.
Starting matrix generation for INDELs... I find the SNVs and DINUCs costs 5minutes, but the INDELS has costed 10 days. It still didn't finish. |
Please generate your matrices separately and provide those as inputs to SigProfilerExtractor. The matrix generation step should not take anywhere near 10 days. How many mutations are you working with? Are you running out of memory? |
There's a lot of memory left. All vcf files are about 5 GB. I am going to try to generate matrices separately. |
hi processor @mdbarnesUCSD
I change my code and use the given parameters
The cpu usage is still very low.
When it gets the "making matries for INDELs", the cpu usage is too slow.
It costs too much time to finish.
I look through sigpro.py.
Then I find that you use the mutipleprocess package. But it seems doesn't take effect.
I run code in Ubuntu Linux 22.04 and the sigprofilerextractor package is the latest version.
The text was updated successfully, but these errors were encountered: