Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local pipeline install fails during convert_matrix_from_mtx_to_loom.py #52

Open
t-carroll opened this issue Jun 30, 2020 · 3 comments
Open

Comments

@t-carroll
Copy link

Hi BIMSB team,

I've been working on getting your pipeline up and running as a non-root user on our group's compute cluster. Unfortunately, installing guix isn't an option in my case, but after some trial and error I was able to put together a conda environment with the listed dependencies that manages to get through the configure/make process.

However, when I go to run the pipeline on the test dataset, the pipeline fails with the following message:

./pigx-scrnaseq tests/sample_sheet.csv -s tests/settings.yaml

Error in job convert_matrix_from_mtx_to_loom while creating output file /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Mapped/WT_HEK_4h_br1/hg19/WT_HEK_4h_br1_hg19_UMI.matrix.loom.
RuleException:
CalledProcessError in line 37 of /users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/scripts/Accessory_Functions.py:
Command '/users/tcarroll/sharedscratch/anaconda3/envs/pigx_scrnaseq/bin/python /users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/scripts/convert_matrix_from_mtx_to_loom.py --sample_id WT_HEK_4h_br1 --input_dir /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Mapped/WT_HEK_4h_br1/hg19/WT_HEK_4h_br1_Solo.out --gtf_file /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Annotation/hg19/hg19.gtf --star_output_types_keys Gene GeneFull Velocyto Velocyto --star_output_types_vals Counts GeneFull Spliced Unspliced --output_file /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Mapped/WT_HEK_4h_br1/hg19/WT_HEK_4h_br1_hg19_UMI.matrix.loom --sample_sheet_file /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/sample_sheet.csv --path_script /users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/scripts &> /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Log/WT_HEK_4h_br1.hg19.convert_matrix_from_mtx_to_loom.log' returned non-zero exit status 1.
File "/users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/Snake_Dropseq.py", line 769, in __rule_convert_matrix_from_mtx_to_loom
File "/users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/scripts/Accessory_Functions.py", line 37, in print_shell
File "/users/tcarroll/sharedscratch/anaconda3/envs/pigx_scrnaseq/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message

I then checked the log referenced in the error message:

cat /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Log/WT_HEK_0h_br1.hg19.convert_matrix_from_mtx_to_loom.log

Parsing gene ids from gtf file /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Annotation/hg19/hg19.gtf
Reading input files ...
Counts
Features ...
Traceback (most recent call last):
File "/users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/scripts/convert_matrix_from_mtx_to_loom.py", line 102, in
genes.columns = ['gene_id','gene_id2']
File "/users/tcarroll/sharedscratch/anaconda3/envs/pigx_scrnaseq/lib/python3.6/site-packages/pandas/core/generic.py", line 5287, in setattr
return object.setattr(self, name, value)
File "pandas/_libs/properties.pyx", line 67, in pandas._libs.properties.AxisProperty.set
File "/users/tcarroll/sharedscratch/anaconda3/envs/pigx_scrnaseq/lib/python3.6/site-packages/pandas/core/generic.py", line 661, in _set_axis
self._data.set_axis(axis, labels)
File "/users/tcarroll/sharedscratch/anaconda3/envs/pigx_scrnaseq/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 178, in set_axis
f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements

Looks like some issue with pandas, where there's 2 elements instead of the expected 3? I'm not as experienced with Python so having trouble figuring this out. Perhaps it's some dependency I missed during my manual local install? If you could provide any advice on what might be going on here or where to look for more troubleshooting info it would be most appreciated. I'm running in a CentOS Linux 7 (Core) environment, with tools and packages installed by Conda where possible and manually otherwise.

Thanks for the help!
-Tom

@t-carroll
Copy link
Author

Was able to troubleshoot this further. Looks like in convert_matrix_from_mtx_to_loom.py, there is a block of code:

    path_genes    = os.path.join(path_input,'features.tsv')
    genes         = pd.read_csv(path_genes, sep='\t', header=None)
    genes.columns = ['gene_id','gene_id2']
    genes         = genes.drop(columns = ['gene_id2'])

Where it is expecting 2 columns in the "features.tsv" file. However, all of the relevant features.tsv files generated by the test pigx_scrnaseq command in my environment generate 3 column outputs. Here's an example, but it's the same for all the various features.tsv generated for each sample:
head -3 tests/out/Mapped/WT_HEK_4h_br1/hg19/WT_HEK_4h_br1_Solo.out/Gene/filtered/features.tsv

ENSG00000080573 ENSG00000080573 Gene Expression
ENSG00000267650 ENSG00000267650 Gene Expression
ENSG00000080511 ENSG00000080511 Gene Expression

I therefore modified the relevant code on my fork's devel branch to the following:

    path_genes    = os.path.join(path_input,'features.tsv')
    genes         = pd.read_csv(path_genes, sep='\t', header=None)
    genes.columns = ['gene_id','gene_id2','gene_expression']
    genes         = genes.drop(columns = ['gene_id2','gene_expression'])

And the test command now completes without error. Do the features.tsv files generated by your guix install on the test command only have two files? If so I imagine I have the wrong version of some dependency (maybe STAR? I have 2.7.5a).

In any case, the HTML report generated by my test command (attached) seemingly indicates I am having further issues with the pipeline, and may be generating incorrect results. For instance, the number of uniquely mapped UMIs is a tiny fraction of the total UMIs, and plots 3.0.2 and 3.0.3 are completely blank. Would it be possible to generate an example report for your test pipeline command (e.g. hg19.scRNA-Seq.report.html generated by your own installation) for users to compare against to insure proper installation?

Thanks for the help!
hg19.scRNA-Seq.report_files.zip

@frenkiboy
Copy link
Contributor

Dear Tom,

Thank you so much for reporting the message. I'll look into the error and sort it out.

Could you please just tell me which pandas version you are using?

Best,

Vedran

@t-carroll
Copy link
Author

Hi Vedran,

Thanks for looking into it! It looks like I'm using pandas version 1.0.3:

pip freeze | grep "pandas"

pandas==1.0.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants