Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't find 'rmatspipeline' to import 'run_pipe' #384

Open
tanya-lasagne opened this issue Mar 22, 2024 · 25 comments
Open

Can't find 'rmatspipeline' to import 'run_pipe' #384

tanya-lasagne opened this issue Mar 22, 2024 · 25 comments

Comments

@tanya-lasagne
Copy link

Hello,
I was having issues building rMATS in my system, therefore I downloaded it via an env using Bioconda with Python version 3.10. I seem to be having an issue I can't quite figure out. I downloaded the 'rmats_turbo_v4_2_0' folder via GitHub but I don't see anything that pertains to 'run_pipe.'

I tried using the files under the 'py310_rmats_env' directory but that didn't seem to work either.
Here is my output, any ideas on what I should do?
Thank you!

(/Users/tanyapelayo/Desktop/rmats_turbo_v4_2_0/py310_rmats_env) tanyapelayo@Tanyas-MacBook-Pro-2 rMATS % python rmats.py --b1 /Users/tanyapelayo/desktop/rMATS/input/bam/LD_16_II1.txt --gtf /Users/tanyapelayo/desktop/rMATS/input/reference/Gmax_508_Wm82.a4.v1.gene.gtf -t paired --readLength 50 --nthread 4 --od /Users/tanyapelayo/desktop/rMATS/output/rMATS_results --tmp /Users/tanyapelayo/desktop/rMATS/tmp

Traceback (most recent call last):
File "/Users/tanyapelayo/Desktop/rMATS/rmats.py", line 19, in
from rmatspipeline import run_pipe
ModuleNotFoundError: No module named 'rmatspipeline'

@EricKutschera
Copy link
Contributor

With the conda environment activated you should be able to run with rmats.py --b1 ... instead of python rmats.py --b1 .... Running with just rmats.py should let the conda environment find the correct rmats.py file and the correct version of python. See this post: #237

@RaghdaKailany
Copy link

I installed rMATS by cloning the repo and I am having the same issue even when I used rmats.py without python:

(base) raghdakailany@raghdas-air rmats-turbo % ./rmats.py --b1 /Users/raghdakailany/Documents/RMATS/rmats-turbo/testData/Data/b1.txt --b2 /Users/raghdakailany/Documents/RMATS/rmats-turbo/testData/Data/b2.txt --gtf /Users/raghdakailany/Documents/RMATS/rmats-turbo/testData/gtf/Homo_sapiens.Ensembl.GRCh37.72.gtf -t paired --readLength 50 --nthread 4 --od /Users/raghdakailany/Documents/RMATS/rmats-turbo/output/testData_results --tmp /Users/raghdakailany/Documents/RMATS/tmp

Traceback (most recent call last):
File "/Users/raghdakailany/Documents/RMATS/rmats-turbo/./rmats.py", line 19, in
from rmatspipeline import run_pipe
ModuleNotFoundError: No module named 'rmatspipeline'

any help please

@EricKutschera
Copy link
Contributor

If you clone the repo then you need to run the build to create the rmatspipline.so file:
https://github.com/Xinglab/rmats-turbo/tree/v4.3.0?tab=readme-ov-file#build

#36 (comment)

@tanya-lasagne
Copy link
Author

tanya-lasagne commented Mar 28, 2024

Thanks Eric! I seemed to have figured it out.
Would you mind taking a look at my terminals output, I'm not sure if it's working correctly or not:

(/Users/tanyapelayo/Desktop/rmats_turbo_v4_2_0/py310_rmats_env) tanyapelayo@Tanyas-MacBook-Pro-2 rMATS % rmats.py --b1 /Users/tanyapelayo/desktop/rMATS/input/bam/bam_paths.txt --gtf /Users/tanyapelayo/desktop/rMATS/input/reference/Gmax_508_Wm82.a4.v1.gene.gtf -t single --readLength 50 --nthread 4 --od /Users/tanyapelayo/desktop/rMATS/output/rMATS_results --tmp /Users/tanyapelayo/desktop/rMATS/tmp

gtf: 2.9293458461761475
There are 47095 distinct gene ID in the gtf file
There are 79550 distinct transcript ID in the gtf file
There are 32121 one-transcript genes in the gtf file
There are 544454 exons in the gtf file
There are 9083 one-exon transcripts in the gtf file
There are 7844 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 1.689139
Average number of exons per transcript is 6.844173
Average number of exons per transcript excluding one-exon tx is 7.597471
Average number of gene per geneGroup is 4.559375
statistic: 0.006670951843261719

read outcome totals across all BAMs
USED: 2182
NOT_PAIRED: 0
NOT_NH_1: 14386214
NOT_EXPECTED_CIGAR: 769794
NOT_EXPECTED_READ_LENGTH: 106672034
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 79
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 19
CLIPPED: 16370585
total: 138200907
outcomes by BAM written to: /Users/tanyapelayo/desktop/rMATS/tmp/2024-03-27-18_02_30_233323_read_outcomes_by_bam.txt

novel: 91.05075883865356
The splicing graph and candidate read have been saved into /Users/tanyapelayo/desktop/rMATS/tmp/2024-03-27-18_02_30_233323_*.rmats
save: 0.005758047103881836
loadsg: 0.001359701156616211

==========
Done processing each gene from dictionary to compile AS events
Found 3975 exon skipping events
Found 59 exon MX events
Found 8341 alt SS events
There are 5436 alt 3 SS events and 2905 alt 5 SS events.
Found 4239 RI events

ase: 0.619361162185669
count: 0.16503429412841797
Processing count files.
WARNING: Statistical step is skipped for SE JC because only one group is involved
WARNING: Statistical step is skipped for SE JCEC because only one group is involved
WARNING: Statistical step is skipped for MXE JC because only one group is involved
WARNING: Statistical step is skipped for MXE JCEC because only one group is involved
WARNING: Statistical step is skipped for A3SS JC because only one group is involved
WARNING: Statistical step is skipped for A3SS JCEC because only one group is involved
WARNING: Statistical step is skipped for A5SS JC because only one group is involved
WARNING: Statistical step is skipped for A5SS JCEC because only one group is involved
WARNING: Statistical step is skipped for RI JC because only one group is involved
WARNING: Statistical step is skipped for RI JCEC because only one group is involved
Done processing count files.

@EricKutschera
Copy link
Contributor

USED: 2182
NOT_PAIRED: 0
NOT_NH_1: 14386214
NOT_EXPECTED_CIGAR: 769794
NOT_EXPECTED_READ_LENGTH: 106672034
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 79
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 19
CLIPPED: 16370585
total: 138200907

About 77% of the alignments were filtered for NOT_EXPECTED_READ_LENGTH. If your reads are a fixed length then you can change --readLength 50 to the actual read length. If the reads are different lengths then you can add --variable-read-length

Also since you have just one group (--b1 without --b2) you can add --statoff to get rid of the warnings

@tanya-lasagne
Copy link
Author

Thank you Eric!
You've been so helpful,
Last question, are there some programs you recommend I can use to visualize the data?

@EricKutschera
Copy link
Contributor

You can use https://github.com/Xinglab/rmats2sashimiplot or just load your .bam files and .gtf in https://igv.org/doc/desktop/ and go to the coordinates for an event

@tanya-lasagne
Copy link
Author

Thank you!!!!

@alquamalok22
Copy link

Hi Eric I am facing the same problem with rmats installation, the same error. Can you please guide me regarding this.

@EricKutschera
Copy link
Contributor

How did you install rmats? Did you try the suggestions in #67 ?

@alquamalok22
Copy link

Hi I installed through the conda install bioconda::rmats from Anaconda. I am getting this error
(mapping) [all454@login3 rmats_input]$ echo "/ix1/ukammula/Alquama/star_output/526MEL_LNGFR_HIGH_KAM1656A17_S17_R1_001_Aligned.sortedByCoord.out.bam

/ix1/ukammula/Alquama/star_output/526MEL_LNGFR_HIGH_KAM1656A17_S17_R2_001_Aligned.sortedByCoord.out.bam" > high_group.txt
(mapping) [all454@login3 rmats_input]$ ll
total 10
-rw-r--r-- 1 all454 ukammula 209 Aug 21 14:31 high_group.txt
(mapping) [all454@login3 rmats_input]$ cat high_group.txt
/ix1/ukammula/Alquama/star_output/526MEL_LNGFR_HIGH_KAM1656A17_S17_R1_001_Aligned.sortedByCoord.out.bam
/ix1/ukammula/Alquama/star_output/526MEL_LNGFR_HIGH_KAM1656A17_S17_R2_001_Aligned.sortedByCoord.out.bam
(mapping) [all454@login3 rmats_input]$ echo "/ix1/ukammula/Alquama/star_output/526MEL_LNGFR_LOW_KAM1656A16_S21_R1_001_Aligned.sortedByCoord.out.bam

/ix1/ukammula/Alquama/star_output/526MEL_LNGFR_LOW_KAM1656A16_S21_R2_001_Aligned.sortedByCoord.out.bam" > low_group.txt
(mapping) [all454@login3 rmats_input]$ ll
total 19
-rw-r--r-- 1 all454 ukammula 209 Aug 21 14:31 high_group.txt
-rw-r--r-- 1 all454 ukammula 207 Aug 21 14:32 low_group.txt
(mapping) [all454@login3 rmats_input]$ cd ../
(mapping) [all454@login3 Alquama]$ rmats.py --b1 /ix1/ukammula/Alquama/rmats_input/high_group.txt --b2 /ix1/ukammula/Alquama/rmats_input/low_group.txt --gtf /ix1/ukammula/Alquama/refs/gencode.v38.annotation.gtf -t paired --readLength 100 --nthread 4
--od /ix1/ukammula/Alquama/rmats_output --tmp /ix1/ukammula/Alquama/rmats_tmp
gtf: 19.985592126846313
There are 60649 distinct gene ID in the gtf file
There are 237012 distinct transcript ID in the gtf file
There are 36828 one-transcript genes in the gtf file
There are 1499012 exons in the gtf file
There are 25030 one-exon transcripts in the gtf file
There are 22509 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.907929
Average number of exons per transcript is 6.324625
Average number of exons per transcript excluding one-exon tx is 6.953336
Average number of gene per geneGroup is 8.508360
statistic: 0.028011560440063477
Fail to open /ix1/ukammula/Alquama/star_output/526MEL_LNGFR_HIGH_KAM1656A17_S17_R1_001_Aligned.sortedByCoord.out.bam
/ix1/ukammula/Alquama/star_output/526MEL_LNGFR_HIGH_KAM1656A17_S17_R2_001_Aligned.sortedByCoord.out.bam
Fail to open /ix1/ukammula/Alquama/star_output/526MEL_LNGFR_LOW_KAM1656A16_S21_R1_001_Aligned.sortedByCoord.out.bam
/ix1/ukammula/Alquama/star_output/526MEL_LNGFR_LOW_KAM1656A16_S21_R2_001_Aligned.sortedByCoord.out.bam
read outcome totals across all BAMs
USED: 0
NOT_PAIRED: 0
NOT_NH_1: 0
NOT_EXPECTED_CIGAR: 0
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 0
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0
CLIPPED: 0
total: 0
outcomes by BAM written to: /ix1/ukammula/Alquama/rmats_tmp/2024-08-21-14_32_55_155365_read_outcomes_by_bam.txt
novel: 0.012536048889160156
The splicing graph and candidate read have been saved into /ix1/ukammula/Alquama/rmats_tmp/2024-08-21-14_32_55_155365_*.rmats
save: 0.01399087905883789
Traceback (most recent call last):
File "/ihome/ukammula/all454/miniconda3/envs/mapping/bin/rmats.py", line 595, in <module ...

@EricKutschera
Copy link
Contributor

The Fail to open errors happen because the --b1 and --b2 files have each bam path on a separate line, but those files should each have a single line with , separated bam paths: https://github.com/Xinglab/rmats-turbo/tree/v4.3.0?tab=readme-ov-file#starting-with-bam-files

@alquamalok22
Copy link

ok will try.

@alquamalok22
Copy link

(mapping) [all454@login3 Alquama]$ rmats.py --b1 /ix1/ukammula/Alquama/rmats_input/high_group.txt --b2 /ix1/ukammula/Alquama/rmats_input/low_group.txt --gtf /ix1/ukammula/Alquama/refs/gencode.v38.annotation.gtf -t paired --readLength 100 --nthread 4
--od /ix1/ukammula/Alquama/rmats_output --tmp /ix1/ukammula/Alquama/rmats_tmp
gtf: 20.477860927581787
There are 60649 distinct gene ID in the gtf file
There are 237012 distinct transcript ID in the gtf file
There are 36828 one-transcript genes in the gtf file
There are 1499012 exons in the gtf file
There are 25030 one-exon transcripts in the gtf file
There are 22509 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.907929
Average number of exons per transcript is 6.324625
Average number of exons per transcript excluding one-exon tx is 6.953336
Average number of gene per geneGroup is 8.508360
statistic: 0.031183481216430664
read outcome totals across all BAMs
USED: 0
NOT_PAIRED: 166926434
NOT_NH_1: 0
NOT_EXPECTED_CIGAR: 0
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 0
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0
CLIPPED: 17851011
total: 184777445
outcomes by BAM written to: /ix1/ukammula/Alquama/rmats_tmp/2024-08-21-17:25:57_756052_read_outcomes_by_bam.txt
novel: 149.1213676929474
The splicing graph and candidate read have been saved into /ix1/ukammula/Alquama/rmats_tmp/2024-08-21-17:25:57_756052_*.rmats
save: 0.0021011829376220703
Traceback (most recent call last):
File "/ihome/ukammula/all454/miniconda3/envs/mapping/bin/rmats.py", line 536, in
main()
File "/ihome/ukammula/all454/miniconda3/envs/mapping/bin/rmats.py", line 507, in main
run_pipe(args)
File "rmatspipeline/rmatspipeline.pyx", line 3803, in rmats.rmatspipeline.run_pipe
File "rmatspipeline/rmatspipeline.pyx", line 3666, in rmats.rmatspipeline.split_sg_files_by_bam
File "rmatspipeline/rmatspipeline.pyx", line 3674, in rmats.rmatspipeline.split_sg_files_by_bam
ValueError: invalid literal for int() with base 10: '' i am getting this i have corrected the format for the bam files also

@EricKutschera
Copy link
Contributor

ValueError: invalid literal for int() with base 10: '' could happen if there are empty .rmats files in the --tmp directory: #59 (comment)

Those files could be from previous runs that had the wrong format for --b1. After a run with an error you should either clear out the --tmp directory or use a new one. Another possibility is that your --b1 files have blank lines or extra whitespace at the end

@alquamalok22
Copy link

tf: 21.28791308403015
There are 60649 distinct gene ID in the gtf file
There are 237012 distinct transcript ID in the gtf file
There are 36828 one-transcript genes in the gtf file
There are 1499012 exons in the gtf file
There are 25030 one-exon transcripts in the gtf file
There are 22509 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.907929
Average number of exons per transcript is 6.324625
Average number of exons per transcript excluding one-exon tx is 6.953336
Average number of gene per geneGroup is 8.508360
statistic: 0.06770205497741699
read outcome totals across all BAMs
USED: 0
NOT_PAIRED: 175310705
NOT_NH_1: 0
NOT_EXPECTED_CIGAR: 0
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 0
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0
CLIPPED: 19069227
total: 194379932
outcomes by BAM written to: /ix1/ukammula/Alquama/rmats_tmp_526mel_high_comparison/2024-08-22-15:36:02_562746_read_outcomes_by_bam.txt
novel: 157.22008872032166
The splicing graph and candidate read have been saved into /ix1/ukammula/Alquama/rmats_tmp_526mel_high_comparison/2024-08-22-15:36:02_562746_*.rmats
save: 0.0019304752349853516 : hi I am getting this output, i checked the txt file its empty. Is it because alignemtn didn,t work properly

@alquamalok22
Copy link

I do have RNA reads R1 AND R2, is it because I have aligned against whole genome. Is that a problem or issue?

@EricKutschera
Copy link
Contributor

Aligning against the whole genome is fine. Most of your alignments are filtered as NOT_PAIRED. From a previous post it looks like you have separate R1 and R2 bam files: 526MEL_LNGFR_HIGH_KAM1656A17_S17_R1_001_Aligned.sortedByCoord.out.bam and also a similarly named R2 file

From https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

--readFilesIn /path/to/read1 [/path/to/read2]

--readFilesIn name(s) (with path) of the files containing the sequences to be mapped (e.g.
RNA-seq FASTQ files). If using Illumina paired-end reads, the read1 and read2 files have to
be supplied

It looks like you may have run STAR separately for R1 and R2 instead of using them both at the same time with --readFilesIn. rMATS marks alignments as NOT_PAIRED based on a "proper pair" flag in the alignment. With R1 and R2 aligned separately that flag won't be set: #51 (comment)

@alanlleung
Copy link

Hello,

I'm using macOS big sur (11.7.0) and got the same error code about rmatspipeline when trying to execute rmats.py with the latest version of rMATs-turbo.
#36 (comment)
#384 (comment)

Tried to build rMATs with ./build_rmats but wasn't that successful - there was an issue installing gsl and xcode. I wonder anyone has a good solution to get around that.

For example, can I build a rmatspipeline.so file that doesn't require me to install gsl and xcode?
#36 (comment)

I am a new user I hope I am adding this comment/question to the right thread.
Thank you.

@EricKutschera
Copy link
Contributor

The build requires gsl, and on mac I think you'll also need xcode

Instead of building rmats yourself you could use the bioconda package or docker image (https://hub.docker.com/r/xinglab/rmats/tags). If you install conda (https://docs.anaconda.com/miniconda/) then you can run the commands in this post to install rmats: #361 (comment)

@alanlleung
Copy link

Thank you. I was able to install a few other dependencies, but I think I'm still missing gcc. - I may try to use bioconda later. This is the error message I am getting right now when I run % ./build_rmats. if still couldn't figure out, I may try to use bioconda later.

cd bamtools; mkdir -p build; cd build; cmake ..; make;
CMake Warning (dev) at CMakeLists.txt:9 (project):
cmake_minimum_required() should be called prior to this top-level project()
call. Please see the cmake-commands(7) manual for usage documentation of
both commands.
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Deprecation Warning at CMakeLists.txt:12 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.

-- Configuring done (0.1s)
CMake Warning (dev):
Policy CMP0042 is not set: MACOSX_RPATH is enabled by default. Run "cmake
--help-policy CMP0042" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.

MACOSX_RPATH is not specified for the following targets:

BamTools

This warning is for project developers. Use -Wno-dev to suppress it.

-- Generating done (0.1s)
-- Build files have been written to: /Users/alanleung/rmats-turbo/bamtools/build
[ 0%] Built target SharedHeaders
[ 1%] Linking CXX shared library /Users/alanleung/rmats-turbo/bamtools/lib/libbamtools.dylib
[ 38%] Built target BamTools
[ 76%] Built target BamTools-static
[ 76%] Built target APIHeaders
[ 76%] Built target AlgorithmsHeaders
[ 80%] Built target jsoncpp
[ 85%] Built target BamTools-utils
[ 86%] Linking CXX executable /Users/alanleung/rmats-turbo/bamtools/bin/bamtools
[100%] Built target bamtools_cmd

rm -f to ignore nonexistent files since *.dylib will only exist for mac

cd bamtools/lib; rm -f *.so .so. *.dylib
cd rMATS_C; make;
make[1]: Nothing to be done for `all'.
cd rMATS_pipeline; python setup.py build_ext;
error: command 'gcc-12' failed: No such file or directory
make: *** [build] Error 1

@alquamalok22
Copy link

(mapping) [all454@login3 Alquama]$ rmats.py --b1/Alquama/rmats_input/high.txt --b2 /Alquama/nput/mut_high.txt --gtf /Alquama/refs/gencode.v38.annotation.gtf -t paired --readLength 100 --nthread 4
--od /Alquama/rmats_output_lngfr_vs_mut_high --tmp /Alquama/rmats_tmp_lngfr_vs_mut_high
gtf: 18.970665454864502
There are 60649 distinct gene ID in the gtf file
There are 237012 distinct transcript ID in the gtf file
There are 36828 one-transcript genes in the gtf file
There are 1499012 exons in the gtf file
There are 25030 one-exon transcripts in the gtf file
There are 22509 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.907929
Average number of exons per transcript is 6.324625
Average number of exons per transcript excluding one-exon tx is 6.953336
Average number of gene per geneGroup is 8.508360
statistic: 0.03118610382080078
read outcome totals across all BAMs
USED: 0
NOT_PAIRED: 9284078
NOT_NH_1: 34407484
NOT_EXPECTED_CIGAR: 1769888
NOT_EXPECTED_READ_LENGTH: 125314937
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 0
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0
CLIPPED: 16024451
total: 186800838
outcomes by BAM written to: /ix1/ukammula/Alquama/rmats_tmp_lngfr_vs_mut_high/2024-08-28-12:19:54_900340_read_outcomes_by_bam.txt

novel: 306.3669626712799
The splicing graph and candidate read have been saved into /Alquama/rmats_tmp_high/2024-08-28-12:19:54_900340_*.rmats
save: 0.03438973426818848
loadsg: 0.07026386260986328

Done processing each gene from dictionary to compile AS events
Found 52939 exon skipping events
Found 3916 exon MX events
Found 18998 alt SS events
There are 11503 alt 3 SS events and 7495 alt 5 SS events.
Found 8337 RI events

ase: 1.9979050159454346
count: 0.4916682243347168
Processing count files.
Done processing count files. while i am seeing the txt file its empty. but here its sghowing results . can you please guide

@EricKutschera
Copy link
Contributor

Looks like most of the reads were filtered for not matching --readLength 100

NOT_EXPECTED_READ_LENGTH: 125314937
total: 186800838

See this post: #95

@alanlleung
Copy link

Sorry this may be a naive question - but does the error message below indicate I don't have enough memory to run rMATS on my computer?
I was able to install rMATS in the conda environment according to Eric's instructions but I ran into issues opening my BAM files - I used samtools to check and everything about my BAM files seemed fine. Initially, I did two controls and two treatment group BAMs then switched to one control vs one treatment because I suspected I did not have enough memory - the files seem to be able to be opened in the latter case but still I got no output files. Does anyone have a hint?

#########################################################
(base) allng@Als-MacBook-Pro ~ % conda activate ./rmats_conda_env
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro ~ % cd rmats_conda_env
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rmats_conda_env % cd rMAT_C
cd: no such file or directory: rMAT_C
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rmats_conda_env % ls
bin include man share
conda-meta lib rMATS ssl
etc libexec sbin
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rmats_conda_env % cd rMATS
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rMATS % ls
pycache rMATS_R
cp_with_prefix.py rmats.py
output_dir rmatspipeline.cpython-312-darwin.so
rMATS_C temp_dir
rMATS_P
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rMATS % python rmats.py --b1 /Users/allng/Downloads/path_to_BAM_control.txt --b2 /Users/allng/Downloads/path_to_BAM_treatment.txt --gtf /Users/allng/Downloads/Homo_sapiens.GRCh37.87.gtf --od output_dir --tmp temp_dir --readLength 50 --nthread 1 --cstat 0.0001
gtf: 24.437462091445923
There are 57905 distinct gene ID in the gtf file
There are 196501 distinct transcript ID in the gtf file
There are 35717 one-transcript genes in the gtf file
There are 1195764 exons in the gtf file
There are 24943 one-exon transcripts in the gtf file
There are 21969 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.393507
Average number of exons per transcript is 6.085282
Average number of exons per transcript excluding one-exon tx is 6.824637
Average number of gene per geneGroup is 7.481461
statistic: 0.02162313461303711
Fail to open /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1101_accepted_hits.bam: BamReader::Open: could not open file: /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1101_accepted_hits.bam
BgzfStream::Open: could not open BGZF stream:
BamFile::Open: could not open file handle for /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1101_accepted_hits.bam
Fail to open /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1115_accepted_hits.bam: BamReader::Open: could not open file: /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1115_accepted_hits.bam
BgzfStream::Open: could not open BGZF stream:
BamFile::Open: could not open file handle for /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1115_accepted_hits.bam

read outcome totals across all BAMs
USED: 0
NOT_PAIRED: 46746630
NOT_NH_1: 0
NOT_EXPECTED_CIGAR: 0
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 0
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0
CLIPPED: 0
total: 46746630
outcomes by BAM written to: temp_dir/2024-08-28-19_36_46_532542_read_outcomes_by_bam.txt

novel: 83.77730202674866
The splicing graph and candidate read have been saved into temp_dir/2024-08-28-19_36_46_532542_.rmats
save: 0.0008111000061035156
Traceback (most recent call last):
File "/Users/allng/rmats_conda_env/rMATS/rmats.py", line 979, in
main()
File "/Users/allng/rmats_conda_env/rMATS/rmats.py", line 945, in main
run_pipe(args)
File "rmatspipeline/rmatspipeline.pyx", line 4006, in rmats.rmatspipeline.run_pipe
File "rmatspipeline/rmatspipeline.pyx", line 3869, in rmats.rmatspipeline.split_sg_files_by_bam
File "rmatspipeline/rmatspipeline.pyx", line 3877, in rmats.rmatspipeline.split_sg_files_by_bam
ValueError: invalid literal for int() with base 10: '/Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1101_accepted_hits.bam'
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rMATS % ls
pycache rMATS_R
cp_with_prefix.py rmats.py
output_dir rmatspipeline.cpython-312-darwin.so
rMATS_C temp_dir
rMATS_P
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rMATS % cd temp-dir
cd: no such file or directory: temp-dir
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rMATS % ls
pycache rMATS_R
cp_with_prefix.py rmats.py
output_dir rmatspipeline.cpython-312-darwin.so
rMATS_C temp_dir
rMATS_P
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rMATS % cd temp_dir
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro temp_dir % ls
2024-08-27-22_44_20_211424_0.rmats
2024-08-27-22_44_20_211424_1.rmats
2024-08-27-22_44_20_211424_read_outcomes_by_bam.txt
2024-08-27-22_48_23_901443_0.rmats
2024-08-27-22_48_23_901443_1.rmats
2024-08-27-22_48_23_901443_read_outcomes_by_bam.txt
2024-08-27-23_00_26_144133_0.rmats
2024-08-27-23_00_26_144133_1.rmats
2024-08-27-23_00_26_144133_2.rmats
2024-08-27-23_00_26_144133_3.rmats
2024-08-27-23_00_26_144133_read_outcomes_by_bam.txt
2024-08-27-23_04_24_032976_0.rmats
2024-08-27-23_04_24_032976_1.rmats
2024-08-27-23_04_24_032976_2.rmats
2024-08-27-23_04_24_032976_3.rmats
2024-08-27-23_04_24_032976_read_outcomes_by_bam.txt
2024-08-27-23_25_40_988503_0.rmats
2024-08-27-23_25_40_988503_1.rmats
2024-08-27-23_25_40_988503_2.rmats
2024-08-27-23_25_40_988503_3.rmats
2024-08-27-23_25_40_988503_read_outcomes_by_bam.txt
2024-08-27-23_36_54_599569_0.rmats
2024-08-27-23_36_54_599569_1.rmats
2024-08-27-23_36_54_599569_2.rmats
2024-08-27-23_36_54_599569_3.rmats
2024-08-27-23_36_54_599569_read_outcomes_by_bam.txt
2024-08-27-23_46_14_641307_0.rmats
2024-08-27-23_46_14_641307_1.rmats
2024-08-27-23_46_14_641307_read_outcomes_by_bam.txt
2024-08-28-19_36_46_532542_0.rmats
2024-08-28-19_36_46_532542_1.rmats
2024-08-28-19_36_46_532542_2.rmats
2024-08-28-19_36_46_532542_3.rmats
2024-08-28-19_36_46_532542_read_outcomes_by_bam.txt
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro temp_dir % less 2024-08-28-19_36_46_532542_read_outcomes_by_bam.txt
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro temp_dir % less 2024-08-28-19_36_46_532542_
.rmats

(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rmats_conda_env % cd rMATS
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rMATS % python rmats.py --b1 /Users/allng/Downloads/path_to_BAM_control_b.txt --b2 /Users/allng/Downloads/path_to_BAM_treatment_b.txt --gtf /Users/allng/Downloads/Homo_sapiens.GRCh37.87.gtf --od output_dir --tmp temp_dir --readLength 50 --nthread 1 --cstat 0.0001
gtf: 24.960295915603638
There are 57905 distinct gene ID in the gtf file
There are 196501 distinct transcript ID in the gtf file
There are 35717 one-transcript genes in the gtf file
There are 1195764 exons in the gtf file
There are 24943 one-exon transcripts in the gtf file
There are 21969 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.393507
Average number of exons per transcript is 6.085282
Average number of exons per transcript excluding one-exon tx is 6.824637
Average number of gene per geneGroup is 7.481461
statistic: 0.020067930221557617

read outcome totals across all BAMs
USED: 0
NOT_PAIRED: 46746630
NOT_NH_1: 0
NOT_EXPECTED_CIGAR: 0
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 0
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0
CLIPPED: 0
total: 46746630
outcomes by BAM written to: temp_dir/2024-08-28-19_43_27_347046_read_outcomes_by_bam.txt

novel: 82.15347099304199
The splicing graph and candidate read have been saved into temp_dir/2024-08-28-19_43_27_347046_*.rmats
save: 0.0004296302795410156
Traceback (most recent call last):
File "/Users/allng/rmats_conda_env/rMATS/rmats.py", line 979, in
main()
File "/Users/allng/rmats_conda_env/rMATS/rmats.py", line 945, in main
run_pipe(args)
File "rmatspipeline/rmatspipeline.pyx", line 4006, in rmats.rmatspipeline.run_pipe
File "rmatspipeline/rmatspipeline.pyx", line 3869, in rmats.rmatspipeline.split_sg_files_by_bam
File "rmatspipeline/rmatspipeline.pyx", line 3877, in rmats.rmatspipeline.split_sg_files_by_bam
ValueError: invalid literal for int() with base 10: '/Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1101_accepted_hits.bam'
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rMATS % ls
pycache rMATS_C rmats.py
cp_with_prefix.py rMATS_P rmatspipeline.cpython-312-darwin.so
output_dir rMATS_R temp_dir
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro rMATS % cd output_dir
(/Users/allng/rmats_conda_env) allng@Als-MacBook-Pro output_dir % ls
split_dot_rmats tmp

@EricKutschera
Copy link
Contributor

The errors don't seem related to memory limits. Here are the main errors:

Fail to open /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1101_accepted_hits.bam: BamReader::Open: could not open file: /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1101_accepted_hits.bam
BgzfStream::Open: could not open BGZF stream:
BamFile::Open: could not open file handle for /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1101_accepted_hits.bam
Fail to open /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1115_accepted_hits.bam: BamReader::Open: could not open file: /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1115_accepted_hits.bam
BgzfStream::Open: could not open BGZF stream:
BamFile::Open: could not open file handle for /Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1115_accepted_hits.bam

https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L792
It looks like maybe those are not valid paths to bam files or you don't have read permissions on those files. Can you open them with samtools view?

File "rmatspipeline/rmatspipeline.pyx", line 3877, in rmats.rmatspipeline.split_sg_files_by_bam
ValueError: invalid literal for int() with base 10: '/Users/allng/Documents/Results/B1_mutant_RNA_seq/KX_analysis/1101_accepted_hits.bam'

https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L3877

That error could happen if you used a --b1 or --b2 file with extra newline characters. It looks like you have output from multiple runs in your --tmp and the run that had the incorrectly formatted file might be an old run. You can use a new --tmp for new runs or remove the files before reusing --tmp in a new run: #424 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants