Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoloTE indexer error #31

Closed
frentzeperis opened this issue Sep 15, 2023 · 9 comments
Closed

SoloTE indexer error #31

frentzeperis opened this issue Sep 15, 2023 · 9 comments

Comments

@frentzeperis
Copy link

Good afternoon, I hope you are well!

I am using SoloTE 1.09 to analyze TE expression from a murine BAM file, it was aligned to mm10. I am trying to get the code running for the first subject before moving on to the others. It runs for a while but at the end got a few errors and the temp files were all still there. I would greatly appreciate any help!

Code:
python SoloTE/SoloTE_pipeline.py --threads 1 --bam possorted_genome_bam.bam --teannotation mm10_rmsk.bed --outputprefix sub1-test --outputdir /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/

Output:
SoloTE started at 12:28:50
[OK] samtools found!
[OK] bedtools found!
SoloTE v1.09 started!
SoloTE Home directory /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/SoloTE
SoloTE executed from /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te
Results will be stored in /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te
Input BAM file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam
Input TE BED file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed
Currently working in temporary directory: /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/sub1-test_SoloTE_temp
samtools view -@ 1 -O BAM -o sub1-test_nogenes_overlappingtes.bam -L /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed -e '(exists([CB]) && exists([UB]) && [CB]!="-" && [UB]!="-") && (!exists([GN]) || [GN]=="-")' /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam
samtools index sub1-test_nogenes_overlappingtes.bam
sub1-test_nogenes_overlappingtes.bed exists in output folder. Skipping this step
sub1-test_selectedtes.bed exists in output folder. Skipping this step
python /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/SoloTE/annotateBAM.py sub1-test_nogenes_overlappingtes.bam sub1-test_selectedtes.bed temp_annotated_te.bam 1
samtools sort -@ 1 -O BAM -o sub1-test_teannotated.bam temp_annotated_te.bam
samtools merge --threads 1 -o - /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam sub1-test_teannotated.bam|samtools view -@ 1 -O BAM -o sub1-test_final.bam -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-" && [GN]!="-"' --keep-tag GN,CB,UB
samtools index sub1-test_final.bam
Counts for chromosome chr1 are being generated in process: 19911
Counts for chromosome chr10 are being generated in process: 19911
Counts for chromosome chr11 are being generated in process: 19911
Counts for chromosome chr12 are being generated in process: 19911
Counts for chromosome chr13 are being generated in process: 19911
Counts for chromosome chr14 are being generated in process: 19911
Counts for chromosome chr15 are being generated in process: 19911
Counts for chromosome chr16 are being generated in process: 19911
Counts for chromosome chr17 are being generated in process: 19911
Counts for chromosome chr18 are being generated in process: 19911
Counts for chromosome chr19 are being generated in process: 19911
Counts for chromosome chr2 are being generated in process: 19911
Counts for chromosome chr3 are being generated in process: 19911
Counts for chromosome chr4 are being generated in process: 19911
Counts for chromosome chr5 are being generated in process: 19911
Counts for chromosome chr6 are being generated in process: 19911
Counts for chromosome chr7 are being generated in process: 19911
Counts for chromosome chr8 are being generated in process: 19911
Counts for chromosome chr9 are being generated in process: 19911
Counts for chromosome chrM are being generated in process: 19911
Counts for chromosome chrX are being generated in process: 19911
Counts for chromosome chrY are being generated in process: 19911
Counts for chromosome GL456233.1 are being generated in process: 19911
Counts for chromosome GL456211.1 are being generated in process: 19911
Counts for chromosome GL456350.1 are being generated in process: 19911
Counts for chromosome JH584293.1 are being generated in process: 19911
Counts for chromosome GL456221.1 are being generated in process: 19911
Counts for chromosome JH584297.1 are being generated in process: 19911
Counts for chromosome JH584296.1 are being generated in process: 19911
Counts for chromosome JH584294.1 are being generated in process: 19911
Counts for chromosome JH584298.1 are being generated in process: 19911
Counts for chromosome GL456210.1 are being generated in process: 19911
Counts for chromosome GL456212.1 are being generated in process: 19911
Counts for chromosome JH584304.1 are being generated in process: 19911
Counts for chromosome GL456216.1 are being generated in process: 19911
Counts for chromosome JH584295.1 are being generated in process: 19911
Traceback (most recent call last):
File "/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3800, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 4

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/SoloTE/SoloTE_pipeline.py", line 217, in
tecounts2.loc[tecounts2[4].isnull(),4] = tecounts2.loc[tecounts2[4].isnull(),1]
File "/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/python3.9/site-packages/pandas/core/frame.py", line 3805, in getitem
indexer = self.columns.get_loc(key)
File "/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
raise KeyError(key) from err
KeyError: 4

@bvaldebenitom
Copy link
Owner

Hi @frentzeperis !

Do you have a sub1-test_allcounts.txt file in /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/sub1-test_SoloTE_temp ? If so, please share the output of head sub1-test_allcounts.txt.

Additionally, can you share the output of the following commands?

ls -lht /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/sub1-test_SoloTE_temp
head /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed
samtools view /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam|head

The output of those commands will help further diagnose these issues. Thanks!

@frentzeperis
Copy link
Author

Thanks so much!

The output of head sub1-test_allcounts.txt is below:
1600012P17Rik;Pappa2 AGACCCGCACGCCACA-1 1
1700007P06Rik AGTCATGCAACGTATC-1 1
1700007P06Rik CCAAGCGGTCGAAACG-1 1
1700016C15Rik AAACCCACATCGCTAA-1 2
1700016C15Rik AAACGAACACTCCACT-1 1
1700016C15Rik AAACGCTTCTAACGGT-1 1
1700016C15Rik AAAGGTAGTGATAGTA-1 1
1700016C15Rik AAATGGATCCGATGCG-1 2
1700016C15Rik AAATGGATCCTACCAC-1 1
1700016C15Rik AACAGGGCAAAGGCGT-1 2

ls -lht /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/sub1-test_SoloTE_temp
total 16342632
-rw-r--r--@ 1 frederikarentzeperis staff 591M Sep 15 13:57 sub1-test_allcounts.txt
-rw-r--r-- 1 frederikarentzeperis staff 14K Sep 15 13:57 sub1-test_countpercell_JH584295.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 57K Sep 15 13:57 sub1-test_countpercell_GL456216.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 115K Sep 15 13:57 sub1-test_countpercell_JH584304.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 7.3K Sep 15 13:57 sub1-test_countpercell_GL456212.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 2.4K Sep 15 13:57 sub1-test_countpercell_GL456210.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 32B Sep 15 13:57 sub1-test_countpercell_JH584298.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 83B Sep 15 13:57 sub1-test_countpercell_JH584294.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 64B Sep 15 13:57 sub1-test_countpercell_JH584296.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 32B Sep 15 13:57 sub1-test_countpercell_JH584297.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 3.0K Sep 15 13:57 sub1-test_countpercell_GL456221.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 289B Sep 15 13:57 sub1-test_countpercell_JH584293.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 568B Sep 15 13:57 sub1-test_countpercell_GL456350.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 3.6K Sep 15 13:57 sub1-test_countpercell_GL456211.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 57K Sep 15 13:57 sub1-test_countpercell_GL456233.1.counts
-rw-r--r-- 1 frederikarentzeperis staff 183K Sep 15 13:57 sub1-test_countpercell_chrY.counts
-rw-r--r-- 1 frederikarentzeperis staff 16M Sep 15 13:57 sub1-test_countpercell_chrX.counts
-rw-r--r-- 1 frederikarentzeperis staff 6.7M Sep 15 13:56 sub1-test_countpercell_chrM.counts
-rw-r--r-- 1 frederikarentzeperis staff 33M Sep 15 13:54 sub1-test_countpercell_chr9.counts
-rw-r--r-- 1 frederikarentzeperis staff 33M Sep 15 13:51 sub1-test_countpercell_chr8.counts
-rw-r--r-- 1 frederikarentzeperis staff 45M Sep 15 13:49 sub1-test_countpercell_chr7.counts
-rw-r--r-- 1 frederikarentzeperis staff 30M Sep 15 13:44 sub1-test_countpercell_chr6.counts
-rw-r--r-- 1 frederikarentzeperis staff 37M Sep 15 13:42 sub1-test_countpercell_chr5.counts
-rw-r--r-- 1 frederikarentzeperis staff 33M Sep 15 13:39 sub1-test_countpercell_chr4.counts
-rw-r--r-- 1 frederikarentzeperis staff 38M Sep 15 13:36 sub1-test_countpercell_chr3.counts
-rw-r--r-- 1 frederikarentzeperis staff 39M Sep 15 13:33 sub1-test_countpercell_chr2.counts
-rw-r--r-- 1 frederikarentzeperis staff 27M Sep 15 13:30 sub1-test_countpercell_chr19.counts
-rw-r--r-- 1 frederikarentzeperis staff 16M Sep 15 13:27 sub1-test_countpercell_chr18.counts
-rw-r--r-- 1 frederikarentzeperis staff 33M Sep 15 13:26 sub1-test_countpercell_chr17.counts
-rw-r--r-- 1 frederikarentzeperis staff 15M Sep 15 13:23 sub1-test_countpercell_chr16.counts
-rw-r--r-- 1 frederikarentzeperis staff 24M Sep 15 13:22 sub1-test_countpercell_chr15.counts
-rw-r--r-- 1 frederikarentzeperis staff 20M Sep 15 13:20 sub1-test_countpercell_chr14.counts
-rw-r--r-- 1 frederikarentzeperis staff 20M Sep 15 13:18 sub1-test_countpercell_chr13.counts
-rw-r--r-- 1 frederikarentzeperis staff 19M Sep 15 13:17 sub1-test_countpercell_chr12.counts
-rw-r--r-- 1 frederikarentzeperis staff 46M Sep 15 13:15 sub1-test_countpercell_chr11.counts
-rw-r--r-- 1 frederikarentzeperis staff 29M Sep 15 13:11 sub1-test_countpercell_chr10.counts
-rw-r--r-- 1 frederikarentzeperis staff 30M Sep 15 13:09 sub1-test_countpercell_chr1.counts
-rw-r--r-- 1 frederikarentzeperis staff 1.9M Sep 15 13:07 sub1-test_final.bam.bai
-rw-r--r-- 1 frederikarentzeperis staff 5.7G Sep 15 13:06 sub1-test_final.bam
-rw-r--r-- 1 frederikarentzeperis staff 1.7K Sep 15 12:42 sub1-test_teannotated.bam
-rw-r--r-- 1 frederikarentzeperis staff 1.7K Sep 15 12:42 temp_annotated_te.bam
-rw-r--r-- 1 frederikarentzeperis staff 2.9M Sep 15 12:42 sub1-test_nogenes_overlappingtes.bam.bai
-rw-r--r-- 1 frederikarentzeperis staff 972M Sep 15 12:42 sub1-test_nogenes_overlappingtes.bam
-rw-r--r-- 1 frederikarentzeperis staff 0B Sep 15 10:26 sub1-test_selectedtes.bed
-rw-r--r-- 1 frederikarentzeperis staff 0B Sep 15 10:26 sub1-test_nogenes_overlappingtes.bed

head /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed
chr1 3000001 3002128 chr1|3000001|3002128|L1_Mus3:L1:LINE|10.5|- 10.5 -
chr1 3003153 3003994 chr1|3003153|3003994|L1Md_F:L1:LINE|26.8|- 26.8 -
chr1 3003994 3004054 chr1|3003994|3004054|L1_Mus3:L1:LINE|27.9|- 27.9 -
chr1 3004041 3004206 chr1|3004041|3004206|L1_Rod:L1:LINE|19.9|+ 19.9 +
chr1 3004271 3005001 chr1|3004271|3005001|L1_Rod:L1:LINE|19.9|+ 19.9 +
chr1 3005002 3005439 chr1|3005002|3005439|L1_Rod:L1:LINE|22.1|+ 22.1 +
chr1 3005461 3005548 chr1|3005461|3005548|Lx9:L1:LINE|22.6|+ 22.6 +
chr1 3005571 3006764 chr1|3005571|3006764|Lx9:L1:LINE|22.6|+ 22.6 +
chr1 3007015 3007268 chr1|3007015|3007268|L1M4:L1:LINE|28.9|- 28.9 -
chr1 3008117 3008483 chr1|3008117|3008483|L1_Mur2:L1:LINE|14.8|- 14.8 -

samtools view /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam|head
A01185:247:HY53WDRXY:2:1101:6723:12383 16 chr1 3016338 0 92M * 0 0 GGAGTTCCTTAATCCACTTAGATTTGACCTTAGTACAAGGAGATAGGAATGGATCAATTCGCATTCTTCTACATGATAACAGCCAGTTGTGC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:5 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:ATGATCGCACCGGAAA CY:Z:FFFFFFFFFFFFFFFF CB:Z:ATGATCGCACCGGAAA-1 UR:Z:TTGCATTATCTC UY:Z:FFFFFFFFFFFF UB:Z:TTGCATTATCTC
A01185:247:HY53WDRXY:2:2236:9173:35728 16 chr1 3016344 0 92M * 0 0 CCTTAATCCACTTAGATTTGACCTTAGTACAAGGAGATAGGAATGGATCAATTCGCATTCTTCTACATGATAACAGCCAGTTGTGCCAGCAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:6 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:CCTGTTGTCCTTCTAA CY:Z:FFFFFFFF:FFFFFFF CB:Z:CCTGTTGTCCTTCTAA-1 UR:Z:CTCTCATTAACT UY:Z:FFFFFFFFFFFF UB:Z:CTCTCATTAACT
A01185:247:HY53WDRXY:1:1169:14163:18349 16 chr1 3018673 1 92M * 0 0 TTTGTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTG FFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:1 RE:A:I xf:i:0 CR:Z:AAAGAACGTTGACTAC CY:Z:FFFFFFFFFFFF:FFF CB:Z:AAAGAACGTTGACTAC-1 UR:Z:TATTACTTAGCT UY:Z:FFFFFFFFF,:F UB:Z:TATTACTTAGCT
A01185:247:HY53WDRXY:1:1256:20636:20541 16 chr1 3018673 1 92M * 0 0 TTTGTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:1 RE:A:I xf:i:0 CR:Z:TTAGGGTGTTCGAACT CY:Z:FFFFFFFFFFFFFFFF CB:Z:TTAGGGTGTTCGAACT-1 UR:Z:CTCTTTTTGGGT UY:Z:FFFFFFFFFFFF UB:Z:CTCTTTTTGGGT
A01185:247:HY53WDRXY:2:2102:26793:1799 16 chr1 3018673 1 92M * 0 0 TTTGTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTG FFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:GACTTCCTCGCAATGT CY:Z:FFFFFF:FFFFFF::F CB:Z:GACTTCCTCGCAATGT-1 UR:Z:TGGGCCTATCCT UY:Z:FFFFFFFFFFFF UB:Z:TGGGCCTATCCT
A01185:247:HY53WDRXY:1:1254:26069:24283 16 chr1 3018673 1 92M * 0 0 TTTGTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTG FFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:1 RE:A:I xf:i:0 CR:Z:TGATGCAAGACAGCTG CY:Z:FFFFFFFFFFFFFFFF CB:Z:TGATGCAAGACAGCTG-1 UR:Z:TAAAGTACTCAC UY:Z::FFFFFFFFFFF UB:Z:TAAAGTACTCAC
A01185:247:HY53WDRXY:2:1233:19750:10238 16 chr1 3018676 1 92M * 0 0 GTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTGTTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:CGTAATGAGAGTCACG CY:Z:FFFFFFFFFFFFFFFF CB:Z:CGTAATGAGAGTCACG-1 UR:Z:AGCGTTTTTGCA UY:Z:FFF:FFFFFFFF UB:Z:AGCGTTTTTGCA
A01185:247:HY53WDRXY:2:1266:5141:17315 16 chr1 3018678 1 92M * 0 0 TTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTGTTTCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:GGACGTCCACACGCCA CY:Z:FFFFFFFFFFFFFFFF CB:Z:GGACGTCCACACGCCA-1 UR:Z:CACTCTTCGAAC UY:Z:FFFFFF:FFFFF UB:Z:CACTCTTCGAAC
A01185:247:HY53WDRXY:2:1267:3125:36558 16 chr1 3018678 1 92M * 0 0 TTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTGTTTCC FFFFFFF:FFFFFFFF:FFFFF:FFFFFFFFFFFF,FFFFFFFFFFFFFFFFF:FFFFFF:FFFFFFFFF,FFFFFFFFFFFFFFFF:FFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:ACGTACACATACTGTG CY:Z:FFFFFFFFFFFFFFFF CB:Z:ACGTACACATACTGTG-1 UR:Z:CTCTCCCCTGCT UY:Z:FFFFFFFFFFFF UB:Z:CTCTCCCCTGCT
A01185:247:HY53WDRXY:1:2106:10294:13996 16 chr1 3018678 1 92M * 0 0 TTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTGTTTCC FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:1 RE:A:I xf:i:0 CR:Z:GTTGCTCTCAAGAAAC CY:Z:FFFFFFFFFFFFFFFF CB:Z:GTTGCTCTCAAGAAAC-1 UR:Z:GACACATACACA UY:Z:FFFFFFFFFFFF UB:Z:GACACATACACA

@bvaldebenitom
Copy link
Owner

Looks like input files are in order, and most of the results are being generated. However, the files

sub1-test_selectedtes.bed
sub1-test_nogenes_overlappingtes.bed

appear to be created before the file
sub1-test_nogenes_overlappingtes.bam
which is not the standard behaviour, since the BAM file is created first, and then the aforementioned BED files are created.

This results in:

sub1-test_nogenes_overlappingtes.bed exists in output folder. Skipping this step
sub1-test_selectedtes.bed exists in output folder. Skipping this step

Since these files are empty, then no TEs are annotated. Can you check the output of grep -c sub1-test_allcounts.txt ? If the result is 0, it will confirm this issue.

Did you experience any interruption during the pipeline execution? Could you try deleting the temp directory, and running the pipeline again?

@frentzeperis
Copy link
Author

I had no interruptions. I tried running everything again and regenerated the initial BED file because I was wondering the same thing, it seemed weird. I am still getting errors. I tried to run grep but it just runs forever (I have been running it for close to an hour and it is still going). Here is the output of my second run.

Code:
python SoloTE/SoloTE_pipeline.py --threads 1 --bam MIME26_AL1-3_0_v1/possorted_genome_bam.bam --teannotation rmsk.bed --outputprefix TE --outputdir MIME26_AL1-3_0_v1

Output:
SoloTE started at 15:51:48
[OK] samtools found!
[OK] bedtools found!
SoloTE v1.09 started!
SoloTE Home directory /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE
SoloTE executed from /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te
Results will be stored in /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1
Input BAM file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam
Input TE BED file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed
Currently working in temporary directory: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/TE_SoloTE_temp
samtools view -@ 1 -O BAM -o TE_nogenes_overlappingtes.bam -L /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed -e '(exists([CB]) && exists([UB]) && [CB]!="-" && [UB]!="-") && (!exists([GN]) || [GN]=="-")' /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam
samtools index TE_nogenes_overlappingtes.bam
bedtools bamtobed -i TE_nogenes_overlappingtes.bam -split > TE_nogenes_overlappingtes.bed
bedtools intersect -a /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed -b TE_nogenes_overlappingtes.bed -u > TE_selectedtes.bed
python /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/annotateBAM.py TE_nogenes_overlappingtes.bam TE_selectedtes.bed temp_annotated_te.bam 1
samtools sort -@ 1 -O BAM -o TE_teannotated.bam temp_annotated_te.bam
[bam_sort_core] merging from 17 files and 1 in-memory blocks...
samtools merge --threads 1 -o - /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam TE_teannotated.bam|samtools view -@ 1 -O BAM -o TE_final.bam -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-" && [GN]!="-"' --keep-tag GN,CB,UB
samtools index TE_final.bam
Counts for chromosome chr1 are being generated in process: 22550
Counts for chromosome chr10 are being generated in process: 22550
Counts for chromosome chr11 are being generated in process: 22550
Counts for chromosome chr12 are being generated in process: 22550
Counts for chromosome chr13 are being generated in process: 22550
Counts for chromosome chr14 are being generated in process: 22550
Counts for chromosome chr15 are being generated in process: 22550
Counts for chromosome chr16 are being generated in process: 22550
Counts for chromosome chr17 are being generated in process: 22550
Counts for chromosome chr18 are being generated in process: 22550
Counts for chromosome chr19 are being generated in process: 22550
Counts for chromosome chr2 are being generated in process: 22550
Counts for chromosome chr3 are being generated in process: 22550
Counts for chromosome chr4 are being generated in process: 22550
Counts for chromosome chr5 are being generated in process: 22550
Counts for chromosome chr6 are being generated in process: 22550
Counts for chromosome chr7 are being generated in process: 22550
Counts for chromosome chr8 are being generated in process: 22550
Counts for chromosome chr9 are being generated in process: 22550
Counts for chromosome chrM are being generated in process: 22550
Counts for chromosome chrX are being generated in process: 22550
Counts for chromosome chrY are being generated in process: 22550
Counts for chromosome GL456233.1 are being generated in process: 22550
Counts for chromosome GL456211.1 are being generated in process: 22550
Counts for chromosome GL456350.1 are being generated in process: 22550
Counts for chromosome JH584293.1 are being generated in process: 22550
Counts for chromosome GL456221.1 are being generated in process: 22550
Counts for chromosome JH584297.1 are being generated in process: 22550
Counts for chromosome JH584296.1 are being generated in process: 22550
Counts for chromosome JH584294.1 are being generated in process: 22550
Counts for chromosome JH584298.1 are being generated in process: 22550
Counts for chromosome GL456210.1 are being generated in process: 22550
Counts for chromosome GL456212.1 are being generated in process: 22550
Counts for chromosome JH584304.1 are being generated in process: 22550
Counts for chromosome GL456216.1 are being generated in process: 22550
Counts for chromosome JH584295.1 are being generated in process: 22550
Creating final results directory
/Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/TE_SoloTE_output was created
Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_legacytes.txt TE_legacytes_MATRIX
dyld[23347]: Library not loaded: @rpath/libreadline.6.2.dylib
Referenced from: <185433D7-8B40-31AA-8BD9-465D23C57257> /Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/libR.dylib
Reason: tried: '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/bin/exec/../../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/bin/exec/../../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/libreadline.6.2.dylib' (no such file)
Traceback (most recent call last):
File "/Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/SoloTE_pipeline.py", line 272, in
os.rename(mtx_outname,finaldir+"/"+mtx_outname)
FileNotFoundError: [Errno 2] No such file or directory: 'TE_legacytes_MATRIX' -> '/Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/TE_SoloTE_output/TE_legacytes_MATRIX'

@bvaldebenitom
Copy link
Owner

It seems there is an error with R. Can you run the following commands?

R --version
conda env export > fk_issue31_environment.yml

The second command should create the file fk_issue31_environment.yml. Send it to me so I can further inspect. Additionally, if you are able to share the _allcounts.txt file generated now, that would also be helpful.

@frentzeperis
Copy link
Author

frentzeperis commented Sep 16, 2023

When I type R --version I am getting the following:
Library not loaded: @rpath/libreadline.6.2.dylib
Referenced from: <185433D7-8B40-31AA-8BD9-465D23C57257> /Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/libR.dylib
Reason: tried: '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/bin/exec/../../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/bin/exec/../../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/libreadline.6.2.dylib' (no such file)
zsh: abort R --version

zsh: no matches found: dyld[27033]:

Here is the fk_issue31_environment.yml
fk_issue31_environment.yml.zip

The _allcounts is too big to upload even after compression (1.52GB before and 235mb after compression)

@frentzeperis
Copy link
Author

I made another conda environment and reinstalled everything, not sure what broke R in the last environment. I think it ran this time, thanks for helping me.

code:
python SoloTE/SoloTE_pipeline.py --threads 5 --bam MIME26_AL1-3_0_v1/possorted_genome_bam.bam --teannotation rmsk.bed --outputprefix TE --outputdir MIME26_AL1-3_0_v1

output:
SoloTE started at 20:37:43
[OK] samtools found!
[OK] bedtools found!
SoloTE v1.09 started!
SoloTE Home directory /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE
SoloTE executed from /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te
Results will be stored in /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1
Input BAM file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam
Input TE BED file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed
Currently working in temporary directory: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/TE_SoloTE_temp
samtools view -@ 5 -O BAM -o TE_nogenes_overlappingtes.bam -L /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed -e '(exists([CB]) && exists([UB]) && [CB]!="-" && [UB]!="-") && (!exists([GN]) || [GN]=="-")' /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam
samtools index TE_nogenes_overlappingtes.bam
bedtools bamtobed -i TE_nogenes_overlappingtes.bam -split > TE_nogenes_overlappingtes.bed
bedtools intersect -a /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed -b TE_nogenes_overlappingtes.bed -u > TE_selectedtes.bed
python /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/annotateBAM.py TE_nogenes_overlappingtes.bam TE_selectedtes.bed temp_annotated_te.bam 1
samtools sort -@ 5 -O BAM -o TE_teannotated.bam temp_annotated_te.bam
[bam_sort_core] merging from 15 files and 5 in-memory blocks...
samtools merge --threads 5 -o - /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam TE_teannotated.bam|samtools view -@ 5 -O BAM -o TE_final.bam -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-" && [GN]!="-"' --keep-tag GN,CB,UB
samtools index TE_final.bam
Counts for chromosome chr1 are being generated in process: 30695
Counts for chromosome chr10 are being generated in process: 30693
Counts for chromosome chr11 are being generated in process: 30694
Counts for chromosome chr12 are being generated in process: 30692
Counts for chromosome chr13 are being generated in process: 30696
Counts for chromosome chr14 are being generated in process: 30696
Counts for chromosome chr15 are being generated in process: 30692
Counts for chromosome chr16 are being generated in process: 30695
Counts for chromosome chr17 are being generated in process: 30693
Counts for chromosome chr18 are being generated in process: 30696
Counts for chromosome chr19 are being generated in process: 30695
Counts for chromosome chr2 are being generated in process: 30692
Counts for chromosome chr3 are being generated in process: 30694
Counts for chromosome chr4 are being generated in process: 30696
Counts for chromosome chr5 are being generated in process: 30693
Counts for chromosome chr6 are being generated in process: 30695
Counts for chromosome chr7 are being generated in process: 30692
Counts for chromosome chr8 are being generated in process: 30696
Counts for chromosome chr9 are being generated in process: 30694
Counts for chromosome chrM are being generated in process: 30693
Counts for chromosome chrX are being generated in process: 30695
Counts for chromosome chrY are being generated in process: 30696
Counts for chromosome GL456233.1 are being generated in process: 30696
Counts for chromosome GL456211.1 are being generated in process: 30696
Counts for chromosome GL456350.1 are being generated in process: 30696
Counts for chromosome JH584293.1 are being generated in process: 30696
Counts for chromosome GL456221.1 are being generated in process: 30696
Counts for chromosome JH584297.1 are being generated in process: 30696
Counts for chromosome JH584296.1 are being generated in process: 30696
Counts for chromosome JH584294.1 are being generated in process: 30696
Counts for chromosome JH584298.1 are being generated in process: 30696
Counts for chromosome GL456210.1 are being generated in process: 30696
Counts for chromosome GL456212.1 are being generated in process: 30696
Counts for chromosome JH584304.1 are being generated in process: 30696
Counts for chromosome GL456216.1 are being generated in process: 30696
Counts for chromosome JH584295.1 are being generated in process: 30696
Creating final results directory
/Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/TE_SoloTE_output was created
Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_legacytes.txt TE_legacytes_MATRIX
Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_locustes.txt TE_locustes_MATRIX
Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_classtes.txt TE_classtes_MATRIX
Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_familytes.txt TE_familytes_MATRIX
Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_subfamilytes.txt TE_subfamilytes_MATRIX
A total of 110474508 UMIs are in the final matrix.
Of these,
91659123 (82.969%) correspond to genes.
and 18815385 (17.031%) correspond to TEs.
TE detected UMIs are distributed as follows:
Locus-specific TEs: 16194148 UMIs (86.069%).
Subfamily TEs: 2621237 (13.931%).
Creating TE_SoloTE.stats TE statistics file
Finished creating TE_SoloTE.stats
SoloTE finished with MIME26_AL1-3_0_v1/possorted_genome_bam.bam
SoloTE finished at 21:59:56
SoloTE total running time: 1:22:12.853309

@frentzeperis
Copy link
Author

Hm actually, before closing this, there are 5 output folders all with the barcodes, features, and matrix files. They are called:
TE_classtes_MATRIX, TE_familytes_MATRIX, TE_legacytes_MATRIX, TE_locustes_MATRIX, and TE_subfamilytes_MATRIX.

Is the intended output in one of these? I thought we were just meant to get one output with the 3 file types.

@bvaldebenitom
Copy link
Owner

@frentzeperis :

Thanks for sharing the update. Sometimes setting up R within one conda environment breaks another installation within a different environment.

It looks like it now finished successfully. And yes, this is the new intended output as of version 1.09. This was done in order to provide a seamless generation of the matrices corresponding to different ways of analyzing TE data. The description for each one is as follows:

  • classtes_MATRIX: Matrix contains TE expression summarized at the class level (LTR, LINE, SINE, DNA).
  • familytes_MATRIX: Matrix with TE expression summarized at the family level.
  • subfamilytes_MATRIX: Matrix with TE expression summarized at the subfamily level.
  • locustes_MATRIX: Matrix containing TE expression at the locus level (using only uniquely mapped reads).
  • legacytes_MATRIX: Matrix containing TE expression at the locus level (again, using only uniquely mapped reads), and at the subfamily level for reads not mapping uniquely.

Overall the class, family, and subfamily matrices could be used to get an idea of global changes in TE expression. For example, the tool scTE reports results only at the subfamily level, and here we provide users with a similar output. On the other hand, locustes and legacytes matrices contain locus-specific expression which might be helpful for correlation analyses. A more detailed explanation of the difference between locus and legacy can be found here. Previously, the default (and only output) was the legacy TEs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants