-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SoloTE indexer error #31
Comments
Hi @frentzeperis ! Do you have a Additionally, can you share the output of the following commands?
The output of those commands will help further diagnose these issues. Thanks! |
Thanks so much! The output of head sub1-test_allcounts.txt is below: ls -lht /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/sub1-test_SoloTE_temp head /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed samtools view /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam|head |
Looks like input files are in order, and most of the results are being generated. However, the files
appear to be created before the file This results in:
Since these files are empty, then no TEs are annotated. Can you check the output of Did you experience any interruption during the pipeline execution? Could you try deleting the temp directory, and running the pipeline again? |
I had no interruptions. I tried running everything again and regenerated the initial BED file because I was wondering the same thing, it seemed weird. I am still getting errors. I tried to run grep but it just runs forever (I have been running it for close to an hour and it is still going). Here is the output of my second run. Code: Output: |
It seems there is an error with R. Can you run the following commands?
The second command should create the file |
When I type R --version I am getting the following: zsh: no matches found: dyld[27033]: Here is the fk_issue31_environment.yml The _allcounts is too big to upload even after compression (1.52GB before and 235mb after compression) |
I made another conda environment and reinstalled everything, not sure what broke R in the last environment. I think it ran this time, thanks for helping me. code: output: |
Hm actually, before closing this, there are 5 output folders all with the barcodes, features, and matrix files. They are called: Is the intended output in one of these? I thought we were just meant to get one output with the 3 file types. |
Thanks for sharing the update. Sometimes setting up R within one conda environment breaks another installation within a different environment. It looks like it now finished successfully. And yes, this is the new intended output as of version 1.09. This was done in order to provide a seamless generation of the matrices corresponding to different ways of analyzing TE data. The description for each one is as follows:
Overall the class, family, and subfamily matrices could be used to get an idea of global changes in TE expression. For example, the tool scTE reports results only at the subfamily level, and here we provide users with a similar output. On the other hand, |
Good afternoon, I hope you are well!
I am using SoloTE 1.09 to analyze TE expression from a murine BAM file, it was aligned to mm10. I am trying to get the code running for the first subject before moving on to the others. It runs for a while but at the end got a few errors and the temp files were all still there. I would greatly appreciate any help!
Code:
python SoloTE/SoloTE_pipeline.py --threads 1 --bam possorted_genome_bam.bam --teannotation mm10_rmsk.bed --outputprefix sub1-test --outputdir /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/
Output:
SoloTE started at 12:28:50
[OK] samtools found!
[OK] bedtools found!
SoloTE v1.09 started!
SoloTE Home directory /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/SoloTE
SoloTE executed from /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te
Results will be stored in /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te
Input BAM file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam
Input TE BED file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed
Currently working in temporary directory: /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/sub1-test_SoloTE_temp
samtools view -@ 1 -O BAM -o sub1-test_nogenes_overlappingtes.bam -L /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed -e '(exists([CB]) && exists([UB]) && [CB]!="-" && [UB]!="-") && (!exists([GN]) || [GN]=="-")' /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam
samtools index sub1-test_nogenes_overlappingtes.bam
sub1-test_nogenes_overlappingtes.bed exists in output folder. Skipping this step
sub1-test_selectedtes.bed exists in output folder. Skipping this step
python /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/SoloTE/annotateBAM.py sub1-test_nogenes_overlappingtes.bam sub1-test_selectedtes.bed temp_annotated_te.bam 1
samtools sort -@ 1 -O BAM -o sub1-test_teannotated.bam temp_annotated_te.bam
samtools merge --threads 1 -o - /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam sub1-test_teannotated.bam|samtools view -@ 1 -O BAM -o sub1-test_final.bam -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-" && [GN]!="-"' --keep-tag GN,CB,UB
samtools index sub1-test_final.bam
Counts for chromosome chr1 are being generated in process: 19911
Counts for chromosome chr10 are being generated in process: 19911
Counts for chromosome chr11 are being generated in process: 19911
Counts for chromosome chr12 are being generated in process: 19911
Counts for chromosome chr13 are being generated in process: 19911
Counts for chromosome chr14 are being generated in process: 19911
Counts for chromosome chr15 are being generated in process: 19911
Counts for chromosome chr16 are being generated in process: 19911
Counts for chromosome chr17 are being generated in process: 19911
Counts for chromosome chr18 are being generated in process: 19911
Counts for chromosome chr19 are being generated in process: 19911
Counts for chromosome chr2 are being generated in process: 19911
Counts for chromosome chr3 are being generated in process: 19911
Counts for chromosome chr4 are being generated in process: 19911
Counts for chromosome chr5 are being generated in process: 19911
Counts for chromosome chr6 are being generated in process: 19911
Counts for chromosome chr7 are being generated in process: 19911
Counts for chromosome chr8 are being generated in process: 19911
Counts for chromosome chr9 are being generated in process: 19911
Counts for chromosome chrM are being generated in process: 19911
Counts for chromosome chrX are being generated in process: 19911
Counts for chromosome chrY are being generated in process: 19911
Counts for chromosome GL456233.1 are being generated in process: 19911
Counts for chromosome GL456211.1 are being generated in process: 19911
Counts for chromosome GL456350.1 are being generated in process: 19911
Counts for chromosome JH584293.1 are being generated in process: 19911
Counts for chromosome GL456221.1 are being generated in process: 19911
Counts for chromosome JH584297.1 are being generated in process: 19911
Counts for chromosome JH584296.1 are being generated in process: 19911
Counts for chromosome JH584294.1 are being generated in process: 19911
Counts for chromosome JH584298.1 are being generated in process: 19911
Counts for chromosome GL456210.1 are being generated in process: 19911
Counts for chromosome GL456212.1 are being generated in process: 19911
Counts for chromosome JH584304.1 are being generated in process: 19911
Counts for chromosome GL456216.1 are being generated in process: 19911
Counts for chromosome JH584295.1 are being generated in process: 19911
Traceback (most recent call last):
File "/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3800, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 4
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/SoloTE/SoloTE_pipeline.py", line 217, in
tecounts2.loc[tecounts2[4].isnull(),4] = tecounts2.loc[tecounts2[4].isnull(),1]
File "/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/python3.9/site-packages/pandas/core/frame.py", line 3805, in getitem
indexer = self.columns.get_loc(key)
File "/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
raise KeyError(key) from err
KeyError: 4
The text was updated successfully, but these errors were encountered: