-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom reference for non-human, non-mouse genome #3
Comments
We have update scTE with more speices' genome included, and the the -g is optional now if the bed/gtf file were given. |
Thanks for your reply @jphe |
For non-model species you need to make sure it has well annotated files for TEs and genes. As GitHub has a strict file limit of 100MB, and the genmoe indices usually much bigger than that, so we can not upload the geome indices to the Github. If you have we accessible ftp or any other web accessible tools, you can share the annotation files for us then we build the indices and send to you |
here are the genome and the annotation files. thank you so much for your help! |
P.S. you should be able to use the same link to upload the indices. please let us know if there is any issue! |
There are only the gtf file for genes under the ftp, while scTE also needs an annotation file for TEs. The gene annotation gtf file seems derived from transcript assembly, however, we did not recommend for such file as there are many TE derived transcripts, which will leads to underestimate of TE expression if you use scTE for quantification, as scTE assign reads to genes/transcripts first, and then for TEs. Besides, usually the transcripts assembly highly depends on bulk RNA-seq data, while development and disease process are highly heterogenous, the transcripts from the rare cell types are often masked by bulk RNA-seq, which means the transcript assembly from bulk RNA-seq data may unreliable for the analysis of the rare cell types from single-cell. May be you can try the strategy of this paper if you want to use the assembled transcripts, which quantifies the expression of TE derived transcripts https://genome.cshlp.org/content/early/2020/12/21/gr.265173.120.abstract |
sorry, i accidentally copied a link to one of the files instead of the link to the entire drive folder. here is the correct link. the gene annotations file is not derived from a transcriptome assembly, but i wonder what made you think that? the gene annotations were generated by the NCBI annotation pipeline and further updated by incorporating additional RNA-seq data. the TE annotations are simply the output of RepeatMasker (edited to remove some classes of short features). |
hi @jphe, do you think you have everything you need? thank you for offering help! |
Sorry for the late reply, we can not interpretate properly, we don't know what it means for each column, as it seems not a classical gtf file. Basically you need to convert it into a canonical gtf format for the gtf file. Or you can check if Ensemble has the gtf file for the genome, it should be canonical gtf format in Ensemble. |
@jphe Then, I treated the repeatmask file and get a six-column bed file with the option awk 'BEGIN{FS=OFS="\t"}{print $6,$7,$8,$11,$3,$10}' rmsk.txt > mmul10rmsk.bed and make sure the chromosome name consistent with gene annotation file. Any tips are appreciated ! |
Hello team of authors and thank you for your beautiful work! Could you please write a guide process so that others can create their own custom references for non-model species, so that we can get the results file in a timely manner while reducing your work! Thanks again! |
If the research is on non modal species, there is no canonical gtf in Ensemble. If convenient, please provide the non Ensemble gtf or how to supplement the missing column content to obtain a custom reference. BR |
Hi, the ReadMe file says "If you want to use your customs reference, you can use the -gene -te options:". We understood this as being able to use your code on other genomes than the mouse and the human. We tried this command to build the index:
scTE_build -te /path/to/hsal_v8.5_filtered_unique_ids.bed -gene /path/to/hsal_v8.5_genes_update16.gtf -o /path/to/scTE_build_1.idx
and we got the following error message:
scTE_build: error: the following arguments are required: -g/--genome
In the ReadMe file example the -g argument is not supplied for building a custom index. Why is it required? Any tips are appreciated. Thank you.
The text was updated successfully, but these errors were encountered: