Custom reference for non-human, non-mouse genome #3

hswhitbeck · 2021-03-19T20:12:46Z

Hi, the ReadMe file says "If you want to use your customs reference, you can use the -gene -te options:". We understood this as being able to use your code on other genomes than the mouse and the human. We tried this command to build the index:
scTE_build -te /path/to/hsal_v8.5_filtered_unique_ids.bed -gene /path/to/hsal_v8.5_genes_update16.gtf -o /path/to/scTE_build_1.idx
and we got the following error message:
scTE_build: error: the following arguments are required: -g/--genome
In the ReadMe file example the -g argument is not supplied for building a custom index. Why is it required? Any tips are appreciated. Thank you.

The text was updated successfully, but these errors were encountered:

jphe · 2021-03-20T13:08:14Z

We have update scTE with more speices' genome included, and the the -g is optional now if the bed/gtf file were given.

bsierieb1 · 2021-03-22T21:54:42Z

Thanks for your reply @jphe
We have downloaded the updated version of scTE and now get another error:
ERROR : Counting genome other not supported
We work with an exotic non-model species of insects. Could you please help us generate a custom index for our genome? Would it be possible to share the genome with you so that you could include it in the next update? If this is too much work for you, maybe you could guide us through the process and let us do it ourselves?
Thanks a lot!

jphe · 2021-03-23T01:46:37Z

For non-model species you need to make sure it has well annotated files for TEs and genes.

As GitHub has a strict file limit of 100MB, and the genmoe indices usually much bigger than that, so we can not upload the geome indices to the Github.

If you have we accessible ftp or any other web accessible tools, you can share the annotation files for us then we build the indices and send to you

bsierieb1 · 2021-03-29T17:38:15Z

here are the genome and the annotation files.

thank you so much for your help!

bsierieb1 · 2021-03-29T23:10:25Z

P.S. you should be able to use the same link to upload the indices. please let us know if there is any issue!

jphe · 2021-03-30T05:07:52Z

There are only the gtf file for genes under the ftp, while scTE also needs an annotation file for TEs.

The gene annotation gtf file seems derived from transcript assembly, however, we did not recommend for such file as there are many TE derived transcripts, which will leads to underestimate of TE expression if you use scTE for quantification, as scTE assign reads to genes/transcripts first, and then for TEs.

Besides, usually the transcripts assembly highly depends on bulk RNA-seq data, while development and disease process are highly heterogenous, the transcripts from the rare cell types are often masked by bulk RNA-seq, which means the transcript assembly from bulk RNA-seq data may unreliable for the analysis of the rare cell types from single-cell.

May be you can try the strategy of this paper if you want to use the assembled transcripts, which quantifies the expression of TE derived transcripts https://genome.cshlp.org/content/early/2020/12/21/gr.265173.120.abstract

bsierieb1 · 2021-03-30T15:54:07Z

sorry, i accidentally copied a link to one of the files instead of the link to the entire drive folder. here is the correct link.

the gene annotations file is not derived from a transcriptome assembly, but i wonder what made you think that? the gene annotations were generated by the NCBI annotation pipeline and further updated by incorporating additional RNA-seq data. the TE annotations are simply the output of RepeatMasker (edited to remove some classes of short features).

bsierieb1 · 2021-04-06T14:48:51Z

hi @jphe, do you think you have everything you need? thank you for offering help!

jphe · 2021-04-23T08:40:06Z

Sorry for the late reply, we can not interpretate properly, we don't know what it means for each column, as it seems not a classical gtf file. Basically you need to convert it into a canonical gtf format for the gtf file. Or you can check if Ensemble has the gtf file for the genome, it should be canonical gtf format in Ensemble.

akui113 · 2021-07-16T01:42:26Z

Hi, the ReadMe file says "If you want to use your customs reference, you can use the -gene -te options:". We understood this as being able to use your code on other genomes than the mouse and the human. We tried this command to build the index:
scTE_build -te /path/to/hsal_v8.5_filtered_unique_ids.bed -gene /path/to/hsal_v8.5_genes_update16.gtf -o /path/to/scTE_build_1.idx
and we got the following error message:
scTE_build: error: the following arguments are required: -g/--genome
In the ReadMe file example the -g argument is not supplied for building a custom index. Why is it required? Any tips are appreciated. Thank you.

@jphe
I also encountered the same problem，and the species is Macaca mulatta.
gene annotation file was downloaded from http://ftp.ensembl.org/pub/release-104/gtf/macaca_mulatta/Macaca_mulatta.Mmul_10.104.gtf.gz,
and repeatmask file was downloaded from http://hgdownload.soe.ucsc.edu/goldenPath/rheMac10/database/rmsk.txt.gz .

Then, I treated the repeatmask file and get a six-column bed file with the option awk 'BEGIN{FS=OFS="\t"}{print $6,$7,$8,$11,$3,$10}' rmsk.txt > mmul10rmsk.bed and make sure the chromosome name consistent with gene annotation file.
Lastly, I building the index scTE_build -te mmul10rmsk.bed -gene Macaca_mulatta.Mmul_10.104.gtf -o Mmul_10scTE.idx.
However, I get the ERROR : Counting genome other not supported.

Any tips are appreciated !
Thank you for your generous help!

antecede · 2023-08-23T13:05:30Z

Hello team of authors and thank you for your beautiful work! Could you please write a guide process so that others can create their own custom references for non-model species, so that we can get the results file in a timely manner while reducing your work! Thanks again!
best wishes!

antecede · 2023-08-25T02:55:35Z

Sorry for the late reply, we can not interpretate properly, we don't know what it means for each column, as it seems not a classical gtf file. Basically you need to convert it into a canonical gtf format for the gtf file. Or you can check if Ensemble has the gtf file for the genome, it should be canonical gtf format in Ensemble.

If the research is on non modal species, there is no canonical gtf in Ensemble. If convenient, please provide the non Ensemble gtf or how to supplement the missing column content to obtain a custom reference.

BR

jphe mentioned this issue Apr 7, 2021

reference hg19 #7

Closed

akui113 mentioned this issue Jul 17, 2021

scTE building customs reference index ERROR ! #20

Closed

jphe closed this as completed Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom reference for non-human, non-mouse genome #3

Custom reference for non-human, non-mouse genome #3

hswhitbeck commented Mar 19, 2021

jphe commented Mar 20, 2021

bsierieb1 commented Mar 22, 2021

jphe commented Mar 23, 2021

bsierieb1 commented Mar 29, 2021 •

edited

Loading

bsierieb1 commented Mar 29, 2021

jphe commented Mar 30, 2021

bsierieb1 commented Mar 30, 2021

bsierieb1 commented Apr 6, 2021

jphe commented Apr 23, 2021

akui113 commented Jul 16, 2021 •

edited

Loading

antecede commented Aug 23, 2023

antecede commented Aug 25, 2023

Custom reference for non-human, non-mouse genome #3

Custom reference for non-human, non-mouse genome #3

Comments

hswhitbeck commented Mar 19, 2021

jphe commented Mar 20, 2021

bsierieb1 commented Mar 22, 2021

jphe commented Mar 23, 2021

bsierieb1 commented Mar 29, 2021 • edited Loading

bsierieb1 commented Mar 29, 2021

jphe commented Mar 30, 2021

bsierieb1 commented Mar 30, 2021

bsierieb1 commented Apr 6, 2021

jphe commented Apr 23, 2021

akui113 commented Jul 16, 2021 • edited Loading

antecede commented Aug 23, 2023

antecede commented Aug 25, 2023

bsierieb1 commented Mar 29, 2021 •

edited

Loading

akui113 commented Jul 16, 2021 •

edited

Loading