Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understand the meaning of Espresso SJ output #37

Closed
junjiemama opened this issue Oct 3, 2023 · 2 comments
Closed

Understand the meaning of Espresso SJ output #37

junjiemama opened this issue Oct 3, 2023 · 2 comments

Comments

@junjiemama
Copy link

Among the output files, there are a couple types of splice junction files. Could you please help to illustrate the column names of the files that I listed below? If it's possible, could you please educate me a little bit about how were these files generated and what could be the potential use of these files? I am sorry for asking such basic questions, but I am really trying to make fully use of the ESPRESSO output as much as I could. Thank you!

i.e. chr1_SJ_simplified_list
SJ_cluster 11475 0 0 chr1 3492124 3740774
11475 chr1:3492124:3740774:1 3492124 3740774 1 0 0 TBD TBD 2 no yes 1 0
SJ_cluster 11476 0 0 chr1 3492124 3740774
11476 chr1:3492124:3740774:1 3492124 3740774 1 0 0 TBD TBD 2 no yes 1 0
SJ_cluster 11477 0 0 chr1 3492124 3740774
11477 chr1:3492124:3740774:1 3492124 3740774 1 0 0 TBD TBD 2 no yes 1 0
SJ_cluster 11478 0 0 chr1 4562891 4563322
11478 chr1:4562891:4563322:1 4562891 4563322 1 1 1 CT AC 2 yes yes 1 0
SJ_cluster 11478 1 1 chr1 4562891 4563994
11478 chr1:4562891:4563994:1 4562891 4563994 1 0 0 TBD TBD 2 no yes 1 1

SJ_group_all.fa

chrUn_JH584304v1:7010:14083:0 SJclst:0: group:0:
AGGTTCCGAATAGCTGAGCATCATGATACGAAGCAGAAGATGTGCCAAGC
chrUn_JH584304v1:19433:20156:1 SJclst:1: group:1:
GGGAGTGCAGCCCGGGGGTCTGGGATGTGTGGCTTTGAATGATGTTGATG
chrUn_JH584304v1:19345:20219:0 SJclst:0: group:1:
CAGGGCCCTGAGCCTCCAGCTGCAGGGTTGGCTGCGATGGCAAGAACAGC
chrUn_JH584304v1:20376:24796:0 SJclst:2: group:1:
TGCAGGGTGAAGAGATGGCAGAATGAGATGGCTGTACAATTCCACCATGG
chrUn_JH584304v1:24958:26983:0 SJclst:3: group:1:
AGGGCCTTTACACACTGGAAGCACTACATGTTGCTACAGGCAGAAGAGGC

Then in each sample folder (if I have multiple samples), there is
sj.list
1 chr12:72831310:72833445 chr12 72831310 72833445 1 1 m64060_200922_102352/3/ccs, m64060_200922_102352/3/ccs,
1 chr12:72837515:72839551 chr12 72837515 72839551 1 1 m64060_200922_102352/3/ccs, m64060_200922_102352/3/ccs,
1 chr12:72808405:72830456 chr12 72808405 72830456 1 1 m64060_200922_102352/3/ccs, m64060_200922_102352/3/ccs,
1 chr12:72839609:72840485 chr12 72839609 72840485 1 1 m64060_200922_102352/3/ccs, m64060_200922_102352/3/ccs,
1 chr12:72833563:72837406 chr12 72833563 72837406 1 1 m64060_200922_102352/3/ccs, m64060_200922_102352/3/ccs,

@EricKutschera
Copy link
Contributor

Those files are only intended to be useful as intermediate files for ESPRESSO itself to use, but if you find them useful that's great

{chr}_SJ_simplified_list is written here: https://github.com/Xinglab/espresso/blob/v1.3.2/src/ESPRESSO_S.pl#L547
The format is the SJ_cluster line:
SJ_cluster {group_number} {sort_index} {other_sort_index} {chr} {cluster_start_coord} {cluster_end_coord}
And then 1 line per SJ in that cluster:
{group_number} {chr}:{SJ_start_coord}:{SJ_end_coord}:{strand} {SJ_start_coord} {SJ_end_coord} {strand} {number_of_perfect_read} {number_of_reads} {1st_2_nt_in_intron} {last_2_nt_in_intron} {enum} {is_putative} {is_annotated} {is_high_confidence} {sort_index}
A perfect read for a splice junction has no mismatches, insertions, or deletions around the SJ. The {enum} is: 2 -> annotated, 1 -> strand determined based on 1st and last 2 nt, 0 -> strand not determined. is_putative is 1 if the SJ was seen in the input alignments

SJ_group_all.fa is written here: https://github.com/Xinglab/espresso/blob/v1.3.2/src/ESPRESSO_S.pl#L554
The format is 1 line to describe the SJ: >{chr}:{SJ_start_coord}:{SJ_end_coord}:{strand} SJclst:{sort_index}: group:{group_number}:
and the next line is the genomic sequence 25nt leading up to the SJ and 25nt after the SJ

sj.list is written here: https://github.com/Xinglab/espresso/blob/v1.3.2/src/ESPRESSO_S.pl#L880
The format is {group_number} {chr}:{SJ_start_coord}:{SJ_end_coord} {chr} {SJ_start_coord} {SJ_end_coord} {number_of_perfect_reads} {number_of_total_reads} {comma_seperated_list_of_perfect_read_IDs_for_this_SJ} {comma_seperated_list_of_all_read_IDs_for_this_SJ}

@junjiemama
Copy link
Author

junjiemama commented Oct 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants