Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

labeling of exons (junctions?) #3

Closed
gpertea opened this issue Feb 11, 2022 · 4 comments
Closed

labeling of exons (junctions?) #3

gpertea opened this issue Feb 11, 2022 · 4 comments

Comments

@gpertea
Copy link

gpertea commented Feb 11, 2022

For transcripts with many exons it would useful to have the option to display the exon order numbers inside the exon (or above/below when the exon height is variable or too small?).

Perhaps a dedicated boolean option to just enable/disable the automatic drawing of exon order numbers for each transcript, with another option for its placement?

A more generic solution would be mapping such exon labels to some GTF exon attribute, like cov or exon_number as found in StringTie output -- maybe a label option can be added to geom_range() or its aesthetics. However in many cases the exon_number attribute is missing so a helper function could be added to generate that automatically in that case..

As for labeling junctions, I suppose a labeling option could be added to geom_junction() to enable showing the numeric coverage values (supporting reads) for each junction, above the junction curve for top curves, or below for bottom ones.

@dzhang32
Copy link
Owner

I really like the idea of labelling geom_ranges. Not only would this be useful for exons, I can imagine it coming in handy for other use cases e.g. labelling to_diff() (or one day, to_jdiff()) outputs with their width to see if transcript changes remain in frame. With this in mind, I find the generic solution more appealing, with a helper function to facilitate the exon number use case.

Thinking this through, I think such a label could currently be achieved by adding something like e.g. ggplot2::geom_label(aes(x = (start + end / 2), label = exon_number)). I wonder if it is worth adding a label parameter to geom_range() or instead, show in the vignette/example the combined use of geom_label and geom_range? Personally favour the latter, as it gives users more flexibility and maybe more ggplot2-esque - would appreciate your thoughts.

For the junction case, I do think a label parameter makes sense. The difference from geom_range being that users currently cannot easily add a label to the centre of the junction line as this requires knowing the points of the curve (in particular the y values). Implementation-wise, I think I would need to rewrite GeomJunction to inherit from GeomPath rather than GeomCurve to allow manipulation of the curve prior to creation of the grob - I will give this a go.

Thank you for your feedback, super helpful @gpertea!

@gpertea
Copy link
Author

gpertea commented Feb 12, 2022

Thank you -- using geom_label (with geom_range) sounds like a good solution for labeling exons, I did not realize it could be that simple. I guess in that case the only thing left for exons would be to make sure the exon_ number column can be generated with a helper function if the attribute is missing.

dzhang32 added a commit that referenced this issue Feb 12, 2022
dzhang32 added a commit that referenced this issue Feb 15, 2022
- Create curves for junctions ourselves using grid:::calcControlPoints. The biggest advantage of this is to plot a geom_label on the middle point of curve.
dzhang32 added a commit that referenced this issue Feb 15, 2022
- make internal for use in geom_junction_label_repel
@dzhang32
Copy link
Owner

I've added both the exon number helper function and method for labelling junctions.

For the junction label, I went for a separate label geom that inherits from ggrepel::geom_label_repel. The reason I chose this option over a label parameter inside geom_junction was because I think this approach would give users more flexibility in deciding the label aesthetics. The downside of this approach is that costs more computationally (as we have to generate junction curves twice - once for junctions, another for junction labels), but there is scope to optimise my implementation if speed does become a bottleneck.

One thing I was considering is whether to provide a helper function (pretty much what is used internally by geom_junction_label_repel) to obtain the midpoints of junction curves. This would enable users to e.g. use ggplot2::geom_text or ggplot2::geom_label instead of ggrepel::geom_label_repel if they desired. Out of simplicity, I've held back on this for now with the idea to return to it if any users requested - would you find this helper useful?

Let me know if you have any additional thoughts regarding the above - thank you!

@gpertea
Copy link
Author

gpertea commented Feb 20, 2022

Thank you for the detailed work on this and the documentation, the examples are great! A lot of work, really appreciated.

@gpertea gpertea closed this as completed Feb 20, 2022
dzhang32 added a commit that referenced this issue Feb 25, 2022
- in prep for allowing shorten_gaps() to work with utrs/CDS, we need to make sure shorten_gaps() will check type column and also take user inputted type when available
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants