Skip to content

Spectrum plot optimization#45

Merged
singjc merged 15 commits intoOpenMS:mainfrom
axelwalter:spectrum-plot-optimization
Oct 18, 2024
Merged

Spectrum plot optimization#45
singjc merged 15 commits intoOpenMS:mainfrom
axelwalter:spectrum-plot-optimization

Conversation

@axelwalter
Copy link
Collaborator

@axelwalter axelwalter commented Oct 9, 2024

User description

Plotting each peak as a single vertical line turned out to be slow with interactive plots (Ploty and Bokeh) since each peak is represented by a separate trace.

Here, I modified the spectrum plotting to group peaks by color (according to priorities 1. custom peak color, 2. group by and 3. ion annotation) and plot them together as a single line (connected on the x-axis). This means each peak is described by three data points in the line (y=0, y=intensity, y=0). Connections on the x-axis are hidden by a h-line.

This reduced the number of traces down to a few, resulting in much better performance (tested on real spectra with thousands of peaks) and instant plot generation.

Main changes:

  • switch from VLinePlot to LinePlot
  • data conversion to three data points per peak (using np.repeat)
  • adaptations to creating ColorGenerator (one color per peak group)
  • adaptations to tooltips (boolean, a fixed tooltip per trace (e.g. for chromatograms) or individual data points within the trace (required for peak plotted together in a line)
  • minor update to the spectrum example data to be more realistic (peaks belonging to one isotope pattern should have the same annotation)
  • adaptations to x axis line method: control line color, width and opacity

Due to these changes I set binning to False by default, since plotting large number of peaks is not an issue any more.


PR Type

enhancement


Description

  • Optimized spectrum plotting by switching from vertical line plots to line plots, significantly improving performance for large spectra.
  • Grouped peaks by color and plotted them as connected lines, reducing the number of traces and enhancing plot generation speed.
  • Enhanced tooltip handling by adding support for fixed tooltips and custom hover data across different plotting backends.
  • Improved color generation logic for peaks and annotations, prioritizing custom colors and ion annotations.
  • Added annotation support for Plotly, Bokeh, and Matplotlib plots, allowing for more informative visualizations.
  • Enhanced x-axis line plotting with customizable color, width, and opacity parameters.

Changes walkthrough 📝

Relevant files
Enhancement
_core.py
Optimize spectrum plotting with line plots and enhanced tooltips

pyopenms_viz/_core.py

  • Optimized spectrum plotting by grouping peaks and using line plots.
  • Added methods for converting data for line plots and generating
    tooltips.
  • Enhanced color generation logic for peaks and annotations.
  • Improved tooltip handling with fixed and custom hover data.
  • +115/-48
    core.py
    Enhance Plotly plotting with annotations and tooltips       

    pyopenms_viz/_plotly/core.py

  • Added support for fixed tooltips and custom hover data.
  • Implemented annotation addition for Plotly plots.
  • Enhanced x-axis line plotting with customizable parameters.
  • +53/-43 
    core.py
    Improve Bokeh plotting with annotations and tooltips         

    pyopenms_viz/_bokeh/core.py

  • Added annotation support for Bokeh plots.
  • Improved tooltip handling with fixed tooltips.
  • Enhanced x-axis line plotting with customizable parameters.
  • +30/-31 
    core.py
    Enhance Matplotlib plotting with annotations and x-axis customization

    pyopenms_viz/_matplotlib/core.py

  • Added annotation support for Matplotlib plots.
  • Enhanced x-axis line plotting with customizable parameters.
  • +21/-24 
    Additional files (token-limit)
    Spectrum.ipynb
    ...                                                                                                           

    nbs/Spectrum.ipynb

    ...

    +70/-445

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    - one vline plot per peak results in bad performance with larger spectra
    - here, peaks with the same color are grouped and plotted together as one line
    - adaptations to _get_colors
    - highest: custom peak_color
    - second: ion_annotation
    - lowest: by column
    - add line width and color as parameters
    @qodo-code-review qodo-code-review bot added the enhancement New feature or request label Oct 9, 2024
    @qodo-code-review
    Copy link
    Contributor

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Performance Optimization
    The change from VLinePlot to LinePlot for spectrum plotting needs to be thoroughly tested to ensure it actually improves performance for large spectra as claimed.

    Data Manipulation
    The new convert_for_line_plots method significantly changes how spectrum data is processed. This needs careful review to ensure it doesn't introduce any errors or unexpected behavior.

    Tooltip Handling
    The _add_tooltips method has been significantly modified to handle fixed tooltips for traces. This change needs to be tested across different scenarios to ensure correct tooltip behavior.

    @qodo-code-review
    Copy link
    Contributor

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Performance
    Optimize color mapping creation for better performance

    Consider using a more efficient data structure for color mapping, such as a
    dictionary comprehension, instead of the current list comprehension approach.

    pyopenms_viz/_core.py [948]

    -color_map = {uniques[i]: colors[i] for i in range(len(colors))}
    +color_map = dict(zip(uniques, colors))
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: The suggestion to use dict(zip(uniques, colors)) is a more efficient and Pythonic way to create a dictionary from two lists. This change improves readability and performance, especially for large datasets.

    8
    Utilize numpy's vectorized operations for improved performance in data preparation

    Consider using numpy's vectorized operations for creating line plot data instead of
    using Python's repeat function, which could be slower for large datasets.

    pyopenms_viz/_core.py [1035-1037]

    -x = repeat(x, 3)
    -y = repeat(y, 3)
    +x = np.repeat(x, 3)
    +y = np.repeat(y, 3)
     y[::3] = y[2::3] = 0
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: The suggestion to use numpy's np.repeat instead of Python's repeat function can lead to performance improvements due to numpy's optimized operations for large datasets. However, the impact may vary depending on the dataset size.

    7
    Optimize annotation addition process for improved performance

    Consider using a more efficient approach for adding annotations, such as creating
    all Label objects at once and adding them to the figure in a single operation.

    pyopenms_viz/_bokeh/core.py [226-243]

    -for text, x, y, color in zip(ann_texts, ann_xs, ann_ys, ann_colors):
    -    if text is not nan and text != "" and text != "nan":
    -        if is_latex_formatted(text):
    -            text = text.replace("\n", r" \\\ ")
    -            text = r'$$\displaylines{{{}}}$$'.format(text)
    -        label = Label(
    -            x=x,
    -            y=y,
    -            text=text,
    -            text_font_size="13pt",
    -            text_color=color,
    -            x_offset=1,
    -            y_offset=0,
    -        )
    -        fig.add_layout(label)
    +labels = [
    +    Label(
    +        x=x, y=y,
    +        text=r'$$\displaylines{{{}}}$$'.format(text.replace("\n", r" \\\ ")) if is_latex_formatted(text) else text,
    +        text_font_size="13pt", text_color=color,
    +        x_offset=1, y_offset=0
    +    )
    +    for text, x, y, color in zip(ann_texts, ann_xs, ann_ys, ann_colors)
    +    if text is not nan and text != "" and text != "nan"
    +]
    +fig.add_layout(*labels)
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: The suggestion to create all Label objects at once and add them in a single operation can improve performance by reducing the number of calls to fig.add_layout. This is beneficial for plots with a large number of annotations.

    7
    Optimize trace updating for improved performance in tooltip handling

    Consider using a more efficient approach to update traces, such as list
    comprehension or map function, instead of the current for loop when
    fixed_tooltip_for_trace is False.

    pyopenms_viz/_plotly/core.py [134-141]

    -counter = 0
    -for i in range(len(fig.data)):
    -    l = len(fig.data[i].x)
    +lengths = [len(trace.x) for trace in fig.data]
    +cumulative_lengths = [0] + list(accumulate(lengths))
    +for i, (start, end) in enumerate(zip(cumulative_lengths, cumulative_lengths[1:])):
         fig.data[i].update(
             hovertemplate=tooltips,
    -        customdata = custom_hover_data[counter:counter+l, :]
    +        customdata=custom_hover_data[start:end, :]
         )
    -    counter += l
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: The suggestion provides a more efficient way to update traces using list comprehensions and accumulate, which can improve performance. However, the performance gain might be marginal depending on the number of traces.

    6
    Optimize grouping operation for potential performance improvement, especially with large datasets

    Consider using itertools.groupby() instead of pandas.DataFrame.groupby() for
    potentially better performance, especially for large datasets.

    pyopenms_viz/_matplotlib/core.py [280-283]

    -for group, df in data.groupby(by, sort=False):
    +from itertools import groupby
    +from operator import itemgetter
    +
    +sorted_data = sorted(data.itertuples(index=False), key=itemgetter(data.columns.get_loc(by)))
    +for group, group_data in groupby(sorted_data, key=itemgetter(data.columns.get_loc(by))):
    +    group_df = pd.DataFrame(group_data)
         (line,) = ax.plot(
    -        df[x],
    -        df[y],
    +        group_df[x],
    +        group_df[y],
    • Apply this suggestion
    Suggestion importance[1-10]: 3

    Why: While using itertools.groupby() could offer performance benefits, it requires converting data to tuples and sorting, which may not be practical or efficient compared to pandas' built-in groupby for DataFrame operations.

    3
    Possible issue
    Add input validation to prevent potential errors from mismatched list lengths

    Consider adding error handling for cases where the input lists (ann_texts, ann_xs,
    ann_ys, ann_colors) have different lengths to prevent potential IndexError.

    pyopenms_viz/_matplotlib/core.py [233-249]

     def _add_annotations(
         self,
         fig,
         ann_texts: list[list[str]],
         ann_xs: list[float],
         ann_ys: list[float],
         ann_colors: list[str],
     ):
    +    if not all(len(l) == len(ann_texts) for l in [ann_xs, ann_ys, ann_colors]):
    +        raise ValueError("All input lists must have the same length")
         for text, x, y, color in zip(ann_texts, ann_xs, ann_ys, ann_colors):
             fig.annotate(
                 text,
                 xy=(x, y),
                 xytext=(3, 0),
                 textcoords="offset points",
                 fontsize=8,
                 color=color,
             )
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: This suggestion addresses a potential bug by ensuring that input lists have the same length, preventing runtime errors. It significantly improves the robustness of the code.

    8
    Enhancement
    Improve code readability and potentially performance by using a dictionary comprehension instead of multiple zip() calls

    Consider using a dictionary comprehension instead of zip() for better readability
    and potential performance improvement when creating the annotation loop.

    pyopenms_viz/_matplotlib/core.py [241-249]

    -for text, x, y, color in zip(ann_texts, ann_xs, ann_ys, ann_colors):
    +for annotation in [dict(text=t, x=x, y=y, color=c) for t, x, y, c in zip(ann_texts, ann_xs, ann_ys, ann_colors)]:
         fig.annotate(
    -        text,
    -        xy=(x, y),
    +        annotation['text'],
    +        xy=(annotation['x'], annotation['y']),
             xytext=(3, 0),
             textcoords="offset points",
             fontsize=8,
    -        color=color,
    +        color=annotation['color'],
         )
    • Apply this suggestion
    Suggestion importance[1-10]: 4

    Why: The suggestion to use a dictionary comprehension for readability is valid, but it does not significantly enhance performance or readability compared to the existing zip() usage. The improvement is marginal.

    4

    💡 Need additional feedback ? start a PR chat

    Copy link
    Collaborator

    @singjc singjc left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thanks for the update to spectrum plotting! It's super fast now! 🚀
    Looks good to merge to me.

    @singjc singjc merged commit e440806 into OpenMS:main Oct 18, 2024
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants