Spectrum plot optimization by axelwalter · Pull Request #45 · OpenMS/pyopenms_viz

axelwalter · 2024-10-09T07:54:21Z

User description

Plotting each peak as a single vertical line turned out to be slow with interactive plots (Ploty and Bokeh) since each peak is represented by a separate trace.

Here, I modified the spectrum plotting to group peaks by color (according to priorities 1. custom peak color, 2. group by and 3. ion annotation) and plot them together as a single line (connected on the x-axis). This means each peak is described by three data points in the line (y=0, y=intensity, y=0). Connections on the x-axis are hidden by a h-line.

This reduced the number of traces down to a few, resulting in much better performance (tested on real spectra with thousands of peaks) and instant plot generation.

Main changes:

switch from VLinePlot to LinePlot
data conversion to three data points per peak (using np.repeat)
adaptations to creating ColorGenerator (one color per peak group)
adaptations to tooltips (boolean, a fixed tooltip per trace (e.g. for chromatograms) or individual data points within the trace (required for peak plotted together in a line)
minor update to the spectrum example data to be more realistic (peaks belonging to one isotope pattern should have the same annotation)
adaptations to x axis line method: control line color, width and opacity

Due to these changes I set binning to False by default, since plotting large number of peaks is not an issue any more.

PR Type

enhancement

Description

Optimized spectrum plotting by switching from vertical line plots to line plots, significantly improving performance for large spectra.
Grouped peaks by color and plotted them as connected lines, reducing the number of traces and enhancing plot generation speed.
Enhanced tooltip handling by adding support for fixed tooltips and custom hover data across different plotting backends.
Improved color generation logic for peaks and annotations, prioritizing custom colors and ion annotations.
Added annotation support for Plotly, Bokeh, and Matplotlib plots, allowing for more informative visualizations.
Enhanced x-axis line plotting with customizable color, width, and opacity parameters.

Changes walkthrough 📝

Relevant files

Enhancement

_core.py `Optimize spectrum plotting with line plots and enhanced tooltips` pyopenms_viz/_core.py Optimized spectrum plotting by grouping peaks and using line plots. Added methods for converting data for line plots and generating tooltips. Enhanced color generation logic for peaks and annotations. Improved tooltip handling with fixed and custom hover data.	+115/-48
core.py `Enhance Plotly plotting with annotations and tooltips` pyopenms_viz/_plotly/core.py Added support for fixed tooltips and custom hover data. Implemented annotation addition for Plotly plots. Enhanced x-axis line plotting with customizable parameters.	+53/-43
core.py `Improve Bokeh plotting with annotations and tooltips` pyopenms_viz/_bokeh/core.py Added annotation support for Bokeh plots. Improved tooltip handling with fixed tooltips. Enhanced x-axis line plotting with customizable parameters.	+30/-31
core.py `Enhance Matplotlib plotting with annotations and x-axis customization` pyopenms_viz/_matplotlib/core.py Added annotation support for Matplotlib plots. Enhanced x-axis line plotting with customizable parameters.	+21/-24

Additional files (token-limit)

Spectrum.ipynb `...` nbs/Spectrum.ipynb ...	+70/-445

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

- one vline plot per peak results in bad performance with larger spectra - here, peaks with the same color are grouped and plotted together as one line - adaptations to _get_colors

- highest: custom peak_color - second: ion_annotation - lowest: by column

- add line width and color as parameters

qodo-code-review · 2024-10-09T07:54:53Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Performance Optimization The change from VLinePlot to LinePlot for spectrum plotting needs to be thoroughly tested to ensure it actually improves performance for large spectra as claimed. Data Manipulation The new convert_for_line_plots method significantly changes how spectrum data is processed. This needs careful review to ensure it doesn't introduce any errors or unexpected behavior. Tooltip Handling The _add_tooltips method has been significantly modified to handle fixed tooltips for traces. This change needs to be tested across different scenarios to ensure correct tooltip behavior.

qodo-code-review · 2024-10-09T07:55:27Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Score
Performance	Optimize color mapping creation for better performance Consider using a more efficient data structure for color mapping, such as a dictionary comprehension, instead of the current list comprehension approach. pyopenms_viz/_core.py [948] -color_map = {uniques[i]: colors[i] for i in range(len(colors))} +color_map = dict(zip(uniques, colors)) Apply this suggestion Suggestion importance[1-10]: 8 Why: The suggestion to use `dict(zip(uniques, colors))` is a more efficient and Pythonic way to create a dictionary from two lists. This change improves readability and performance, especially for large datasets.	8
	Utilize numpy's vectorized operations for improved performance in data preparation Consider using numpy's vectorized operations for creating line plot data instead of using Python's repeat function, which could be slower for large datasets. pyopenms_viz/_core.py [1035-1037] -x = repeat(x, 3) -y = repeat(y, 3) +x = np.repeat(x, 3) +y = np.repeat(y, 3) y[::3] = y[2::3] = 0 Apply this suggestion Suggestion importance[1-10]: 7 Why: The suggestion to use `numpy`'s `np.repeat` instead of Python's `repeat` function can lead to performance improvements due to `numpy`'s optimized operations for large datasets. However, the impact may vary depending on the dataset size.	7
	Optimize annotation addition process for improved performance Consider using a more efficient approach for adding annotations, such as creating all Label objects at once and adding them to the figure in a single operation. pyopenms_viz/_bokeh/core.py [226-243] -for text, x, y, color in zip(ann_texts, ann_xs, ann_ys, ann_colors): - if text is not nan and text != "" and text != "nan": - if is_latex_formatted(text): - text = text.replace("\n", r" \\\ ") - text = r'$$\displaylines{{{}}}$$'.format(text) - label = Label( - x=x, - y=y, - text=text, - text_font_size="13pt", - text_color=color, - x_offset=1, - y_offset=0, - ) - fig.add_layout(label) +labels = [ + Label( + x=x, y=y, + text=r'$$\displaylines{{{}}}$$'.format(text.replace("\n", r" \\\ ")) if is_latex_formatted(text) else text, + text_font_size="13pt", text_color=color, + x_offset=1, y_offset=0 + ) + for text, x, y, color in zip(ann_texts, ann_xs, ann_ys, ann_colors) + if text is not nan and text != "" and text != "nan" +] +fig.add_layout(labels) Apply this suggestion* Suggestion importance[1-10]: 7 Why: The suggestion to create all `Label` objects at once and add them in a single operation can improve performance by reducing the number of calls to `fig.add_layout`. This is beneficial for plots with a large number of annotations.	7
	Optimize trace updating for improved performance in tooltip handling Consider using a more efficient approach to update traces, such as list comprehension or map function, instead of the current for loop when fixed_tooltip_for_trace is False. pyopenms_viz/_plotly/core.py [134-141] -counter = 0 -for i in range(len(fig.data)): - l = len(fig.data[i].x) +lengths = [len(trace.x) for trace in fig.data] +cumulative_lengths = [0] + list(accumulate(lengths)) +for i, (start, end) in enumerate(zip(cumulative_lengths, cumulative_lengths[1:])): fig.data[i].update( hovertemplate=tooltips, - customdata = custom_hover_data[counter:counter+l, :] + customdata=custom_hover_data[start:end, :] ) - counter += l Apply this suggestion Suggestion importance[1-10]: 6 Why: The suggestion provides a more efficient way to update traces using list comprehensions and `accumulate`, which can improve performance. However, the performance gain might be marginal depending on the number of traces.	6
	Optimize grouping operation for potential performance improvement, especially with large datasets Consider using `itertools.groupby()` instead of `pandas.DataFrame.groupby()` for potentially better performance, especially for large datasets. pyopenms_viz/_matplotlib/core.py [280-283] -for group, df in data.groupby(by, sort=False): +from itertools import groupby +from operator import itemgetter + +sorted_data = sorted(data.itertuples(index=False), key=itemgetter(data.columns.get_loc(by))) +for group, group_data in groupby(sorted_data, key=itemgetter(data.columns.get_loc(by))): + group_df = pd.DataFrame(group_data) (line,) = ax.plot( - df[x], - df[y], + group_df[x], + group_df[y], Apply this suggestion Suggestion importance[1-10]: 3 Why: While using itertools.groupby() could offer performance benefits, it requires converting data to tuples and sorting, which may not be practical or efficient compared to pandas' built-in groupby for DataFrame operations.	3
Possible issue	Add input validation to prevent potential errors from mismatched list lengths Consider adding error handling for cases where the input lists (ann_texts, ann_xs, ann_ys, ann_colors) have different lengths to prevent potential IndexError. pyopenms_viz/_matplotlib/core.py [233-249] def _add_annotations( self, fig, ann_texts: list[list[str]], ann_xs: list[float], ann_ys: list[float], ann_colors: list[str], ): + if not all(len(l) == len(ann_texts) for l in [ann_xs, ann_ys, ann_colors]): + raise ValueError("All input lists must have the same length") for text, x, y, color in zip(ann_texts, ann_xs, ann_ys, ann_colors): fig.annotate( text, xy=(x, y), xytext=(3, 0), textcoords="offset points", fontsize=8, color=color, ) Apply this suggestion Suggestion importance[1-10]: 8 Why: This suggestion addresses a potential bug by ensuring that input lists have the same length, preventing runtime errors. It significantly improves the robustness of the code.	8
Enhancement	Improve code readability and potentially performance by using a dictionary comprehension instead of multiple zip() calls Consider using a dictionary comprehension instead of zip() for better readability and potential performance improvement when creating the annotation loop. pyopenms_viz/_matplotlib/core.py [241-249] -for text, x, y, color in zip(ann_texts, ann_xs, ann_ys, ann_colors): +for annotation in [dict(text=t, x=x, y=y, color=c) for t, x, y, c in zip(ann_texts, ann_xs, ann_ys, ann_colors)]: fig.annotate( - text, - xy=(x, y), + annotation['text'], + xy=(annotation['x'], annotation['y']), xytext=(3, 0), textcoords="offset points", fontsize=8, - color=color, + color=annotation['color'], ) Apply this suggestion Suggestion importance[1-10]: 4 Why: The suggestion to use a dictionary comprehension for readability is valid, but it does not significantly enhance performance or readability compared to the existing zip() usage. The improvement is marginal.	4

💡 Need additional feedback ? start a PR chat

- use figure returned by plot method - set show_plot to False

…lot-optimization

singjc

Thanks for the update to spectrum plotting! It's super fast now! 🚀
Looks good to merge to me.

axelwalter added 11 commits October 7, 2024 16:02

spectrum plot with lineplots instead of vline

db2770d

- one vline plot per peak results in bad performance with larger spectra - here, peaks with the same color are grouped and plotted together as one line - adaptations to _get_colors

wrap data conversion for line plot in method

06bbdf0

get spectrum annotations prior to line conversion

6753983

peak color priorities

5d735bb

- highest: custom peak_color - second: ion_annotation - lowest: by column

update plot_x_axis_line

5622af3

- add line width and color as parameters

correct Plotly tooltips with spectrum line plots

10efa72

hide legend if not grouped by

9db5398

update example data

509a059

correctly passing argument for Plotly tooltip method

4fdc0dd

update notebook

c372bda

add x axis line opacity parameter

38afadf

qodo-code-review bot added the enhancement New feature or request label Oct 9, 2024

qodo-code-review bot added the Review effort [1-5]: 3 label Oct 9, 2024

axelwalter and others added 4 commits October 9, 2024 10:54

update spectrum doc parameter binning default

b5c40b4

update Spectrum notebook binning example

c5514dd

- use figure returned by plot method - set show_plot to False

with correct notebook output

3135fd0

Merge branch 'main' of github.com:singjc/pyopenms_viz into spectrum-p…

4f6819d

…lot-optimization

singjc approved these changes Oct 18, 2024

View reviewed changes

singjc merged commit e440806 into OpenMS:main Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spectrum plot optimization#45

Spectrum plot optimization#45
singjc merged 15 commits intoOpenMS:mainfrom
axelwalter:spectrum-plot-optimization

axelwalter commented Oct 9, 2024 •

edited

Loading

Uh oh!

qodo-code-review bot commented Oct 9, 2024

Uh oh!

qodo-code-review bot commented Oct 9, 2024

Uh oh!

singjc left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

axelwalter commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Changes walkthrough 📝

Uh oh!

qodo-code-review bot commented Oct 9, 2024

PR Reviewer Guide 🔍

Uh oh!

qodo-code-review bot commented Oct 9, 2024

PR Code Suggestions ✨

Uh oh!

singjc left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

axelwalter commented Oct 9, 2024 •

edited

Loading