Conversation
WalkthroughThis pull request introduces two new entries in the Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant M as mzML Processor
participant L as Logger
participant O as Output Writer
U->>M: Initiate mzML statistics generation
M->>M: Read and process spectra (transform_mzml_spectrum)
M->>L: Log detailed precursor, error, and progress info
M->>O: Write batch data to Parquet file
O-->>M: Confirm successful write
M->>L: Log success message
Possibly related PRs
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
quantmsutils/sdrf/check_samplesheet.py (1)
184-184: 💡 Verification agent🧩 Analysis chain
Consider addressing the TODO comment.
There's a TODO comment about validating the expdesign file, but the function
check_expdesignalready exists and is called below. It appears the TODO might be stale.
🏁 Script executed:
#!/bin/bash # Check if there are issues or tasks related to the validation of expdesign files grep -r "TODO.*expdesign" --include="*.py" .Length of output: 122
Stale TODO Comment: Remove or Update Expdesign Validation Note
The TODO inquantmsutils/sdrf/check_samplesheet.py(line 184) appears to be redundant since thecheck_expdesignfunction is already implemented and used below. Please remove (or update) this comment to avoid confusion.
🧹 Nitpick comments (4)
quantmsutils/diann/dianncfg.py (1)
86-89: Consider replacing remaining print statements with logging.For consistency, consider replacing the remaining print statements with logging calls, similar to what was done at lines 65-68.
- print( - "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " - + mod - ) + logging.info( + "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " + + mod + )And similarly for the print statement at lines 101-104.
Also applies to: 101-104
README.md (1)
35-48: Great addition of documentation for mzML statistics.The detailed explanation of the mzML statistics functionality and output format is very helpful for users.
Small formatting suggestion: ensure consistent spacing in bullet points for better readability.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
quantmsutils/mzml/mzml_statistics.py (2)
13-13: Remove unused import.The import
import pyopenms as omsappears to be unused since pyopenms is already imported as MzMLFile.-import pyopenms as oms🧰 Tools
🪛 Ruff (0.8.2)
13-13:
pyopenmsimported but unusedRemove unused import:
pyopenms(F401)
109-133: Good handling of multiple precursors.The code now properly handles spectra with multiple precursors, which is important for accurate data processing. However, the manual index incrementing could be simplified using enumerate.
- index = 0 - for precursor in spectrum.getPrecursors(): + for index, precursor in enumerate(spectrum.getPrecursors()): logging.info( "Precursor charge: %s, precursor mz: %s", precursor.getCharge(), precursor.getMZ(), ) charge_state = precursor.getCharge() exp_mz = precursor.getMZ() intensity = precursor.getIntensity() precursors.append( { "charge": charge_state, "mz": exp_mz, "intensity": intensity, "index": index, } ) - index += 1🧰 Tools
🪛 Ruff (0.8.2)
132-132: Use
enumerate()for index variableindexinforloop(SIM113)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.gitignore(1 hunks)README.md(1 hunks)pyproject.toml(1 hunks)quantmsutils/diann/diann2mztab.py(23 hunks)quantmsutils/diann/dianncfg.py(2 hunks)quantmsutils/mzml/mzml_statistics.py(11 hunks)quantmsutils/psm/psm_conversion.py(2 hunks)quantmsutils/sdrf/check_samplesheet.py(2 hunks)quantmsutils/sdrf/extract_sample.py(1 hunks)quantmsutils/utils/constants.py(1 hunks)recipe/meta.yaml(1 hunks)tests/test_commands.py(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 154-154: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 154-154: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
13-13: pyopenms imported but unused
Remove unused import: pyopenms
(F401)
132-132: Use enumerate() for index variable index in for loop
(SIM113)
🪛 LanguageTool
README.md
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
🔇 Additional comments (33)
.gitignore (1)
169-170: New Ignore Rule for.qodoFiles
The addition of.qodoto the.gitignorefile is clear and appropriate to prevent tracking of these files. Please verify that.qodofiles are indeed generated by a build/tool process and that no other related file extensions need to be ignored.pyproject.toml (1)
6-6: Version Bump Updated to 0.0.20
The version upgrade from"0.0.19"to"0.0.20"is correctly applied. This is consistent with the version bump in related files. Make sure that any CI/CD pipeline or documentation referencing the version is updated accordingly.recipe/meta.yaml (1)
4-4: Meta File Version Update
The package version inrecipe/meta.yamlhas been updated to"0.0.20", which aligns perfectly with the update inpyproject.toml. Confirm that downstream processes (e.g., packaging and deployment jobs) are aware of this change.quantmsutils/sdrf/extract_sample.py (1)
1-1: Good addition of logging facilities.Adding logging configuration improves observability and will help with debugging.
Also applies to: 7-8
quantmsutils/utils/constants.py (1)
14-15: New constants added appropriately.The new PRECURSORS and INTENSITY constants follow the consistent naming pattern of the existing constants.
quantmsutils/psm/psm_conversion.py (1)
1-1: Good addition of logging facilities.Adding logging configuration is consistent with other files and enhances debugging capabilities.
Also applies to: 37-38
quantmsutils/sdrf/check_samplesheet.py (1)
5-5: Good addition of logging facilities.The logging setup is consistent with the implementation in other files and will improve debugging capabilities.
Also applies to: 14-15
quantmsutils/diann/dianncfg.py (2)
6-6: Good addition of logging functionality.Adding logging capabilities is a good practice for better error handling and traceability.
Also applies to: 13-14
65-68: Consistent use of logging for warnings.Good job replacing the print statement with logging.info for warnings, which improves observability.
tests/test_commands.py (1)
1-2: Good use of Path for better file handling.Using pathlib.Path for file paths is a good practice and more maintainable than string concatenation.
Also applies to: 7-7
quantmsutils/mzml/mzml_statistics.py (7)
1-1: Good addition of logging functionality.Adding logging capabilities improves error handling and observability throughout the codebase.
Also applies to: 32-33
73-86: Improved docstring for better code documentation.The enhanced docstring for consumeSpectrum method adds clarity about its functionality and parameters.
159-159: Added precursors field to data structure.Good addition of the PRECURSORS field to both MS level 2 and level 1 spectra data structures.
Also applies to: 176-176
232-232: Improved error handling with better context.Replacing print statements with logging.error and including the file path in error messages improves debugging capabilities.
Also applies to: 330-330
380-395: Well-structured schema for precursor data.The new schema for PRECURSORS with a structured list is well-designed and provides good type safety.
420-421: Cleaner path construction.Using Path.with_name() for creating output paths is more maintainable than string concatenation.
439-439: Added success logging message.Good practice to log successful completion of processing, which helps with monitoring and troubleshooting.
quantmsutils/diann/diann2mztab.py (16)
51-59: Method Signature Update indiann2mztab:
The function signature now includes a trailing comma after the last parameter. This change helps keep diffs clean and maintains consistency in parameter formatting.
115-121: Lambda Function Formatting for Peptide Conversion:
The lambda used for converting peptide sequences in the MSstats conversion block has been reformatted into a multi‐line expression. The logic remains unchanged while readability has been significantly improved.
236-236: List Slicing Formatting in Experimental Design Parsing:
The slicing expression in the creation ofs_tablenow uses an extra space for clarity (i.e.data[empty_row + 1 :][1:]). There is no functional change, only improved readability.
276-301: Conditional Logic Reformat in Mass Calculation:
The conditional block incompute_mass_modified_peptide(checking for allowed amino acids) has been reflowed over multiple lines. This reformat enhances clarity without affecting functionality.
381-387: Method Signature Update inconvert_to_mztab:
The signature forconvert_to_mztabnow includes a trailing comma after the last parameter, ensuring consistency with other function declarations.
467-471: Update to Remain Columns List:
The list of columns (remain_cols) now explicitly includes"Global.PG.Q.Value"after"Precursor.Quantity". This addition is clearly formatted and ensures that the report is filtered using all expected columns.
496-499: Precursor m/z Calculation Formatting:
The calculation for"Calculate.Precursor.Mz"has been re-indented—wrapping the addition within parentheses improves readability while leaving the underlying logic untouched.
599-609: Concatenation Formatting for mzTab Settings:
The string concatenation for software setting entries (fragment and precursor mass tolerances) in the MTD sub‐table is now split over multiple lines. This change increases clarity with no alteration in functionality.
787-794: Protein Abundance Aggregation Refinement:
Within the loop aggregating protein abundance data by study variable, the computation and subsequent drop of columns is now formatted in a more streamlined manner. This refactor improves readability and confirms that the mean is calculated correctly.
762-764: Improved Concatenation in PRH Construction:
The call topd.concatto merge protein detail rows with the existing PRH dataframe has been re-indented and reformatted for better clarity and maintainability.
883-889: Method Signature Update inmztab_peh:
The updated definition formztab_pehnow shows each parameter on its own line with a trailing comma. This formatting change enhances consistency across the codebase.
1059-1067: Refactored Merge for Decoy Incorporation in PEH:
The merge block that adds the"Decoy"column intoout_mztab_pehhas been reformatted over multiple lines. The updated formatting enhances readability while preserving the intended functionality.
1137-1145: DataFrame Column Formatting in PSH Construction:
Within the PSH sub‐table, the conversion of the"opt_global_spectrum_reference"column—by prepending"scan="and ensuring data is cast to string—is now clearly separated. Also, the conversion of the RT values (dividing by 60 and ensuring the group's"RT"is of type float64) is now explicit, promoting consistency across DIA‐NN versions.
1291-1295: Spectra Reference String Reformatting in PSH:
The lambda that constructs the spectra reference string (concatenating"ms_run[{}]:"with the original reference) has been broken into two lines. This reformat makes the string assembly clearer without modifying its behavior.
1592-1594: Method Signature Update incalculate_protein_coverages:
The function header now explicitly formats its parameters (with a trailing comma) across multiple lines. This small stylistic update improves consistency with other parts of the code.
1580-1582: Enhanced Readability in Coverage Calculation:
The merging of overlapping intervals during the protein coverage calculation (incalculate_coverage) has been reformatted for clarity, making the computation easier to follow.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
quantmsutils/mzml/mzml_statistics.py (3)
32-33: Consider adjusting logging level for productionSetting the logging level to DEBUG might generate excessive output in a production environment. Consider using INFO as the default level and allowing it to be configurable based on the environment.
-logging.basicConfig(format="%(asctime)s [%(funcName)s] - %(message)s", level=logging.DEBUG) +logging.basicConfig(format="%(asctime)s [%(funcName)s] - %(message)s", level=logging.INFO)
110-132: Use enumerate() instead of manual index incrementingWhen iterating through a collection and tracking an index, it's more Pythonic to use
enumerate()rather than manually incrementing an index variable.- index = 0 - for precursor in spectrum.getPrecursors(): + for index, precursor in enumerate(spectrum.getPrecursors()): logging.info( "Precursor charge: %s, precursor mz: %s", precursor.getCharge(), precursor.getMZ(), ) charge_state = precursor.getCharge() # TODO: Review by @timo and @julianus exp_mz = precursor.getMZ() # TODO: Review by @timo and @julianus intensity = precursor.getIntensity() # TODO: Review by @timo and @julianus rt = spectrum.getRT() # TODO: Review by @timo and @julianus precursors.append( { "charge": charge_state, "mz": exp_mz, "intensity": intensity, "rt": rt, "index": index, } ) - index += 1🧰 Tools
🪛 Ruff (0.8.2)
131-131: Use
enumerate()for index variableindexinforloop(SIM113)
113-117: Use logger instance instead of logging module directlyYou've created a logger instance but are using the logging module directly here. Use the logger instance for consistency.
- logging.info( + logger.info( "Precursor charge: %s, precursor mz: %s", precursor.getCharge(), precursor.getMZ(), )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
quantmsutils/__init__.py(1 hunks)quantmsutils/diann/dianncfg.py(2 hunks)quantmsutils/mzml/mzml_statistics.py(11 hunks)quantmsutils/psm/psm_conversion.py(2 hunks)quantmsutils/sdrf/check_samplesheet.py(2 hunks)quantmsutils/sdrf/extract_sample.py(1 hunks)quantmsutils/utils/constants.py(1 hunks)tests/test_commands.py(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- quantmsutils/init.py
🚧 Files skipped from review as they are similar to previous changes (6)
- quantmsutils/sdrf/extract_sample.py
- quantmsutils/psm/psm_conversion.py
- quantmsutils/utils/constants.py
- quantmsutils/diann/dianncfg.py
- tests/test_commands.py
- quantmsutils/sdrf/check_samplesheet.py
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
131-131: Use enumerate() for index variable index in for loop
(SIM113)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (10)
quantmsutils/mzml/mzml_statistics.py (10)
28-29: Added new constants for schema consistencyAdding PRECURSORS and INTENSITY constants is a good improvement for maintaining schema definition consistency throughout the codebase.
74-87: Improved documentation with clear docstring formatThe enhanced docstring with parameters section provides much better context for the method's functionality.
157-157: Added new fields to row_data for improved MS2 spectrum informationThe addition of intensity and precursors data to row_data provides more comprehensive information about MS2 spectra.
Also applies to: 159-159
176-176: Consistent handling for ms_level 1 spectraAdding the PRECURSORS field with a None value for MS level 1 spectra ensures schema consistency across all spectra records.
232-232: Improved error handling with loggingUsing the logger for error messages with file path context is a significant improvement for debugging and traceability.
292-308: Well-structured schema definition for precursors dataThe nested schema structure for precursors is well-defined and properly includes all necessary fields with appropriate data types.
346-346: Consistent error handling with proper contextUsing logging consistently for error handling with file path information improves the diagnostic capabilities of the code.
397-413: Schema consistency between mzML and Bruker .d file handlingMaintaining consistent schema structures between different file formats ensures uniform data representation across the system.
437-438: Improved path handling with Path.with_name()Using Path.with_name() is cleaner and more robust than string manipulation for constructing output paths.
456-457: Added success message for better operational visibilityAdding a success log message after processing provides valuable operational feedback and makes monitoring easier.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
quantmsutils/mzml/mzml_statistics.py (1)
105-128: 🛠️ Refactor suggestionRemove TODO comments and use enumerate for iteration.
The code contains several TODO comments that should be addressed, and the manual index management can be simplified.
- precursors = [] - # capture if more than one precursor - index = 0 - for precursor in spectrum.getPrecursors(): + precursors = [] + # capture if more than one precursor + for index, precursor in enumerate(spectrum.getPrecursors()): logging.info( "Precursor charge: %s, precursor mz: %s", precursor.getCharge(), precursor.getMZ(), ) - charge_state = precursor.getCharge() # TODO: Review by @timo and @julianus - exp_mz = precursor.getMZ() # TODO: Review by @timo and @julianus - intensity = precursor.getIntensity() # TODO: Review by @timo and @julianus - rt = spectrum.getRT() # TODO: Review by @timo and @julianus + charge_state = precursor.getCharge() + exp_mz = precursor.getMZ() + intensity = precursor.getIntensity() + # rt is already calculated outside the loop and doesn't change per precursor precursors.append( { "charge": charge_state, "mz": exp_mz, "intensity": intensity, "rt": rt, "index": index, } ) - index += 1🧰 Tools
🪛 Ruff (0.8.2)
127-127: Use
enumerate()for index variableindexinforloop(SIM113)
🧹 Nitpick comments (1)
quantmsutils/mzml/mzml_statistics.py (1)
175-175: Fix truncated log message.The log message is incomplete and might cause confusion.
- logger.info("Skipping spectrum with MS level %s, MS not supported, or precursors not ", ms_level) + logger.info("Skipping spectrum with MS level %s, MS not supported, or precursors not found", ms_level)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsutils/mzml/mzml_statistics.py(11 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
127-127: Use enumerate() for index variable index in for loop
(SIM113)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (11)
quantmsutils/mzml/mzml_statistics.py (11)
1-1: Good addition of logging configuration.Adding structured logging with appropriate levels helps with debugging and monitoring application behavior.
Also applies to: 32-34
28-29: Good addition of new constants.Adding
PRECURSORSandINTENSITYconstants maintains consistency with other field naming in the codebase.
73-87: Well-documented function with improved docstring.The enhanced docstring clearly explains the function's purpose, behavior, and parameters.
89-96: Improved peak extraction and intensity calculations.The refactored code now directly uses
get_peaks()to extract mz and intensity arrays, improving readability and potentially performance.
153-155: Good addition of precursor and intensity information.Adding precursor details and intensity values to the row data enriches the output and provides more comprehensive information for analysis.
Also applies to: 172-172
229-229: Improved error handling with context information.Adding file path information to the error message provides better context for troubleshooting.
289-304: Well-structured schema for precursor data.The schema definition correctly handles the complex nested structure of precursor data, properly defining field types and nullability.
343-343: Enhanced error logging with file path context.Adding file path information to error logs significantly improves troubleshooting capabilities.
394-409: Consistent schema definition across functions.The precursor schema definition is consistently applied in the
batch_write_bruker_dfunction, maintaining a uniform data structure throughout the application.
434-435: Improved file path handling using Path methods.Using
Path.with_name()instead of string manipulation improves readability and reliability of path construction.
453-453: Added success logging.Adding success logging helps track the application flow and confirms successful file processing.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/test_commands.py (1)
151-153:⚠️ Potential issueMissing test file causes pipeline failure.
Referencing
BSA1_F1_test_ms_info.parquetleads to a FileNotFoundError. You must either include this file intest_dataor remove/change this reference to a valid existing file.-table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet") +# table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet")🧰 Tools
🪛 GitHub Actions: Python application
[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🪛 GitHub Actions: Python package
[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🧹 Nitpick comments (4)
tests/test_commands.py (1)
157-157: Consider verifying table contents rather than just length.A simple length comparison might mask data discrepancies. For more robust testing, consider comparing column-level data or using a DataFrame method like
pd.testing.assert_frame_equal(table1, table2).quantmsutils/mzml/mzml_statistics.py (3)
5-5: Remove unused import.
typing.Optionalis imported but never used in this file.-from typing import Optional, Set, Union +from typing import Set, Union🧰 Tools
🪛 Ruff (0.8.2)
5-5:
typing.Optionalimported but unusedRemove unused import:
typing.Optional(F401)
207-211: Unify logging usage.Lines in this method use both
logging.infoandlogger.info. To maintain consistency and configurability, consider replacing any directlogging.infocalls withlogger.info.-logging.info( +logger.info(
239-239: Use 'enumerate()' to handle index.Refactor the loop to leverage Pythonic iteration and avoid manual index increments.
-index = 0 -for precursor in spectrum.getPrecursors(): +for index, precursor in enumerate(spectrum.getPrecursors()):🧰 Tools
🪛 Ruff (0.8.2)
239-239: Use
enumerate()for index variableindexinforloop(SIM113)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
quantmsutils/mzml/mzml_statistics.py(10 hunks)tests/test_commands.py(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
5-5: typing.Optional imported but unused
Remove unused import: typing.Optional
(F401)
239-239: Use enumerate() for index variable index in for loop
(SIM113)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (1)
tests/test_commands.py (1)
7-7: Good practice using a project-level constant for tests directory.Defining
TESTS_DIR = Path(__file__).parentpromotes clarity and maintainability when referencing relative testing paths.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/test_commands.py (1)
158-158:⚠️ Potential issueMissing file breaks the test execution.
The Parquet fileBSA1_F1_test_ms_info.parquetdoes not appear to exist, causing aFileNotFoundError. To fix, either add the file totests/test_dataor compare withms_info_pathif the data sets are intended to be the same.- table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet") + table1 = pd.read_parquet(ms_info_path)🧰 Tools
🪛 GitHub Actions: Python application
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🪛 GitHub Actions: Python package
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🧹 Nitpick comments (7)
tests/test_commands.py (1)
149-153: Clarify the comment for file removal.
The comment, "check if the file exist, delete it," is a bit ambiguous since it's happening forms_info_pathand notmzml_path. Consider updating the comment to reflect which file is truly being checked and removed.- # check if the file exist, delete it + # check if the ms_info_path file exists and delete itquantmsutils/mzml/mzml_statistics.py (6)
5-5: Remove unused importsTupleandUnion.
These imports are not used, as flagged by static analysis.-from typing import Dict, List, Optional, Set, Tuple, Union +from typing import Dict, List, Optional, Set🧰 Tools
🪛 Ruff (0.8.2)
5-5:
typing.Tupleimported but unusedRemove unused import
(F401)
5-5:
typing.Unionimported but unusedRemove unused import
(F401)
13-13: Remove unused importSchema.
TheSchemaimport frompyarrowis not directly referenced and can be removed.-from pyarrow import Schema🧰 Tools
🪛 Ruff (0.8.2)
13-13:
pyarrow.Schemaimported but unusedRemove unused import:
pyarrow.Schema(F401)
22-22: Remove unused importMONOISOTOPIC_MZ.
This constant fromquantmsutils.utils.constantsis not being referenced in the code.- MONOISOTOPIC_MZ,🧰 Tools
🪛 Ruff (0.8.2)
22-22:
quantmsutils.utils.constants.MONOISOTOPIC_MZimported but unusedRemove unused import
(F401)
30-30: Remove unused importMAX_INTENSITY.
It is never used in this file.- MAX_INTENSITY,🧰 Tools
🪛 Ruff (0.8.2)
30-30:
quantmsutils.utils.constants.MAX_INTENSITYimported but unusedRemove unused import
(F401)
351-351: Remove or utilize theintensityvariable.
Currently,intensityis assigned but never used, which is flagged by static analysis. Either remove it or incorporate it into the logic if needed.- intensity = precursor.getIntensity()🧰 Tools
🪛 Ruff (0.8.2)
351-351: Local variable
intensityis assigned to but never usedRemove assignment to unused variable
intensity(F841)
609-609: Preserve original exception context.
When re-raising exceptions, consider usingraise click.Abort() from eto attach the original context.- raise click.Abort() + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
609-609: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
quantmsutils/mzml/mzml_statistics.py(2 hunks)tests/test_commands.py(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
5-5: typing.Tuple imported but unused
Remove unused import
(F401)
5-5: typing.Union imported but unused
Remove unused import
(F401)
13-13: pyarrow.Schema imported but unused
Remove unused import: pyarrow.Schema
(F401)
22-22: quantmsutils.utils.constants.MONOISOTOPIC_MZ imported but unused
Remove unused import
(F401)
30-30: quantmsutils.utils.constants.MAX_INTENSITY imported but unused
Remove unused import
(F401)
351-351: Local variable intensity is assigned to but never used
Remove assignment to unused variable intensity
(F841)
609-609: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (3)
tests/test_commands.py (3)
1-1: Good import usage.
Usingpathlib.Pathfor building file paths is more reliable and portable than using raw strings.
7-7: Initialization ofTESTS_DIRlooks good.
Defining a base directory for tests promotes maintainability and consistency across all test assets.
162-162: Verify the correctness of the final assertion.
If you're comparing two different Parquet outputs, confirm that both files indeed exist and represent the same or comparable data. Otherwise, the test will fail due to the missing second file or mismatching data.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/test_commands.py (1)
158-159:⚠️ Potential issueFix the missing test file causing pipeline failures.
The pipeline is failing because the file
BSA1_F1_test_ms_info.parquetdoesn't exist. You need to either create this file or update the test to use an existing file.Apply this fix:
- table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet") + table1 = pd.read_parquet(ms_info_path)🧰 Tools
🪛 GitHub Actions: Python application
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🪛 GitHub Actions: Python package
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🧹 Nitpick comments (3)
quantmsutils/mzml/mzml_statistics.py (3)
1-32: Improved logging configuration and imports.Good addition of proper logging configuration. This will help with debugging and monitoring. However, there are some unused imports from the typing module that should be cleaned up.
-from typing import Dict, List, Optional, Set, Tuple, Union +from typing import Dict, Optional, Set🧰 Tools
🪛 Ruff (0.8.2)
5-5:
typing.Listimported but unusedRemove unused import
(F401)
5-5:
typing.Tupleimported but unusedRemove unused import
(F401)
5-5:
typing.Unionimported but unusedRemove unused import
(F401)
27-27:
quantmsutils.utils.constants.MAX_INTENSITYimported but unusedRemove unused import:
quantmsutils.utils.constants.MAX_INTENSITY(F401)
27-27: Remove unused constant import.The MAX_INTENSITY constant is imported but never used in this file.
- MAX_INTENSITY, PRECURSOR_RT, PRECURSOR_TOTAL_INTENSITY, PRECURSOR_INTENSITY_WINDOW, + PRECURSOR_RT, PRECURSOR_TOTAL_INTENSITY, PRECURSOR_INTENSITY_WINDOW,🧰 Tools
🪛 Ruff (0.8.2)
27-27:
quantmsutils.utils.constants.MAX_INTENSITYimported but unusedRemove unused import:
quantmsutils.utils.constants.MAX_INTENSITY(F401)
599-601: Improve exception handling with exception chaining.When re-raising exceptions, use explicit exception chaining to preserve the original traceback.
- logger.error(f"Error processing file {ms_path}: {e}") - raise click.Abort() + logger.error(f"Error processing file {ms_path}: {e}") + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
601-601: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
quantmsutils/mzml/mzml_statistics.py(2 hunks)quantmsutils/utils/constants.py(1 hunks)tests/test_commands.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- quantmsutils/utils/constants.py
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
5-5: typing.List imported but unused
Remove unused import
(F401)
5-5: typing.Tuple imported but unused
Remove unused import
(F401)
5-5: typing.Union imported but unused
Remove unused import
(F401)
27-27: quantmsutils.utils.constants.MAX_INTENSITY imported but unused
Remove unused import: quantmsutils.utils.constants.MAX_INTENSITY
(F401)
601-601: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (8)
tests/test_commands.py (3)
1-8: Improved test file organization with Path and constants.Good refactoring to use pathlib.Path and the TESTS_DIR constant. This makes the test file paths more maintainable and platform-independent.
148-156: Clear test file path management using TESTS_DIR.The updates to use TESTS_DIR for file paths improve readability and maintainability. The file existence check and deletion before test is a good practice to ensure clean test runs.
162-162:Details
❓ Verification inconclusive
Updated assertion to compare table lengths instead of using compare().
This change from checking
table1.compare(table2).emptyto comparing lengths is more lenient. Make sure this doesn't hide potential differences in the table contents that should be verified.
🏁 Script executed:
#!/bin/bash # Check the previous version of the test to see what was being verified git show HEAD^ -- tests/test_commands.py | grep -A 5 "assert.*table"Length of output: 100
Action: Manual Verification Required on Table Assertion Change
The recent update in
tests/test_commands.py(line 162) replaces the stricter content comparison (table1.compare(table2).empty) with a simpler length check (assert len(table2) == len(table1)). This update is more lenient and might miss discrepancies in the actual table data. Additionally, the earlier version of the test could not be retrieved due to the absence of a valid previous commit (error withHEAD^), so it's unclear how comprehensive the original check was.
- Review Area: Confirm whether comparing only the table lengths is sufficient or if a more detailed comparison is needed to capture potential data differences.
- Additional Action: Manually inspect the commit history for
tests/test_commands.py(or use another valid reference) to understand the intended behavior of the previous assertion.quantmsutils/mzml/mzml_statistics.py (5)
34-64: Well-structured schema definitions with clear documentation.The separation of schemas into create_ms_schema() and create_id_schema() functions with proper documentation is a good practice. This makes the code more maintainable and easier to understand.
207-373: Comprehensive refactoring of spectrum transformation logic.The transform_mzml_spectrum method is a significant improvement over the previous implementation. It:
- Correctly handles MS level filtering
- Properly extracts peak data using get_peaks()
- Includes detailed precursor data extraction
- Handles edge cases and logs errors appropriately
The code is well-structured with helpful comments explaining the processing logic.
374-420: Enhanced batch writing with improved error handling.The _write_batch method improvements include better error logging and a more efficient approach to writing batches. The use of RecordBatch.from_pylist is more memory-efficient than converting through pandas first.
481-487: Include batch size in BatchWritingConsumer instantiation.Good improvement to pass the batch_size parameter to the BatchWritingConsumer. This allows for more flexibility in controlling memory usage during processing.
489-496: Added acquisition datetime retrieval with fallback logic.The improved acquisition datetime handling with fallback options is a good enhancement that ensures this metadata is captured when available.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/test_commands.py (1)
137-142:⚠️ Potential issueFix the reference to the missing test file.
The code is still trying to read
BSA1_F1_test_ms_info.parquetwhich was previously identified as missing in the repository. This will cause test failures.- table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet") - table2 = pd.read_parquet(ms_info_path) - table2 = table2.set_index("scan") - table1 = table1.set_index("scan") - assert len(table2) == len(table1) + table1 = pd.read_parquet(ms_info_path) + table1 = table1.set_index("scan") + # Additional assertions can be added here to validate the content of the file + assert len(table1) > 0Or, if two different files are truly needed, ensure that
BSA1_F1_test_ms_info.parquetexists in the test_data directory.
🧹 Nitpick comments (5)
README.md (1)
39-48: Well-structured documentation with clear examples.The collapsible details section with information about the MS info output format is helpful. Consider adding a space after the bullet points for consistent formatting.
-... the following postfix `{file_name}_ms_info.parquet`. Here, the definition of each column and how they are estimated and used: +... the following postfix `{file_name}_ms_info.parquet`. Here, the definition of each column and how they are estimated and used: -... The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g `controllerType=0 controllerNumber=1 scan=43920`). We tried to find the definition of [quantms.io](https://github.com/bigbio/quantms.io/blob/main/docs/README.adoc#scan). +... The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g `controllerType=0 controllerNumber=1 scan=43920`). We tried to find the definition of [quantms.io](https://github.com/bigbio/quantms.io/blob/main/docs/README.adoc#scan). -... The MS level of the signal, 1 for MS and 2 for MS/MS. +... The MS level of the signal, 1 for MS and 2 for MS/MS. -... The number of peaks in the MS. Compute with pyopenms with `spectrum.get_peaks()`. +... The number of peaks in the MS. Compute with pyopenms with `spectrum.get_peaks()`.🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
quantmsutils/mzml/mzml_statistics.py (4)
1-1: Good addition of logging configuration.Adding proper logging setup with appropriate formatting is a significant improvement for debugging and monitoring. However, consider removing the unused
Listimport from typing.-from typing import Dict, List, Optional, Set +from typing import Dict, Optional, SetAlso applies to: 5-5, 32-34
387-424: Enhanced batch writing with better error handling.The batch writing method now includes better error handling and logging of exceptions, which will make debugging issues easier.
When raising exceptions in an except block, consider using
raise ... from eto preserve the original exception context:- logger.error(f"Error during batch writing: {e}, file path: {self.output_path}") - raise + logger.error(f"Error during batch writing: {e}, file path: {self.output_path}") + raise Exception(f"Batch writing failed") from e
511-513: Improve exception handling in streaming function.Similar to the previous comment, consider preserving the exception context when raising exceptions inside exception handlers.
- except Exception as e: - logger.error(f"Error during mzML streaming: {e}, file path: {file_name}") - return None + except Exception as e: + logger.error(f"Error during mzML streaming: {e}, file path: {file_name}") + return None
602-604: Improve CLI exception handling.When raising in an except block, use
raise ... from eto preserve the exception context:- except Exception as e: - logger.error(f"Error processing file {ms_path}: {e}") - raise click.Abort() + except Exception as e: + logger.error(f"Error processing file {ms_path}: {e}") + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
604-604: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.md(1 hunks)quantmsutils/mzml/mzml_statistics.py(2 hunks)tests/test_commands.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
5-5: typing.List imported but unused
Remove unused import: typing.List
(F401)
604-604: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🪛 LanguageTool
README.md
[uncategorized] ~37-~37: You might be missing the article “the” here.
Context: ...d by multiple tools and packages within quantms ecosystem for quality control, mzTab ge...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (8)
tests/test_commands.py (2)
1-2: Good use of thepathlibmodule and constant definition.Using Path from the pathlib module and defining a TESTS_DIR constant are good practices to make the test code more maintainable and portable.
Also applies to: 7-8
129-135: Check file path construction and existence.The file path construction using TESTS_DIR is a good improvement, but there appears to be a check for deleting the main ms_info file before running the test. Make sure this is intentional and that the file is correctly regenerated during the test.
README.md (1)
35-38: Good documentation addition for mzML statistics.The new section provides valuable context about the mzML statistics functionality. Consider adding "the" before "quantms ecosystem" for improved grammar.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~37-~37: You might be missing the article “the” here.
Context: ...d by multiple tools and packages within quantms ecosystem for quality control, mzTab ge...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
quantmsutils/mzml/mzml_statistics.py (5)
36-66: Well-structured schema definitions with clear type annotations.Creating dedicated functions for schema definitions improves code organization and reusability. The schemas are well-defined with appropriate field types and nullability.
68-141: Enhanced Bruker file processing with better error handling.The improvements to the Bruker file processing include proper error handling, acquisition datetime retrieval, and better logging. This is a significant enhancement to the code quality.
209-269: Improved spectrum transformation with proper peak extraction.The spectrum transformation method has been significantly improved with better error handling and extraction of peak data. The TODOs have been properly addressed as suggested in previous reviews.
270-274: Good handling of missing precursor intensity.Using the calculated precursor intensity when the regular intensity is missing or zero is a good fallback strategy.
330-386: Well-implemented precursor data extraction.The precursor data extraction is now more robust with proper error handling and logging of missing precursors.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
quantmsutils/mzml/mzml_statistics.py (6)
68-141: Consider renamingcolumn_existsfor clarity
Because this function returns a set of columns instead of a boolean, a name likeget_table_columnscould be more intuitive. Otherwise, the chunk-based reading logic and writing to Parquet is well-implemented.
265-296: Ensure consistent usage ofPRECURSOR_INTENSITY_WINDOW
It’s set toNonein the MS1 branch. For consistency, consider handling it similarly in MS2 or removing it entirely if not needed.
298-316: MS1 branch appears clean
If there's overlap with the MS2 branch, consider a shared helper to reduce duplicated logic.
323-329: Regex-based scan ID extraction
If multiple matches occur, consider logging a warning or handling them specifically.
516-553: Path resolution logic
Consider refining behavior when multiple valid candidates are found. Currently, an error is raised, but you might provide a fallback or clearer warning message.
555-605: Preserve exception context
When re-raising viaraise click.Abort(), consider usingraise click.Abort() from eto maintain the original traceback context.🧰 Tools
🪛 Ruff (0.8.2)
604-604: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsutils/mzml/mzml_statistics.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
604-604: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (13)
quantmsutils/mzml/mzml_statistics.py (13)
1-5: Imports look good
No issues found with these import additions.
12-12: No issues with this import
Importingpyopenmsis appropriate here.
32-53: Nice introduction of logging and well-structured schema
Defining the logging configuration and creating a clear PyArrow schema is a solid foundation for handling spectral data.
56-65: Schema definitions are consistent and well-defined
The ID-only schema is clearly specified and matches expected data structures.
143-161: Same rename suggestion
The function name can be misleading since it returns a set of columns rather than a boolean.
165-196: Constructor docstring is comprehensive
All parameters are thoroughly explained, making the initialization straightforward to understand.
209-239: Implementation of spectrum transformation looks solid
Skipping unsupported MS levels and proceeding with MS1 or MS2 is clear and well-structured.
240-247: Correct intensity calculations
Base peak and total intensities are properly handled, including the case with zero peaks.
249-264: Conditional logic for MS2 detection is well-handled
Calling_extract_first_precursor_datafor MS2 spectra aligns with the design.
330-386: Precursor purity logic
Catching and logging exceptions at debug level is a good choice for diagnosing purity calculation issues.
387-425: Incremental batch writing logic
Lazy initialization of parquet writers is an effective way to manage resources.
426-442: Finalize method
Properly flushes any remaining data and closes writers. No issues found.
444-514: Efficient streaming for mzML files
The chunk-based approach is likely performant for large files. Consider adding tests for varied file sizes and missing metadata scenarios.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
quantmsutils/mzml/mzml_statistics.py (1)
555-604: CLI command with improved error handling, but missing exception chainingThe CLI command has been enhanced with improved documentation and error handling. However, there's a static analysis warning about exception chaining being missing.
At line 604, implement proper exception chaining to preserve the original traceback:
except Exception as e: logger.error(f"Error processing file {ms_path}: {e}") - raise click.Abort() + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
604-604: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.markdownlint.json(1 hunks)quantmsutils/mzml/mzml_statistics.py(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- .markdownlint.json
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
604-604: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (9)
quantmsutils/mzml/mzml_statistics.py (9)
1-33: Logging configuration added - good improvement!The addition of proper logging configuration with levels and formatting enhances the overall code quality. This is consistent with similar enhancements made in other files mentioned in the summary.
36-66: Well-structured schema definitions with proper type annotationsThe new
create_ms_schema()andcreate_id_schema()functions provide clear, reusable schema definitions with proper PyArrow types and nullability settings. This modular approach improves code organization.
68-141: Improved error handling and batch processing for Bruker filesThe enhanced
batch_write_bruker_dfunction now includes proper exception handling, logging, and batch processing. The code handles acquisition datetime retrieval and safely constructs SQL queries.
143-161: Well-documented helper function for column checkingThe
column_existsfunction provides a clean way to check for column existence in SQLite tables with proper typing and documentation.
323-329: Good extraction of scan ID from native IDThe
_extract_scan_idhelper method uses a regex pattern to extract scan IDs, improving code readability and maintainability.
330-386: Comprehensive precursor data extraction with error handlingThe
_extract_first_precursor_datamethod thoroughly extracts precursor information with proper error handling using try-except blocks and debug logging.
387-425: Improved batch writing with error handlingThe
_write_batchmethod now includes proper error logging with file path information, which will be valuable for troubleshooting.
444-514: Enhanced mzML streaming with proper error handling and loggingThe
batch_write_mzml_streamingfunction now includes comprehensive error handling, schema creation, and acquisition datetime extraction.
516-553: Improved path resolution with informative loggingThe
resolve_ms_pathfunction now includes better logging for path resolution, which helps with debugging path-related issues.
# Conflicts: # quantmsutils/mzml/mzml_statistics.py
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
quantmsutils/mzml/mzml_statistics.py (4)
29-29: Remove unused import to clean up code.The constant
PRECURSOR_INTENSITY_WINDOWis imported here but never referenced in the file. Consider removing this import to maintain a clean codebase.- PRECURSOR_INTENSITY_WINDOW,🧰 Tools
🪛 Ruff (0.8.2)
29-29:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOWimported but unusedRemove unused import:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW(F401)
68-137: Secure SQL column handling and robust batch writing.
- The dynamic column selection using
column_existsmitigates SQL injection risk.- Streaming the data in chunks helps with memory usage and is aligned with best practices.
- Consider re-raising the original error context for improved debuggability.
except Exception as e: logger.error(f"Error processing Bruker .d file: {e}") - raise + raise e from None # or directly use `raise e` from the original context
271-271: Clarify logical condition.The condition
if intensity is None or intensity == 0.0 and first_precursor_calculated:might be ambiguous. If you intended(intensity is None or intensity == 0.0) and first_precursor_calculated, consider adding parentheses to clarify the precedence.- if intensity is None or intensity == 0.0 and first_precursor_calculated: + if (intensity is None or intensity == 0.0) and first_precursor_calculated:
603-603: Preserve original exception context.To distinguish exceptions due to downstream errors from those in the exception handler, consider chaining the exception explicitly when re-raising or aborting.
logger.error(f"Error processing file {ms_path}: {e}") - raise click.Abort() + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
603-603: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsutils/mzml/mzml_statistics.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused
Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
603-603: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (3)
quantmsutils/mzml/mzml_statistics.py (3)
32-33: Good job adding a global logging configuration.The chosen logging format provides valuable info (timestamp, function name, message). Make sure to adapt the verbosity level based on the environment (development vs. production).
36-54: Schema definition appears consistent.The fields defined in
create_ms_schema()properly align with how they are populated downstream. This ensures consistent data typing in the Parquet output.
56-66: ID-only schema is well-defined.Each field (scan, MS level, mz & intensity arrays) has an appropriate type. This separates ID-only data from the main data effectively.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
tests/test_commands.py (3)
16-16:⚠️ Potential issueFix the missing test file reference that's causing pipeline failures.
The file
BSA1_F1_test_ms_info.parquetdoesn't exist in the test_data directory according to the pipeline errors. This is causing the test at line 129 to fail.You have two options:
- Create this reference file in the test_data directory
- Modify the test to use an existing file instead
94-102:⚠️ Potential issueFix the failing psmconvert command
The CI pipeline shows this test is failing with exit code 2 instead of the expected 0. The previous reviewer identified this may be due to a missing dependency (
numpy).Ensure that:
- All dependencies are correctly installed
- The command arguments are correct and valid
- The implementation of the psmconvert command works with these inputs
🧰 Tools
🪛 GitHub Actions: Python application
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' did not execute successfully.
🪛 GitHub Actions: Python package
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.
128-130:⚠️ Potential issueFix reference data handling
The test is trying to compare with a reference file (
BSA1_F1_test_ms_info.parquet) that doesn't exist, causing pipeline failures.Either:
- Add the missing reference file to the repository
- Generate a reference file as part of the test setup
- Use a different approach that doesn't require a preexisting reference file
🧰 Tools
🪛 GitHub Actions: Python application
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
🪛 GitHub Actions: Python package
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
🧹 Nitpick comments (1)
tests/test_commands.py (1)
67-68: Consider adding output validationThe test verifies the command executes successfully, but doesn't validate the content of the output files. Consider adding assertions to verify specific output characteristics.
result = run_cli_command("diann2mztab", args) assert result.exit_code == 0 - # Additional assertions could check for expected output files + # Verify output file exists and has expected content + output_file = DIANN_TEST_DIR / "output.mztab" + assert output_file.exists(), "Output file was not created" + # Add assertions to check file content
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.gitignore(1 hunks)tests/test_commands.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitignore
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' did not execute successfully.
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (6)
tests/test_commands.py (6)
8-18: Good use of constants for better maintainability!Defining path constants at the top of the file is an excellent practice that improves readability and makes future path updates easier.
21-28: Well-designed helper function!Creating a reusable
run_cli_commandfunction reduces duplication across the test suite and makes the code more maintainable.
31-50: Good use of parameterization!Consolidating CLI help message tests using pytest's parametrize is a great improvement that follows testing best practices.
🧰 Tools
🪛 GitHub Actions: Python application
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
🪛 GitHub Actions: Python package
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
108-117: Good test fixture setup!Using a pytest fixture for setup and teardown is an excellent practice. This ensures tests start with a clean state and don't interfere with each other.
146-149: Good validation of output structure!Checking for the presence of expected columns is a good validation approach.
150-160: Well-structured skipped testMarking tests that require large files as skipped by default is a good practice. This allows local testing without slowing down CI pipelines.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
quantmsutils/mzml/mzml_statistics.py (1)
613-615: Consider preserving the original exception cause.
Raisingclick.Abort()directly loses the traceback context. Usingraise click.Abort() from eprovides better debugging information, though it’s optional.🧰 Tools
🪛 Ruff (0.8.2)
615-615: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/test_data/tmt_erwinia_1ulsike_top10hcd_isol2_45stepped_60min_01_sage_ms2rescore.idxml.feature_names.tsvis excluded by!**/*.tsv
📒 Files selected for processing (3)
quantmsutils/mzml/mzml_statistics.py(1 hunks)quantmsutils/psm/psm_conversion.py(5 hunks)tests/test_commands.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- quantmsutils/psm/psm_conversion.py
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 102-102: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.
[error] 135-135: AssertionError: Expected column scan missing from output in test_mzml_statistics_bsa.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 102-102: AssertionError: assert 2 == 0. The command 'psmconvert' returned an exit code of 2.
[error] 135-135: AssertionError: Expected column scan missing from output.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused
Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
615-615: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (6)
tests/test_commands.py (6)
9-17: Good use of constants for maintainability.
Defining all file and directory constants at the top promotes clarity and eases maintenance.
20-28: Helper function is well-structured.
Centralizing CLI invocation logic inrun_cli_commandhelps avoid duplication and simplifies testing.
30-49: CLI help message tests look good.
Parameterizing the test over multiple commands is a neat, scalable approach.🧰 Tools
🪛 GitHub Actions: Python application
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
🪛 GitHub Actions: Python package
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
51-68: DIA-NN to mzTab test is clear.
The test arguments and usage align well with the documented functionality.
70-88: Samplesheet validation tests are properly organized.
The separation of concerns and explicit checks make these tests easy to extend.
120-128: Solid baseline for mzML statistics testing.
The test covers file creation checks and ensures the broad functionality is correct.
| result = run_cli_command("psmconvert", args) | ||
| assert result.exit_code == 0 |
There was a problem hiding this comment.
Investigate failing 'psmconvert' command.
The pipeline indicates the test returns exit code 2 instead of 0. Confirm that the input files exist and all dependencies (e.g., numpy) are installed. Also ensure the command invocation is correct.
🧰 Tools
🪛 GitHub Actions: Python application
[error] 102-102: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.
🪛 GitHub Actions: Python package
[error] 102-102: AssertionError: assert 2 == 0. The command 'psmconvert' returned an exit code of 2.
| SUMMED_PEAK_INTENSITY, | ||
| PRECURSOR_RT, | ||
| PRECURSOR_TOTAL_INTENSITY, | ||
| PRECURSOR_INTENSITY_WINDOW, |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Remove unused import.
PRECURSOR_INTENSITY_WINDOW is never referenced, so it should be removed to avoid clutter.
- PRECURSOR_INTENSITY_WINDOW,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| PRECURSOR_INTENSITY_WINDOW, |
🧰 Tools
🪛 Ruff (0.8.2)
29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused
Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/test_commands.py (2)
20-27: Helper function reduces code duplicationThe
run_cli_commandfunction is a good abstraction that eliminates repetitive code. It handles both help message testing and command execution with arguments.Consider adding a return type annotation to make the function's interface clearer:
-def run_cli_command(command, args=None): +def run_cli_command(command, args=None) -> click.testing.Result:
138-147: Good use of skip marker for local-only testsUsing
pytest.mark.skipfor tests that should only run locally with larger files is a good practice. This prevents these tests from slowing down CI pipelines while still keeping them in the codebase for local verification.Consider adding a more specific reason for skipping:
-@pytest.mark.skip("Test to be run locally, with big files") +@pytest.mark.skip(reason="Uses large files that would slow down CI pipelines and consume excessive resources")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_commands.py(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (5)
tests/test_commands.py (5)
9-18: Well-organized constants improve maintainabilityI like how you've defined all path and file constants at the top of the file. This improves maintainability and makes it easier to update paths if needed. The constants follow a clear naming convention, making their purpose obvious.
As a minor suggestion, consider adding a docstring at the top of the file to explain its purpose and the overall testing strategy.
30-49: Excellent use of parametrized testingConverting individual test methods into a single parametrized test is a best practice that reduces code duplication and makes adding new command tests easier.
The docstrings are clear and the assertions are appropriate for verifying help messages.
🧰 Tools
🪛 GitHub Actions: Python application
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
🪛 GitHub Actions: Python package
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
106-119: Good test setup with fixture for cleanupUsing a pytest fixture with
autouse=Truefor setup and teardown is a clean approach. This ensures each test runs in a clean environment by removing output files before the test and allowing for cleanup after.I particularly like how you're explicitly checking for and removing existing files to prevent test interference.
1-147:Details
❓ Verification inconclusive
Address deprecation warnings in the pipeline
The pipeline shows deprecation warnings that should be addressed:
FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.While these don't appear directly in this file, they're triggered during test execution and should be fixed in the relevant implementation files.
🏁 Script executed:
#!/bin/bash # Find where the deprecated methods are used echo "Searching for deprecated DataFrame.applymap usage..." rg -l "\.applymap\(" --type py echo "Searching for deprecated is_categorical_dtype usage..." rg -l "is_categorical_dtype" --type pyLength of output: 305
Deprecation Warnings: Manual Verification Recommended
Our search for direct usages of
DataFrame.applymapandis_categorical_dtypein the repository (including in key modules such asquantmsutils/quantmsutilsc.py) did not yield any matches. Given that these warnings are observed during test runs, it's possible they stem from indirect calls (e.g., through dependencies or deeper library internals).
- Please manually verify if these deprecated calls are coming from our own code or from an external dependency.
- If found within our implementations, update:
DataFrame.applymaptoDataFrame.map- The usage of
is_categorical_dtypeto an appropriate type-check usingisinstance(dtype, pd.CategoricalDtype)- Focus especially on the implementation files that transform or manipulate data, such as in
quantmsutils/quantmsutilsc.py.🧰 Tools
🪛 GitHub Actions: Python application
[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
120-136:Details
❓ Verification inconclusive
Test has comprehensive assertions but had past failures
The test has solid assertions checking both the exit code and the existence of expected output files, followed by content validation.
Past review comments indicated an issue with the 'scan' column missing from output. The current code checks for this (line 135), which should prevent that issue from recurring.
🏁 Script executed:
#!/bin/bash # Check if TMT files exist echo "Checking test data files..." ls -la ${TEST_DATA_DIR}/TMT* 2>/dev/null || echo "TMT files not found" # Check whether the test can run without errors echo "Running mzml statistics test..." python -c " import os import sys from pathlib import Path import pandas as pd from click.testing import CliRunner from quantmsutils.quantmsutilsc import cli TMT_MZML_FILE = '${TMT_MZML_FILE}' TMT_MS_INFO_FILE = '${TMT_MS_INFO_FILE}' # Remove output files if they exist if os.path.exists(TMT_MS_INFO_FILE): os.unlink(TMT_MS_INFO_FILE) runner = CliRunner() result = runner.invoke(cli, ['mzmlstats', '--ms2_file', '--ms_path', TMT_MZML_FILE]) print(f'Exit code: {result.exit_code}') if result.exception: print(f'Exception: {result.exception}') print(f'Traceback: {result.exc_info}') if os.path.exists(TMT_MS_INFO_FILE): print('Output file created successfully') df = pd.read_parquet(TMT_MS_INFO_FILE) print(f'Columns: {df.columns.tolist()}') print(f'Row count: {len(df)}') else: print('Output file not created') "Length of output: 1210
Action Required: Verify Test Environment Dependencies
The updated test implementation now robustly validates the output—including checking for the existence of the “scan” column—to address the past issue. However, manual verification is necessary because our test script execution revealed an environment misconfiguration: Pandas was missing from the testing environment (i.e.
ModuleNotFoundError: No module named 'pandas'). Additionally, the test data files (TMT files) were not found during the check.
- Please ensure that all required dependencies (e.g., Pandas) are installed in the testing environment.
- Verify that the test data file paths (e.g.,
${TEST_DATA_DIR}/TMT*) are correctly configured and accessible.Once these issues are resolved and the environment is properly set up, rerun the tests to confirm that the assertions (including the check for the “scan” column) pass as expected.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/test_data/STATIC_TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01_ms2_info.parquetis excluded by!**/*.parquet
📒 Files selected for processing (1)
tests/test_commands.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (7)
tests/test_commands.py (7)
9-18: Good use of constants for improved maintainability.Defining constants at the top of the file makes the code more maintainable and reduces the risk of path-related errors. Using the
Pathobject frompathlibis also a good practice for cross-platform compatibility.
21-28: Well-designed helper function for CLI command execution.The
run_cli_commandfunction nicely abstracts the common pattern of creating a CLI runner and invoking commands, making the test code more concise and following the DRY principle.
31-50: Excellent use of pytest parameterization for testing help messages.Converting individual test functions into a parameterized test reduces code duplication and makes it easier to add new commands to test in the future.
107-120: Good use of pytest fixture for test setup and teardown.Using a fixture with
autouse=Trueensures that test files are properly cleaned up before and after tests run, preventing interference between test runs.
138-148: Good practice using pytest.mark.skip for resource-intensive tests.Marking tests that require large files or significant resources with
pytest.mark.skipis a good practice. It allows other tests to run quickly while still keeping the test code available for local debugging.
1-148: Overall refactoring improves test organization and maintainability.The refactoring of this test file follows best practices by:
- Organizing related tests into classes
- Using fixtures for setup/teardown
- Applying parameterization to reduce duplication
- Providing clear docstrings for tests
- Using consistent naming patterns
This structure makes the tests more maintainable and easier to extend in the future.
134-137:Details
❓ Verification inconclusive
Ensure 'scan' column is correctly generated in the output.
Previous pipeline failures indicated that the 'scan' column was missing from the output. This test now checks for its presence, but verify that the implementation correctly generates this column.
🏁 Script executed:
#!/bin/bash # Run the mzmlstats command to generate the output file python -m quantmsutils.quantmsutilsc mzmlstats --ms2_file --ms_path "${TMT_MZML_FILE}" # Check if the output file exists and contains the expected columns echo "Checking output file for 'scan' column:" python -c " import pandas as pd file_path = '${TMT_MS_INFO_FILE}' try: df = pd.read_parquet(file_path) if 'scan' in df.columns: print(f\"✓ 'scan' column is present in {file_path}\") print(f\"Column values (first 5 rows): {df['scan'].head().tolist()}\") else: print(f\"✗ 'scan' column is missing from {file_path}\") print(f\"Available columns: {df.columns.tolist()}\") except Exception as e: print(f\"Error reading file: {e}\") "Length of output: 1197
ACTION REQUIRED: Ensure dependency installation and verify 'scan' column generation
The execution failed due to missing modules (
numpyandpandas), so we couldn’t confirm whether the output contains the'scan'column. Please make sure all required dependencies are installed in the test environment and re-run the pipeline. Once dependencies are in place, manually verify that the output file (referenced by${TMT_MS_INFO_FILE}) correctly includes the'scan'column as intended.
| class TestPSMConversion: | ||
| """ | ||
| Test class for PSM conversion commands, it takes an ms2 file in parquet and an idXML file | ||
| and converts it to a parquet file with PSM information. | ||
| """ | ||
| def test_convert_psm(self): | ||
| """Test converting PSM data""" | ||
| args = [ | ||
| "--idxml", str(TMT_IDXML_FILE), | ||
| "--ms2_file", str(TMT_STATIC_MS2_INFO_FILE), | ||
| ] | ||
| result = run_cli_command("psmconvert", args) | ||
| assert result.exit_code == 0 | ||
| # Could add assertions to check for expected output files | ||
|
|
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify that dependencies for psmconvert are correctly installed.
Previous pipeline failures indicated issues with missing dependencies (e.g., numpy) for the psmconvert command. While the test structure has been improved, ensure that all required dependencies are properly installed in the testing environment.
🏁 Script executed:
#!/bin/bash
# Check if numpy is installed and its version
echo "Checking for numpy installation:"
pip list | grep numpy || echo "numpy not found!"
# Check if the test files exist
echo "Checking if test files exist:"
ls -la ${TMT_IDXML_FILE} ${TMT_STATIC_MS2_INFO_FILE} 2>/dev/null || echo "Test files not found"
# Try importing numpy in a Python script
echo "Attempting to import numpy:"
python -c "import numpy; print(f'Numpy version: {numpy.__version__}')" || echo "Failed to import numpy"Length of output: 1492
Missing Dependency: Ensure numpy is installed in the testing environment
The latest test run confirms that numpy is not installed (output shows “numpy not found!” and a failure when attempting to import numpy). Please update the dependency configuration (e.g., add numpy to requirements.txt or environment.yml) so that the psmconvert command has all necessary packages during testing.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
README.md (6)
35-36: Header Formatting: Standardize "mzML Statistics" Headline.Consider capitalizing and formatting the header as "mzML Statistics" (instead of "mzml statistics") to match the conventional capitalization of the mzML format for clarity and consistency.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-37: Typographical Error in Section Description.There is a typo in the phrase "gow they are computed." It should be corrected to "how they are computed."
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
39-43: Details Section Clarity.The details section is well-structured and clearly explains the purpose and output of the
mzmlstatsfunctionality. For further clarity, consider adding a brief example demonstrating the output filename format.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315(MD013, line-length)
44-49: Field Descriptions and Markdown Formatting.The list describing each field is comprehensive and informative. However, please review the punctuation and spacing to address minor markdownlint warnings (e.g., trailing spaces and line-length issues) as mentioned in the static analysis hints. Paying attention to these details will further improve readability.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -base_peak_intensity`: The max intensity in the spectrum (MS o...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). -summed_peak_intensities: The sum of all intensities in the spect...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). -rt: The retention time of the spectrum, cap...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335(MD013, line-length)
51-57: MS/MS Signal Field Details Consistency.The additional fields for MS/MS signals are clearly described. Ensure that all field names (e.g.,
precursor_charge,precursor_mz,precursor_rt, andprecursor_total_intensity) are formatted uniformly with backticks and consistent punctuation.🧰 Tools
🪛 LanguageTool
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: -precursor_charge: The charge of the precursor ion, if the...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -precursor_mz`: The m/z of the precursor ion, if the si...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -precursor_intensity`: The intensity of the precursor ion, if ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. -precursor_rt: The retention time of the precursor ion...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. -precursor_total_intensity: The total intensity of the precursor io...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
53-53: Line length
Expected: 80; Actual: 148(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112(MD013, line-length)
58-69: Note Block and Code Fencing Adjustments.The note section effectively explains how precursor-related information is computed. Consider the following improvements:
- Ensure that the fenced code block is surrounded by blank lines (per MD031 guidelines).
- Correct the typo "pyopnems" to "pyopenms" to accurately reference the library.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
60-60: Line length
Expected: 80; Actual: 233(MD013, line-length)
61-61: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
64-64: Line length
Expected: 80; Actual: 97(MD013, line-length)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md(1 hunks)quantmsutils/utils/constants.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- quantmsutils/utils/constants.py
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
README.md
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272
(MD013, line-length)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315
(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335
(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148
(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137
(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268
(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98
(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112
(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
60-60: Line length
Expected: 80; Actual: 233
(MD013, line-length)
61-61: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
64-64: Line length
Expected: 80; Actual: 97
(MD013, line-length)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (1)
README.md (1)
71-73: Contributions and Issues Section.The "Contributions and issues" section is clear and directs users appropriately to the GitHub repository for further engagement. No changes are needed in this part.
🧰 Tools
🪛 LanguageTool
[typographical] ~73-~73: A comma between “please” and the verb is normally not needed. If you want to signal a pause, consider using an em dash instead.
Context: ... Contributions and issues are welcome. Please, open an issue in the [GitHub repository](htt...(PLEASE_UNNECESSARY_COMMA)
🪛 markdownlint-cli2 (0.17.2)
73-73: Line length
Expected: 80; Actual: 196(MD013, line-length)
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
README.md (3)
35-36: Header Capitalization & ConsistencyThe new header "
#### mzml statistics" is a clear indicator for this section. For consistency and clarity—especially given standard usage in proteomics—it would be better to capitalize it as "#### mzML Statistics".🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-38: Typo & Clarity in Descriptive TextThe introductory text contains a minor typographical error: "
gow they are computed" should be corrected to "how they are computed". Additionally, you might consider rephrasing the sentence for clarity. For example, starting with "The quantms-utils package provides multiple scripts…" could improve the flow.Proposed Diff:
- Here are some details about the formats, the fields they contain and gow they are computed. + Here are some details about the formats, the fields they contain and how they are computed.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
35-38: Markdown Formatting ImprovementsStatic analysis reports indicate minor markdown formatting issues in these new lines (e.g., trailing spaces on line 35 and an excessive line length on line 37). Addressing these by removing extra whitespace and breaking long lines into shorter, more readable segments will further improve the documentation’s clarity.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
README.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~60-~60: Possible missing preposition found.
Context: ...ng the first precursor in the spectrum. The following columns intensity (if not a...
(AI_HYDRA_LEO_MISSING_IN)
[uncategorized] ~76-~76: Loose punctuation mark.
Context: ...w they are estimated and used: - scan: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level: The MS level of the signal, all of them...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Possible missing comma found.
Context: ...el: The MS level of the signal, all of them will be 2. - mz_array`: The m/z array ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array: The m/z array of the peaks in the MS/MS...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...ectrum.get_peaks(). - intensity_array`: The intensity array of the peaks in the...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
README.md
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272
(MD013, line-length)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315
(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335
(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148
(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137
(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268
(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98
(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112
(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
60-60: Line length
Expected: 80; Actual: 233
(MD013, line-length)
61-61: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
64-64: Line length
Expected: 80; Actual: 97
(MD013, line-length)
74-74: Line length
Expected: 80; Actual: 313
(MD013, line-length)
76-76: Line length
Expected: 80; Actual: 334
(MD013, line-length)
78-78: Line length
Expected: 80; Actual: 140
(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 153
(MD013, line-length)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
README.md (1)
39-69: Comprehensive documentation of MS info outputsThe detailed explanation of the MS info output format, including column definitions and their computation methods, is excellent. This level of detail will help users understand and properly utilize the data.
However, there is a minor typo in line 61: "pyopnems" should be "pyopenms".
-precursor_spectrum = mzml_exp.getSpectrum(precursor_spectrum_index) -precursor_rt = precursor_spectrum.getRT() -purity = oms.PrecursorPurity().computePrecursorPurity(precursor_spectrum, precursor, 100, True) -precursor_intensity = purity.target_intensity -total_intensity = purity.total_intensity +precursor_spectrum = mzml_exp.getSpectrum(precursor_spectrum_index) +precursor_rt = precursor_spectrum.getRT() +purity = oms.PrecursorPurity().computePrecursorPurity(precursor_spectrum, precursor, 100, True) +precursor_intensity = purity.target_intensity +total_intensity = purity.total_intensity🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -base_peak_intensity`: The max intensity in the spectrum (MS o...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). -summed_peak_intensities: The sum of all intensities in the spect...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). -rt: The retention time of the spectrum, cap...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: -precursor_charge: The charge of the precursor ion, if the...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -precursor_mz`: The m/z of the precursor ion, if the si...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms withs...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -precursor_intensity`: The intensity of the precursor ion, if ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. -precursor_rt: The retention time of the precursor ion...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. -precursor_total_intensity: The total intensity of the precursor io...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
60-60: Line length
Expected: 80; Actual: 233(MD013, line-length)
61-61: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
64-64: Line length
Expected: 80; Actual: 97(MD013, line-length)
quantmsutils/mzml/mzml_statistics.py (2)
209-316: Comprehensive spectrum transformation with better error handlingThe
transform_mzml_spectrummethod provides a more structured approach to processing spectra, with clear separation of MS1 and MS2 handling. The code now properly handles cases where peaks or precursor information might be missing.However, there's one improvement opportunity: the code could benefit from a better structured exception handling when dealing with spectrum processing.
Consider adding a try-except block around the main transformation logic to catch and log any unexpected errors that might occur during spectrum processing:
def transform_mzml_spectrum( self, i: int, spectrum: oms.MSSpectrum, mzml_exp: oms.MSExperiment, ) -> None: + try: # Extract peaks data mz_array, intensity_array = spectrum.get_peaks() peak_count = len(mz_array) scan_id = self._extract_scan_id(spectrum) # ... remaining implementation ... self.batch_data.append(row_data) # Write batch when it reaches specified size if len(self.batch_data) >= self.batch_size: self._write_batch() + except Exception as e: + logger.error(f"Error processing spectrum {i}: {e}") + # Continue processing other spectra
566-615: Enhanced CLI command with better options and error handlingThe CLI command now includes better documentation, support for MS2 file generation, and improved error handling. However, there's an issue with the exception handling in line 615.
When catching and re-raising exceptions, it's better to use
raise ... from eto preserve the original traceback:except Exception as e: logger.error(f"Error processing file {ms_path}: {e}") - raise click.Abort() + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
615-615: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md(1 hunks)quantmsutils/mzml/mzml_statistics.py(1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with s...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~74-~74: Possible missing comma found.
Context: ...o produce a file containing all the MS2 spectra including the intesities and masses of ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~76-~76: Loose punctuation mark.
Context: ...w they are estimated and used: - scan: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level: The MS level of the signal, all of them...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array: The m/z array of the peaks in the MS/MS...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...ectrum.get_peaks(). - intensity_array`: The intensity array of the peaks in the...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
README.md
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272
(MD013, line-length)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315
(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335
(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148
(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137
(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268
(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98
(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112
(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
60-60: Line length
Expected: 80; Actual: 233
(MD013, line-length)
61-61: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
64-64: Line length
Expected: 80; Actual: 97
(MD013, line-length)
74-74: Line length
Expected: 80; Actual: 331
(MD013, line-length)
76-76: Line length
Expected: 80; Actual: 334
(MD013, line-length)
78-78: Line length
Expected: 80; Actual: 140
(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 153
(MD013, line-length)
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused
Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
615-615: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (14)
README.md (2)
35-38: Well-structured introduction to mzML statistics functionalityThe new section provides a clear introduction to the mzML statistics functionality, highlighting its purpose within the quantms ecosystem. This documentation will be valuable for users.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
71-81: Clear explanation of MS2 file formatThe documentation for the MS2 output format is concise and clearly explains the contents of the MS2 info parquet file. This will be helpful for users working with MS2 data.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~74-~74: Possible missing comma found.
Context: ...o produce a file containing all the MS2 spectra including the intesities and masses of ...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~76-~76: Loose punctuation mark.
Context: ...w they are estimated and used: -scan: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). -ms_level: The MS level of the signal, all of them...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...nal, all of them will be 2. -mz_array: The m/z array of the peaks in the MS/MS...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...ectrum.get_peaks(). -intensity_array`: The intensity array of the peaks in the...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
74-74: Line length
Expected: 80; Actual: 331(MD013, line-length)
76-76: Line length
Expected: 80; Actual: 334(MD013, line-length)
78-78: Line length
Expected: 80; Actual: 140(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 153(MD013, line-length)
quantmsutils/mzml/mzml_statistics.py (12)
1-33: Improved logging implementationAdding proper logging with configurable levels is a good improvement that will help with troubleshooting and monitoring. The module-level logger with a consistent format will make it easier to track the execution flow.
🧰 Tools
🪛 Ruff (0.8.2)
29-29:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOWimported but unusedRemove unused import:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW(F401)
29-29: Remove unused importThe
PRECURSOR_INTENSITY_WINDOWconstant is imported but never used in the code.- PRECURSOR_INTENSITY_WINDOW,🧰 Tools
🪛 Ruff (0.8.2)
29-29:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOWimported but unusedRemove unused import:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW(F401)
36-66: Well-structured schema definitionsThe addition of dedicated functions for schema creation improves code organization and maintainability. These schemas clearly define the structure of the output files as documented in the README.
68-141: Enhanced Bruker file processing with better error handlingThe updated
batch_write_bruker_dfunction now includes proper logging, batch processing, and robust error handling. The function ensures that all schema fields are properly populated, even when they don't exist in the source data.
143-161: Useful helper function for SQLite column checkingThe
column_existsfunction is a good addition that makes the code more robust by allowing it to adapt to different SQLite schema structures.
169-202: Improved constructor with better parameter documentationThe updated constructor now includes support for MS2 file generation and has comprehensive parameter documentation. This makes the code more maintainable and easier to understand.
322-340: Improved scan ID extractionThe
_extract_scan_idmethod is a good addition that handles different formats of scan IDs in a consistent way. This will make the output files more consistent and easier to work with.
341-397: Well-structured precursor data extractionThe
_extract_first_precursor_datamethod properly handles precursor information extraction, including fallbacks for when precursor information is not explicitly provided. The code follows good practices by including comprehensive error handling and logging.
398-436: Efficient batch writing with error handlingThe
_write_batchmethod now includes proper error handling and logging, making it more robust. The method efficiently writes batches of data to the output files.
437-453: Improved finalization processThe
finalizemethod now properly handles resource cleanup, ensuring that all writers are closed and references are cleared, which helps prevent resource leaks.
455-524: Enhanced mzML streaming with comprehensive error handlingThe
batch_write_mzml_streamingfunction now includes better error handling, acquisition date extraction, and supports generating MS2 files. The function is well-structured and follows good practices for logging and error reporting.
527-564: Improved file path resolutionThe
resolve_ms_pathfunction now handles different file extensions and provides better logging for path resolution. This makes the tool more user-friendly by automatically finding the correct file even if the extension is not exactly as specified.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
README.md (4)
35-38: Fix Typos and Grammar in the mzML Statistics SectionThe new "mzml statistics" section has a couple of language issues:
- The phrase “quantms-utils have multiple scripts…” should be corrected to “quantms-utils has multiple scripts…” for proper subject–verb agreement.
- The word “gow” in “…and gow they are computed” is a typo; it should be “how”.
Consider applying this diff:
-quantms-utils have multiple scripts to generate mzML stats. These files are used by multiple tools and packages within quantms ecosystem for quality control, mzTab generation, etc. Here are some details about the formats, the fields they contain and gow they are computed. +quantms-utils has multiple scripts to generate mzML stats. These files are used by multiple tools and packages within the quantms ecosystem for quality control, mzTab generation, etc. Here are some details about the formats, the fields they contain and how they are computed.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
40-70: Ensure Consistent Library Naming and PunctuationWithin the "MS info and details" block:
- The note section refers to “pyopnems” which is likely a typo; it should be “pyopenms” to correctly reference the library.
- Additionally, several lines trigger static analysis warnings (e.g., loose punctuation and lengthy lines). Although these issues do not affect functionality, revising them will enhance readability.
A diff suggestion for the library naming issue:
-For all the precursor-related information, we are using the first precursor in the spectrum. The following columns `intensity` (if not annotated), `precursor_rt`, and `precursor_total_intensity` we use the following pyopnems code: +For all the precursor-related information, we are using the first precursor in the spectrum. The following columns `intensity` (if not annotated), `precursor_rt`, and `precursor_total_intensity` we use the following pyopenms code:🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -base_peak_intensity`: The max intensity in the spectrum (MS o...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). -summed_peak_intensities: The sum of all intensities in the spect...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). -rt: The retention time of the spectrum, cap...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: -precursor_charge: The charge of the precursor ion, if the...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -precursor_mz`: The m/z of the precursor ion, if the si...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms withs...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -precursor_intensity`: The intensity of the precursor ion, if ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. -precursor_rt: The retention time of the precursor ion...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. -precursor_total_intensity: The total intensity of the precursor io...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
61-61: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
61-61: Line length
Expected: 80; Actual: 233(MD013, line-length)
62-62: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
65-65: Line length
Expected: 80; Actual: 97(MD013, line-length)
72-82: Correct Typo in MS2 DetailsIn the "MS2 info and details" section on line 75, the word “intesities” is misspelled. It should be “intensities”:
-`mzmlstats` allows the user to produce a file containing all the MS2 spectra including the intesities and masses of every peak. +`mzmlstats` allows the user to produce a file containing all the MS2 spectra including the intensities and masses of every peak.🧰 Tools
🪛 LanguageTool
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...w they are estimated and used: -scan: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). -ms_level: The MS level of the signal, all of them...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...nal, all of them will be 2. -mz_array: The m/z array of the peaks in the MS/MS...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~80-~80: Loose punctuation mark.
Context: ...ectrum.get_peaks(). -intensity_array`: The intensity array of the peaks in the...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
75-75: Line length
Expected: 80; Actual: 331(MD013, line-length)
77-77: Line length
Expected: 80; Actual: 334(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 140(MD013, line-length)
80-80: Line length
Expected: 80; Actual: 153(MD013, line-length)
35-86: Refine Markdown FormattingSeveral lines in the new sections are triggering markdownlint warnings (e.g., trailing spaces, excessive line lengths, and loose punctuation). Although these are minor formatting issues, addressing them will improve the overall readability and maintainability of the README. Consider reviewing these aspects using markdownlint tools.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -base_peak_intensity`: The max intensity in the spectrum (MS o...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). -summed_peak_intensities: The sum of all intensities in the spect...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). -rt: The retention time of the spectrum, cap...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: -precursor_charge: The charge of the precursor ion, if the...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -precursor_mz`: The m/z of the precursor ion, if the si...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms withs...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -precursor_intensity`: The intensity of the precursor ion, if ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. -precursor_rt: The retention time of the precursor ion...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. -precursor_total_intensity: The total intensity of the precursor io...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...w they are estimated and used: -scan: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). -ms_level: The MS level of the signal, all of them...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...nal, all of them will be 2. -mz_array: The m/z array of the peaks in the MS/MS...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~80-~80: Loose punctuation mark.
Context: ...ectrum.get_peaks(). -intensity_array`: The intensity array of the peaks in the...(UNLIKELY_OPENING_PUNCTUATION)
[typographical] ~86-~86: A comma between “please” and the verb is normally not needed. If you want to signal a pause, consider using an em dash instead.
Context: ... Contributions and issues are welcome. Please, open an issue in the [GitHub repository](htt...(PLEASE_UNNECESSARY_COMMA)
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
61-61: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
61-61: Line length
Expected: 80; Actual: 233(MD013, line-length)
62-62: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
65-65: Line length
Expected: 80; Actual: 97(MD013, line-length)
75-75: Line length
Expected: 80; Actual: 331(MD013, line-length)
77-77: Line length
Expected: 80; Actual: 334(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 140(MD013, line-length)
80-80: Line length
Expected: 80; Actual: 153(MD013, line-length)
86-86: Line length
Expected: 80; Actual: 196(MD013, line-length)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md(1 hunks)quantmsutils/psm/psm_conversion.py(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- quantmsutils/psm/psm_conversion.py
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with s...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...w they are estimated and used: - scan: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level: The MS level of the signal, all of them...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array: The m/z array of the peaks in the MS/MS...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~80-~80: Loose punctuation mark.
Context: ...ectrum.get_peaks(). - intensity_array`: The intensity array of the peaks in the...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
README.md
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272
(MD013, line-length)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315
(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335
(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148
(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137
(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268
(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98
(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112
(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
61-61: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
61-61: Line length
Expected: 80; Actual: 233
(MD013, line-length)
62-62: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
65-65: Line length
Expected: 80; Actual: 97
(MD013, line-length)
75-75: Line length
Expected: 80; Actual: 331
(MD013, line-length)
77-77: Line length
Expected: 80; Actual: 334
(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 140
(MD013, line-length)
80-80: Line length
Expected: 80; Actual: 153
(MD013, line-length)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
Summary by CodeRabbit
Documentation
New Features
Chores
.qodoto.gitignoreto prevent tracking of specific files..markdownlint.jsonadded for Markdown linting rules.