Initial commit pyLucXor#4
Conversation
|
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||||||||||
There was a problem hiding this comment.
Actionable comments posted: 21
🧹 Nitpick comments (39)
pyLucXor/README.md (2)
170-173: Specify a language for the fenced code block (markdownlint MD040).Add a language to the citation block for consistency with linters and better rendering.
Apply this diff:
-``` +```text Fermin, D., et al. (2011). LuciPHOr: algorithm for phosphorylation site localization with false localization rate estimation using target-decoy approach. Molecular & Cellular Proteomics, 10(3), M110.003830.--- `30-34`: **Replace placeholder repository URL in install instructions.** The placeholder `<repository-url>` can cause copy-paste errors for users. Example fix: ```diff -git clone <repository-url> +git clone https://github.com/bigbio/onsite.gitIf this is a monorepo or subdirectory, add a note clarifying the path to the package root.
pyLucXor/constants.py (2)
59-66: Mixing modified residue masses into AA_MASSES via lowercase keys can be brittle.Encoding PTMs by overloading residue case makes parsing fragile and opaque. Prefer explicit modification handling (e.g., a sequence + PTM map or using MOD_MASSES and dedicated logic).
Consider removing lowercase entries from AA_MASSES and centralizing PTM mass adjustments in peptide/psm logic. If you keep this, document the convention prominently and ensure all parsers/generators adhere to it.
95-100: Decoy mass augmentation mutates AA_MASSES; ensure no double-add and document.The guard prevents double-insertion per interpreter session, but the side-effectful mutation on import should be documented since it affects any consumer importing constants.
Add a comment above the loop explaining idempotency and intent, or move to a function that returns an augmented copy when needed to avoid global mutation.
pyLucXor/config.py (2)
62-70: Alignsetsemantics withupdate: warn on unknown keys.Currently
setsilently introduces new keys whileupdatewarns on unknown keys. This inconsistency is surprising.Apply this diff:
def set(self, key: str, value: Any) -> None: """ Set a configuration value. @@ - self.config[key] = value + if key in self.config: + self.config[key] = value + else: + logger.warning(f"Unknown configuration key: {key}")
93-101: Also guard__setitem__against unknown keys for consistency.Keep behavior consistent across all mutation paths.
Apply this diff:
def __setitem__(self, key: str, value: Any) -> None: """ Set a configuration value using dictionary syntax. @@ - self.config[key] = value + if key in self.config: + self.config[key] = value + else: + logger.warning(f"Unknown configuration key: {key}")pyLucXor/globals.py (2)
5-11: Remove unused import Optional.
Optionalis not used.Apply this diff:
-from typing import List, Dict, Optional +from typing import List, Dict
81-99: Use direct reverse map for decoy symbol lookup.You re-derive the reverse mapping via iteration. Use AA_DECOY_MAP for O(1) lookup and simpler code.
Apply this diff:
-from .constants import DECOY_AA_MAP +from .constants import AA_DECOY_MAP @@ - ret = "" - src_char = c.upper() - - for k, v in DECOY_AA_MAP.items(): - if v.upper() == src_char: - ret = k - break - - return ret + return AA_DECOY_MAP.get(c.upper(), "")pyLucXor/peak.py (1)
119-127: Make from_dict robust to missing keys.Use .get() with sensible defaults to ease forward/backward compatibility of serialized data.
Apply this diff:
- peak = cls(data["mz"], data["raw_intensity"], data["norm_intensity"]) - peak.rel_intensity = data["rel_intensity"] - peak.matched = data["matched"] - peak.dist = data["dist"] - peak.matched_ion_str = data["matched_ion_str"] - peak.matched_ion_mz = data["matched_ion_mz"] - peak.score = data["score"] - peak.intensity_score = data["intensity_score"] - peak.dist_score = data["dist_score"] + peak = cls( + data["mz"], + data["raw_intensity"], + data.get("norm_intensity", 0.0), + ) + peak.rel_intensity = data.get("rel_intensity", 0.0) + peak.matched = data.get("matched", False) + peak.dist = data.get("dist", 0.0) + peak.matched_ion_str = data.get("matched_ion_str", "") + peak.matched_ion_mz = data.get("matched_ion_mz", 0.0) + peak.score = data.get("score", 0.0) + peak.intensity_score = data.get("intensity_score", 0.0) + peak.dist_score = data.get("dist_score", 0.0)pyLucXor/spectrum.py (4)
9-10: Remove unused imports to satisfy linter and reduce import overhead
Dict,List,Optional, andpyopenmsare not used in this module.-from typing import Dict, List, Tuple, Optional -import pyopenms +from typing import Tuple
43-66: Vectorize peak initialization for correctness and speed; also clarify max_i_index semanticsYou can replace the loop-based zero-intensity filtering and manual array population with a vectorized approach. This both simplifies the code and ensures
max_i_indexis the index in the filtered arrays (current code stores the index from the original arrays, which can be confusing downstream).Is
max_i_indexintended to be an index into the filtered arrays (self.raw_intensity) or the original input arrays? The change below makes it index into the filtered arrays.- # Filter out zero intensity peaks - cand_peaks = [] - for i in range(N): - if intensity_array[i] > 0: - cand_peaks.append(i) - - self.N = len(cand_peaks) - - # Initialize arrays - self.mz = np.zeros(self.N) - self.raw_intensity = np.zeros(self.N) - self.rel_intensity = np.zeros(self.N) - self.norm_intensity = np.zeros(self.N) - - # Record peak data - for i, idx in enumerate(cand_peaks): - intensity = intensity_array[idx] - if intensity > self.max_i: - self.max_i = intensity - self.max_i_index = idx - - self.mz[i] = mz_array[idx] - self.raw_intensity[i] = intensity + # Filter out zero intensity peaks (vectorized) + mz_array = np.asarray(mz_array) + intensity_array = np.asarray(intensity_array) + mask = intensity_array[:N] > 0 + self.N = int(np.count_nonzero(mask)) + + # Initialize arrays + self.mz = mz_array[:N][mask].astype(float, copy=False) + self.raw_intensity = intensity_array[:N][mask].astype(float, copy=False) + self.rel_intensity = np.zeros(self.N, dtype=float) + self.norm_intensity = np.zeros(self.N, dtype=float) + + # Record maxima + if self.N > 0: + self.max_i_index = int(np.argmax(self.raw_intensity)) + self.max_i = float(self.raw_intensity[self.max_i_index])
83-101: Use numpy median and vectorized log for normalizationEquivalent, faster, and safer. Also guards against accidental non-positive medians.
- # Sort relative intensities - sorted_intensities = sorted(self.rel_intensity) - - # Calculate median - mid = self.N // 2 - if self.N % 2 == 0: # even number of peaks - a = sorted_intensities[mid - 1] - b = sorted_intensities[mid] - median_i = (a + b) / 2.0 - else: # odd number of peaks - median_i = sorted_intensities[mid] - - # Calculate normalized intensities - for i in range(self.N): - d = self.rel_intensity[i] / median_i - if d > 0: # avoid log(0) - self.norm_intensity[i] = np.log(d) - else: - self.norm_intensity[i] = float('-inf') + # Compute median using numpy + median_i = float(np.median(self.rel_intensity)) + if median_i <= 0: + # Defensive: if median is non-positive, set to -inf + self.norm_intensity.fill(float("-inf")) + return + + # Vectorized log normalization; avoid log(0) by using where/out + self.norm_intensity = np.full_like(self.rel_intensity, float("-inf")) + np.log(self.rel_intensity / median_i, + out=self.norm_intensity, + where=self.rel_intensity > 0)
131-134: Optional: speed up m/z index lookupFor large spectra, a vectorized search avoids Python loops; if
self.mzis sorted, consider binary search, otherwise vectorized absolute diff works.- for i in range(self.N): - if abs(self.mz[i] - mz) < 1e-6: # floating point comparison - return i - return -1 + idxs = np.where(np.abs(self.mz - mz) < 1e-6)[0] + return int(idxs[0]) if idxs.size else -1pyLucXor/flr.py (5)
9-11: Drop unused imports (typing and concurrent.futures)These symbols aren’t used here; remove them to appease linters and speed imports.
-from typing import List, Dict, Any, Optional, Tuple -from concurrent.futures import ThreadPoolExecutor, as_completed +from typing import List, Tuple
394-403: Guard against tiny denominators when computing FDR ratiosDivision by very small
g_auc_f1/l_auc_f1can produce inf/NaN. Clamp denominator withTINY_NUM. You already useTINY_NUMin KDE; extend the safety here.- ratio = (Ndecoy2/Nreal2) * (g_auc_f0 / g_auc_f1) + denom = max(g_auc_f1, TINY_NUM) + ratio = (Ndecoy2 / Nreal2) * (g_auc_f0 / denom) self.global_fdr[i] = ratio @@ - ratio = (Ndecoy2/Nreal2) * (l_auc_f0 / l_auc_f1) + denom = max(l_auc_f1, TINY_NUM) + ratio = (Ndecoy2 / Nreal2) * (l_auc_f0 / denom) self.local_fdr[i] = ratio
125-138: Handle small-sample bandwidth edge cases explicitlyWhen N <= 1 or variance is 0, bandwidth should be 0 (or a small epsilon), not NaN/inf. Make this explicit.
if data_type == REAL: sigma = np.sqrt(self.delta_score_var_pos) N = float(self.n_real) - x = np.power(N, 0.2) - result = 1.06 * (sigma / x) + if N <= 1 or sigma == 0.0: + result = 0.0 + else: + result = 1.06 * (sigma / (N ** 0.2)) self.bw_real = result @@ elif data_type == DECOY: sigma = np.sqrt(self.delta_score_var_neg) N = float(self.n_decoy) - x = np.power(N, 0.2) - result = 1.06 * (sigma / x) + if N <= 1 or sigma == 0.0: + result = 0.0 + else: + result = 1.06 * (sigma / (N ** 0.2)) self.bw_decoy = result
205-211: Treat non-finite/negative bandwidth as invalidIf
bwbecomes NaN/inf or negative, fall back toTINY_NUMdensities to avoid propagating invalid values.- if N == 0 or bw == 0: + if N == 0 or not np.isfinite(bw) or bw <= 0: # Avoid division by zero if data_type == DECOY: self.f0 = np.full(self.NMARKS, TINY_NUM) else: self.f1 = np.full(self.NMARKS, TINY_NUM) return
460-481: Minorization slope divisions can divide by zeroBoth denominators can hit zero in degenerate cases, producing infs. Add guards.
- slope = (0.0 - min_val) / (self.max_delta_score * 1.1 - x[min_idx]) + denom = (self.max_delta_score * 1.1 - x[min_idx]) + slope = (0.0 - min_val) / denom if denom != 0 else 0.0 @@ - slope = max_val / (x[max_idx] - x[-1]) + denom2 = (x[max_idx] - x[-1]) + slope = max_val / denom2 if denom2 != 0 else 0.0pyLucXor/__init__.py (1)
22-25: Avoid F401 warnings for CIDModel/HCDModel lazy exportsReturning via
locals()[name]still trips some linters. Use a small mapping for clarity and to placate static analysis.- elif name in ["CIDModel", "HCDModel"]: - from .models import CIDModel, HCDModel - return locals()[name] + elif name in ("CIDModel", "HCDModel"): + from .models import CIDModel, HCDModel + return {"CIDModel": CIDModel, "HCDModel": HCDModel}[name]pyLucXor/core.py (3)
8-10: Clean up unused imports
numpyandtyping(Dict, Any, Optional) are not used in this file.-import numpy as np import pyopenms -from typing import List, Dict, Any, Optional +from typing import List
131-135: Avoid double assignment of FLR values after process_round2
process_round2is expected to assign FLR viafind_closest_flr. Callingassign_flr_to_psmsagain may overwrite those values and currently assigns FLR to decoys (fixed separately). Consider removing this to keep a single source of truth.Would you like me to adjust the stage-2 flow to rely solely on
process_round2for FLR assignment?
200-203: CSV export writes modified peptide equal to unmodified peptideUse the modified representation if available.
- f.write(f"{psm.scan_num},{psm.peptide.peptide},{psm.peptide.peptide},{psm.psm_score:.4f},{psm.delta_score:.4f},{psm.global_flr:.4f},{psm.local_flr:.4f},{1 if psm.is_decoy else 0}\n") + mod_pep = getattr(psm.peptide, "mod_peptide", psm.peptide.peptide) + f.write(f"{psm.scan_num},{psm.peptide.peptide},{mod_pep},{psm.psm_score:.4f},{psm.delta_score:.4f},{psm.global_flr:.4f},{psm.local_flr:.4f},{1 if psm.is_decoy else 0}\n")pyLucXor/cli.py (6)
10-11: Prune unused imports
json,NTERM_MOD,CTERM_MOD, andAA_MASSESare not used in this file.-import json @@ -from pyLucXor.constants import ( - NTERM_MOD, - CTERM_MOD, - AA_MASSES, - DEFAULT_CONFIG -) +from pyLucXor.constants import DEFAULT_CONFIGAlso applies to: 29-33
528-531: Flatten nested if for readabilityMinor readability improvement; no behavior change.
- if hasattr(psm, 'delta_score') and not np.isnan(psm.delta_score): - if psm.delta_score > global_flr_calculator.min_delta_score: - global_flr_calculator.add_psm(psm.delta_score, psm.is_decoy) + if hasattr(psm, 'delta_score') and not np.isnan(psm.delta_score) \ + and psm.delta_score > global_flr_calculator.min_delta_score: + global_flr_calculator.add_psm(psm.delta_score, psm.is_decoy)
395-395: Rename unused loop index variable
iis unused; use_to signal intent and silence linters.- for i, pep_id in enumerate(pep_ids, 1): + for _, pep_id in enumerate(pep_ids, 1):
666-668: Avoid bare except in MGF TITLE parsingCatching all exceptions hides real errors; scope to expected ones.
- except: + except (ValueError, IndexError): logger.warning(f"Cannot extract scan number from title: {line}")
691-693: Avoid bare except when parsing peak linesBe explicit about parsing failures.
- except: + except ValueError: logger.warning(f"Cannot parse peak data: {line}")
521-542: Consider skipping decoy FLR assignment in round 1 mappingAfter
record_flr_estimates, callingassign_flr_to_psmswill assign FLR to decoys unlessassign_flr_to_psmsis fixed (see flr.py). Once fixed, this is okay. Otherwise, skip assigning here and only assign in round 2.Do you want me to refactor the flow to avoid assigning FLR in round 1 entirely and rely strictly on round 2 mapping?
pyLucXor/parallel.py (3)
181-187: Thread chunking can create highly imbalanced work for small listsUsing len(psms) // num_threads can create many 1-element tasks and underutilize threads for small inputs. Consider reusing get_optimal_thread_count to bound threads to data size.
Apply this diff:
- if num_threads is None: - num_threads = os.cpu_count() or DEFAULT_NUM_THREADS + if num_threads is None: + num_threads = get_optimal_thread_count(len(psms))
96-113: Remove unused locals and narrow exception in SpectrumMatchingWorker.match_spectrum_peptide
spectrum_native_idandrtare unused; the innerexcept Exception as edoesn’t usee.Apply this diff:
- # Get spectrum basic information - spectrum_native_id = getattr(psm.spectrum, 'native_id', None) - rt = getattr(psm.spectrum, 'rt', None) + # Spectrum basic information is attached below via getattr on psm/psm.spectrum as needed @@ - try: - matched_peaks = psm.peptide.match_peaks(psm.spectrum) - except Exception as e: - matched_peaks = [] + try: + matched_peaks = psm.peptide.match_peaks(psm.spectrum) + except Exception: + matched_peaks = []
11-11: Drop unused import
threadingis unused in this module.-import threadingpyLucXor/psm.py (3)
781-786: Simplify and harden decoy detectionUse any(...) directly; it’s clearer and avoids early-return logic errors.
- for aa in unmod_seq: - if aa in DECOY_AA_MAP: - return True - - return False + return any(aa in DECOY_AA_MAP for aa in unmod_seq)
1216-1219: Remove unused local variable
original_toleranceis assigned but never used.- original_tolerance = self.config.get('fragment_mass_tolerance', 0.1) temp_config = self.config.copy()
7-28: Prune unused imports (keeps runtime trim and static analysis clean)A number of imports are unused in this module.
-import math -from typing import Dict, List, Optional, Tuple, Set, Any, Union +import math +from typing import Dict, List, Optional, Tuple, Any, Union @@ -import pyopenms from pyopenms import AASequence, Modification @@ - NTERM_MOD, CTERM_MOD, AA_MASSES, DECOY_AA_MAP, AA_DECOY_MAP, NEUTRAL_LOSSES, MIN_DELTA_SCORE, PROTON_MASS, WATER_MASS, - DECOY_AMINO_ACIDS, PHOSPHO_MOD_MASS, OXIDATION_MASS + NTERM_MOD, CTERM_MOD, AA_MASSES, DECOY_AA_MAP, PROTON_MASS, WATER_MASS, + PHOSPHO_MOD_MASS, OXIDATION_MASS ) -from .peak import Peak from .peptide import Peptide from .spectrum import Spectrum -from .flr import FLRCalculator from .globals import get_decoy_symbolpyLucXor/peptide.py (2)
687-701: Remove unused debug block in match_peaks or emit it to logs
output_datais built but never used. Either remove it or log/emit when debug is enabled.- if use_config.get('debug', False): - output_data = { - 'matched_peaks': matched_peaks, - 'theoretical_ions': { - 'y_ions': y_ions, - 'b_ions': b_ions - }, - 'peptide': self.peptide, - 'modified_peptide': self.mod_peptide, - 'charge': self.charge - } + if use_config.get('debug', False): + logger.debug(f"Matched peaks: {len(matched_peaks)} | peptide={self.peptide} z={self.charge}")
521-586: Potential index mismatch in get_permutations for ambiguous peptidesYou iterate sites from self.peptide (which contains marker substrings) but then index into self.mod_peptide. These indices likely diverge once markers are present, producing incorrect lowercase positions.
Consider deriving candidate sites from a “marker-free” sequence and mapping to mod_peptide indices consistently. For example, build an index map from original peptide positions (excluding markers) to mod_peptide indices before flipping case. I can draft this refactor if this method is still in use; otherwise consider removing it to reduce confusion since PSM.generate_permutations covers the needed logic.
pyLucXor/models.py (3)
936-964: Remove unused local and simplify CID distance log-density
muis assigned but not used; the implementation assumes mean 0. Keep it concise.- mu = 0.0 - var = charge_model.var_dist_b + var = charge_model.var_dist_b @@ - log_prob = -0.5 * np.log(2 * np.pi * var) - 0.5 * (x ** 2) / var + log_prob = -0.5 * np.log(2 * np.pi * var) - 0.5 * (x ** 2) / var return log_prob
8-18: Trim unused imports
typing.Anyand constantsALGORITHM_CID/ALGORITHM_HCDare unused. Keep imports tight.-from typing import Dict, List, Optional, Tuple, Any +from typing import Dict, List, Optional, Tuple @@ -from .constants import ALGORITHM_CID, ALGORITHM_HCD, TINY_NUM +from .constants import TINY_NUM
535-553: Style: collapse simple if/else paddings with ternaries (optional)The padding of min_i/max_i blocks can be simplified; not functionally required but improves readability.
- if min_i < 0: - min_i = min_i + (min_i * padding) - else: - min_i = min_i - (min_i * padding) + min_i = min_i + (min_i * padding) if min_i < 0 else min_i - (min_i * padding) @@ - if max_i < 0: - max_i = max_i - (max_i * padding) - else: - max_i = max_i + (max_i * padding) + max_i = max_i - (max_i * padding) if max_i < 0 else max_i + (max_i * padding)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (14)
pyLucXor/README.md(1 hunks)pyLucXor/__init__.py(1 hunks)pyLucXor/cli.py(1 hunks)pyLucXor/config.py(1 hunks)pyLucXor/constants.py(1 hunks)pyLucXor/core.py(1 hunks)pyLucXor/flr.py(1 hunks)pyLucXor/globals.py(1 hunks)pyLucXor/models.py(1 hunks)pyLucXor/parallel.py(1 hunks)pyLucXor/peak.py(1 hunks)pyLucXor/peptide.py(1 hunks)pyLucXor/psm.py(1 hunks)pyLucXor/spectrum.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (10)
pyLucXor/flr.py (2)
pyLucXor/core.py (1)
add_psm(31-38)pyLucXor/globals.py (1)
record_flr_estimates(33-42)
pyLucXor/globals.py (2)
pyLucXor/flr.py (1)
record_flr_estimates(642-663)pyLucXor/psm.py (1)
clear_scores(1015-1037)
pyLucXor/psm.py (5)
pyLucXor/peak.py (1)
Peak(10-129)pyLucXor/peptide.py (4)
get_precursor_mass(165-196)get_unmodified_sequence(883-907)build_ion_ladders(207-268)match_peaks(587-702)pyLucXor/spectrum.py (5)
Spectrum(14-134)is_empty(112-119)median_normalize_spectra(75-101)calc_relative_intensity(70-73)get_peaks(103-110)pyLucXor/flr.py (2)
add_psm(61-81)find_closest_flr(736-764)pyLucXor/globals.py (2)
globals(13-79)get_decoy_symbol(81-99)
pyLucXor/parallel.py (3)
pyLucXor/psm.py (3)
score_permutations(556-733)process(322-452)process_round2(454-488)pyLucXor/cli.py (1)
process_psms(741-773)pyLucXor/peptide.py (1)
match_peaks(587-702)
pyLucXor/__init__.py (4)
pyLucXor/cli.py (1)
PyLuciPHOr2(40-773)pyLucXor/peptide.py (1)
Peptide(24-907)pyLucXor/psm.py (1)
PSM(31-1367)pyLucXor/models.py (2)
CIDModel(787-964)HCDModel(967-1122)
pyLucXor/core.py (3)
pyLucXor/psm.py (5)
PSM(31-1367)process(322-452)clear_scores(1015-1037)get_results(916-926)update_peptide_id(1279-1321)pyLucXor/flr.py (5)
FLRCalculator(15-798)add_psm(61-81)calculate_flr_estimates(596-640)record_flr_estimates(642-663)assign_flr_to_psms(665-705)pyLucXor/config.py (3)
LucXorConfig(14-113)get(49-60)to_dict(72-79)
pyLucXor/peptide.py (4)
pyLucXor/globals.py (2)
globals(13-79)get_decoy_symbol(81-99)pyLucXor/peak.py (1)
Peak(10-129)pyLucXor/spectrum.py (1)
get_peaks(103-110)pyLucXor/models.py (8)
get_charge_model(884-894)get_charge_model(1065-1075)get_log_np_density_int(662-723)get_log_np_density_int(896-934)get_log_np_density_int(1077-1099)get_log_np_density_dist_pos(725-769)get_log_np_density_dist_pos(936-964)get_log_np_density_dist_pos(1101-1122)
pyLucXor/cli.py (6)
pyLucXor/psm.py (3)
PSM(31-1367)charge(1244-1248)process(322-452)pyLucXor/peptide.py (2)
Peptide(24-907)build_ion_ladders(207-268)pyLucXor/models.py (4)
CIDModel(787-964)HCDModel(967-1122)build(802-834)build(985-1013)pyLucXor/spectrum.py (2)
Spectrum(14-134)get_peaks(103-110)pyLucXor/flr.py (6)
FLRCalculator(15-798)add_psm(61-81)calculate_flr_estimates(596-640)record_flr_estimates(642-663)assign_flr_to_psms(665-705)save_delta_score_flr_mapping(711-734)pyLucXor/parallel.py (3)
parallel_psm_processing(170-201)get_optimal_thread_count(240-261)process_psms(31-34)
pyLucXor/constants.py (1)
pyLucXor/config.py (1)
update(36-47)
pyLucXor/models.py (4)
pyLucXor/peak.py (1)
Peak(10-129)pyLucXor/config.py (1)
get(49-60)pyLucXor/psm.py (2)
charge(1244-1248)_match_peaks(1190-1227)pyLucXor/spectrum.py (1)
get_peaks(103-110)
🪛 Ruff (0.12.2)
pyLucXor/flr.py
9-9: typing.Dict imported but unused
Remove unused import
(F401)
9-9: typing.Any imported but unused
Remove unused import
(F401)
9-9: typing.Optional imported but unused
Remove unused import
(F401)
10-10: concurrent.futures.ThreadPoolExecutor imported but unused
Remove unused import
(F401)
10-10: concurrent.futures.as_completed imported but unused
Remove unused import
(F401)
pyLucXor/globals.py
6-6: typing.Optional imported but unused
Remove unused import: typing.Optional
(F401)
pyLucXor/spectrum.py
9-9: typing.Dict imported but unused
Remove unused import
(F401)
9-9: typing.List imported but unused
Remove unused import
(F401)
9-9: typing.Optional imported but unused
Remove unused import
(F401)
10-10: pyopenms imported but unused
Remove unused import: pyopenms
(F401)
pyLucXor/psm.py
9-9: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
16-16: pyopenms imported but unused
Remove unused import: pyopenms
(F401)
20-20: .constants.AA_DECOY_MAP imported but unused
Remove unused import
(F401)
20-20: .constants.NEUTRAL_LOSSES imported but unused
Remove unused import
(F401)
20-20: .constants.MIN_DELTA_SCORE imported but unused
Remove unused import
(F401)
21-21: .constants.DECOY_AMINO_ACIDS imported but unused
Remove unused import
(F401)
23-23: .peak.Peak imported but unused
Remove unused import: .peak.Peak
(F401)
26-26: .flr.FLRCalculator imported but unused
Remove unused import: .flr.FLRCalculator
(F401)
506-506: Loop control variable sites not used within loop body
Rename unused sites to _sites
(B007)
525-528: Combine if branches using logical or operator
Combine if branches
(SIM114)
574-577: Use ternary operator tolerance = base_tolerance * 2.0 if is_decoy else base_tolerance instead of if-else-block
(SIM108)
610-613: Combine if branches using logical or operator
Combine if branches
(SIM114)
706-706: Loop control variable s not used within loop body
(B007)
782-786: Use return any(aa in DECOY_AA_MAP for aa in unmod_seq) instead of for loop
Replace with return any(aa in DECOY_AA_MAP for aa in unmod_seq)
(SIM110)
1089-1092: Combine if branches using logical or operator
Combine if branches
(SIM114)
1180-1184: Use return any(aa in DECOY_AA_MAP for aa in sequence) instead of for loop
Replace with return any(aa in DECOY_AA_MAP for aa in sequence)
(SIM110)
1217-1217: Local variable original_tolerance is assigned to but never used
Remove assignment to unused variable original_tolerance
(F841)
1265-1265: Loop control variable mod_positions not used within loop body
Rename unused mod_positions to _mod_positions
(B007)
1272-1272: Loop control variable mod_positions not used within loop body
Rename unused mod_positions to _mod_positions
(B007)
pyLucXor/parallel.py
11-11: threading imported but unused
Remove unused import: threading
(F401)
106-106: Local variable spectrum_native_id is assigned to but never used
Remove assignment to unused variable spectrum_native_id
(F841)
107-107: Local variable rt is assigned to but never used
Remove assignment to unused variable rt
(F841)
112-112: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
pyLucXor/__init__.py
23-23: .models.CIDModel imported but unused
(F401)
23-23: .models.HCDModel imported but unused
(F401)
pyLucXor/core.py
8-8: numpy imported but unused
Remove unused import: numpy
(F401)
10-10: typing.Dict imported but unused
Remove unused import
(F401)
10-10: typing.Any imported but unused
Remove unused import
(F401)
10-10: typing.Optional imported but unused
Remove unused import
(F401)
pyLucXor/peptide.py
9-9: typing.Set imported but unused
Remove unused import
(F401)
9-9: typing.Tuple imported but unused
Remove unused import
(F401)
9-9: typing.Union imported but unused
Remove unused import
(F401)
12-12: pyopenms imported but unused
Remove unused import: pyopenms
(F401)
15-15: .constants.AA_DECOY_MAP imported but unused
Remove unused import
(F401)
16-16: .constants.ION_TYPES imported but unused
Remove unused import
(F401)
16-16: .constants.NEUTRAL_LOSSES imported but unused
Remove unused import
(F401)
281-281: Loop control variable idx not used within loop body
Rename unused idx to _idx
(B007)
689-689: Local variable output_data is assigned to but never used
Remove assignment to unused variable output_data
(F841)
895-898: Combine if branches using logical or operator
Combine if branches
(SIM114)
pyLucXor/cli.py
10-10: json imported but unused
Remove unused import: json
(F401)
29-29: pyLucXor.constants.NTERM_MOD imported but unused
Remove unused import
(F401)
30-30: pyLucXor.constants.CTERM_MOD imported but unused
Remove unused import
(F401)
31-31: pyLucXor.constants.AA_MASSES imported but unused
Remove unused import
(F401)
395-395: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
528-529: Use a single if statement instead of nested if statements
(SIM102)
666-666: Do not use bare except
(E722)
691-691: Do not use bare except
(E722)
pyLucXor/models.py
11-11: typing.Any imported but unused
Remove unused import: typing.Any
(F401)
16-16: .constants.ALGORITHM_CID imported but unused
Remove unused import
(F401)
16-16: .constants.ALGORITHM_HCD imported but unused
Remove unused import
(F401)
545-548: Use ternary operator min_i = min_i + min_i * padding if min_i < 0 else min_i - min_i * padding instead of if-else-block
Replace if-else-block with min_i = min_i + min_i * padding if min_i < 0 else min_i - min_i * padding
(SIM108)
550-553: Use ternary operator max_i = max_i - max_i * padding if max_i < 0 else max_i + max_i * padding instead of if-else-block
Replace if-else-block with max_i = max_i - max_i * padding if max_i < 0 else max_i + max_i * padding
(SIM108)
957-957: Local variable mu is assigned to but never used
Remove assignment to unused variable mu
(F841)
🪛 LanguageTool
pyLucXor/README.md
[grammar] ~23-~23: There might be a mistake here.
Context: ...on
Prerequisites
- Python 3.7+
- PyOpenMS
- NumPy
- SciPy
Instal...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ...erequisites
- Python 3.7+
- PyOpenMS
- NumPy
- SciPy
Install from sourc...
(QB_NEW_EN)
[grammar] ~25-~25: There might be a mistake here.
Context: ...es
- Python 3.7+
- PyOpenMS
- NumPy
- SciPy
Install from source
(QB_NEW_EN)
---
[grammar] ~48-~48: There might be a mistake here.
Context: ...trum`: Input spectrum file (mzML format)
- `-id, --input_id`: Input identification file (idXML forma...
(QB_NEW_EN)
---
[grammar] ~49-~49: There might be a mistake here.
Context: ...Input identification file (idXML format)
- `-out, --output`: Output file (idXML format)
#### Opt...
(QB_NEW_EN)
---
[grammar] ~54-~54: There might be a mistake here.
Context: ...tation method (CID or HCD, default: CID)
- `--fragment_mass_tolerance`: Fragment mass tolerance (default: 0.5)...
(QB_NEW_EN)
---
[grammar] ~55-~55: There might be a mistake here.
Context: ...: Fragment mass tolerance (default: 0.5)
- `--fragment_error_units`: Tolerance units (Da or ppm, default: D...
(QB_NEW_EN)
---
[grammar] ~56-~56: There might be a mistake here.
Context: ...Tolerance units (Da or ppm, default: Da)
- `--min_mz`: Minimum m/z value to consider (default...
(QB_NEW_EN)
---
[grammar] ~57-~57: There might be a mistake here.
Context: ...m m/z value to consider (default: 150.0)
- `--target_modifications`: Target modifications (default: ["Phosp...
(QB_NEW_EN)
---
[grammar] ~58-~58: There might be a mistake here.
Context: ...pho (S)", "Phospho (T)", "Phospho (Y)"])
- `--neutral_losses`: Neutral loss definitions (default: ["s...
(QB_NEW_EN)
---
[grammar] ~59-~59: There might be a mistake here.
Context: ...ions (default: ["sty -H3PO4 -97.97690"])
- `--decoy_mass`: Mass for decoy generation (default: 79...
(QB_NEW_EN)
---
[grammar] ~60-~60: There might be a mistake here.
Context: ...or decoy generation (default: 79.966331)
- `--max_charge_state`: Maximum charge state (default: 5)
- `...
(QB_NEW_EN)
---
[grammar] ~61-~61: There might be a mistake here.
Context: ...tate`: Maximum charge state (default: 5)
- `--max_peptide_length`: Maximum peptide length (default: 40)
...
(QB_NEW_EN)
---
[grammar] ~62-~62: There might be a mistake here.
Context: ...h`: Maximum peptide length (default: 40)
- `--max_num_perm`: Maximum permutations (default: 16384)
...
(QB_NEW_EN)
---
[grammar] ~63-~63: There might be a mistake here.
Context: ...`: Maximum permutations (default: 16384)
- `--modeling_score_threshold`: Minimum score for modeling (default: 0...
(QB_NEW_EN)
---
[grammar] ~64-~64: There might be a mistake here.
Context: ...nimum score for modeling (default: 0.95)
- `--min_num_psms_model`: Minimum PSMs for modeling (default: 50...
(QB_NEW_EN)
---
[grammar] ~65-~65: There might be a mistake here.
Context: ... Minimum PSMs for modeling (default: 50)
- `--num_threads`: Number of threads (default: 4)
- `--r...
(QB_NEW_EN)
---
[grammar] ~66-~66: There might be a mistake here.
Context: ...threads`: Number of threads (default: 4)
- `--rt_tolerance`: Retention time tolerance (default: 0.0...
(QB_NEW_EN)
---
[grammar] ~67-~67: There might be a mistake here.
Context: ...Retention time tolerance (default: 0.01)
- `--debug`: Enable debug mode
- `--log_file`: Cus...
(QB_NEW_EN)
---
[grammar] ~68-~68: There might be a mistake here.
Context: ...t: 0.01)
- `--debug`: Enable debug mode
- `--log_file`: Custom log file path
#### Example
...
(QB_NEW_EN)
---
[grammar] ~111-~111: There might be a mistake here.
Context: ...m:
### Stage 1: FLR Estimation (RN=0)
- Process all PSMs with both real and deco...
(QB_NEW_EN)
---
[grammar] ~116-~116: There might be a mistake here.
Context: ...ns
### Stage 2: FLR Assignment (RN=1)
- Re-process PSMs using only real permutat...
(QB_NEW_EN)
---
[grammar] ~123-~123: There might be a mistake here.
Context: ...tool generates an idXML file containing:
- **Luciphor_delta_score**: Main localizatio...
(QB_NEW_EN)
---
[grammar] ~124-~124: There might be a mistake here.
Context: ...r_delta_score**: Main localization score
- **Luciphor_pep_score**: Peptide identifica...
(QB_NEW_EN)
---
[grammar] ~125-~125: There might be a mistake here.
Context: ...ep_score**: Peptide identification score
- **Luciphor_global_flr**: Global false loca...
(QB_NEW_EN)
---
[grammar] ~126-~126: There might be a mistake here.
Context: ...al_flr**: Global false localization rate
- **Luciphor_local_flr**: Local false locali...
(QB_NEW_EN)
---
[grammar] ~133-~133: There might be a mistake here.
Context: ...method**: CID or HCD specific parameters
- **Mass tolerances**: Fragment and precurso...
(QB_NEW_EN)
---
[grammar] ~134-~134: There might be a mistake here.
Context: ...: Fragment and precursor mass tolerances
- **Modification definitions**: Target PTMs ...
(QB_NEW_EN)
---
[grammar] ~135-~135: There might be a mistake here.
Context: ...itions**: Target PTMs and neutral losses
- **Modeling parameters**: Score thresholds ...
(QB_NEW_EN)
---
[grammar] ~136-~136: There might be a mistake here.
Context: ... Score thresholds and minimum PSM counts
- **Performance settings**: Threading and me...
(QB_NEW_EN)
---
[grammar] ~141-~141: There might be a mistake here.
Context: ...ensive logging with configurable levels:
- **INFO**: General progress information
- ...
(QB_NEW_EN)
---
[grammar] ~142-~142: There might be a mistake here.
Context: ...- **INFO**: General progress information
- **DEBUG**: Detailed processing information...
(QB_NEW_EN)
---
[grammar] ~143-~143: There might be a mistake here.
Context: ...DEBUG**: Detailed processing information
- **WARNING**: Non-critical issues
- **ERRO...
(QB_NEW_EN)
---
[grammar] ~144-~144: There might be a mistake here.
Context: ...tion
- **WARNING**: Non-critical issues
- **ERROR**: Critical errors
Log files ar...
(QB_NEW_EN)
---
[grammar] ~151-~151: There might be a mistake here.
Context: ...c thread optimization based on PSM count
- **Memory efficient**: Streaming processing...
(QB_NEW_EN)
---
[grammar] ~152-~152: There might be a mistake here.
Context: ... Streaming processing for large datasets
- **Scalable**: Handles datasets with thousa...
(QB_NEW_EN)
---
[grammar] ~157-~157: There might be a mistake here.
Context: ...penMS**: Mass spectrometry data handling
- **NumPy**: Numerical computations
- **Sci...
(QB_NEW_EN)
---
[grammar] ~158-~158: There might be a mistake here.
Context: ...ing
- **NumPy**: Numerical computations
- **SciPy**: Statistical functions
- **Stan...
(QB_NEW_EN)
---
[grammar] ~159-~159: There might be a mistake here.
Context: ...ions
- **SciPy**: Statistical functions
- **Standard library**: argparse, logging, j...
(QB_NEW_EN)
---
[style] ~177-~177: Consider using a less common alternative to make your writing sound more unique and professional.
Context: ...tributing
Contributions are welcome! Please feel free to submit issues and pull requests.
## ...
(FEEL_FREE_TO_STYLE_ME)
</details>
</details>
<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>
<details>
<summary>pyLucXor/README.md</summary>
170-170: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
</details>
<details>
<summary>🔇 Additional comments (5)</summary><blockquote>
<details>
<summary>pyLucXor/README.md (2)</summary>
`65-67`: **Docs default for --num_threads disagrees with code (6 vs 4).**
README lists default num_threads=4, but DEFAULT_CONFIG in pyLucXor/constants.py sets it to 6. Please align one of them to avoid user confusion.
Apply one of the following diffs:
- If 6 is intended (change README):
```diff
- - `--num_threads`: Number of threads (default: 4)
+ - `--num_threads`: Number of threads (default: 6)
- If 4 is intended (change constants.py; see comment in constants.py too):
- "num_threads": 6,
+ "num_threads": 4,121-128: Clarify FLR field names to match output tag casing.
The README uses bold keys like Luciphor_delta_score, etc. Ensure the exact tag names and casing match the idXML writer implementation to prevent downstream parsing issues.
If the writer uses different keys (e.g., LuciPHOr vs Luciphor), update the README accordingly.
pyLucXor/constants.py (1)
121-136: Default num_threads=6 conflicts with README (4). Pick one.
This default should match the documented default in README and CLI help.
Apply if 4 is intended:
- "num_threads": 6,
+ "num_threads": 4,Or update README to 6 (see corresponding README comment).
pyLucXor/peptide.py (1)
7-20: Prune unused imports to reduce noise
[ suggest_nitpick ]
-from typing import Dict, List, Set, Tuple, Optional, Any, Union
+from typing import Dict, List, Tuple, Optional, Any, Union
@@
-import pyopenms
@@
-from .constants import (
- NTERM_MOD, CTERM_MOD, AA_MASSES, DECOY_AA_MAP, AA_DECOY_MAP,
- WATER_MASS, PROTON_MASS, ION_TYPES, NEUTRAL_LOSSES
-)
+from .constants import (
+ NTERM_MOD, CTERM_MOD, AA_MASSES, DECOY_AA_MAP,
+ WATER_MASS, PROTON_MASS
+)pyLucXor/models.py (1)
821-859: Model training builds matched peaks by calling private PSM._match_peaks
This reaches into a private method and duplicates tolerance logic. Consider using the public Peptide.match_peaks with the PSM’s peptide and spectrum to reduce coupling.
[ suggest_optional_refactor ]
Happy to propose a small wrapper on PSM to expose a public matching utility if desired.
| if name == "LucXor": | ||
| from .core import LucXor | ||
| return LucXor | ||
| elif name == "PyLuciPHOr2": |
There was a problem hiding this comment.
getattr exposes non-existent symbol 'LucXor'
.core defines CoreProcessor, not LucXor. Importing LucXor will fail at runtime.
- if name == "LucXor":
- from .core import LucXor
- return LucXor
+ if name == "LucXor":
+ # Expose CoreProcessor under the historical/public name "LucXor"
+ from .core import CoreProcessor as LucXor
+ return LucXorOptionally, you can also expose CoreProcessor explicitly:
+ elif name == "CoreProcessor":
+ from .core import CoreProcessor
+ return CoreProcessor📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if name == "LucXor": | |
| from .core import LucXor | |
| return LucXor | |
| elif name == "PyLuciPHOr2": | |
| if name == "LucXor": | |
| # Expose CoreProcessor under the historical/public name "LucXor" | |
| from .core import CoreProcessor as LucXor | |
| return LucXor | |
| elif name == "CoreProcessor": | |
| from .core import CoreProcessor | |
| return CoreProcessor | |
| elif name == "PyLuciPHOr2": |
🤖 Prompt for AI Agents
In pyLucXor/__init__.py around lines 10 to 13, __getattr__ tries to import and
return a symbol named "LucXor" but core.py defines CoreProcessor instead; change
the import to from .core import CoreProcessor and return CoreProcessor (or, if
you intended the public name to remain LucXor, import CoreProcessor and return
it under the LucXor alias), and optionally add an explicit exposure (e.g., set
LucXor = CoreProcessor) so the module only exposes existing symbols.
| peptide = Peptide(sequence, {}, config, charge=charge) | ||
|
|
||
| # Build ion ladders | ||
| self.logger.info("Building ion ladders...") | ||
| peptide.build_ion_ladders() | ||
|
|
||
| # Create PSM object with spectrum | ||
| self.logger.info("Creating PSM object...") | ||
| psm = PSM(peptide, spectrum, config) | ||
|
|
There was a problem hiding this comment.
Fix Peptide and PSM constructor calls (wrong argument order/arity)
These calls currently pass config into the mod_pep/scan_num positional parameters, breaking object initialization.
- peptide = Peptide(sequence, {}, config, charge=charge)
+ peptide = Peptide(sequence, config, charge=charge)
@@
- psm = PSM(peptide, spectrum, config)
+ psm = PSM(peptide, spectrum, config=config)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| peptide = Peptide(sequence, {}, config, charge=charge) | |
| # Build ion ladders | |
| self.logger.info("Building ion ladders...") | |
| peptide.build_ion_ladders() | |
| # Create PSM object with spectrum | |
| self.logger.info("Creating PSM object...") | |
| psm = PSM(peptide, spectrum, config) | |
| peptide = Peptide(sequence, config, charge=charge) | |
| # Build ion ladders | |
| self.logger.info("Building ion ladders...") | |
| peptide.build_ion_ladders() | |
| # Create PSM object with spectrum | |
| self.logger.info("Creating PSM object...") | |
| psm = PSM(peptide, spectrum, config=config) | |
🤖 Prompt for AI Agents
In pyLucXor/cli.py around lines 296 to 305, the Peptide and PSM constructors are
being called with config passed positionally into parameters that expect
mods/scan_num, breaking initialization; change the calls to pass the mods and
scan_num in their correct positional places (or omit them) and supply config
explicitly as a keyword argument (e.g., Peptide(sequence, mods_dict,
charge=charge, config=config) and PSM(peptide, spectrum, scan_num=...,
config=config) or equivalent according to the actual constructor signatures) so
config is not passed into the wrong positional parameters.
| # Initialize with default configuration | ||
| self.config = DEFAULT_CONFIG.copy() | ||
|
|
There was a problem hiding this comment.
Shallow copy of DEFAULT_CONFIG leaks nested mutations to the global default.
Lists/dicts inside DEFAULT_CONFIG are shared with all instances. Mutating them (e.g., append to target_modifications) will corrupt the module default and other instances.
Apply this diff:
+from copy import deepcopy
@@
- # Initialize with default configuration
- self.config = DEFAULT_CONFIG.copy()
+ # Initialize with a deep copy of the default configuration
+ self.config = deepcopy(DEFAULT_CONFIG)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Initialize with default configuration | |
| self.config = DEFAULT_CONFIG.copy() | |
| from copy import deepcopy | |
| # Initialize with a deep copy of the default configuration | |
| self.config = deepcopy(DEFAULT_CONFIG) |
🤖 Prompt for AI Agents
In pyLucXor/config.py around lines 29 to 31, the code uses DEFAULT_CONFIG.copy()
which performs a shallow copy and leaks nested mutable objects (lists/dicts)
between instances; replace the shallow copy with a deep copy (e.g., import copy
and use copy.deepcopy(DEFAULT_CONFIG)) so each Config instance gets its own
independent nested structures, and ensure the copy module is imported at the top
of the file.
| # Physical constants | ||
| WATER = 18.01056 | ||
| PROTON = 1.00728 No newline at end of file |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Duplicate physical constants (WATER/PROTON) may diverge from precise values.
WATER_MASS/PROTON_MASS are precise, while WATER/PROTON are truncated. To avoid divergence, alias them to the precise constants.
Apply this diff:
-# Physical constants
-WATER = 18.01056
-PROTON = 1.00728
+# Physical constants (aliases for precise values)
+WATER = WATER_MASS
+PROTON = PROTON_MASS📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Physical constants | |
| WATER = 18.01056 | |
| PROTON = 1.00728 | |
| # Physical constants (aliases for precise values) | |
| WATER = WATER_MASS | |
| PROTON = PROTON_MASS |
🤖 Prompt for AI Agents
In pyLucXor/constants.py around lines 197 to 199, the file defines WATER and
PROTON with truncated values while there are precise constants WATER_MASS and
PROTON_MASS elsewhere; replace the hardcoded truncated values by aliasing WATER
= WATER_MASS and PROTON = PROTON_MASS so the two names refer to the single
precise values and cannot drift apart.
| # Process PSM (generate only real permutations, no decoys) | ||
| psm.process_stage2(self.config.to_dict()) | ||
|
|
There was a problem hiding this comment.
Broken API call: PSM has process_round2(...), not process_stage2(...)
Per PSM docs, second-round processing should call process_round2(flr_calculator); current call will raise AttributeError and ignores FLR mapping.
- # Process PSM (generate only real permutations, no decoys)
- psm.process_stage2(self.config.to_dict())
+ # Process PSM (real permutations only) and assign FLR from mapping
+ psm.process_round2(self.flr_calculator)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Process PSM (generate only real permutations, no decoys) | |
| psm.process_stage2(self.config.to_dict()) | |
| # Process PSM (real permutations only) and assign FLR from mapping | |
| psm.process_round2(self.flr_calculator) |
🤖 Prompt for AI Agents
In pyLucXor/core.py around lines 121 to 123, the code calls a non-existent
psm.process_stage2(...) which will raise an AttributeError and omits the FLR
mapping; replace this call with the correct API
psm.process_round2(flr_calculator) and pass the FLR calculator/mapping derived
from the current config (e.g., use an existing self.flr_calculator or construct
one from self.config before the call), ensuring the variable is properly
initialized and documented so process_round2 receives the expected FLR
calculator.
| if seq[i:i+9] == "(Phospho)": | ||
| i += 9 | ||
| elif seq[i:i+9] == "(Oxidation)": | ||
| i += 9 | ||
| else: |
There was a problem hiding this comment.
Incorrect oxidation marker length in _validate_permutation
Same 11-char issue here.
- elif seq[i:i+9] == "(Oxidation)":
- i += 9
+ elif seq[i:i+11] == "(Oxidation)":
+ i += 11📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if seq[i:i+9] == "(Phospho)": | |
| i += 9 | |
| elif seq[i:i+9] == "(Oxidation)": | |
| i += 9 | |
| else: | |
| if seq[i:i+9] == "(Phospho)": | |
| i += 9 | |
| elif seq[i:i+11] == "(Oxidation)": | |
| i += 11 | |
| else: |
🧰 Tools
🪛 Ruff (0.12.2)
525-528: Combine if branches using logical or operator
Combine if branches
(SIM114)
🤖 Prompt for AI Agents
In pyLucXor/psm.py around lines 525 to 529, the branch that advances past the
"(Oxidation)" marker uses i += 9 (same incorrect fixed length as the phospho
case); update that to use the correct length for "(Oxidation)" (or, better,
compute it dynamically with len("(Oxidation)") ) so the index advances by the
full marker length instead of the current wrong value.
| def normalize_spectrum(self) -> None: | ||
| """Normalize spectrum intensities.""" | ||
| if self.spectrum is None: | ||
| self.logger.warning("No spectrum data available for normalization") | ||
| return | ||
|
|
||
| # Get peaks | ||
| mz_values, intensities = self.spectrum.get_peaks() | ||
|
|
||
| if len(intensities) == 0: | ||
| self.logger.warning("No intensity values available for normalization") | ||
| return | ||
|
|
||
| # Check data validity | ||
| valid_mask = np.isfinite(intensities) & (intensities > 0) | ||
| if not np.any(valid_mask): | ||
| self.logger.warning("No valid intensity values found for normalization") | ||
| return | ||
|
|
||
| # Only use valid intensity values | ||
| valid_intensities = intensities[valid_mask] | ||
| valid_mz_values = mz_values[valid_mask] | ||
|
|
||
| # Log intensity statistics | ||
| self.logger.debug(f"Intensity stats before normalization: min={np.min(valid_intensities):.2f}, " | ||
| f"max={np.max(valid_intensities):.2f}, mean={np.mean(valid_intensities):.2f}") | ||
|
|
||
| # Normalize to sum of 1 | ||
| total = np.sum(valid_intensities) | ||
| if total > 0: | ||
| normalized_intensities = valid_intensities / total | ||
|
|
||
| # Update spectrum with normalized values | ||
| self.spectrum.set_peaks(valid_mz_values, normalized_intensities) | ||
|
|
||
| self.logger.debug(f"Normalized {len(normalized_intensities)} intensity values") | ||
| else: | ||
| self.logger.warning("Total intensity is zero, skipping normalization") | ||
|
|
There was a problem hiding this comment.
Fix logger attribute access in normalize_spectrum
self.logger is not defined on PSM; use the module logger. Calls will raise AttributeError if this method is ever invoked.
Apply this diff:
- if self.spectrum is None:
- self.logger.warning("No spectrum data available for normalization")
+ if self.spectrum is None:
+ logger.warning("No spectrum data available for normalization")
@@
- if len(intensities) == 0:
- self.logger.warning("No intensity values available for normalization")
+ if len(intensities) == 0:
+ logger.warning("No intensity values available for normalization")
@@
- if not np.any(valid_mask):
- self.logger.warning("No valid intensity values found for normalization")
+ if not np.any(valid_mask):
+ logger.warning("No valid intensity values found for normalization")
@@
- self.logger.debug(f"Intensity stats before normalization: min={np.min(valid_intensities):.2f}, "
+ logger.debug(f"Intensity stats before normalization: min={np.min(valid_intensities):.2f}, "
f"max={np.max(valid_intensities):.2f}, mean={np.mean(valid_intensities):.2f}")
@@
- self.logger.debug(f"Normalized {len(normalized_intensities)} intensity values")
+ logger.debug(f"Normalized {len(normalized_intensities)} intensity values")
else:
- self.logger.warning("Total intensity is zero, skipping normalization")
+ logger.warning("Total intensity is zero, skipping normalization")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def normalize_spectrum(self) -> None: | |
| """Normalize spectrum intensities.""" | |
| if self.spectrum is None: | |
| self.logger.warning("No spectrum data available for normalization") | |
| return | |
| # Get peaks | |
| mz_values, intensities = self.spectrum.get_peaks() | |
| if len(intensities) == 0: | |
| self.logger.warning("No intensity values available for normalization") | |
| return | |
| # Check data validity | |
| valid_mask = np.isfinite(intensities) & (intensities > 0) | |
| if not np.any(valid_mask): | |
| self.logger.warning("No valid intensity values found for normalization") | |
| return | |
| # Only use valid intensity values | |
| valid_intensities = intensities[valid_mask] | |
| valid_mz_values = mz_values[valid_mask] | |
| # Log intensity statistics | |
| self.logger.debug(f"Intensity stats before normalization: min={np.min(valid_intensities):.2f}, " | |
| f"max={np.max(valid_intensities):.2f}, mean={np.mean(valid_intensities):.2f}") | |
| # Normalize to sum of 1 | |
| total = np.sum(valid_intensities) | |
| if total > 0: | |
| normalized_intensities = valid_intensities / total | |
| # Update spectrum with normalized values | |
| self.spectrum.set_peaks(valid_mz_values, normalized_intensities) | |
| self.logger.debug(f"Normalized {len(normalized_intensities)} intensity values") | |
| else: | |
| self.logger.warning("Total intensity is zero, skipping normalization") | |
| def normalize_spectrum(self) -> None: | |
| """Normalize spectrum intensities.""" | |
| if self.spectrum is None: | |
| logger.warning("No spectrum data available for normalization") | |
| return | |
| # Get peaks | |
| mz_values, intensities = self.spectrum.get_peaks() | |
| if len(intensities) == 0: | |
| logger.warning("No intensity values available for normalization") | |
| return | |
| # Check data validity | |
| valid_mask = np.isfinite(intensities) & (intensities > 0) | |
| if not np.any(valid_mask): | |
| logger.warning("No valid intensity values found for normalization") | |
| return | |
| # Only use valid intensity values | |
| valid_intensities = intensities[valid_mask] | |
| valid_mz_values = mz_values[valid_mask] | |
| # Log intensity statistics | |
| logger.debug(f"Intensity stats before normalization: min={np.min(valid_intensities):.2f}, " | |
| f"max={np.max(valid_intensities):.2f}, mean={np.mean(valid_intensities):.2f}") | |
| # Normalize to sum of 1 | |
| total = np.sum(valid_intensities) | |
| if total > 0: | |
| normalized_intensities = valid_intensities / total | |
| # Update spectrum with normalized values | |
| self.spectrum.set_peaks(valid_mz_values, normalized_intensities) | |
| logger.debug(f"Normalized {len(normalized_intensities)} intensity values") | |
| else: | |
| logger.warning("Total intensity is zero, skipping normalization") |
| def reduce_nl_peak(self, precursor_nl_mass: float = 0.0) -> None: | ||
| """Reduce neutral loss peaks. | ||
|
|
||
| Args: | ||
| precursor_nl_mass: Precursor neutral loss mass | ||
| """ | ||
| if self.spectrum is None: | ||
| self.logger.warning("No spectrum data available for neutral loss reduction") | ||
| return | ||
|
|
||
| # Get peaks | ||
| mz_values, intensities = self.spectrum.get_peaks() | ||
|
|
||
| if len(intensities) == 0: | ||
| self.logger.warning("No intensity values available for neutral loss reduction") | ||
| return | ||
|
|
||
| # Find peaks to reduce | ||
| nl_indices = [] | ||
| for i, mz in enumerate(mz_values): | ||
| # Check if peak is within neutral loss mass window | ||
| if abs(mz - precursor_nl_mass) < 0.5: # Using 0.5 Da window | ||
| nl_indices.append(i) | ||
|
|
||
| if nl_indices: | ||
| # Reduce intensity of neutral loss peaks | ||
| for idx in nl_indices: | ||
| intensities[idx] *= 0.1 # Reduce to 10% of original intensity | ||
|
|
||
| # Update spectrum | ||
| self.spectrum.set_peaks(mz_values, intensities) | ||
|
|
||
| self.logger.debug(f"Reduced {len(nl_indices)} neutral loss peaks") | ||
| else: | ||
| self.logger.debug("No neutral loss peaks found") | ||
|
|
There was a problem hiding this comment.
Fix logger attribute access in reduce_nl_peak
Same self.logger issue here.
- if self.spectrum is None:
- self.logger.warning("No spectrum data available for neutral loss reduction")
+ if self.spectrum is None:
+ logger.warning("No spectrum data available for neutral loss reduction")
@@
- if len(intensities) == 0:
- self.logger.warning("No intensity values available for neutral loss reduction")
+ if len(intensities) == 0:
+ logger.warning("No intensity values available for neutral loss reduction")
@@
- self.logger.debug(f"Reduced {len(nl_indices)} neutral loss peaks")
+ logger.debug(f"Reduced {len(nl_indices)} neutral loss peaks")
else:
- self.logger.debug("No neutral loss peaks found")
+ logger.debug("No neutral loss peaks found")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def reduce_nl_peak(self, precursor_nl_mass: float = 0.0) -> None: | |
| """Reduce neutral loss peaks. | |
| Args: | |
| precursor_nl_mass: Precursor neutral loss mass | |
| """ | |
| if self.spectrum is None: | |
| self.logger.warning("No spectrum data available for neutral loss reduction") | |
| return | |
| # Get peaks | |
| mz_values, intensities = self.spectrum.get_peaks() | |
| if len(intensities) == 0: | |
| self.logger.warning("No intensity values available for neutral loss reduction") | |
| return | |
| # Find peaks to reduce | |
| nl_indices = [] | |
| for i, mz in enumerate(mz_values): | |
| # Check if peak is within neutral loss mass window | |
| if abs(mz - precursor_nl_mass) < 0.5: # Using 0.5 Da window | |
| nl_indices.append(i) | |
| if nl_indices: | |
| # Reduce intensity of neutral loss peaks | |
| for idx in nl_indices: | |
| intensities[idx] *= 0.1 # Reduce to 10% of original intensity | |
| # Update spectrum | |
| self.spectrum.set_peaks(mz_values, intensities) | |
| self.logger.debug(f"Reduced {len(nl_indices)} neutral loss peaks") | |
| else: | |
| self.logger.debug("No neutral loss peaks found") | |
| def reduce_nl_peak(self, precursor_nl_mass: float = 0.0) -> None: | |
| """Reduce neutral loss peaks. | |
| Args: | |
| precursor_nl_mass: Precursor neutral loss mass | |
| """ | |
| if self.spectrum is None: | |
| logger.warning("No spectrum data available for neutral loss reduction") | |
| return | |
| # Get peaks | |
| mz_values, intensities = self.spectrum.get_peaks() | |
| if len(intensities) == 0: | |
| logger.warning("No intensity values available for neutral loss reduction") | |
| return | |
| # Find peaks to reduce | |
| nl_indices = [] | |
| for i, mz in enumerate(mz_values): | |
| # Check if peak is within neutral loss mass window | |
| if abs(mz - precursor_nl_mass) < 0.5: # Using 0.5 Da window | |
| nl_indices.append(i) | |
| if nl_indices: | |
| # Reduce intensity of neutral loss peaks | |
| for idx in nl_indices: | |
| intensities[idx] *= 0.1 # Reduce to 10% of original intensity | |
| # Update spectrum | |
| self.spectrum.set_peaks(mz_values, intensities) | |
| logger.debug(f"Reduced {len(nl_indices)} neutral loss peaks") | |
| else: | |
| logger.debug("No neutral loss peaks found") |
🤖 Prompt for AI Agents
In pyLucXor/psm.py around lines 967 to 1002, the method uses self.logger which
doesn't match the class's logger attribute; replace self.logger with the actual
logger attribute used elsewhere in this class (e.g., self._logger or self.log)
and update all occurrences in this method accordingly; if the class has no
logger attribute, add one in __init__ like self._logger =
logging.getLogger(__name__) and ensure import logging is present, then use that
attribute (and adjust the two warning/debug calls in this method to use it).
| def get_spectrum_peaks(self) -> Tuple[np.ndarray, np.ndarray]: | ||
| """Get spectrum peaks. | ||
|
|
||
| Returns: | ||
| Tuple of (m/z array, intensity array) | ||
| """ | ||
| if self.spectrum is None: | ||
| self.logger.warning("No spectrum data available") | ||
| return np.array([]), np.array([]) | ||
|
|
||
| return self.spectrum.get_peaks() | ||
|
|
There was a problem hiding this comment.
Fix logger attribute access in get_spectrum_peaks
Same self.logger issue again.
- if self.spectrum is None:
- self.logger.warning("No spectrum data available")
+ if self.spectrum is None:
+ logger.warning("No spectrum data available")
return np.array([]), np.array([])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_spectrum_peaks(self) -> Tuple[np.ndarray, np.ndarray]: | |
| """Get spectrum peaks. | |
| Returns: | |
| Tuple of (m/z array, intensity array) | |
| """ | |
| if self.spectrum is None: | |
| self.logger.warning("No spectrum data available") | |
| return np.array([]), np.array([]) | |
| return self.spectrum.get_peaks() | |
| def get_spectrum_peaks(self) -> Tuple[np.ndarray, np.ndarray]: | |
| """Get spectrum peaks. | |
| Returns: | |
| Tuple of (m/z array, intensity array) | |
| """ | |
| if self.spectrum is None: | |
| logger.warning("No spectrum data available") | |
| return np.array([]), np.array([]) | |
| return self.spectrum.get_peaks() |
🤖 Prompt for AI Agents
In pyLucXor/psm.py around lines 1003 to 1014, the method uses self.logger which
is not the correct logger attribute for this class; change the access to the
logger attribute used elsewhere in the class (e.g., self._logger) so the warning
call uses the actual logger instance, leaving the warning message and return
behavior unchanged; if the class uses a different logger property name, use that
exact name instead to keep consistency.
| def clear_scores(self) -> None: | ||
| """ | ||
| Clear PSM scores | ||
| """ | ||
| logger.debug(f"Clearing scores for PSM {self.scan_num}") | ||
|
|
||
| # Clear permutation score mappings | ||
| self.pos_permutation_score_map.clear() | ||
| self.neg_permutation_score_map.clear() | ||
|
|
||
| # Reset score-related attributes | ||
| self.delta_score = np.nan | ||
| self.pep_score = np.nan | ||
| self.global_flr = np.nan | ||
| self.local_flr = np.nan | ||
|
|
There was a problem hiding this comment.
Typo: clearing psm_score uses undefined pep_score
self.pep_score doesn’t exist elsewhere; this should reset self.psm_score.
- self.delta_score = np.nan
- self.pep_score = np.nan
- self.global_flr = np.nan
- self.local_flr = np.nan
+ self.delta_score = np.nan
+ self.psm_score = np.nan
+ self.global_flr = np.nan
+ self.local_flr = np.nan📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def clear_scores(self) -> None: | |
| """ | |
| Clear PSM scores | |
| """ | |
| logger.debug(f"Clearing scores for PSM {self.scan_num}") | |
| # Clear permutation score mappings | |
| self.pos_permutation_score_map.clear() | |
| self.neg_permutation_score_map.clear() | |
| # Reset score-related attributes | |
| self.delta_score = np.nan | |
| self.pep_score = np.nan | |
| self.global_flr = np.nan | |
| self.local_flr = np.nan | |
| def clear_scores(self) -> None: | |
| """ | |
| Clear PSM scores | |
| """ | |
| logger.debug(f"Clearing scores for PSM {self.scan_num}") | |
| # Clear permutation score mappings | |
| self.pos_permutation_score_map.clear() | |
| self.neg_permutation_score_map.clear() | |
| # Reset score-related attributes | |
| self.delta_score = np.nan | |
| self.psm_score = np.nan | |
| self.global_flr = np.nan | |
| self.local_flr = np.nan | |
🤖 Prompt for AI Agents
In pyLucXor/psm.py around lines 1015 to 1030, the clear_scores method resets
self.pep_score, which is a typo because the class uses self.psm_score elsewhere;
change the reset to self.psm_score = np.nan (and remove or do not add pep_score)
so the correct attribute is cleared; ensure no other references to pep_score
remain and run tests to confirm attribute consistency.
There was a problem hiding this comment.
Pull Request Overview
This PR introduces pyLucXor, a complete Python package implementation for phosphosite localization analysis with False Localization Rate (FLR) calculation. The implementation provides both a command-line interface and Python API for processing peptide-spectrum matches (PSMs) to determine the most likely phosphorylation sites on peptides.
Key changes include:
- Complete two-stage PSM processing workflow with statistical model training and FLR estimation
- Comprehensive peptide sequence analysis with fragment ion generation for both CID and HCD fragmentation
- Parallel processing framework supporting multi-threading for improved performance
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| pyLucXor/psm.py | Core PSM class with scoring logic and two-stage processing workflow |
| pyLucXor/models.py | Statistical models for CID/HCD fragmentation with kernel density estimation |
| pyLucXor/peptide.py | Peptide sequence analysis with ion ladder generation and peak matching |
| pyLucXor/flr.py | FLR calculator implementing statistical modeling for false localization rates |
| pyLucXor/spectrum.py | Mass spectrum data processing with normalization and peak handling |
| pyLucXor/core.py | Main processing orchestrator coordinating the two-stage workflow |
| pyLucXor/parallel.py | Parallel processing infrastructure with worker classes |
| pyLucXor/peak.py | Peak representation for mass spectrometry data |
| pyLucXor/globals.py | Global state management utilities |
| pyLucXor/constants.py | Physical constants and configuration parameters |
| pyLucXor/config.py | Configuration management with validation |
Comments suppressed due to low confidence (1)
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| """ | ||
| for i in range(self.N): | ||
| if abs(self.mz[i] - mz) < 1e-6: # floating point comparison | ||
| return i |
There was a problem hiding this comment.
The hardcoded tolerance value 1e-6 for m/z comparison should be configurable or defined as a named constant to improve maintainability and allow for different precision requirements.
| return i | |
| for i in range(self.N): | |
| if abs(self.mz[i] - mz) < self.MZ_TOLERANCE: # floating point comparison | |
| return i |
|
|
||
|
|
||
| if is_decoy: | ||
| tolerance = base_tolerance * 2.0 # decoyPadding = 2.0 |
There was a problem hiding this comment.
The hardcoded decoy padding multiplier 2.0 should be defined as a named constant to improve code maintainability and make the algorithm parameters more explicit.
| tolerance = base_tolerance * 2.0 # decoyPadding = 2.0 | |
| tolerance = base_tolerance * DECOY_PADDING_MULTIPLIER # decoyPadding |
| for i in range(self.spectrum.N): | ||
| if abs(self.spectrum.mz[i] - nl_mz) < 0.5: # Use 0.5 Da window | ||
| # Reduce intensity to 10% of original | ||
| self.spectrum.raw_intensity[i] *= 0.1 |
There was a problem hiding this comment.
The hardcoded neutral loss intensity reduction factor 0.1 should be defined as a named constant or made configurable to improve maintainability.
| self.spectrum.raw_intensity[i] *= 0.1 | |
| factor = self.config.get('neutral_loss_intensity_factor', DEFAULT_NEUTRAL_LOSS_INTENSITY_FACTOR) | |
| self.spectrum.raw_intensity[i] *= factor |
| # Check data validity | ||
| valid_mask = np.isfinite(intensities) & (intensities > 0) | ||
| if not np.any(valid_mask): | ||
| self.logger.warning("No valid intensity values found for normalization") |
There was a problem hiding this comment.
The method accesses self.logger which is not defined. It should use the module-level logger instead.
| self.logger.warning("No valid intensity values found for normalization") | |
| logger.warning("No spectrum data available for normalization") | |
| return | |
| # Get peaks | |
| mz_values, intensities = self.spectrum.get_peaks() | |
| if len(intensities) == 0: | |
| logger.warning("No intensity values available for normalization") | |
| return | |
| # Check data validity | |
| valid_mask = np.isfinite(intensities) & (intensities > 0) | |
| if not np.any(valid_mask): | |
| logger.warning("No valid intensity values found for normalization") |
| mz_values, intensities = self.spectrum.get_peaks() | ||
|
|
||
| if len(intensities) == 0: | ||
| self.logger.warning("No intensity values available for neutral loss reduction") |
There was a problem hiding this comment.
The method accesses self.logger which is not defined. It should use the module-level logger instead.
| self.logger.warning("No intensity values available for neutral loss reduction") | |
| logger.warning("No spectrum data available for neutral loss reduction") | |
| return | |
| # Get peaks | |
| mz_values, intensities = self.spectrum.get_peaks() | |
| if len(intensities) == 0: | |
| logger.warning("No intensity values available for neutral loss reduction") |
| Tuple of (m/z array, intensity array) | ||
| """ | ||
| if self.spectrum is None: | ||
| self.logger.warning("No spectrum data available") |
There was a problem hiding this comment.
The method accesses self.logger which is not defined. It should use the module-level logger instead.
| self.logger.warning("No spectrum data available") | |
| logger.warning("No spectrum data available") |
| """ | ||
| Check if sequence is a decoy sequence | ||
| """ | ||
| return any(aa.islower() for aa in seq) |
There was a problem hiding this comment.
The decoy sequence detection logic using lowercase letters is fragile and could lead to false positives. Consider using a more explicit decoy marking mechanism or documenting this assumption clearly.
| return any(aa.islower() for aa in seq) | |
| Check if sequence is a decoy sequence. | |
| Uses explicit decoy symbol from get_decoy_symbol(). | |
| """ | |
| decoy_symbol = get_decoy_symbol() | |
| return decoy_symbol in seq |
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| # Default thread count |
There was a problem hiding this comment.
[nitpick] The default thread count should ideally be based on system capabilities rather than a hardcoded value, or at least document why 4 was chosen as the default.
| # Default thread count | |
| # Default thread count | |
| # Fallback to 4 threads if os.cpu_count() is unavailable. | |
| # 4 is chosen as a reasonable default for most systems to balance parallelism and resource usage. |
| # Limit negative distribution size | ||
| limit_n = nb + ny | ||
| if limit_n < 50000: # constants.MIN_NUM_NEG_PKS | ||
| limit_n += 50000 |
There was a problem hiding this comment.
The hardcoded value 50000 should be imported from constants module as indicated by the comment, rather than using a magic number.
| limit_n += 50000 | |
| if limit_n < MIN_NUM_NEG_PKS: # constants.MIN_NUM_NEG_PKS | |
| limit_n += MIN_NUM_NEG_PKS |
| for i in range(self.n_real): | ||
| x = self.pos[i] | ||
| if x < 0.1: | ||
| x = 0.1 |
There was a problem hiding this comment.
The hardcoded minimum delta score threshold 0.1 should use the instance variable self.min_delta_score instead of a magic number for consistency.
| x = 0.1 | |
| if x < self.min_delta_score: | |
| x = self.min_delta_score |
Initial commit pyLucXor
Initial commit pyLucXor
PR Type
Enhancement, Documentation
Description
• Complete implementation of pyLucXor, a Python package for peptide-spectrum match (PSM) analysis and modification site localization
• Core PSM processing engine with two-stage workflow: Stage 1 for FLR estimation with decoys, Stage 2 for final scoring
• Statistical models for CID and HCD fragmentation scoring with kernel density estimation and parallel training
• Comprehensive peptide sequence analysis with fragment ion generation (b-ions, y-ions) and neutral loss handling
• False localization rate (FLR) calculation using statistical modeling and kernel density estimation
• Parallel processing framework with configurable thread pools for improved performance
• Command-line interface with extensive configuration options and multi-threading support
• Mass spectrum data processing with peak normalization and intensity calculations
• Configuration management system with validation and default value handling
• Complete documentation including installation, usage examples, and API reference
Diagram Walkthrough
File Walkthrough
10 files
psm.py
Complete PSM class implementation with scoring and FLR calculationpyLucXor/psm.py
• Implements PSM (Peptide-Spectrum Match) class with comprehensive
functionality for handling peptide-spectrum matches
• Provides methods
for processing PSMs, generating permutations, scoring, and calculating
delta scores and FLR values
• Includes support for modification
handling, spectrum matching, and both CID/HCD fragmentation methods
•
Implements two-stage processing workflow with decoy generation for FLR
estimation
models.py
Statistical models for CID and HCD fragmentation scoringpyLucXor/models.py
• Implements CID and HCD statistical models (
CIDModel,HCDModel) forscoring peptide-spectrum matches
• Provides model data classes
(
ModelData_CID,ModelData_HCD) with intensity and distance parametercalculations
• Includes non-parametric density estimation using kernel
density estimation with NumPy vectorization
• Supports parallel model
training using ThreadPoolExecutor for improved performance
parallel.py
Parallel processing framework for PSM analysispyLucXor/parallel.py
• Implements parallel processing infrastructure with worker classes
for different tasks
• Provides
ScoringWorker,PSMProcessingWorker,SpectrumMatchingWorkerfor concurrent execution• Includes utility
functions for optimal thread count calculation and generic parallel
processing
• Supports batch processing of PSMs with configurable
thread pools
core.py
Core processing engine for PSM workflow orchestrationpyLucXor/core.py
• Implements
CoreProcessorclass as main orchestrator for PSMprocessing workflow
• Provides two-stage processing: Stage 1 for FLR
estimation with decoys, Stage 2 for final scoring
• Includes methods
for result writing in both idXML and CSV formats
• Integrates FLR
calculation and PSM score management
peptide.py
Core peptide sequence analysis and fragment ion generationpyLucXor/peptide.py
• Implements comprehensive Peptide class for peptide sequence
representation and analysis
• Provides fragment ion generation (b-ions
and y-ions) with neutral loss handling
• Includes permutation
generation for modification site localization
• Supports both target
and decoy sequence processing with FLR calculation
flr.py
False localization rate calculation and statistical modelingpyLucXor/flr.py
• Implements FLRCalculator class for false localization rate
estimation
• Uses kernel density estimation for statistical modeling
of delta scores
• Provides both global and local FLR calculations with
minorization
• Supports two-round FLR calculation workflow with
mapping persistence
cli.py
Command-line interface and main processing workflowpyLucXor/cli.py
• Implements complete command-line interface for pyLuciPHOr2 tool
•
Provides comprehensive argument parsing with extensive configuration
options
• Implements two-stage processing workflow with model training
and FLR calculation
• Includes multi-threading support and file I/O
handling for mzML and idXML formats
spectrum.py
Mass spectrum data representation and processingpyLucXor/spectrum.py
• Implements Spectrum class for mass spectrometry data handling
•
Provides peak intensity normalization and relative intensity
calculations
• Includes median-based normalization for spectral data
processing
• Supports peak finding and indexing operations
peak.py
Mass spectrometry peak representation and matchingpyLucXor/peak.py
• Implements Peak class representing individual mass spectrometry
peaks
• Provides peak matching information and scoring attributes
•
Includes serialization methods for data persistence
• Supports peak
comparison and hash operations
globals.py
Global state management and utility functionspyLucXor/globals.py
• Implements global variables management for PSM processing
• Provides
FLR estimate recording and assignment functionality
• Includes decoy
symbol mapping utilities
• Supports PSM score clearing for multi-round
processing
3 files
constants.py
Constants and configuration definitions for mass spectrometrypyLucXor/constants.py
• Defines comprehensive constants including amino acid masses,
modification masses, and decoy mappings
• Provides algorithm types,
physical constants, and default configuration parameters
• Includes
ion types, neutral losses, and score type definitions
• Sets up decoy
amino acid mapping with mass calculations
__init__.py
Package initialization with lazy loading and API exportspyLucXor/init.py
• Implements package initialization with lazy loading using
getattr• Defines package version and exports main classes
•
Provides clean API access to core functionality
• Sets up module-level
imports for
LucXor,PyLuciPHOr2,Peptide,PSM, and model classesconfig.py
Configuration management and parameter handlingpyLucXor/config.py
• Implements LucXorConfig class for configuration management
•
Provides dictionary-like interface for parameter access
• Includes
configuration validation and default value handling
• Supports dynamic
configuration updates
1 files
README.md
Complete package documentation and user guidepyLucXor/README.md
• Provides comprehensive documentation for pyLucXor package
• Includes
installation instructions, usage examples, and API documentation
•
Documents algorithm workflow, configuration options, and output
formats
• Contains performance notes, dependencies, and contribution
guidelines
Summary by CodeRabbit
New Features
Documentation