Skip to content

Pdex 0.2.0#70

Merged
noamteyssier merged 41 commits intomainfrom
pdex-0.2.0
Feb 25, 2026
Merged

Pdex 0.2.0#70
noamteyssier merged 41 commits intomainfrom
pdex-0.2.0

Conversation

@noamteyssier
Copy link
Copy Markdown
Collaborator

close #67
close #66
close #65
close #52
close #51
close #46
close #18
close #9

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @noamteyssier, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces pdex version 0.2.0, representing a significant architectural overhaul of the library. The primary goal was to simplify the public API, enhance performance through Numba-accelerated mathematical operations, and improve maintainability. The changes streamline differential expression analysis in single-cell genomics by providing a unified function with flexible comparison modes, while also updating the development environment and documentation to support this new direction.

Highlights

  • Major API Refactor: The core functionality has been consolidated into a single pdex function, replacing the previous parallel_differential_expression and pseudobulk_dex functions. This new function supports three distinct comparison modes: 'ref' (each group vs. a single reference), 'all' (each group vs. all remaining cells), and 'on_target' (each group vs. its specific target gene).
  • Performance Enhancements: Mathematical operations, including fold change, percent change, pseudobulk calculations, and Mann-Whitney U tests, have been centralized in a new _math.py module and heavily optimized using Numba JIT compilation. This leverages the numba-mwu library for accelerated statistical tests.
  • Dependency and Environment Updates: The project now requires Python 3.11 or higher. Key dependencies like anndata, numba, pandas, polars, pyarrow, scipy, and tqdm have been updated to newer versions. Old dependencies such as adpbulk and pydeseq2 have been removed, reflecting the streamlined architecture.
  • Improved Documentation and AI Guidance: The README.md has been completely rewritten to reflect the new API and usage patterns. A new CLAUDE.md file has been added to provide specific guidance for AI code assistants, ensuring better understanding and interaction with the repository's structure and logic.
  • Streamlined Testing Framework: The testing suite has been overhauled, removing old integration and benchmarking tests (tests/bench_expr.py, tests/test_integration.py, tests/test_parallel.py, tests/test_pbdex.py). New, more focused tests have been introduced (tests/conftest.py, tests/test_internals.py, tests/test_math.py, tests/test_pdex.py) to validate the new pdex function and its internal components, including sparse data handling and geometric mean calculations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .python-version
    • Updated the required Python version from 3.10 to 3.13.
  • CLAUDE.md
    • Added a new file providing detailed guidance for Claude Code AI on project overview, commands, architecture, performance design, output schema, public API, and dependencies.
  • README.md
    • Rewrote the entire README to reflect the new pdex API, its modes, parameters, and output format.
    • Removed sections detailing the old parallel_differential_expression function, backed vs. in-memory AnnData handling, and various statistical metrics (Wilcoxon, Anderson-Darling, T-Test).
  • pyproject.toml
    • Updated the project version from 0.1.28 to 0.2.0.
    • Increased the minimum required Python version from 3.10 to 3.11.
    • Updated various dependencies to newer versions, including anndata, numba, numpy, pandas, polars, pyarrow, scipy, and tqdm.
    • Removed adpbulk and pydeseq2 from core dependencies.
    • Added numba-mwu as a new dependency.
    • Updated dev dependencies, removing pyright and pytest-benchmark, and adding ty.
  • ruff.toml
    • Removed the ruff configuration file.
  • src/pdex/init.py
    • Rewrote the module to introduce a single pdex entry point function.
    • Implemented core logic for validating inputs, identifying reference groups, building group-gene maps, and isolating matrix data.
    • Added support for 'ref', 'all', and 'on_target' comparison modes.
    • Integrated logging and warning mechanisms for is_log1p auto-detection.
    • Removed previous pseudobulk_dex and parallel_differential_expression imports and logic.
  • src/pdex/_math.py
    • Added a new module to centralize Numba JIT-compiled mathematical functions.
    • Included _log1p_col_mean, _expm1_vec, _expm1_vec_mean for efficient array operations.
    • Implemented bulk_matrix_arithmetic, pseudobulk, and bulk_matrix_geometric for pseudobulk calculations.
    • Provided fold_change and percent_change functions.
    • Added an mwu dispatcher for Mann-Whitney U tests, supporting sparse and dense inputs.
  • src/pdex/_parallel.py
    • Removed the entire module, indicating a shift away from its previous parallelization and Wilcoxon ranksum implementations.
  • src/pdex/_pseudobulk.py
    • Removed the entire module, deprecating the pseudobulk_dex function and its dependencies on adpbulk and pydeseq2.
  • src/pdex/_single_cell.py
    • Removed the entire module, deprecating the parallel_differential_expression function.
  • src/pdex/_utils.py
    • Replaced the guess_is_log function with _detect_is_log1p for improved log1p transformation detection.
    • Added set_numba_threadpool to configure Numba's global thread pool.
  • tests/bench_expr.py
    • Removed the benchmarking test file.
  • tests/conftest.py
    • Added a new conftest file to provide shared pytest fixtures.
    • Included fixtures for synthetic AnnData objects, covering dense, sparse, on-target, log1p-transformed, and backed data scenarios.
  • tests/test_integration.py
    • Removed the integration test file.
  • tests/test_internals.py
    • Added a new test file for internal helper functions.
    • Included tests for _validate_groupby, _identify_reference_index, and _unique_groups.
  • tests/test_math.py
    • Added a new test file for the _math.py module.
    • Included tests for fold_change, percent_change, and bulk_matrix_geometric.
  • tests/test_parallel.py
    • Removed the test file for the _parallel.py module.
  • tests/test_pbdex.py
    • Removed the test file for the pseudobulk_dex function.
  • tests/test_pdex.py
    • Rewrote the test file to focus on the new pdex function.
    • Added comprehensive tests for 'ref', 'all', and 'on_target' modes, including validation, output structure, membership counts, p-value ranges, and agreement with SciPy statistics.
    • Included tests for sparse data, log1p handling, geometric mean calculations, and backed AnnData objects.
  • tests/test_utils.py
    • Updated tests to reflect changes in _utils.py, specifically for set_numba_threadpool.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci.yml
Activity
  • The pull request title 'Pdex 0.2.0' indicates a new version release.
  • The pull request description lists multiple close # references, suggesting that this update addresses several previously identified issues or feature requests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a major refactoring and version bump to 0.2.0. The core logic has been completely rewritten into a more streamlined API with the pdex() function and its different modes (ref, all, on_target). The implementation is simplified, removing the pydeseq2 dependency in favor of numba-mwu for Mann-Whitney U tests, and replacing the manual shared-memory multiprocessing with Numba's threading capabilities. The documentation and test suite have also been significantly improved. My review identifies a critical issue with dependency versions in pyproject.toml that will prevent installation, and some minor documentation inconsistencies regarding FDR calculation that could be clarified for users.

Comment thread pyproject.toml
Comment on lines +9 to +17
"anndata>=0.12.10",
"numba>=0.64.0",
"numba-mwu>=0.1.1",
"numpy>=2.4.2",
"pandas>=2.3.3",
"polars>=1.38.1",
"pyarrow>=23.0.1",
"scipy>=1.17.1",
"tqdm>=4.67.3",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The version specifiers for several dependencies appear to be for future, unreleased versions. For example, numpy>=2.4.2, pandas>=2.3.3, polars>=1.38.1, anndata>=0.12.10, etc., are not available on PyPI. This will cause installation to fail for users.

Please correct these to valid, available versions. You might want to use the latest stable versions or a less restrictive specifier like >= with a known good version (e.g., numpy>=2.0.0).

Suggested change
"anndata>=0.12.10",
"numba>=0.64.0",
"numba-mwu>=0.1.1",
"numpy>=2.4.2",
"pandas>=2.3.3",
"polars>=1.38.1",
"pyarrow>=23.0.1",
"scipy>=1.17.1",
"tqdm>=4.67.3",
"anndata>=0.10.0",
"numba>=0.59.0",
"numba-mwu>=0.1.1",
"numpy>=2.0.0",
"pandas>=2.2.0",
"polars>=0.20.0",
"pyarrow>=16.0.0",
"scipy>=1.13.0",
"tqdm>=4.66.0",

Comment thread CLAUDE.md Outdated
Comment thread README.md Outdated
noamteyssier and others added 2 commits February 25, 2026 13:50
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@noamteyssier noamteyssier merged commit 3284e73 into main Feb 25, 2026
14 checks passed
@noamteyssier noamteyssier deleted the pdex-0.2.0 branch February 25, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant