Skip to content

Chores#2

Merged
Xmaster6y merged 5 commits intomainfrom
chores
Aug 14, 2025
Merged

Chores#2
Xmaster6y merged 5 commits intomainfrom
chores

Conversation

@Xmaster6y
Copy link
Copy Markdown
Owner

@Xmaster6y Xmaster6y commented Aug 14, 2025

What does this PR do?

Various chores.

Linked Issues

  • Closes #?
  • #?

Summary by Sourcery

Revise benchmark configurations, broaden Python compatibility, and introduce a benchmark testing harness

Enhancements:

  • Adjust default parameters and impact ranges for GPT2 and Integrated Gradients benchmark tasks and tidy task ordering in get_stats

Build:

  • Lower Python requirement to >=3.10 and add Python 3.10 classifier

CI:

  • Add Python 3.10 to CI workflow matrix

Tests:

  • Add scripts/bench/test_stats.py and SLURM launch script for automated benchmark testing

Chores:

  • Increase SLURM time limit for get-stats.sh

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Aug 14, 2025

Reviewer's Guide

This PR bundles maintenance updates across benchmark scripts, packaging requirements, CI configurations, scheduler settings, and core code comments, and introduces a new test harness for benchmark statistics.

Entity relationship diagram for TASKS dictionary reordering in get_stats.py

erDiagram
  TASKS {
    string gpt2_cache
    string integrated_gradients
    string lrp
    string mlp_intervene
  }
  gpt2_cache ||--o{ tdhook : uses
  gpt2_cache ||--o{ nnsight : uses
  gpt2_cache ||--o{ transformer_lens : uses
  integrated_gradients ||--o{ tdhook : uses
  integrated_gradients ||--o{ captum_add : uses
  integrated_gradients ||--o{ captum : uses
  lrp ||--o{ tdhook : uses
  lrp ||--o{ zennit : uses
  mlp_intervene ||--o{ tdhook : uses
  mlp_intervene ||--o{ nnsight : uses
Loading

Class diagram for impact_parameters and default_parameters changes in gpt2_cache and integrated_gradients

classDiagram
  class GPT2CacheParameters {
    +model_size: ["gpt2", "gpt2-medium", "gpt2-large"]
    +batch_size: [10, 50, 100, 500]
    +variation: ["all"]
  }
  class GPT2CacheDefaults {
    +model_size: "gpt2"
    +batch_size: 50
    +variation: "all"
  }
  class IntegratedGradientsParameters {
    +width: [100, 1000, 5000, 10000]
    +height: [5, 10, 20, 50]
    +batch_size: [10, 100, 1000, 5000]
    +variation: ["multiply", "no-multiply"]
  }
Loading

File-Level Changes

Change Details Files
Refined benchmark task parameters
  • Reduced GPT2 model sizes and adjusted its batch size and variation key
  • Corrected batch size values and formatting in integrated_gradients task
scripts/bench/tasks/gpt2_cache/__init__.py
scripts/bench/tasks/integrated_gradients/__init__.py
Reordered TASKS mapping for stats collection
  • Moved gpt2_cache to the top of TASKS
  • Swapped positions of integrated_gradients and lrp entries
scripts/bench/get_stats.py
Broadened Python version support
  • Lowered required Python minimum from 3.11 to 3.10
  • Added Python 3.10 classifier
  • Expanded CI matrix to include 3.10
pyproject.toml
.github/workflows/ci.yml
Tweaked scheduler timeouts
  • Increased SBATCH runtime limit for get-stats.sh from 1h to 10h
scripts/bench/launch/get-stats.sh
Added benchmark stats test harness
  • Created test_stats.py to automate benchmarking runs and save results
  • Introduced test-stats.sh SBATCH launcher script
  • Added uv.lock for dependency consistency
scripts/bench/test_stats.py
scripts/bench/launch/test-stats.sh
uv.lock
Annotated code with TODOs for future enhancements
  • Added TODO to move version metadata to packaging
  • Inserted TODO to support kwargs in gradient attribution hook
src/tdhook/__init__.py
src/tdhook/attribution/gradient_attribution/common.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • SEEDS and TASKS are duplicated in get_stats.py and test_stats.py—consider centralizing these constants to reduce maintenance overhead.
  • The numeric literals "10_00" and "50_00" in the integrated_gradients impact_parameters are unconventional—prefer standard grouping like 1_000 and 5_000 for clarity.
  • There are lingering TODOs for version metadata and init_target kwargs in tdhook—address these or track them as issues to prevent accumulating technical debt.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- SEEDS and TASKS are duplicated in get_stats.py and test_stats.py—consider centralizing these constants to reduce maintenance overhead.
- The numeric literals "10_00" and "50_00" in the integrated_gradients impact_parameters are unconventional—prefer standard grouping like 1_000 and 5_000 for clarity.
- There are lingering TODOs for version metadata and init_target kwargs in tdhook—address these or track them as issues to prevent accumulating technical debt.

## Individual Comments

### Comment 1
<location> `scripts/bench/tasks/integrated_gradients/__init__.py:11` </location>
<code_context>

 impact_parameters = {
-    "model_size": ["gpt2", "gpt2-medium", "gpt2-large", "gpt2-xl"],
-    "batch_size": [100, 1000, 10_000, 100_000],
-    "variations": ["all", "specific"],
+    "model_size": ["gpt2", "gpt2-medium", "gpt2-large"],
</code_context>

<issue_to_address>
Possible typo in batch_size values: '10_00' and '50_00' should likely be '1000' and '5000'.

Please update the batch size values to '1000' and '5000' to avoid confusion and ensure consistency.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread scripts/bench/tasks/integrated_gradients/__init__.py
@Xmaster6y Xmaster6y merged commit de3f5b7 into main Aug 14, 2025
3 checks passed
@Xmaster6y Xmaster6y deleted the chores branch August 14, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant