⚡ Bolt: [performance improvement] Replace slow pandas iterrows() with itertuples() or to_dict() by alinelena · Pull Request #565 · ddmms/ml-peg

alinelena · 2026-05-23T06:39:29Z

💡 What:
Replaced the use of pandas.DataFrame.iterrows() with df.itertuples(index=False, name=None) and df.to_dict('records') across the bulk_crystal and conformers calculation modules.

🎯 Why:
iterrows() is a known pandas anti-pattern for performance. It converts each row into a pd.Series object, introducing immense type-checking and memory-allocation overhead. This is especially problematic in tight loops during benchmark setup and parsing. By unpacking native Python tuples (itertuples) or dictionaries (to_dict), the iteration speed is drastically improved.

📊 Impact:
Pandas iteration over the dataframes is expected to be anywhere from 10x to 50x faster. While overall benchmark execution is bounded by MLIP calculations, the parsing/setup phases will run with significantly reduced CPU overhead and memory footprint.

🔬 Measurement:
This can be verified by running the tests (e.g., test_solvmpconf196) and profiling the time spent in the setup phases prior to the heavy calculator inference. The logical behavior (index/column access) remains functionally identical.

PR created automatically by Jules for task 7457440406182944128 started by @alinelena

Refactored multiple loops across the codebase that were previously using `pd.DataFrame.iterrows()` to use more efficient iteration methods: `df.itertuples(index=False, name=None)` and `df.to_dict('records')`. `iterrows()` is notoriously slow due to the overhead of creating a `pd.Series` object for every single row. By switching to tuples or native dictionaries, the inner loops execute significantly faster. Co-authored-by: alinelena <3306823+alinelena@users.noreply.github.com>

google-labs-jules · 2026-05-23T06:39:31Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Bolt: [performance improvement] Replace slow pandas iterrows() with itertuples() or to_dict()#565

⚡ Bolt: [performance improvement] Replace slow pandas iterrows() with itertuples() or to_dict()#565
alinelena wants to merge 1 commit into
mainfrom
bolt/optimize-pandas-iteration-7457440406182944128

alinelena commented May 23, 2026

Uh oh!

google-labs-jules Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alinelena commented May 23, 2026

Uh oh!

google-labs-jules Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant