Skip to content

⚡ Bolt: [performance improvement] Replace slow pandas iterrows() with itertuples() or to_dict()#565

Open
alinelena wants to merge 1 commit into
mainfrom
bolt/optimize-pandas-iteration-7457440406182944128
Open

⚡ Bolt: [performance improvement] Replace slow pandas iterrows() with itertuples() or to_dict()#565
alinelena wants to merge 1 commit into
mainfrom
bolt/optimize-pandas-iteration-7457440406182944128

Conversation

@alinelena
Copy link
Copy Markdown
Collaborator

💡 What:
Replaced the use of pandas.DataFrame.iterrows() with df.itertuples(index=False, name=None) and df.to_dict('records') across the bulk_crystal and conformers calculation modules.

🎯 Why:
iterrows() is a known pandas anti-pattern for performance. It converts each row into a pd.Series object, introducing immense type-checking and memory-allocation overhead. This is especially problematic in tight loops during benchmark setup and parsing. By unpacking native Python tuples (itertuples) or dictionaries (to_dict), the iteration speed is drastically improved.

📊 Impact:
Pandas iteration over the dataframes is expected to be anywhere from 10x to 50x faster. While overall benchmark execution is bounded by MLIP calculations, the parsing/setup phases will run with significantly reduced CPU overhead and memory footprint.

🔬 Measurement:
This can be verified by running the tests (e.g., test_solvmpconf196) and profiling the time spent in the setup phases prior to the heavy calculator inference. The logical behavior (index/column access) remains functionally identical.


PR created automatically by Jules for task 7457440406182944128 started by @alinelena

Refactored multiple loops across the codebase that were previously using `pd.DataFrame.iterrows()` to use more efficient iteration methods: `df.itertuples(index=False, name=None)` and `df.to_dict('records')`.

`iterrows()` is notoriously slow due to the overhead of creating a `pd.Series` object for every single row. By switching to tuples or native dictionaries, the inner loops execute significantly faster.

Co-authored-by: alinelena <3306823+alinelena@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant