Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer - Add system-setting for batch size #26125

Merged

Conversation

MegaphoneJon
Copy link
Contributor

Overview

Given the extended conversations on #24151 etc., it seems like people want at least 50 contacts per import.
Unfortunately, I've found that even 50 is too many on Pantheon, where our complicated greeting processing means that we get frequent timeouts (50 contacts take just about 60 seconds to import). Seemed like the obvious solution was to make this configurable.

Before

Hard-coded import batch size is 50 (or 5 or 100, depending on your version).

After

Import batch size is a setting.

@civibot
Copy link

civibot bot commented Apr 28, 2023

(Standard links)

@civibot civibot bot added the master label Apr 28, 2023
@eileenmcnaughton
Copy link
Contributor

@MegaphoneJon I feel like it would be better as an option on the import screen? Although perhaps if it were it would still need this setting to set the default to prevent mishaps?

@totten totten changed the title configurable import batch size Importer - Add system-setting for batch size May 2, 2023
@totten
Copy link
Member

totten commented May 2, 2023

(@eileenmcnaughton) I feel like it would be better as an option on the import screen? Although perhaps if it were it would still need this setting to set the default to prevent mishaps?

Yeah, I suppose the optimal size can be influenced by both (a) static factors in the hosting environment (RAM, HTTP+PHP timeouts) and (b) dynamic factors in the specific data-set (e.g. quantity and width of columns). Agree the setting could evolve into a default whenever there's a patch to per-job setting.

Implementation looks sensible. I did some quick r-run with "Import Contacts" and a few different values of import_batch_size (e.g. 3 and 150), and the effect seemed right.

Perhaps somewhere in the future we might consider adaptive sizing -- e.g. use a formula which relates memory_limit, max_execution_time, baseline memory_get_usage(), the #columns, the length of the rows, etc. However, tuning that formula feels like an adventure. I guess if anyone wants to take that adventure right now, then that could be a reason to hold back.

But more likely, that's a future-hypothetical, and I don't think it should block this. IMHO, this looks better than status-quo and is KISS-compliant. Merge ready.

@totten totten added the merge ready PR will be merged after a few days if there are no objections label May 2, 2023
@MegaphoneJon
Copy link
Contributor Author

I've been wavering back and forth on whether to move this setting to the import screen. Tim points out it could be dependent on the specific dataset - but IMO it's not good end-user UX. It's an overly technical field for end users, and contributes to the "Civi is so complicated" feeling. I propose we keep it on settings for now and revisit later.

@seamuslee001
Copy link
Contributor

I support this I think it is in similar thought stream to the Default Pager size setting that we have https://github.com/civicrm/civicrm-core/blob/master/settings/Search.setting.php#L230 which does in a way a similar but different job in that the pager size controls how many records to show on the screen and in some screens that is overridable but it is there to provide a sensible default

@eileenmcnaughton eileenmcnaughton merged commit 4976021 into civicrm:master May 2, 2023
3 checks passed
@totten
Copy link
Member

totten commented May 2, 2023

Definitely agree that a setting on the import-screen would make the UX feel more complicated. +1 for some "wait-and-see" before do anything more complicated.

FWIW, the appeal of an adaptive formula is that it doesn't require configuration. Here's an example algorithm for adapting the batch-size with some notes about its (debatable) assumptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master merge ready PR will be merged after a few days if there are no objections
Projects
None yet
4 participants