Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handling of usage to DAO generator #25874

Merged
merged 1 commit into from Mar 22, 2023
Merged

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented Mar 20, 2023

Overview

Add handling of usage to DAO generator

Before

The use of the DAO field import is overloaded to imply import, export, usable for dedupe matching & a whole bunch of other things I haven't get figured out...

After

new
usage key on fields.

During generation it will ALWAYS be added to the DAOs for each field like

  'usage' => [
     'import' => FALSE,
      'export' => TRUE,
      'duplicate_matching' => FALSE,
      'token' => FALSE,
 ],

If it is not in the xml it will make explicit what was implied anyway - ie

  • export is TRUE if export or IMPORT is true
  • import is TRUE if import is true
  • duplicate_match is TRUE if import is true

Technical Details

  • I have only included one regenerated DAO with 1 explicit field & the rest 'determined'. I would propose to do the rest of the DAOs as a follow up PR once merged (xmls only need to be updated as needed)

  • I explicity declared usage for each field. I prefer this explicitness but the counter argument is more file size for tar-balling or more use in memory / serializing

  • This doesn't add any usage - any potential usage should use in a way that falls back to the key not being there (not necessarily in a notice-free way)

  • it would be nice if sites could override these the way they do settings - ie easy for a site admin without an extension

  • I kinda wonder if we want to allow for more nuance later ie

  'usage' => [
     'import' => ['is_available' => FALSE],
      'export' =>  ['is_available' => FALSE], 
     'duplicate_matching' =>  ['is_available' => FALSE],
 ],

Comments

@colemanw as discussed on chat

@civibot
Copy link

civibot bot commented Mar 20, 2023

(Standard links)

@eileenmcnaughton
Copy link
Contributor Author

test this please

@eileenmcnaughton
Copy link
Contributor Author

@colemanw I just hit another argument for YAPOM - the import code uses DAO_Import::fields() & then parses to re-format into apiv4 field names (currently unique names but the goal was custom fields too) - because I can't filter apiv4 getFields for import

@colemanw
Copy link
Member

I like the direction of this. Wish I'd thought of it years ago. As it stands, there's going to be a lot of fudging to accommodate old daos without these flags, but eventually we can add noisy deprecations.

<import> is one of the most scary/mysterious pieces of metadata in the schema so I've never touched it. It's mighty brave of you to do so @eileenmcnaughton

@eileenmcnaughton
Copy link
Contributor Author

@colemanw do you have any thoughts on the second & last point in the technical details

@colemanw
Copy link
Member

@eileenmcnaughton I agree with the 2nd point, because not including it for every field leads to ambiguity (does missing data == false or does it mean the dao needs to be regenerated?)

To the last point, I prefer the flatter structure you've got now. We can always add more keys to the array, but I think nesting in more arrays is overkill.

@eileenmcnaughton
Copy link
Contributor Author

@colemanw OK - I can't think of a use-case for more nuance at the moment - but it's more than transforming from a bool to an array latter is tricky - but as you say there could be new array entries

Comment on lines 391 to 395
if ($usage['import'] === 'TRUE') {
$field['import'] = $usage['import'];
}
if ($usage['export'] === 'TRUE') {
$field['export'] = $usage['export'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ($usage['import'] === 'TRUE') {
$field['import'] = $usage['import'];
}
if ($usage['export'] === 'TRUE') {
$field['export'] = $usage['export'];
$field['import'] = $usage['import'] === 'TRUE' ? $usage['import'] : NULL;
$field['export'] = $usage['export'] === 'TRUE' ? $usage['export'] : NULL;

@eileenmcnaughton
Copy link
Contributor Author

@seamuslee001 @colemanw it has passed!

The follow on steps from here go in a few different directions - fixing the api to return & preferably filter on usage (I probably need some help from @colemanw there) and fixing some of the DAO functions like importableFields to use usage along with metadata cleanup & DAO regeneration - but I think we should merge this one as is first since those spider web off in different directions

<usage>
<import>false</import>
<export>true</export>
<duplicate_matching>false</duplicate_matching>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to have to set tests going again now - but will add token to this when I do a metadata pass

@totten
Copy link
Member

totten commented Mar 21, 2023

Overall, I also like the concept here. Definitely think it's better to split into a separate subkey.

(@eileenmcnaughton) I kinda wonder if we want to allow for more nuance later ie

(@colemanw) I prefer the flatter structure you've got now. We can always add more keys to the array,

+1 for the flatter structure.

I explicity declared usage for each field. I prefer this explicitness but the counter argument is more file size for tar-balling or more use in memory / serializing

Yeah, the tarball... I wouldn't worry about that. For the memory, let's ballpark it and see how we feel.

  • Model size

    • There are ~1,800 fields in the database (per git grep '<field>' xml/schema/ | wc) spread across ~180 entities. That means average of ~10 fields per entity.
    • I hacked my CRM_Core_ClassLoader to log every time it loads a _DAO_ class, and then I clicked around a few screens (login, view dashboard, search participants, edit a contact, etc). With 48 HTTP requests, the average (mean) was to load 55 DAOs.
  • Size of usage metadata

    • Serialized Measure: I copied the usage from one of the Address.php fields, and it weighs in at 80-100 bytes (in json or serialize format). I'm not certain how it looks with PHP opcodes, but it's probably a similar ballpark. Call it 100 bytes.
    • Theoretical Measure: Suppose you encoded with a more compact format. There are 4 flags. Each flag could be mapped to a 1-byte value (number/letter), and you test flags with strpos()!==FALSE. That's up to 4 bytes. But add some room for growth. Call it 10 bytes.
    • Empirical Measure: But then I tried a quick php-cli call to load CRM/Core/DAO/Address.php. Per memory_get_usage(), it now requires 19kb more. This was consistent across multiple calls. It has 29 fields. That means actual memory usage grows by 650b per field.
      • It does make sense for this to be larger -- e.g. you read the PHP file; generate opcodes; then run the opcodes. The extra usage gets reflected in multiple ways.
      • An even better measurement would be to update 10 DAOs and then measure memory_get_usage() in php-fpm SAPI. But I'm getting lazy...

Putting those numbers together, the ballpark looks like this:

Full Model Average Model
Description (e.g. build full metadata cache) (e.g. average HTTP request)
Entities 180 55
Fields 1800 550
Serialized Size (100b) 180kb 55kb
Theoretical Size (10b) 18kb 6kb
Empirical Size (650b) 1.1mb 350kb

I think you're right to ask the question, but I'm not really sure the answer. At 18kb -- no problem. At 350kb or 1mb, hard to say. I'm not sure how to formulate a cut-off.

@eileenmcnaughton
Copy link
Contributor Author

@totten is there anything to be read into this test run seeming to take a bit longer than some other test runs (I looked at 2 others & they took less time)

@totten
Copy link
Member

totten commented Mar 21, 2023

(@eileenmcnaughton) is there anything to be read into this test run seeming to take a bit longer than some other test runs (I looked at 2 others & they took less time)

Not that I can see. There's some natural variation due to hardware differences, overall load, etc -- and (eg) the recent runs of phpunit-crm and phpunit-api3 landed on mid-tier (bknix-XXXXX) and low-tier (rabul-t450s) hardware, and the timings were not unusual for them. If there is a performance impact in the patch, then it's not dramatic. (It's small enough that one would need multiple/controlled runs to detect it.)

@colemanw
Copy link
Member

Ok, I like the proposed structure so will merge this so @eileenmcnaughton can continue progressing on it.

@colemanw colemanw merged commit ab906d7 into civicrm:master Mar 22, 2023
@colemanw colemanw deleted the usage branch March 22, 2023 12:13
colemanw added a commit to colemanw/civicrm-core that referenced this pull request Mar 27, 2023
eileenmcnaughton added a commit that referenced this pull request Mar 27, 2023
Regenerate DAOs with usage from #25874
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants