New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve typing performance by splitting attributes in state tree. #12312

Merged
merged 2 commits into from Dec 20, 2018

Conversation

@sgomes
Copy link
Contributor

sgomes commented Nov 26, 2018

Description

Attributes have been moved to attributesByClientId in order to reduce the impact of typing throughout the state tree (See #11782).

Together with this, several selectors have been rewritten to take advantage of this separation by depending only on byClientId, which is now a less expensive branch of the state tree.

How has this been tested?

Unit tests and manual verification.

Performance has been tested by recording the quick typing of a short word in a large document and measuring the time taken handling each input event.

Screenshots

Not applicable.

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.

@sgomes sgomes requested a review from youknowriad Nov 26, 2018

@youknowriad youknowriad requested a review from WordPress/gutenberg-core Nov 26, 2018

@@ -61,12 +62,10 @@ export function AlignmentToolbar( { isCollapsed, value, onChange, alignmentContr
);
}

const getClientIdOnly = memize( ( clientId ) => ( { clientId } ) );

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

Instead of using memize here I wonder if we should do a shallow comparison in withBlockEditContext instead as it would benefit more components. (similar to how we do in withSelect)

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

I'll remove this optimization for now; it's distracting from the main purpose of the PR anyway.

[ blockId ]: { attributes: flattenedBlocks[ blockId ].attributes },
} ), {} );
}

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

A nice little refactoring was suggested here by @noisysocks #12268 (comment)

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

Great idea! Done.

...state,
[ action.clientId ]: {
...state[ action.clientId ],
...omit( action.updates, 'attributes' ),

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

In my PR, I did a little optimization here where I avoid generating a new state if the only key in action.updates is attributes

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

Good idea! Added that in.


case 'UPDATE_BLOCK':
// Ignore updates if block isn't known
if ( ! state[ action.clientId ] ) {

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

We should avoid generating a new state here if there's no attributes updates in action.updates

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

Yup, another useful early return.

...state,
[ action.clientId ]: {
...state[ action.clientId ],
attributes: { ...action.updates.attributes },

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

The attributes key is redundant here, we can just remove it and update the state to be { [ clientID ]: attributes } instead of { [ clientID ]: { attributes } }

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

I opted for maintaining the attributes key as a way of having the same general shape in both byClientId and attributesByClientId, but I'm not married to that choice.

I'm happy to change that if you feel strongly about it, but do let me know, as it will involve many changes to the test files.

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

Not a blocker for me.

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

I opted for maintaining the attributes key as a way of having the same general shape in both byClientId and attributesByClientId, but I'm not married to that choice.

But they're not meant to be the same sort of thing. The former is a block object, the latter (by its name) is exclusively attributes.

What else are we planning to include here? The spread of ...state[ action.clientId ] seems to imply that there might be other non-attribute properties.

Given the name attributesByClientId, I would assume the contents of the object are in-fact attributes.

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

No worries, I'll make the change.

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

Done in latest commit.

@@ -439,7 +440,7 @@ export function isEditedPostSaveable( state ) {
* @return {boolean} Whether post has content.
*/
export function isEditedPostEmpty( state ) {
const blocks = getBlocksForSerialization( state );
const blocks = getBlocksForSerializationWithoutAttributes( state );

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

For more clarity, maybe this could be refactored to return just the blockNames : getBlocksNamesForSerialization

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

Changed. Also removed the export.

*
* @return {Object} Parsed block object.
*/
export const getBlockWithoutAttributes = createSelector(

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

I'm not certain why we need all these new selectors getBlockWithoutAttributes getBlocksWithoutAttributes, getBlocksByClientIdWithoutAttributes. I'd like to avoid exposing new selectors in this PR. Is this only for getBlocksForSerializationWithoutAttributes?

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

It gets used directly or indirectly in a bunch of other selectors (such as getBlockNamesForSerialization, getInserterItems, etc.) , but you're right that we don't need to export these new selectors.

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

The latest commit removes the new selectors, in exchange for the approach of using the same full selectors everywhere, but changing the dependencies where possible. This way, they won't invalidate on attribute changes, but that's ok if the attributes are not being used.

@@ -1944,7 +2085,7 @@ export const getInserterItems = createSelector(
},
( state, rootClientId ) => [
state.blockListSettings[ rootClientId ],
state.editor.present.blocks,
state.editor.present.blocks.byClientId,

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

I think we probably need .order here as well right?

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

Actually, I'm not certain

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

From what I can tell, block order should not affect inserter items, but I could be wrong. Is there someone with more domain-specific knowledge of inserter items that could weigh in?

This comment has been minimized.

@gziolo

gziolo Nov 26, 2018

Member

Nested blocks might have some impact on that, but as noted in the other PR, I would love us to explore whether we can skip it here.

This comment has been minimized.

@noisysocks

noisysocks Nov 27, 2018

Member

getInserterItems() relies on .order in two ways:

  1. It uses getClientIdsWithDescendants() which in turn uses .order. This is to set an inserter item's disabled attribute which is true when a block has already been used.
  2. It uses canIncludeReusableBlockInInserter() which in turn uses isAncestorOf() which in turn uses .order This is to stop reusable column blocks from being inserted into themselves which causes an infinite loop.

(1) should be safe to ignore because inserting or removing a block modifies .byClientId as well. (2) I'm less sure about—I think it's safe to ignore because one cannot create that infinite loop scenario by only re-ordering blocks.

This comment has been minimized.

@sgomes

sgomes Nov 27, 2018

Contributor

I'm not too concerned with the performance implications of depending on .order, from what I've seen, so I'll add in that dependency just to be on the safe side. If this were to produce any issues due to the intricate dependencies you're describing, it could get extremely difficult to debug, so it's not worth the risk.

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

so I'll add in that dependency just to be on the safe side.

It still needs to be added?

This comment has been minimized.

@sgomes

sgomes Nov 29, 2018

Contributor

Had forgotten to push my changes.

@sgomes sgomes force-pushed the update/improve-typing-performance branch 3 times, most recently from a26dce0 to de80620 Nov 26, 2018

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 26, 2018

All tests should now pass, and the code should be ready for a full review.

Addressed all existing comments and fixed everything except for removing the nesting of attributes inside attributesByClientId. @youknowriad: if you'd still like me to go ahead with that let me know, and I'll create a new commit in this PR for that.

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 26, 2018

With the current set of fixes, performance is improved by responding to keyboard input events ~17% faster.

@youknowriad

This comment has been minimized.

Copy link
Contributor

youknowriad commented Nov 26, 2018

Unit tests are passing but the e2e tests are not. I'll see if I can find why.

@sgomes sgomes force-pushed the update/improve-typing-performance branch from 25f39ef to 85ff9fb Nov 26, 2018

getBlockDependantsCacheBust( state, clientId ),
state.editor.present.blocks.order[ clientId ],

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

What's the reasoning here? I think we're depending on order in general and not only order[ clientId ] because the nesting in this state is only one level deep.

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

Unless I'm mistaken, since getBlockDependantsCacheBustcalls getBlock for each child, we'll recursively depend on the entire set of IDs. Therefore, each getBlock only needs to worry about its own direct children, where it comes to order.

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

Unless I'm mistaken, since getBlockDependantsCacheBustcalls getBlock for each child, we'll recursively depend on the entire set of IDs. Therefore, each getBlock only needs to worry about its own direct children, where it comes to order.

Isn't this already encompassed by including getBlockDependantsCacheBust as a dependant? i.e. this line shouldn't be necesary at all?

This comment has been minimized.

@sgomes

sgomes Nov 29, 2018

Contributor

Correct, it's redundant. Removed.

( state ) => [
state.editor.present.blocks,
]
( state, rootClientId ) => {

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

I think this should always be state.editor.present.blocks because even if we have a rootClientId we still want to access the attributes and content of the nested blocks. I suspect this change is the culprit breaking the e2e tests.

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

Oh, right, because of the nesting! Excellent catch!

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

Modified this to reuse the dependants of getBlock if a rootClientId is provided.

state.editor.present.edits.meta,
state.initialEdits.meta,
state.currentPost.meta,
state.editor.present.blocks,
...mapClientIds( clientIds, ( clientId ) => getBlock( state, clientId ) ),

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

Do we really need to call getBlock here`. I suspect this harm performance more than it helps.

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

I think in general the dependencies should be left as simple as possible. Even if state.editor.present.blocks changes often, it's probably more performant to run the selector again instead of doing the less-performant memoization on each call. I have a similar concern for getMultiSelectedBlocks

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

No problem. If this affects things too much we can always change it back.

...state,
[ action.clientId ]: {
...state[ action.clientId ],
...omit( action.updates, 'attributes' ),

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

This value is already assigned above as changes. Avoiding a second omit would be ideal.

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

Good catch!

return state;
},

attributesByClientId( state = {}, action ) {

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

Bikeshedding: The naming of the properties of blocks already doesn't feel very consistent, and I'm worried this moves us further in that direction. If order is also keyed by client IDs, why not name it orderByClientId ? Or conversely, name this one just attributes ? What are the contents byClientId anyways, particularly relevant to the proposed changes of pulling some values out. It makes me wonder if we'd be better served by granularity on all of the properties, rather than some ambiguous "byClientId" block-ish object shape, i.e.

{
	blocks: {
		isValid, # Object<string, boolean>
		name, # Object<string, string>
		attributes, # Object<string, Object>
		order, # Object<string, Array>
	}
}

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

I think this is probably a good idea :) each property is mapped by ClientID, making it easier to memoize selectors based on the properties used. Maybe better as a follow-up to keep the size of this PR contained?

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

Maybe better as a follow-up to keep the size of this PR contained?

I raised because I think the proposed changes shine light on the fact that we might be better served by considering all of these block values in the same way (hash keyed by client ID), which could lead us to some solution for generalizing the reducer implementation rather than copy/paste-ing large parts of it across as we're doing here. I'd worry we'd continue this pattern in some possible future refactor if set as a precedent.

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

I'm happy to make the change either in this PR or in a follow-up one. Let me know the final decision (when and which of the proposed conventions to go for) and I'll make it happen :)

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

Let's try to do it here then.

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

As a more general comment, following the proposed approach would essentially encode the various properties of a block as separate branches in the state tree. The benefit is performance and separation, as discussed, but this comes at the cost of more difficulty in adding more properties to block in the future (or changing them), I would say.

In my mind, I'd hoped it'd be some way we could programmatically generate the objects (reducers) based on some set of expected keys (isValid, name, attributes, and I suppose originalContent and any others), and that by doing it generically we would avoid any incurred cost in adding additional properties in the future, since it'd just be a key to add to the set.

This would assume that the logic for handling each key is consistent, which I'm not entirely confident is something we can assume (at least for attributes?). In which case, the refactoring may be more difficult and thus potentially not worthwhile.

The big issue for me was in evaluating this part of state, how would you or anyone else answer the prompt "What is represented in the value of byClientId?" It's making less sense over time as we pull out separate properties to be able to call it as anything remotely resembling a block object.

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

More practically speaking, in my mind I'd had in mind some higher-order reducer akin to onSubKey, specific to handling the reducer actions relevant for blocks, on each block key.

So something roughly like:

BLOCK_KEYS = [ 'name', 'isValid', 'originalContent', 'attributes' /* ? */ ]
blocks: combineReducers( BLOCK_KEYS.map( ( key ) => [ key, onBlockSubKey( key ) ] ) ) )

This comment has been minimized.

@sgomes

sgomes Nov 27, 2018

Contributor

Right, that may be doable with a few exceptions here and there. I'll take a look!

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

If order is also keyed by client IDs, why not name it orderByClientId ? Or conversely, name this one just attributes ?

Even if we're deferring a greater refactor, I'm still inclined to align these two. Thoughts on renaming to just attributes?

This comment has been minimized.

@sgomes

sgomes Nov 29, 2018

Contributor

Sure, no problem! I have no strong opinions there. Added a commit with the change.

@@ -754,6 +766,11 @@ export const getGlobalBlockCount = createSelector(
]
);

const mapClientIds = ( clientIds, fn ) => map(
castArray( clientIds ),
( clientId ) => fn( clientId )

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

Minor: This line could be just fn ?

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

The method doesn't exist anymore.

@@ -754,6 +766,11 @@ export const getGlobalBlockCount = createSelector(
]
);

const mapClientIds = ( clientIds, fn ) => map(

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

Not loving a randomly-placed, un-documented, un-tested utility function located here. Thinking either placement in a utilities file, or placement at top of function to have some arrangement of selectors vs. non-selectors. Is it just handling the coercion of castArray?

I don't see a ton of value in pulling this out as a function vs. just inlining.

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

Removed, no longer needed.

@@ -1140,7 +1156,7 @@ const isAncestorOf = createSelector(
return possibleAncestorId === idToCheck;
},
( state ) => [
state.editor.present.blocks,
state.editor.present.blocks.order,

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

Are these revisions relevant to the proposed change? Or are these things which are technically separate issues?

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

Actually, these changes are one of the important changes in this PR as we don't invalidate these selectors unless the inserted blocks (order) change instead of any attribute change.

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

I can see it would have an important impact on the overall benefit of the memoization; but is it specifically related to pulling out attributes to a separate state path?

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

Actually, you're right, this one in particular is not directly related.

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

I'm happy enough to create a separate PR for this, but it doesn't seem like it would be outlandish to keep it in this PR either.

@@ -273,6 +300,48 @@ const withBlockReset = ( reducer ) => ( state, action ) => {
return reducer( state, action );
};

/**
* Higher-order reducer which targets the combined blocks reducer and handles

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

What does "handles" entail? It's not immediately obvious to me as a reviewer now, let alone a future hypothetical maintainer.

This comment has been minimized.

@youknowriad

youknowriad Nov 26, 2018

Contributor

To clarify a bit: This HoR only updates the attributes of blocks not the "byClientId" part but we need to access "byClientId" to know the name of the blocks as we only want to change the core/block blocks.

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

I'll add a clarification that this HoR exists because it needs access to both branches of the state tree simultaneously.

newState.attributesByClientId = {};

Object.keys( state.byClientId ).forEach( ( blockId ) => {
const block = { ...state.byClientId[ blockId ] };

This comment has been minimized.

@aduth

aduth Nov 26, 2018

Member

Why do we create shallow clones?

This comment has been minimized.

@sgomes

sgomes Nov 26, 2018

Contributor

That's a good question. This must have stayed behind as I was figuring out the reducer. Sorry about that. Fixed!

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 26, 2018

Current status: all tests pass, improvement vs. master (at branching point) at 20%.

@noisysocks
Copy link
Member

noisysocks left a comment

This is working well in my testing. Nice work! 😍

I left some mostly stylistic comments.

newState.attributesByClientId = {};

Object.keys( state.byClientId ).forEach( ( blockId ) => {
const block = newState.byClientId[ blockId ];

This comment has been minimized.

@noisysocks

noisysocks Nov 27, 2018

Member

nit: _.forEach could simplify this a little as it passes the key to the given callback.

forEach( state.byClientId, ( block, blockId ) => {

This comment has been minimized.

@sgomes

sgomes Nov 27, 2018

Contributor

Thanks! I don't think I had ever used lodash's forEach before.

* @return {Function} Enhanced reducer function.
*/
const withSaveReusableBlock = ( reducer ) => ( state, action ) => {
let newState = state;

This comment has been minimized.

@noisysocks

noisysocks Nov 27, 2018

Member

Is this variable necessary? We could re-assign state, instead. That would make it a little clearer below on lines 327 and 328 that state.byClientId[ blockId ] is the same object as newState.byClientId[ blockId ].

This comment has been minimized.

@sgomes

sgomes Nov 27, 2018

Contributor

Yup, absolutely right. If we're just reassigning, we don't need the extra variable.


Object.keys( state.byClientId ).forEach( ( blockId ) => {
const block = newState.byClientId[ blockId ];
let attributes = { ...state.attributesByClientId[ blockId ] };

This comment has been minimized.

@noisysocks

noisysocks Nov 27, 2018

Member

We shouldn't need to clone this object both here and then again on line 333.

This comment has been minimized.

@sgomes

sgomes Nov 27, 2018

Contributor

Fixed as part of using mapValues.

newState = { ...state };
newState.attributesByClientId = {};

Object.keys( state.byClientId ).forEach( ( blockId ) => {

This comment has been minimized.

@noisysocks

noisysocks Nov 27, 2018

Member

nit: Naming this param clientId would be more consistent with how we refer to these strings elsewhere in Gutenberg.

This comment has been minimized.

@sgomes

sgomes Nov 27, 2018

Contributor

👍

return state;
}

newState = { ...state };

This comment has been minimized.

@noisysocks

noisysocks Nov 27, 2018

Member

If I'm honest, I'm having a hard time keeping track of the object cloning and mutation in this function. Are we optimising for performance here? SAVE_REUSABLE_BLOCK_SUCCESS is dispatched relatively rarely, so I'm not sure that optimising this is worth sacrificing readability.

Could we use mapValues, as before?

newState.attributesByClientId = mapValues( newState.attributesByClientId, ( attributes, clientId ) => {
	const { name } = newState.byClientId[ clientId ];
	if ( name === 'core/block' && attributes.ref === id ) {
		return {
			...attributes,
			ref: updatedId,
		};
	}

	return attributes;
} );

This comment has been minimized.

@sgomes

sgomes Nov 27, 2018

Contributor

I over-engineered this reducer as I was figuring out what it was doing, and never went back to see if I could rewrite it in a simpler way. Thanks for that, it's much better! 👍

if ( nextAttributes === state[ action.clientId ].attributes ) {
// Do nothing if only attributes change.
const changes = omit( action.updates, 'attributes' );
if ( keys( changes ).length === 0 ) {

This comment has been minimized.

@noisysocks

noisysocks Nov 27, 2018

Member

nit: Could use _.isEmpty here.

if ( isEmpty( changes ) ) {

This comment has been minimized.

@sgomes

sgomes Nov 27, 2018

Contributor

Done.

@@ -1069,6 +1078,15 @@ export const getMultiSelectedBlockClientIds = createSelector(
],
);

const mapMultiSelectedBlockClientIds = ( state, fn ) => {

This comment has been minimized.

@noisysocks

noisysocks Nov 27, 2018

Member

What's the benefit of moving this to its own helper function?

This comment has been minimized.

@noisysocks

noisysocks Nov 27, 2018

Member

We generally prefer to avoid using abbreviations when naming variables in Gutenberg as they aren't all that descriptive for future maintainers. Something like callback or iteratee would be a better name for fn here.

This comment has been minimized.

@sgomes

sgomes Nov 27, 2018

Contributor

There's no advantage anymore. This was being used in multiple places, but after other changes, that's no longer the case. Good catch! Moving into getMultiSelectedBlocks.

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

Moving into getMultiSelectedBlocks.

It still needs to be moved?

This comment has been minimized.

@sgomes

sgomes Nov 29, 2018

Contributor

Looks like I had forgotten to push my changes.

@@ -1944,7 +2085,7 @@ export const getInserterItems = createSelector(
},
( state, rootClientId ) => [
state.blockListSettings[ rootClientId ],
state.editor.present.blocks,
state.editor.present.blocks.byClientId,

This comment has been minimized.

@noisysocks

noisysocks Nov 27, 2018

Member

getInserterItems() relies on .order in two ways:

  1. It uses getClientIdsWithDescendants() which in turn uses .order. This is to set an inserter item's disabled attribute which is true when a block has already been used.
  2. It uses canIncludeReusableBlockInInserter() which in turn uses isAncestorOf() which in turn uses .order This is to stop reusable column blocks from being inserted into themselves which causes an infinite loop.

(1) should be safe to ignore because inserting or removing a block modifies .byClientId as well. (2) I'm less sure about—I think it's safe to ignore because one cannot create that infinite loop scenario by only re-ordering blocks.

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 27, 2018

Thanks for the review, @noisysocks ! I made the changes you suggested and will now explore separating all block keys into their own branch (not just attributes), as suggested by @aduth.

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 27, 2018

I've successfully implemented the changes suggested by @aduth in my local machine, but as a result, performance has absolutely tanked.

I'll spend some time tomorrow trying to figure out why, but in the interest of keeping the PR in a good state, I'm not submitting those changes until I figure out the root cause and whether it's fixable.

@aduth

This comment has been minimized.

Copy link
Member

aduth commented Nov 27, 2018

@sgomes Could you put the contents of git diff | pbcopy into a Gist so that others could potentially evaluate in the meantime?

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 28, 2018

@sgomes Could you put the contents of git diff | pbcopy into a Gist so that others could potentially evaluate in the meantime?

Here's the WIP diff: https://gist.github.com/sgomes/1b38919a07ba02a0712e390fa6b263cc

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 28, 2018

I spent all of yesterday implementing and validating the proposed changes by @aduth, and a considerable amount of time today trying to understand why they don't perform as expected.

My conclusion so far is that getBlock isn't memoizing as expected, but I can't figure out why, as the items that are added to the dependants array stay equal by reference, as far as I can tell.

My suggestion is to review the PR as it is, and leave the proposed changes for later if needed for clarity, which I believe was the main argument. Changes of this nature are very well contained, affecting only the reducers, the selectors, and their corresponding tests, so it doesn't seem to me like too much of an issue to change things twice if needed.

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 28, 2018

It appears that rememo doesn't handle native-value (e.g. string, boolean) dependants the way I expected it would; that is, by directly comparing them with their past values, taking into account the getDependant arguments.

It's not clear to me whether this is a bug in rememo or an intentional design decision, but the fact that it tries to record the lastDependents (but does so apparently regardless of the getDependant arguments) leads me to believe it might be a bug. I haven't fully grokked the library code, though, so I could be wrong.

@aduth

This comment has been minimized.

Copy link
Member

aduth commented Nov 28, 2018

It appears that rememo doesn't handle native-value (e.g. string, boolean) dependants the way I expected it would; that is, by directly comparing them with their past values, taking into account the dependant function arguments.

Would you be able to put together a minimal example demonstrating the issue? I'm not sure I follow what's unexpected.

FYI, the rememo package is my own, so I'd be glad to help clarify or resolve any issues.

There is some distinct treatment (more an optimization) for objects and arrays as dependants, since these can be used as keys of a WeakMap to avoid explicitly invalidating a cache value.

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 28, 2018

Would you be able to put together a minimal example demonstrating the issue? I'm not sure I follow what's unexpected.

No problem, here you go: https://codepen.io/anon/pen/zMmPJo?editors=0011

Let me know if there's anything unclear about the example!

@aduth

This comment has been minimized.

Copy link
Member

aduth commented Nov 28, 2018

Thanks for the example @sgomes . On further evaluation, I think this is working as intended, though I would agree that it is quite confusing, and is one of the reasons I was reluctant to provide arguments to the getDependants function (in fact, the very earliest implementation upon which it was inspired did not pass them).

The result of getDependants is effectively a statement of "if any of these values change, then clear the cache". "Clear the cache" in a naive case can mean the entire cache for the selector, but it was "enhanced" (arguably in retrospect 😄 ) to limit the cache invalidation when the dependants are all object-like. I think this explains the difference in behavior you're seeing for object vs. primitive types.

See also: aduth/rememo#1

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 28, 2018

I see. That's... problematic. 😅 As far as I can tell that mostly invalidates the state structure you suggested, since it would effectively not be memoizable per-clientId, unless I'm missing something? At least without custom memoization.

blocks: {
  name: { A: 'foo', B: 'bar' },
  attributes: { A: {...}, B: {...} },
  isValid: { A: true, B: true },
  ...
  order: [ ... ]
}

I mean, from what you're saying, per-clientId memoization in getBlock currently only works because all of the dependants in there happen to be non-primitive types, correct?

FWIW, wp-calypso has a simple createSelector library that achieves memoization based on the dependant function parameters, by generating a cache key based on a simple join of the parameters. The restriction there becomes that the parameters should all be primitive types, but that's usually not an issue.

In any case, this all makes me wonder more whether it's possible to keep this PR as-is for the performance benefits, and look into how to achieve the perfect state structure later? 😁

@aduth

This comment has been minimized.

Copy link
Member

aduth commented Nov 28, 2018

I mean, from what you're saying, per-clientId memoization in getBlock currently only works because all of the dependants in there happen to be non-primitive types, correct?

Yes, that's correct.

As far as I can tell that mostly invalidates the state structure you suggested, since it would effectively not be memoizable per-clientId, unless I'm missing something? At least without custom memoization.

I suppose this is true, yes.

FWIW, wp-calypso has a simple createSelector library that achieves memoization based on the dependant function parameters, by generating a cache key based on a simple join of the parameters. The restriction there becomes that the parameters should all be primitive types, but that's usually not an issue.

It still destroys the entire cache if dependants references change:

https://github.com/Automattic/wp-calypso/blob/77f67b116e45d1c1cb75e50aae77b01a34fa39c1/client/lib/create-selector/index.js#L102-L104

This is where Calypso's treeSelect is meant to serve to have more granular cache invalidation:

https://github.com/Automattic/wp-calypso/tree/master/client/lib/tree-select
Automattic/wp-calypso#20547

rememo is effectively implemented as an amalgamation of the two, adopting optimizations like those found in treeSelect when it detects being possible.

In any case, this all makes me wonder more whether it's possible to keep this PR as-is for the performance benefits, and look into how to achieve the perfect state structure later? 😁

I'll plan to give this another pass over, but yes, I'd generally be agreeable to a more immediate resolution.

*
* @param {Array} blocks Blocks to flatten.
* @param {*} blocks Blocks to flatten.

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

Why was the type changed to a wildcard * ? It's an Array.

This comment has been minimized.

@sgomes

sgomes Nov 29, 2018

Contributor

That's odd, I don't remember changing that. I need to check my IDE settings for JSDoc. Good catch, thanks!

*
* @param {Array} blocks Blocks to flatten.
* @param {*} blocks Blocks to flatten.
* @param {*} transform Transforming function to be applied to each block.

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

It's not a wildcard *, it's a Function.

This comment has been minimized.

@sgomes

sgomes Nov 29, 2018

Contributor

👍

@@ -1069,6 +1078,15 @@ export const getMultiSelectedBlockClientIds = createSelector(
],
);

const mapMultiSelectedBlockClientIds = ( state, fn ) => {

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

Moving into getMultiSelectedBlocks.

It still needs to be moved?

@@ -1944,7 +2085,7 @@ export const getInserterItems = createSelector(
},
( state, rootClientId ) => [
state.blockListSettings[ rootClientId ],
state.editor.present.blocks,
state.editor.present.blocks.byClientId,

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

so I'll add in that dependency just to be on the safe side.

It still needs to be added?

return state;
},

attributesByClientId( state = {}, action ) {

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

If order is also keyed by client IDs, why not name it orderByClientId ? Or conversely, name this one just attributes ?

Even if we're deferring a greater refactor, I'm still inclined to align these two. Thoughts on renaming to just attributes?

getBlockDependantsCacheBust( state, clientId ),
state.editor.present.blocks.order[ clientId ],

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

Unless I'm mistaken, since getBlockDependantsCacheBustcalls getBlock for each child, we'll recursively depend on the entire set of IDs. Therefore, each getBlock only needs to worry about its own direct children, where it comes to order.

Isn't this already encompassed by including getBlockDependantsCacheBust as a dependant? i.e. this line shouldn't be necesary at all?

];
}
return [
getBlock.getDependants( state, rootClientId ),

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

This is wrong because it will produce a nested array. getDependants returns a newly-generated array every time it's called, so the dependants will always be different. This should either (a) spread ... into the return array or (b) be returned directly as the return value.

This comment has been minimized.

@sgomes

sgomes Nov 29, 2018

Contributor

Excellent catch! Silly mistake; fixed.

// Skip update if nothing has been changed. The reference will
// match the original block if `reduce` had no changed values.
if ( nextAttributes === state[ action.clientId ].attributes ) {
// Do nothing if only attributes change.

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

What does / should this do if the update is for innerBlocks?

This comment has been minimized.

@sgomes

sgomes Nov 29, 2018

Contributor

Is it valid semantics to modify inner blocks with an UPDATE_BLOCK instead of MOVE_* and other actions? I assumed not, since that possibility isn't covered in the existing code. Or rather, I assumed no special attention would need to be paid to it. Should I add a check to discard any innerBlocks changes?

This comment has been minimized.

@aduth

aduth Dec 4, 2018

Member

Is it valid semantics to modify inner blocks with an UPDATE_BLOCK instead of MOVE_* and other actions?

It's a downside of the "update" action being so generic, but hypothetically it should be able to expect receiving any properties of a block object (the value returned by createBlock).

This comment has been minimized.

@aduth

aduth Dec 4, 2018

Member

While I think it's something we should respect, we already don't have consideration for a changed innerBlocks from the order subtree, so I think it'd be fine to defer.

};

case 'REPLACE_BLOCKS':
if ( ! action.blocks ) {

This comment has been minimized.

@aduth

aduth Nov 28, 2018

Member

I know this was inherited, but looking at it, it seems quite wrong. We call replaceBlocks with an empty array when, for example, removing a paragraph:

// If before content is omitted, treat as intent to delete block.
onReplace( [] );

If I'm reading this right, this behavior would leave lingering state for the removed block?

This comment has been minimized.

@sgomes

sgomes Nov 29, 2018

Contributor

An empty array is truthy, so it seems to me that this should work correctly, unless I'm missing something?

This comment has been minimized.

@aduth

aduth Nov 29, 2018

Member

An empty array is truthy, so it seems to me that this should work correctly, unless I'm missing something?

That's a good point. Makes me wonder why the condition exists at all then 🤷‍♂️

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 29, 2018

@aduth I believe I addressed all comments, even though the GitHub UI makes it look otherwise. Let me know if I missed anything!

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 29, 2018

E2E tests seem to be consistently timing out. Not sure if this reflects a problem in the code or merely CI issues.

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Nov 29, 2018

This time the measured improvement is ~26%.

@aduth

This comment has been minimized.

Copy link
Member

aduth commented Dec 3, 2018

There were end-to-end failures across the board last week which have since been resolved. They should be cleared up if you rebase your branch or merge the latest master.

In the meantime, I'll give this another review pass.

@aduth

This comment has been minimized.

Copy link
Member

aduth commented Dec 3, 2018

I think there's some reconciliation needed between this and the since-merged #11899, which had introduced a new getBlockAttributes selector.

@sgomes sgomes force-pushed the update/improve-typing-performance branch from 3fbd342 to 0a2fc82 Dec 4, 2018

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Dec 4, 2018

Squashed all commits to simplify rebasing. Working on that now.

Improve typing performance by splitting attributes in state tree.
Attributes have been moved to `attributesByClientId` in order to
reduce the impact of typing throughout the state tree.
See #11782.

Review fixes pass #1

Simplify block flattening code in reducers.

Remove alignment toolbar optimization; should be a different PR

Fix minor bug in test

Fix failing getBlockDependantsCacheBust test

Remove new `*WithoutAttributes` selectors.

We'll go with a different approach: use the existing selectors, but
keep the dependencies as they are. The attributes may get stale, but
it doesn't matter if they're not being used.

Change attributesById structure to not have an inner attributes

Simplifying some selector dependencies

Simplify withSaveReusableBlock a bit further and clarify its existence

Reuse constant instead of running omit again

Remove no longer needed mapClientIds

Simplifying reducers after review comments.

Further changes to selectors after review

Fix types in JSDoc

Renaming attributesByClientId to attributes

Further selector fixes after review

@sgomes sgomes force-pushed the update/improve-typing-performance branch from 0a2fc82 to 99f8a2b Dec 4, 2018

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Dec 4, 2018

Rebased on master and made to work well with getBlockAttributes.

@iseulde

This comment has been minimized.

Copy link
Member

iseulde commented Dec 4, 2018

@sgomes Will this help with garbage collection at random function calls that I'm seeing in master, sometimes multiple times per key press, other times after several key presses?

Sometimes this adds up to over 14MB and I've seen this to take over 130ms.

screen shot 2018-12-04 at 12 33 57

screen shot 2018-12-04 at 13 01 10

@sgomes

This comment has been minimized.

Copy link
Contributor

sgomes commented Dec 4, 2018

@iseulde It's possible, but that's hard to answer definitively; garbage collection is somewhat opaque and hard to trace to what's taking place in memory exactly. This PR does reduce the amount of work done when typing (particularly where it comes to selectors), but it's difficult to say whether that translates into any benefits in terms of fewer reallocations taking place.

My advice would be to measure the PR branch versus master several times and see if there seems to be a statistical difference between the two.

},
};

case 'UPDATE_BLOCK_ATTRIBUTES':

This comment has been minimized.

@aduth

aduth Dec 4, 2018

Member

Kinda makes me wish we'd normalized UPDATE_BLOCK_ATTRIBUTES to use UPDATE_BLOCK, but it's also not something I think we ought to be addressing here.

state.editor.present.blocks,
]
( state, rootClientId ) => {
if ( ! rootClientId ) {

This comment has been minimized.

@aduth

aduth Dec 4, 2018

Member

Given the prior noted observation about dependants caching not occurring per arguments set, I'm not sure this is a change we should want to make. Is it even relevant to the changes, or could it at least be considered separately?

This comment has been minimized.

@sgomes

sgomes Dec 4, 2018

Contributor

Nothing like measuring :) I couldn't find any meaningful difference between the two, so I reverted back to the simpler code as per your suggestion.

@aduth

aduth approved these changes Dec 4, 2018

Copy link
Member

aduth left a comment

Looks good 👍 Let's hold off on merging until the code freeze is released.

@SchneiderSam

This comment has been minimized.

Copy link

SchneiderSam commented Dec 9, 2018

Technically I have no idea, but I wonder if it doesn't make sense to add this PR to the 5.0.1 roadmap, especially since this bug should be solved with 5.0.1:

screenshot_9

If that has nothing to do with it: Sorry ;-)

You are great! Thank you very much for your great work!!!

@aduth

This comment has been minimized.

Copy link
Member

aduth commented Dec 11, 2018

Given it is higher risk than other related performance effort, it's been slated for 4.8 (WP 5.0.2) rather than the more-immediate 5.0.1.

@youknowriad youknowriad merged commit 4f6fdbd into master Dec 20, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@youknowriad youknowriad deleted the update/improve-typing-performance branch Dec 20, 2018

@youknowriad

This comment has been minimized.

Copy link
Contributor

youknowriad commented Dec 20, 2018

Awesome work on this PR @sgomes Thanks.

youknowriad added a commit that referenced this pull request Jan 3, 2019

Improve typing performance by splitting attributes in state tree. (#1…
…2312)

* Improve typing performance by splitting attributes in state tree.

Attributes have been moved to `attributesByClientId` in order to
reduce the impact of typing throughout the state tree.
See #11782.

Review fixes pass #1

Simplify block flattening code in reducers.

Remove alignment toolbar optimization; should be a different PR

Fix minor bug in test

Fix failing getBlockDependantsCacheBust test

Remove new `*WithoutAttributes` selectors.

We'll go with a different approach: use the existing selectors, but
keep the dependencies as they are. The attributes may get stale, but
it doesn't matter if they're not being used.

Change attributesById structure to not have an inner attributes

Simplifying some selector dependencies

Simplify withSaveReusableBlock a bit further and clarify its existence

Reuse constant instead of running omit again

Remove no longer needed mapClientIds

Simplifying reducers after review comments.

Further changes to selectors after review

Fix types in JSDoc

Renaming attributesByClientId to attributes

Further selector fixes after review

* Revert changes to getBlocks dependants
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment