Skip to content

Increased importer perf by 50% and allow adding nested relations by foreign key#9431

Merged
kirrg001 merged 2 commits intoTryGhost:masterfrom
kirrg001:improve-importer
Feb 20, 2018
Merged

Increased importer perf by 50% and allow adding nested relations by foreign key#9431
kirrg001 merged 2 commits intoTryGhost:masterfrom
kirrg001:improve-importer

Conversation

@kirrg001
Copy link
Copy Markdown
Contributor

@kirrg001 kirrg001 commented Jan 28, 2018

no issue

Speed improvement

  • change behaviour from updating user references after the actual import to update the user reference before the actual import
    • updating user references after the import is way less case intense
    • that was the initial decision for updating the references afterwards (was easier to implement)
    • but that does not play well with adding nested relations by identifier (foreign key)
  • the refactoring is required for multiple authors
    • if we e.g. store invalid author id's, we won't be able to add a belongs-to-many relation for multiple authors
    • bookshelf-relations is generic and always tries to find a matching target before attching a model
    • invalid user references in the database are in general not acceptable
    • we have to keep backwards compatibility in the importer
    • e.g. you import author_id, we have to detect that and insert the correct post_authors relation
    • but if we try to add posts_authors with an invalid author_id, bookshelf-relations will complain, because it ensures that adding nested relations requires a valid and existent target (it does not allow adding broken references to the database)

This change has a very good side affect

  • 17mb takes on master ~1,5m
    • on this branch it takes ~42seconds 🙊
  • 40mb times out on master
    • on this branch it takes 1,5m
  • memory looks like the same

Insert nested tags by foreign key

  • because of the importer change, we can proof that adding nested relations by foreign key works
  • tags runs already through bookshelf-relations
  • replace logic for preparing nested tags
    • we added nested relations by name
    • this is not needed, because the matching tag was already imported
    • also: the notation post.tags=[{name: 'x'}] doesn't even make use of the target tag id
    • bookshelf-relations is smart and can handle that case, but this notation looks like you would like to add a tag
    • adding nested tags by name can also trouble, because name is not even unique (!)
  • what we do now
    • we simply would like to add the relationship to the database
    • use same approach as base class
    • add posts_tags to target post model
    • update identifiers
    • insert relation by foreign key tag_id e.g. post.tags=[{tag_id: '...'}]
  • bump bookshelf-relations to 0.1.10 (which supports attaching relations by foreign key)

  • tested example files in the browser
  • push side-fixes to master straight (!)

@cobbspur I've assigned you. To A: share knowledge that we refactored a piece of the importer and B: it would be great if you could do a second browser test and compare the result 👍 The importer still works like the same, it's just faster. You don't have to review the code.

@kirrg001 kirrg001 mentioned this pull request Jan 28, 2018
45 tasks
Comment thread core/server/models/post.js Outdated

tags: function tags() {
return this.belongsToMany('Tag').withPivot('sort_order').query('orderBy', 'sort_order', 'ASC');
return this.belongsToMany('Tag', 'posts_tags', 'post_id', 'tag_id').withPivot('sort_order').query('orderBy', 'sort_order', 'ASC');

This comment was marked as abuse.

"created_by": 1,
"updated_at": null,
"updated_by": null
"updated_by": 10

This comment was marked as abuse.

@kirrg001
Copy link
Copy Markdown
Contributor Author

kirrg001 commented Feb 14, 2018

@cobbspur Could you please reserve a little time to test my PR? Thanks 👍 No code review needed.

@kirrg001 kirrg001 force-pushed the improve-importer branch 3 times, most recently from 1e4e22a to c133b46 Compare February 20, 2018 07:53
no issue

- change behaviour from updating user references after the actual import to update the user reference before the actual import
  - updating user references after the import is way less case intense
  - that was the initial decision for updating the references afterwards
  - but that does not play well with adding nested relations by identifier
- the refactoring is required for multiple authors
  - if we e.g. store invalid author id's, we won't be able to add a belongs-to-many relation for multiple authors
  - bookshelf-relations is generic and always tries to find a matching target before attching a model
  - invalid user references won't work anymore
- this change has a very good side affect
  - 17mb takes on master ~1,5seconds
    - on this branch it takes ~45seconds
  - also the memory usage is way lower and stabler
  - 40mb takes 1,6s (times out on master)
no issue

- replace logic for preparing nested tags
- if you have nested tags in your file, we won't update or update the target tag
- we simply would like to add the relationship to the database
- use same approach as base class
  - add `posts_tags` to target post model
  - update identifiers
  - insert relation by foreign key `tag_id`
- bump bookshelf-relations to 0.1.10
@kirrg001
Copy link
Copy Markdown
Contributor Author

Was tested by @kevinansfield, @AileenCGN and @cobbspur.
Will self-merge.

@kirrg001 kirrg001 merged commit 68d8154 into TryGhost:master Feb 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants