Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ghost Importer: Current State, Problems and Future Improvements #5422

Closed
10 of 18 tasks
cobbspur opened this issue Jun 11, 2015 · 2 comments
Closed
10 of 18 tasks

Ghost Importer: Current State, Problems and Future Improvements #5422

cobbspur opened this issue Jun 11, 2015 · 2 comments
Labels
import / upgrade life-cycle stuff: Importing, exporting, upgrading and migrating server / core Issues relating to the server or core of Ghost

Comments

@cobbspur
Copy link
Member

cobbspur commented Jun 11, 2015

This Issue is to pick up where Issue #4605 left off and consolidate the issues surrounding the importer into one place. It is also to generate a conversation on what people would like/expect to see from the importer.

Current State

As it stands the Importer can handle zip files containing images, markdown files and the official JSON formatted file but not raw JSON. It should probably be opened up to take raw JSON data as well.

Problems

There is some feedback at the moment if the import fails such as a post with an unknown user_id but there are other cases where it simply says the import has failed with no other useful information so we need to investigate all possible causes of failure and display more information if possible.

There is currently no visual feedback of potential problems. We check the import file for posts and tags that have a matching slug, title or name but we do not display this information or compare it with the current posts and tags in the database which leads to duplications. This can be found as well in Issue #4685

There is no visual feedback of what the import file contained (I am about to make a PR for this to get the ball rolling - it is fairly rough and ready at the moment but better no information at all)

There are comments in the codebase about implementing a Dry-run. i.e. testing the import file, comparing with the current database then providing feedback to the user prior to actually performing the import. We could decide to complete the whole import process if no problems are found or have a check box to let the user see details of the import first either way such as number of posts etc.

The feedback could come with options on how to merge the data - e.g. for a duplicate post - the user could either:

  • use current database version of post
  • use importer version of post
  • use both posts (incrementing onto current slug as currently happens)
  • use post with most recent published_at/updated_at value

The UI needs some love as well. I was playing around with moving the importer into a modal. This is okay but I am not sure it is really required, perhaps the feedback would be nicer in a modal rather than the labs page. There was some discussion on the UI in #3927

Large import files can really mess with Ghost - presumably the import times out somewhere so this needs further investigation and either files have to be intelligently chopped up/chunked somehow.

There was also discussion at some point about creating an export or backup (more like the migrations backup) when importing - this could be automatic or provided as an option to the user and this is something I think is necessary for the delete content button. A Backup is probably better than export (maybe choose a limit on backups in the codebase or through the admin-UI). Discussion on this would be great - such as whether people agree or disagree and preferred approach if any.

Future improvements

  • It would be nice if we could have specific importers from various other platforms either via some automated check of the file/page or via a dropdown menu.
  • Image acquistion - Checking through posts for images and downloading them automatically could be tricky but would greatly enhance the migration process.
  • Create flows in modals or otherwise for users to fix through problems found - e.g. if posts have an unknown user_id they could see the users available and select one, or create a new user rather than having to dig into the import file.

Required tasks

Optional/Future tasks

  • add all weird edge cases as tests
  • Image Acquisition
  • Importing JSON database with wrong date format throws deprecation warning #8015: importing JSON database with wrong date format throws deprecation warning
  • discuss/implement backups
  • fix importer for large import files
  • split the importer processing up to allow for a dry-run option
  • provide more controls to user on how to merge duplicates
  • warnings we can add: two users are imported with same email, but different slug (slug is regenerated)
  • optimisation: trigger an export before import
@ErisDS ErisDS added epic import / upgrade life-cycle stuff: Importing, exporting, upgrading and migrating labels Jun 11, 2015
cobbspur added a commit to cobbspur/Ghost that referenced this issue Jun 16, 2015
Issue TryGhost#5422

- change db.api response to post
- show details of import file
- show problems with import file
- remove notifications
@ErisDS ErisDS added import / upgrade life-cycle stuff: Importing, exporting, upgrading and migrating and removed import / upgrade life-cycle stuff: Importing, exporting, upgrading and migrating labels Mar 20, 2017
@kirrg001 kirrg001 added this to the 1.0.0 Beta Ready milestone Mar 20, 2017
@ErisDS ErisDS modified the milestones: 1.0.0 Beta Ready, 1.0.0 Launch Apr 3, 2017
@kirrg001 kirrg001 removed this from the 1.0.0 Backlog milestone May 12, 2017
@kirrg001 kirrg001 removed their assignment May 12, 2017
@kirrg001 kirrg001 added the server / core Issues relating to the server or core of Ghost label May 12, 2017
kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017
refs TryGhost#5422

- it was only supported to add roles by id e.g. roles = [1] or by object e.g. roles = [{....}]
- we use the name strategy also for tags e.g. tags = ['my-tag']
- we support this for roles as well, this makes it easier when importing related user roles (because usually roles already exists in the database and the related id's are wrong e.g. roles_users)
kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017
refs TryGhost#5422

- if the json file contains null values for created_at/updated_at
- can be handled on model layer and not on the importer layer
kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017
refs TryGhost#5422

- post or tag slugs are always safe strings
- if e.g. you import a null slug, we can handle this, no need to crash or to cover this on import layer
kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017
refs TryGhost#5422

- basically nothing has changed here, just the none usage of the old importer and the result of problems/warnings while importing
kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017
refs TryGhost#5422

- this is the new integration of the importer
- i have used a class inheritance mechanismn to achieve an easier readability and maintenace
- most of the logic bits were copied from the old importer or modified/easified
- here a list:
    - schema validation (happens on model layer) was ignored
    - allow to import unknown user id's (see TryGhost#8365)
    - most of the duplication handling happens on model layer (we can use the power of unique fields and errors from the database)
- this PR has not concentrated on clear error messages or logs (this comes tomorrow)
- i've added debug messages
- the import is splitted into three steps:
  - beforeImport
    --> prepares the data to import, sorts out relations (roles, tags), detects fields (for LTS)
  - doImport
    --> does the actual import
  - afterImport
    --> updates the data after successful import e.g. update all user reference fields e.g. published_by (compares the imported data with the current state of the database)
    --> the background why we are doing it after the import is that we don't have to write complicated logic to detect outdated user references before we know which user get's which id
    --> plus: we don't care about the order of inserting the data
    --> keep in mind: we never ever import id's, that's why user references are usually outdated
    --> e.g. you import a post with `published_by:2`, but user 2 get's a new ObjectId 'X', we have to update the reference
kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017
kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017
refs TryGhost#5422

- simply move to the correct folder
kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017
refs TryGhost#5422

- the markdown field will be removed soon, see TryGhost#8479
- if you import images the importer crashes if markdown is null e.g. the welcome post has no markdown field anymore (results in null)
kirrg001 added a commit to kirrg001/Ghost-Admin that referenced this issue May 23, 2017
cobbspur pushed a commit that referenced this issue May 23, 2017
refs #5422

- we can support null titles after this PR if we want
- user model: fix getAuthorRole
- user model: support adding roles by name
- we support this for roles as well, this makes it easier when importing related user roles (because usually roles already exists in the database and the related id's are wrong e.g. roles_users)
- base model: support for null created_at or updated_at values
- post or tag slugs are always safe strings
- enable an import of a null slug, no need to crash or to cover this on import layer
- add new DataImporter logic
    - uses a class inheritance mechanism to achieve an easier readability and maintenance
    - schema validation (happens on model layer) was ignored
    - allow to import unknown user id's (see #8365)
    - most of the duplication handling happens on model layer (we can use the power of unique fields and errors from the database)
- the import is splitted into three steps:
  - beforeImport
    --> prepares the data to import, sorts out relations (roles, tags), detects fields (for LTS)
  - doImport
    --> does the actual import
  - afterImport
    --> updates the data after successful import e.g. update all user reference fields e.g. published_by (compares the imported data with the current state of the database)
- import images: markdown can be null
- show error message when json handler can't parse file
- do not request gravatar if email is null
- return problems/warnings after successful import
- optimise warnings in importer
- do not return warnings for role duplications, no helpful information
- error handler: return context information of error
- we show the affected json entries as one line in the UI
- show warning for: detected duplicated tag
- schema validation: fix valueMustBeBoolean translation
- remove context property from json parse error
kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 23, 2017
@kirrg001 kirrg001 added the later [triage] Things we intend to work but are not immediate priority label May 23, 2017
cobbspur pushed a commit that referenced this issue May 23, 2017
refs #6967, refs #5422

- imports posts tags by post id and sort order
- test: fix order problem (sqlite)
kevinansfield pushed a commit to TryGhost/Admin that referenced this issue May 24, 2017
refs TryGhost/Ghost#5422
- handles errors and warnings from returned from the server and improves visual display
- adds a reset so that errors are cleared when leaving the labs screen
- removes the unnecessary "Import failed" alert - we already show the errors on the screen, no point bugging the user even further
@ErisDS
Copy link
Member

ErisDS commented Jul 31, 2017

@kirrg001 is this issue up to date? You fixed so many things with the importer already, I wonder if its worth closing this and waiting to see what problems people run into in the real world. Also we have both #8756 and #8517 open - all tracking remaining issues in the importer. I think we can reduce issue noise here 😁

@kirrg001
Copy link
Contributor

Yeah it is up-to-date. I kept it open for future improvements - see list on top. But yeah agree, let's close it - if we want to add some more improvements from the list on top, we can still look it up here.

@kirrg001 kirrg001 removed the later [triage] Things we intend to work but are not immediate priority label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
import / upgrade life-cycle stuff: Importing, exporting, upgrading and migrating server / core Issues relating to the server or core of Ghost
Projects
None yet
Development

No branches or pull requests

3 participants