Ghost Importer: Current State, Problems and Future Improvements #5422

cobbspur · 2015-06-11T10:09:44Z

This Issue is to pick up where Issue #4605 left off and consolidate the issues surrounding the importer into one place. It is also to generate a conversation on what people would like/expect to see from the importer.

Current State

As it stands the Importer can handle zip files containing images, markdown files and the official JSON formatted file but not raw JSON. It should probably be opened up to take raw JSON data as well.

Problems

There is some feedback at the moment if the import fails such as a post with an unknown user_id but there are other cases where it simply says the import has failed with no other useful information so we need to investigate all possible causes of failure and display more information if possible.

There is currently no visual feedback of potential problems. We check the import file for posts and tags that have a matching slug, title or name but we do not display this information or compare it with the current posts and tags in the database which leads to duplications. This can be found as well in Issue #4685

There is no visual feedback of what the import file contained (I am about to make a PR for this to get the ball rolling - it is fairly rough and ready at the moment but better no information at all)

There are comments in the codebase about implementing a Dry-run. i.e. testing the import file, comparing with the current database then providing feedback to the user prior to actually performing the import. We could decide to complete the whole import process if no problems are found or have a check box to let the user see details of the import first either way such as number of posts etc.

The feedback could come with options on how to merge the data - e.g. for a duplicate post - the user could either:

use current database version of post
use importer version of post
use both posts (incrementing onto current slug as currently happens)
use post with most recent published_at/updated_at value

The UI needs some love as well. I was playing around with moving the importer into a modal. This is okay but I am not sure it is really required, perhaps the feedback would be nicer in a modal rather than the labs page. There was some discussion on the UI in #3927

Large import files can really mess with Ghost - presumably the import times out somewhere so this needs further investigation and either files have to be intelligently chopped up/chunked somehow.

There was also discussion at some point about creating an export or backup (more like the migrations backup) when importing - this could be automatic or provided as an option to the user and this is something I think is necessary for the delete content button. A Backup is probably better than export (maybe choose a limit on backups in the codebase or through the admin-UI). Discussion on this would be great - such as whether people agree or disagree and preferred approach if any.

Future improvements

It would be nice if we could have specific importers from various other platforms either via some automated check of the file/page or via a dropdown menu.
Image acquistion - Checking through posts for images and downloading them automatically could be tricky but would greatly enhance the migration process.
Create flows in modals or otherwise for users to fix through problems found - e.g. if posts have an unknown user_id they could see the users available and select one, or create a new user rather than having to dig into the import file.

Required tasks

refactor the importer: right now we have two different folders for the importer logic - sort that out
improve logs and error information for the user
investigate import failures with no information
Tags lost order after exporting and importing #6967: tags loosing order after importing
Prevent sending notification to Slack when importing blog data #7275: prevent sending notification to Slack when importing blog data
Duplicate entry found. Multiple values of '' found for . #8349
Can't import my backup #8365
Look at Importer: Improve Post and Tag de-duplication #4685
there were issues with duplicate posts_tags, authors with duplicate slugs and I think there was an issue where there were duplicate tags in the import file after sanitisation

Optional/Future tasks

add all weird edge cases as tests
Image Acquisition
Importing JSON database with wrong date format throws deprecation warning #8015: importing JSON database with wrong date format throws deprecation warning
discuss/implement backups
fix importer for large import files
split the importer processing up to allow for a dry-run option
provide more controls to user on how to merge duplicates
warnings we can add: two users are imported with same email, but different slug (slug is regenerated)
optimisation: trigger an export before import

Issue TryGhost#5422 - change db.api response to post - show details of import file - show problems with import file - remove notifications

refs TryGhost#5422 - it was only supported to add roles by id e.g. roles = [1] or by object e.g. roles = [{....}] - we use the name strategy also for tags e.g. tags = ['my-tag'] - we support this for roles as well, this makes it easier when importing related user roles (because usually roles already exists in the database and the related id's are wrong e.g. roles_users)

refs TryGhost#5422 - if the json file contains null values for created_at/updated_at - can be handled on model layer and not on the importer layer

refs TryGhost#5422 - post or tag slugs are always safe strings - if e.g. you import a null slug, we can handle this, no need to crash or to cover this on import layer

refs TryGhost#5422 - basically nothing has changed here, just the none usage of the old importer and the result of problems/warnings while importing

refs TryGhost#5422 - this is the new integration of the importer - i have used a class inheritance mechanismn to achieve an easier readability and maintenace - most of the logic bits were copied from the old importer or modified/easified - here a list: - schema validation (happens on model layer) was ignored - allow to import unknown user id's (see TryGhost#8365) - most of the duplication handling happens on model layer (we can use the power of unique fields and errors from the database) - this PR has not concentrated on clear error messages or logs (this comes tomorrow) - i've added debug messages - the import is splitted into three steps: - beforeImport --> prepares the data to import, sorts out relations (roles, tags), detects fields (for LTS) - doImport --> does the actual import - afterImport --> updates the data after successful import e.g. update all user reference fields e.g. published_by (compares the imported data with the current state of the database) --> the background why we are doing it after the import is that we don't have to write complicated logic to detect outdated user references before we know which user get's which id --> plus: we don't care about the order of inserting the data --> keep in mind: we never ever import id's, that's why user references are usually outdated --> e.g. you import a post with `published_by:2`, but user 2 get's a new ObjectId 'X', we have to update the reference

refs TryGhost#5422

refs TryGhost#5422 - simply move to the correct folder

refs TryGhost#5422 - the markdown field will be removed soon, see TryGhost#8479 - if you import images the importer crashes if markdown is null e.g. the welcome post has no markdown field anymore (results in null)

refs TryGhost/Ghost#5422

refs #5422 - we can support null titles after this PR if we want - user model: fix getAuthorRole - user model: support adding roles by name - we support this for roles as well, this makes it easier when importing related user roles (because usually roles already exists in the database and the related id's are wrong e.g. roles_users) - base model: support for null created_at or updated_at values - post or tag slugs are always safe strings - enable an import of a null slug, no need to crash or to cover this on import layer - add new DataImporter logic - uses a class inheritance mechanism to achieve an easier readability and maintenance - schema validation (happens on model layer) was ignored - allow to import unknown user id's (see #8365) - most of the duplication handling happens on model layer (we can use the power of unique fields and errors from the database) - the import is splitted into three steps: - beforeImport --> prepares the data to import, sorts out relations (roles, tags), detects fields (for LTS) - doImport --> does the actual import - afterImport --> updates the data after successful import e.g. update all user reference fields e.g. published_by (compares the imported data with the current state of the database) - import images: markdown can be null - show error message when json handler can't parse file - do not request gravatar if email is null - return problems/warnings after successful import - optimise warnings in importer - do not return warnings for role duplications, no helpful information - error handler: return context information of error - we show the affected json entries as one line in the UI - show warning for: detected duplicated tag - schema validation: fix valueMustBeBoolean translation - remove context property from json parse error

closes TryGhost#6967, refs TryGhost#5422

refs #6967, refs #5422 - imports posts tags by post id and sort order - test: fix order problem (sqlite)

refs TryGhost/Ghost#5422 - handles errors and warnings from returned from the server and improves visual display - adds a reset so that errors are cleared when leaving the labs screen - removes the unnecessary "Import failed" alert - we already show the errors on the screen, no point bugging the user even further

ErisDS · 2017-07-31T17:32:18Z

@kirrg001 is this issue up to date? You fixed so many things with the importer already, I wonder if its worth closing this and waiting to see what problems people run into in the real world. Also we have both #8756 and #8517 open - all tracking remaining issues in the importer. I think we can reduce issue noise here 😁

kirrg001 · 2017-07-31T17:54:00Z

Yeah it is up-to-date. I kept it open for future improvements - see list on top. But yeah agree, let's close it - if we want to add some more improvements from the list on top, we can still look it up here.

ErisDS added epic import / upgrade life-cycle stuff: Importing, exporting, upgrading and migrating labels Jun 11, 2015

cobbspur mentioned this issue Jun 12, 2015

Add visual indicators of problems and details of import file #5424

Closed

cobbspur added a commit to cobbspur/Ghost that referenced this issue Jun 16, 2015

Add visual indicators of problems and details of import file

05bc88a

Issue TryGhost#5422 - change db.api response to post - show details of import file - show problems with import file - remove notifications

ErisDS mentioned this issue Jun 11, 2016

Ghost Importer: Image Import and other Improvements #4605

Closed

6 tasks

ErisDS added import / upgrade life-cycle stuff: Importing, exporting, upgrading and migrating and removed import / upgrade life-cycle stuff: Importing, exporting, upgrading and migrating labels Mar 20, 2017

ErisDS assigned kirrg001 Mar 20, 2017

kirrg001 added this to the 1.0.0 Beta Ready milestone Mar 20, 2017

ErisDS modified the milestones: 1.0.0 Beta Ready, 1.0.0 Launch Apr 3, 2017

kirrg001 mentioned this issue Apr 18, 2017

Duplicate entry found. Multiple values of '' found for . #8349

Closed

kirrg001 removed this from the 1.0.0 Backlog milestone May 12, 2017

kirrg001 removed their assignment May 12, 2017

kirrg001 added the server / core Issues relating to the server or core of Ghost label May 12, 2017

kirrg001 assigned kirrg001 and cobbspur May 15, 2017

kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017

🎨 base model: if created_at or updated_at are null

d491852

refs TryGhost#5422 - if the json file contains null values for created_at/updated_at - can be handled on model layer and not on the importer layer

kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017

🎨 support safeString with null

cd375e5

refs TryGhost#5422 - post or tag slugs are always safe strings - if e.g. you import a null slug, we can handle this, no need to crash or to cover this on import layer

kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017

🎨 update tests

b35e0b7

refs TryGhost#5422 - basically nothing has changed here, just the none usage of the old importer and the result of problems/warnings while importing

kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017

🔥 remove old import folder

639ab4e

refs TryGhost#5422

kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 22, 2017

🎨 move test files

685db7f

refs TryGhost#5422 - simply move to the correct folder

kirrg001 mentioned this issue May 22, 2017

🐛 care about post tags order when importing #8481

Merged

kirrg001 mentioned this issue May 22, 2017

Importer: Improve Post and Tag de-duplication #4685

Closed

kirrg001 added a commit to kirrg001/Ghost-Admin that referenced this issue May 23, 2017

🎨 optimise importer result

4228d1e

refs TryGhost/Ghost#5422

kirrg001 added a commit to kirrg001/Ghost that referenced this issue May 23, 2017

🐛 care about post tags order when importing

09aca11

closes TryGhost#6967, refs TryGhost#5422

kirrg001 mentioned this issue May 23, 2017

😝 optimise importer warnings/errors TryGhost/Admin#711

Merged

kirrg001 added the later [triage] Things we intend to work but are not immediate priority label May 23, 2017

kirrg001 unassigned kirrg001 and cobbspur May 23, 2017

cobbspur pushed a commit that referenced this issue May 23, 2017

🐛 Ensure post tags sort order is correct when importing (#8481)

925d721

refs #6967, refs #5422 - imports posts tags by post id and sort order - test: fix order problem (sqlite)

kirrg001 closed this as completed Jul 31, 2017

kirrg001 removed the later [triage] Things we intend to work but are not immediate priority label Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ghost Importer: Current State, Problems and Future Improvements #5422

Ghost Importer: Current State, Problems and Future Improvements #5422

cobbspur commented Jun 11, 2015 •

edited by naz

Loading

ErisDS commented Jul 31, 2017

kirrg001 commented Jul 31, 2017

Ghost Importer: Current State, Problems and Future Improvements #5422

Ghost Importer: Current State, Problems and Future Improvements #5422

Comments

cobbspur commented Jun 11, 2015 • edited by naz Loading

Current State

Problems

Future improvements

Required tasks

Optional/Future tasks

ErisDS commented Jul 31, 2017

kirrg001 commented Jul 31, 2017

cobbspur commented Jun 11, 2015 •

edited by naz

Loading