-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ghost Importer: Current State, Problems and Future Improvements #5422
Comments
Issue TryGhost#5422 - change db.api response to post - show details of import file - show problems with import file - remove notifications
refs TryGhost#5422 - it was only supported to add roles by id e.g. roles = [1] or by object e.g. roles = [{....}] - we use the name strategy also for tags e.g. tags = ['my-tag'] - we support this for roles as well, this makes it easier when importing related user roles (because usually roles already exists in the database and the related id's are wrong e.g. roles_users)
refs TryGhost#5422 - if the json file contains null values for created_at/updated_at - can be handled on model layer and not on the importer layer
refs TryGhost#5422 - post or tag slugs are always safe strings - if e.g. you import a null slug, we can handle this, no need to crash or to cover this on import layer
refs TryGhost#5422 - basically nothing has changed here, just the none usage of the old importer and the result of problems/warnings while importing
refs TryGhost#5422 - this is the new integration of the importer - i have used a class inheritance mechanismn to achieve an easier readability and maintenace - most of the logic bits were copied from the old importer or modified/easified - here a list: - schema validation (happens on model layer) was ignored - allow to import unknown user id's (see TryGhost#8365) - most of the duplication handling happens on model layer (we can use the power of unique fields and errors from the database) - this PR has not concentrated on clear error messages or logs (this comes tomorrow) - i've added debug messages - the import is splitted into three steps: - beforeImport --> prepares the data to import, sorts out relations (roles, tags), detects fields (for LTS) - doImport --> does the actual import - afterImport --> updates the data after successful import e.g. update all user reference fields e.g. published_by (compares the imported data with the current state of the database) --> the background why we are doing it after the import is that we don't have to write complicated logic to detect outdated user references before we know which user get's which id --> plus: we don't care about the order of inserting the data --> keep in mind: we never ever import id's, that's why user references are usually outdated --> e.g. you import a post with `published_by:2`, but user 2 get's a new ObjectId 'X', we have to update the reference
refs TryGhost#5422 - simply move to the correct folder
refs TryGhost#5422 - the markdown field will be removed soon, see TryGhost#8479 - if you import images the importer crashes if markdown is null e.g. the welcome post has no markdown field anymore (results in null)
refs #5422 - we can support null titles after this PR if we want - user model: fix getAuthorRole - user model: support adding roles by name - we support this for roles as well, this makes it easier when importing related user roles (because usually roles already exists in the database and the related id's are wrong e.g. roles_users) - base model: support for null created_at or updated_at values - post or tag slugs are always safe strings - enable an import of a null slug, no need to crash or to cover this on import layer - add new DataImporter logic - uses a class inheritance mechanism to achieve an easier readability and maintenance - schema validation (happens on model layer) was ignored - allow to import unknown user id's (see #8365) - most of the duplication handling happens on model layer (we can use the power of unique fields and errors from the database) - the import is splitted into three steps: - beforeImport --> prepares the data to import, sorts out relations (roles, tags), detects fields (for LTS) - doImport --> does the actual import - afterImport --> updates the data after successful import e.g. update all user reference fields e.g. published_by (compares the imported data with the current state of the database) - import images: markdown can be null - show error message when json handler can't parse file - do not request gravatar if email is null - return problems/warnings after successful import - optimise warnings in importer - do not return warnings for role duplications, no helpful information - error handler: return context information of error - we show the affected json entries as one line in the UI - show warning for: detected duplicated tag - schema validation: fix valueMustBeBoolean translation - remove context property from json parse error
refs TryGhost/Ghost#5422 - handles errors and warnings from returned from the server and improves visual display - adds a reset so that errors are cleared when leaving the labs screen - removes the unnecessary "Import failed" alert - we already show the errors on the screen, no point bugging the user even further
@kirrg001 is this issue up to date? You fixed so many things with the importer already, I wonder if its worth closing this and waiting to see what problems people run into in the real world. Also we have both #8756 and #8517 open - all tracking remaining issues in the importer. I think we can reduce issue noise here 😁 |
Yeah it is up-to-date. I kept it open for future improvements - see list on top. But yeah agree, let's close it - if we want to add some more improvements from the list on top, we can still look it up here. |
This Issue is to pick up where Issue #4605 left off and consolidate the issues surrounding the importer into one place. It is also to generate a conversation on what people would like/expect to see from the importer.
Current State
As it stands the Importer can handle zip files containing images, markdown files and the official JSON formatted file but not raw JSON. It should probably be opened up to take raw JSON data as well.
Problems
There is some feedback at the moment if the import fails such as a post with an unknown user_id but there are other cases where it simply says the import has failed with no other useful information so we need to investigate all possible causes of failure and display more information if possible.
There is currently no visual feedback of potential problems. We check the import file for posts and tags that have a matching slug, title or name but we do not display this information or compare it with the current posts and tags in the database which leads to duplications. This can be found as well in Issue #4685
There is no visual feedback of what the import file contained (I am about to make a PR for this to get the ball rolling - it is fairly rough and ready at the moment but better no information at all)
There are comments in the codebase about implementing a Dry-run. i.e. testing the import file, comparing with the current database then providing feedback to the user prior to actually performing the import. We could decide to complete the whole import process if no problems are found or have a check box to let the user see details of the import first either way such as number of posts etc.
The feedback could come with options on how to merge the data - e.g. for a duplicate post - the user could either:
The UI needs some love as well. I was playing around with moving the importer into a modal. This is okay but I am not sure it is really required, perhaps the feedback would be nicer in a modal rather than the labs page. There was some discussion on the UI in #3927
Large import files can really mess with Ghost - presumably the import times out somewhere so this needs further investigation and either files have to be intelligently chopped up/chunked somehow.
There was also discussion at some point about creating an export or backup (more like the migrations backup) when importing - this could be automatic or provided as an option to the user and this is something I think is necessary for the delete content button. A Backup is probably better than export (maybe choose a limit on backups in the codebase or through the admin-UI). Discussion on this would be great - such as whether people agree or disagree and preferred approach if any.
Future improvements
Required tasks
Optional/Future tasks
The text was updated successfully, but these errors were encountered: