Import & Export Updates. Content Migration and Import Overhaul for 5.6 & 5.7. #1991

Closed
aembler opened this Issue Feb 19, 2015 · 15 comments

Comments

Projects
None yet
@aembler
Member

aembler commented Feb 19, 2015

Content Import, Export & Migration

Background

Concrete5 versions 6 and 7 employ a similar system for content population: the concrete5 Content Import Format (CIF). This is an XML description of a site’s content, including pages, areas, attributes, blocks, files and more. This XML language is primarily used when installing Concrete5’s sample content, but it can also be used to add items like attributes, single pages, block types and more during a package install.

Example: Creating a page, and specifying a content block inside that page:

<concrete5-cif> <page name="Portfolio" path="/portfolio" filename="" pagetype="portfolio" template="full" user="admin" description="" package=""> <attributes> <attributekey handle="exclude_subpages_from_nav"> <value>1</value> </attributekey> </attributes> <area name="Main"> <blocks> <block type="content" name=""> <data table="btContentLocal"> <record> <content><![CDATA[Hello World!]]></content> </record> </data> </block> </blocks> </area> </page> </concrete5-cif>

This simple snippet of CIF XML specifies a page named Portfolio, found at /portfolio in the sitemap. Its page type is one with the handle portfolio, and its page template is one with the handle “full.” It will have its “exclude_subpages_from_nav” checkbox attribute set to true, and will add one content block to its Main area.

Goals

The Content Import Format was designed with the following goals in mind:

Portability: you should be able to take the XML generated by one Concrete5 site and easily import it into another site without worrying about the state of a database or a site environment.
Readability: snippets of CIF XML should be easy to understand and modify.
Limited reliance on IDs. Most items in Concrete5 have a unique numerical ID, but the CIF specifically avoids this approach, instead opting for other unique identifiers like page path, file names, handles, etc… This makes the file more portable and easier to update. What few IDs are used are generated at runtime, and have no bearing on items once they’re added to a Concrete5 site; they are simply used for relating bits of content to each other within the file itself.

Generation

Currently there is a single tool for Concrete5 CIF generation: the Sample Content Generator add-on. This add-on can be found in the 5.6 marketplace here:

http://www.concrete5.org/marketplace/addons/sample-content-generator

and in a 5.7 version on GitHub:

https://github.com/concrete5/addon_sample_content_generator/tree/5.7

This add-on is in the Concrete5 package. When a developer installs this package in their Concrete5 site, it adds a Dashboard page that gives an administrator the ability to export the entire site in the CIF format. This includes everything: the Dashboard, all attributes, all pages and stacks and all files.

Limitations

The paragraph above that describes the main features of the Sample Content Generator also sums up its glaring weakness: there are no tools to filter any of the content exported by the Sample Content Generator. The entire Dashboard is exported, even if 99% of the time the developer building this CIF XML has no need for the Dashboard. All attribute keys are exported – not just the ones added after installation. This means that any time content is exported from a Concrete5 site, a significant amount of pruning and cleanup must take place before that content can be used with a package or as the starting point for a new Concrete5 install.

Import and Export Update

Goals

  • Ability to import new content and functionality from an- XML file into an existing concrete5.7 site without breaking existing content. e.g.
    • Import a 3rd party blog into a 5.7 site.
    • Import legacy data into new page types.
  • Ability to more easily create XML segments from existing concrete5 sites for mixing and matching when creating a new concrete5.7 site with sample content. e.g.
    • Build sites with custom content elements and functionality based on a series of choices made in a installer script.
  • Ability to more easily pack up changes that touch both code and content, and apply them to a live site. * e.g.
    Launching a new searchable white papers section on a live site that can’t go down. Now you can pack up the data as XML, add the scripts, import and move everything while staying live.
    • Create a meaningful smaller stage of a huge site by dropping repetitive content elements from the XML - test with a few dozen records instead of millions.

To achieve these goals, we will change the Sample Content Generator into the following:

Content Exporter Add-On

  • The “Sample Content Generator” add-on becomes the “Content Exporter” add-on.
  • Rather than blindly exporting all items within a Concrete5 site, this add-on allows for the export of a specific portion of content. This could be segmented by type (e.g. “Export all attribute keys.”, “Export all dashboard pages”) or by section of the site (e.g. exporting all pages beneath the “/blog” page).
  • Export Interface: any items that are exportable (e.g. block types, attribute types, attribute keys) can be selected in an export interface.

Content Importer Add-On

While CIF XML files will still be used to power Starting Point packages as they are today (and thereby still used as before to create Concrete5 sample content), we will create a new add-on that can import this XML as well, with much more flexibility and reliability. The usage of the Content Importer add-on will typically be as follows:

  1. Through the dashboard, an administrator creates a new “Content Import Operation.” An ImportBatch object will be created.
  2. An administrator then uploads a content file, and specifies what kind of file it is they’re uploading (Note: by default this will be CIF XML, but this architecture lets us leave the door open for additional file types in the future, for migration from other CMSs.)
  3. All contents of the uploaded file will be transformed into PortableContentItem objects which will be stored against this ImportBatch.
  4. Upon refresh, the actions of the current ImportBatch will and its objects will be displayed to the administrator in human readable terms. For example: “Create Collection Attribute Key “Start Date” (blog_entry_start_date).” “Create Block Type “Blog Entry List” “Create Page Type “Blog Listing” “Create Page “First Post” under “/blog/”
  5. At any time an administrator can also upload a ZIP of files. These files must have the same filenames as any files found in the CIF XML files. These files will eventually be imported into the file manager and mapped to PortableContentItem objects based on their file names.
  6. Any PortableContentItem objects that are incomplete or otherwise problematic will be listed as such. This might include pages that reference block types that don’t exist, etc…
  7. At any time the administrator can deselect a PortableContentItem object by clicking on a checkbox next to it. This will persist and ensure that the object won’t be imported when an actual import takes place.
  8. At any time an administrator can choose to import the entire batch. First, they will choose whether they want to clear out the entire contents of the site and import the content into it, or whether they want to import it into the Imports section of the sitemap, which, like drafts, is a protected area in the Dashboard. 1. This will lock the batch, and insert all PortableContentItem objects into the Concrete5 queue, at which point they will be imported into the proper location.

Goals for the Content Importer Add-On and Content Import Architectural Update

Must: Update the \Concrete\Core\Backup\ContentImporter and \Concrete\Core\Backup\ContentExporter classes to use PortableContentItem objects rather than directly work with XML.

Must: Add additional OutputFormatter objects that can take PortableContentItem objects and translate them into CIF XML.

Must: Add adapters for content formats. CIF XML is just one type of adapter. These handle the creation of PortableContentItem objects from different formats.

Must: The CIF XML format must still be used for Concrete5 sample content.

Must: The Content Exporter add-on must work with 5.6 sites.

Must: The Content Exporter add-on must work with 5.6 sites.

Must: The Content Importer add-on must work with 5.7 sites.

Must: The 5.7 Content Importer add-on must import content generated from both the 5.6 Content Exporter add-on and the 5.7 Content Exporter add-on.

Must: The Content Importer add-on must provide a way to import content over an existing site or into an Imports area of the Sitemap.

Must: Both Content Exporter add-ons should provide filter controls for choosing what types of content to output.

Should: A Wordpress Content Import Adapter should be available that can translate Wordpress’s content export format (XML?) into PortableContentItem objects.

Should: The Content Exporter add-on and the Content Importer add-on should be usable as solutions for building out an entire sub-section of a single site. For example, on a development server, creating a product catalog. Export the content using the content exporter add-on, and import it on the live site in one go, and voila – you’ve just created content.

Could: An ImportBatch may allow administrators to upload more than one content file within it,
prior to running the full import.

Could: Provide a mechanism to map certain block types to other block types during import. For example, take a custom content block and map to the core Content block.

Could: Content import could be scriptable via command line or a deployment tool like rocketeer.

Could: The Content Importer add-on could provide a way to rollback a recently applied import.

Won’t: Import any Permissions.

Won’t: Import any Users.

Won’t: There will not be a 5.6 version of the Content Importer add-on.

@aembler aembler self-assigned this Feb 19, 2015

@aembler aembler added this to the Feature Release milestone Feb 19, 2015

@hissy

This comment has been minimized.

Show comment
Hide comment
@hissy

hissy Feb 19, 2015

Member

Totally this sounds nice. I'm very interested in this project.

I already made an another importer add-on works with current ContentImporter class.
https://github.com/hissy/addon_cif_importer

And here is a WordPress plugin that exports contents as concrete5 cif format.
https://github.com/hissy/wp-c5-exporter

Member

hissy commented Feb 19, 2015

Totally this sounds nice. I'm very interested in this project.

I already made an another importer add-on works with current ContentImporter class.
https://github.com/hissy/addon_cif_importer

And here is a WordPress plugin that exports contents as concrete5 cif format.
https://github.com/hissy/wp-c5-exporter

@aembler

This comment has been minimized.

Show comment
Hide comment
@aembler

aembler Feb 19, 2015

Member

Very cool! Yeah, I saw your add-on in the marketplace. Everything will ideally continue to work with the same output format from Wordpress, we're just going to make getting the content into your concrete5 site be a lot less error-prone, help you actually step through the process, give you options if something doesn't map 100% perfectly, etc...

Member

aembler commented Feb 19, 2015

Very cool! Yeah, I saw your add-on in the marketplace. Everything will ideally continue to work with the same output format from Wordpress, we're just going to make getting the content into your concrete5 site be a lot less error-prone, help you actually step through the process, give you options if something doesn't map 100% perfectly, etc...

@hissy

This comment has been minimized.

Show comment
Hide comment
@hissy

hissy Feb 20, 2015

Member

Cool.

Should: A Wordpress Content Import Adapter should be available that can translate Wordpress’s content export format (XML?) into PortableContentItem objects.

I'd like to make this adapter.

Member

hissy commented Feb 20, 2015

Cool.

Should: A Wordpress Content Import Adapter should be available that can translate Wordpress’s content export format (XML?) into PortableContentItem objects.

I'd like to make this adapter.

@Remo

This comment has been minimized.

Show comment
Hide comment
@Remo

Remo Feb 24, 2015

Contributor

What's the reason for not exporting/importing users? I thought the passwords are compatible.. We have quite a few sites with more than 100 users and one with more than 100k. At some point we must have a way to migrate them.

Settings the permissions again is okay for most of our projects. Takes quite some time, but cleaning them up should improve things too.

Contributor

Remo commented Feb 24, 2015

What's the reason for not exporting/importing users? I thought the passwords are compatible.. We have quite a few sites with more than 100 users and one with more than 100k. At some point we must have a way to migrate them.

Settings the permissions again is okay for most of our projects. Takes quite some time, but cleaning them up should improve things too.

@joe-meyer

This comment has been minimized.

Show comment
Hide comment
@joe-meyer

joe-meyer Feb 24, 2015

Contributor

I'm also curious about the users not being imported. That curiosity also extends to groups. I would have thought these two would have been trivial to import but maybe i'm missing some technical limitation?

Contributor

joe-meyer commented Feb 24, 2015

I'm also curious about the users not being imported. That curiosity also extends to groups. I would have thought these two would have been trivial to import but maybe i'm missing some technical limitation?

@aembler

This comment has been minimized.

Show comment
Hide comment
@aembler

aembler Feb 25, 2015

Member

It just seems like a separate problem that can be solved any number of ways. Many times when migrating content you're not going to want to bring over permissions exactly as they are, and I'm not sure you want XML files hanging around with password salts in them.

Member

aembler commented Feb 25, 2015

It just seems like a separate problem that can be solved any number of ways. Many times when migrating content you're not going to want to bring over permissions exactly as they are, and I'm not sure you want XML files hanging around with password salts in them.

@aembler

This comment has been minimized.

Show comment
Hide comment
@aembler

aembler Feb 25, 2015

Member

I think there's a need there, definitely – but I think it could be solved by its own custom add-on.

Member

aembler commented Feb 25, 2015

I think there's a need there, definitely – but I think it could be solved by its own custom add-on.

@Remo

This comment has been minimized.

Show comment
Hide comment
@Remo

Remo Feb 25, 2015

Contributor

Not sure I see why this should be a different problem, but I don't care whether it's an additional add-on or not, but there has to be a way to migrate users and groups. The point with password salts is definitely valid though, but no matter if it's a second add-on or not, I'll have those things in a file.. Adding a warning should imho be good enough - who ever exports / imports content should be able to handle things like that.

Contributor

Remo commented Feb 25, 2015

Not sure I see why this should be a different problem, but I don't care whether it's an additional add-on or not, but there has to be a way to migrate users and groups. The point with password salts is definitely valid though, but no matter if it's a second add-on or not, I'll have those things in a file.. Adding a warning should imho be good enough - who ever exports / imports content should be able to handle things like that.

@rosie607

This comment has been minimized.

Show comment
Hide comment
@rosie607

rosie607 Mar 20, 2015

I also absolutely need the ability to import users. I have several sites with far too many users to add manually. I can't migrate these sites without this functionality. Just wondering is anyone working on this add-on currently?

I also absolutely need the ability to import users. I have several sites with far too many users to add manually. I can't migrate these sites without this functionality. Just wondering is anyone working on this add-on currently?

@aembler aembler modified the milestones: Future, Feature Release Mar 30, 2015

@sday

This comment has been minimized.

Show comment
Hide comment
@sday

sday Apr 5, 2015

I also need a user migration tool. I can live with manual content migration, but with several thousand users, I absolutely cannot ask them to re-signup. Can I just match up the previous password salt and import users with a script? I don't want to go through the trouble of doing this if there is a gotcha.

Thanks
-Steve

sday commented Apr 5, 2015

I also need a user migration tool. I can live with manual content migration, but with several thousand users, I absolutely cannot ask them to re-signup. Can I just match up the previous password salt and import users with a script? I don't want to go through the trouble of doing this if there is a gotcha.

Thanks
-Steve

@RazorCommerce

This comment has been minimized.

Show comment
Hide comment
@RazorCommerce

RazorCommerce May 2, 2015

We're currently building importers for Razor Commerce right now and it's our goal to have a systematic way to structure both import and export. Couple of ideas to pass along that have come from what we've built so far:

  1. Migrate means data in/out. Use the term Migrate or Migration and make that the namespace. While import/export are mostly separate in practice, there are opportunities to do certain operations on both import/export and to create certain methods that make data consistent or share tests. Think in terms of "use Migration" and being able to do something like:
$migrate = new Migration();
$migrate->setType('import', 'Page');
$testResult = $migrate->test( $xml ); // xml is from file or string
if( $testResult->pass() ) {
  $migrate->run();      
}
  1. Migrate Types. We've setup a base class Migrate\Type\Type and then for each importer we have it extends the base Type as in Migrate\Type\ProductImport. We only aim to import 1 thing in one importer. Same with export. Rather than doing loops over nested stuff the way the C5 installer does today. Not that you can't build a more complex importer, or use the single-object importers as a base and run them within a sequential import. The point is for testing purposes, the more you break it down to small imports the easier it is to handle testing and avoid fatals.

  2. Migrate Test Pattern. For testing the pattern that's emerged in Razor is that we have Migrate\Test\Test which is the main testing object. When you make a new Migrate object it loads a MigrateTest object. There is only 1 test object per migrate, within it there are multiple MigrateTestConditions. These conditions are defined in the MigrateType. So when you build a MigrateType you define method conditions() where you setup the MigrateTestConditions() and return the array. When migrations are tested, each condition is run the MigrateTest. What is returned is the MigrateTestResult object. This gives us a consistent return result that we then display to the user to say either the test failed, or it passed, or it passed with warnings. We can then also display the count of objects to import or flag importables that have issues. The way we handle pass/fail is each condition only returns true or false. The condition is set to be required or optional, if the condition fails we check is it required using $mtc->required() and if it is then we fail the entire import. If we get a fail on an optional condition, the import can continue but we output the warning to the user.

  3. Execution/Result. Once tests are run we do the actual import or export. The method that processes we define in the MigrateType. Once again this gets us away from having to make anything other than the MigrateType for each migration. As with MigrateTest the execute method returns a standardized MigrateResult object which is then output to user.

  4. Format Consistency. A major challenge in most systems that offer imports/exports is that you their exports don't import. Meaning you can fire an export one minute, and have it fail as an import the next. That's true right now for C5.6, export objects and then try importing them and they will fail more often than not with fatal. Why? Often because formats are different, date export format is different from the required import. Even the structure of the XML is sometimes slightly different. Dependencies are not met etc.


It would be great to see a system where you can export your data, then turn right around and import it or do the reverse. Full in/out capacity. That is what we are striving for with Razor Commerce. Of course to achieve it starts with the focus on testing. It might also require some options also, as in how to handle updating existing records on import.

Regarding User imports, we plan to build a Customer Importer shortly it's on our list of importers and it will use the Migrate system described here within Razor Commerce. Customers are essentially just C5 users so I think it will be possible to install Razor, use the Customer install and then uninstall. Somebody inclined to pull our Migrate section out and create a stand-alone User importer out of it would be welcome to that as well.

We're currently building importers for Razor Commerce right now and it's our goal to have a systematic way to structure both import and export. Couple of ideas to pass along that have come from what we've built so far:

  1. Migrate means data in/out. Use the term Migrate or Migration and make that the namespace. While import/export are mostly separate in practice, there are opportunities to do certain operations on both import/export and to create certain methods that make data consistent or share tests. Think in terms of "use Migration" and being able to do something like:
$migrate = new Migration();
$migrate->setType('import', 'Page');
$testResult = $migrate->test( $xml ); // xml is from file or string
if( $testResult->pass() ) {
  $migrate->run();      
}
  1. Migrate Types. We've setup a base class Migrate\Type\Type and then for each importer we have it extends the base Type as in Migrate\Type\ProductImport. We only aim to import 1 thing in one importer. Same with export. Rather than doing loops over nested stuff the way the C5 installer does today. Not that you can't build a more complex importer, or use the single-object importers as a base and run them within a sequential import. The point is for testing purposes, the more you break it down to small imports the easier it is to handle testing and avoid fatals.

  2. Migrate Test Pattern. For testing the pattern that's emerged in Razor is that we have Migrate\Test\Test which is the main testing object. When you make a new Migrate object it loads a MigrateTest object. There is only 1 test object per migrate, within it there are multiple MigrateTestConditions. These conditions are defined in the MigrateType. So when you build a MigrateType you define method conditions() where you setup the MigrateTestConditions() and return the array. When migrations are tested, each condition is run the MigrateTest. What is returned is the MigrateTestResult object. This gives us a consistent return result that we then display to the user to say either the test failed, or it passed, or it passed with warnings. We can then also display the count of objects to import or flag importables that have issues. The way we handle pass/fail is each condition only returns true or false. The condition is set to be required or optional, if the condition fails we check is it required using $mtc->required() and if it is then we fail the entire import. If we get a fail on an optional condition, the import can continue but we output the warning to the user.

  3. Execution/Result. Once tests are run we do the actual import or export. The method that processes we define in the MigrateType. Once again this gets us away from having to make anything other than the MigrateType for each migration. As with MigrateTest the execute method returns a standardized MigrateResult object which is then output to user.

  4. Format Consistency. A major challenge in most systems that offer imports/exports is that you their exports don't import. Meaning you can fire an export one minute, and have it fail as an import the next. That's true right now for C5.6, export objects and then try importing them and they will fail more often than not with fatal. Why? Often because formats are different, date export format is different from the required import. Even the structure of the XML is sometimes slightly different. Dependencies are not met etc.


It would be great to see a system where you can export your data, then turn right around and import it or do the reverse. Full in/out capacity. That is what we are striving for with Razor Commerce. Of course to achieve it starts with the focus on testing. It might also require some options also, as in how to handle updating existing records on import.

Regarding User imports, we plan to build a Customer Importer shortly it's on our list of importers and it will use the Migrate system described here within Razor Commerce. Customers are essentially just C5 users so I think it will be possible to install Razor, use the Customer install and then uninstall. Somebody inclined to pull our Migrate section out and create a stand-alone User importer out of it would be welcome to that as well.

@tduncandesign

This comment has been minimized.

Show comment
Hide comment
@tduncandesign

tduncandesign Jun 25, 2015

How would you change this xml to make the following content populate an existing composer form content block named Description? Simply adding the name attribute on doesn't do it.

<block type="content" name=""> <data table="btContentLocal"> <record> <content><![CDATA[Hello World!]]></content> </record> </data> </block>

How would you change this xml to make the following content populate an existing composer form content block named Description? Simply adding the name attribute on doesn't do it.

<block type="content" name=""> <data table="btContentLocal"> <record> <content><![CDATA[Hello World!]]></content> </record> </data> </block>

@2keepConcrete5

This comment has been minimized.

Show comment
Hide comment
@2keepConcrete5

2keepConcrete5 Jul 4, 2015

Happy Independence Day for the U.S. citizens here -

While I applaud the efforts toward these high end goals, I guess I would recommend taking the project goals down a notch and focus on developing a map that enables the translation of the C5.6 exported file to the C5.7.3 import file. With a simple, yet complete Excel map for this scenario, we can then create a script to translate the import file into the expected format for input with hissy's addon_cif_importer. Does anyone have the variable specifications for each of these? I suppose there must be a way for me to figure this out, but if someone's already got that information, it would greatly speed things up for me.

Currently I am of the mindset that I NEED to upgrade my concrete5 to 7.4 as soon as possible. I HAVE developed a process to do so, but I need to create this map first. I am definitely open to suggestions.

Thanks to all for the consideration,

Rob Smith
TeachingSupportDesk.com

Happy Independence Day for the U.S. citizens here -

While I applaud the efforts toward these high end goals, I guess I would recommend taking the project goals down a notch and focus on developing a map that enables the translation of the C5.6 exported file to the C5.7.3 import file. With a simple, yet complete Excel map for this scenario, we can then create a script to translate the import file into the expected format for input with hissy's addon_cif_importer. Does anyone have the variable specifications for each of these? I suppose there must be a way for me to figure this out, but if someone's already got that information, it would greatly speed things up for me.

Currently I am of the mindset that I NEED to upgrade my concrete5 to 7.4 as soon as possible. I HAVE developed a process to do so, but I need to create this map first. I am definitely open to suggestions.

Thanks to all for the consideration,

Rob Smith
TeachingSupportDesk.com

@idlenexusgaming

This comment has been minimized.

Show comment
Hide comment
@idlenexusgaming

idlenexusgaming Oct 11, 2015

What is the status on an exporter replacement for Sample Content Generator? My customers are getting antsy and I can't justify the cost/time for pruning xml files.

What is the status on an exporter replacement for Sample Content Generator? My customers are getting antsy and I can't justify the cost/time for pruning xml files.

@mlocati

This comment has been minimized.

Show comment
Hide comment
@mlocati

mlocati Oct 11, 2015

Collaborator

I think that @aembler is working on it: https://github.com/concrete5/migration-tool

Collaborator

mlocati commented Oct 11, 2015

I think that @aembler is working on it: https://github.com/concrete5/migration-tool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment