Skip to content

POC PHPBB3->PHPBB3 Migration Tool w/ Webscraper ->MySql

Notifications You must be signed in to change notification settings

bevansm/migration-station

Repository files navigation

PHPBB3 to PHPBB3 Migration

A PHPBB3 to PHPBB3 migration tool heavily inspired by the python crawlers at https://www.phpbb.com/community/viewtopic.php?f=65&t=1761395. This only supports a subset of features, but does provide avenues to navigate around issues like gcaptcha logins & custom nested forums.

This will recursively migrate everything from the given base URL to the given destination forum/subforum (to ease the migration of dependant communities), and will only map over users who have posted in the given forum.

Output results can be found in the out folder. They will be in the form of: - paginated "Insert" queries for post - a user creation query, with ghost accounts and non-functional passwords - a forum structure query to create the new forum & topics - a dump json object, with no posts - a dump posts object

Note that this can produce a large amount of output data, and will generate a high volume of calls to an outbound forum.

Please treat this as a POC. It performed the migration in a way that we needed, and it provides a decent basis for a TypeScript wrapper of the phpbb api (look in clients).

Notes

Options

The Migrator class has a handful of options, many of which were for quick and easy debugging purposes. Note that forcing early execution (users, posts, topics, ect.) may result in malformed or incomplete sets of sql. Use with caution.

BBCode Parsing

BBCode parsing is hit and miss. You can play around with the values in Parser.ts to generate better bitfields for "legacy" (pbpbb 3.0.x and 3.1.x) codes (the ids should map to the values inside of the phpbb_bbcodes table), but this doesn't support the "new" parsing algorithms from phpbb 3.3.x. We're also using an out-of-the box html2bbcode parser to handle pages from locked threads; while this handles standard bbcode fine, it breaks on most other things, and currently doesn't map [list] to a [list] element (it maps to [ul] and [ol] instead).

We do, however, manage to pull out all the codes & append identifiers to tags, so re-parsing posts from the output json object shouldn't be too horrible. The best way to do an export is always, always to ask the dba/admin of your homesite if they'd run a simple sql export for you:

    Hi, [x].

    We are looking to our community to a different forum. Would you be willing to run a query for us so we can export the posts on the fourm?

    SELECT * FROM phpbb_posts WHERE forum_id IN ([AN ARRAY OF YOUR FORUM IDS HERE])

    SELECT * FROM phpbb_bbcode

    We appreciate your time,

    [z]

It will always be quicker, and less resource intensive, for them to run that query and dump the results in a bucket someplace instead of running the script. You're hitting the DB indirectly thousands of times, rather than one direct query. From there, you can rehash the bitfields & update your bbcode as needed.

If not, then a webscraper is inevitable. Hopefully this one helps, or at least provides a starting point for your use case.

migrate_user.sql

Strictly intended to map over ghost users generated by this sql. It doesn't do anything fancy. You might have to look around the phpbb/area51 boards if you have a different use case for migration.

Use

This code is somewhat specialized for our usecase, which was exporting a single subforum (with no dependents) from a phpbb forum. You can find that forum here (exported posts in the "Game Archive"). Hence, there are no guarentees that this code is bug-free. Treat it as a POC.

  1. Fill out the env file as appropriate
  2. Run this script to generate users, forum data, ect.
  3. Jam the resulting SQL commands through phpadmin
  4. Navigate to the Admin Control Panel (ACP) and sync forum stats.

About

POC PHPBB3->PHPBB3 Migration Tool w/ Webscraper ->MySql

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published