Python tool to migrate WP contents to JSON for use in harpjs
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Author: Ethan J. Eldridge

Python tool to migrate WP contents to JSON for use in harp

Current Features:

  1. Creates JSON for Posts
  2. Creates JSON for Comments
  3. Creates JSON for Nav
  4. Creates JSON for Pages
  5. Creates .md files for the Posts
  6. Creates .md files for the pages
  7. If using the PULL_TYPES, will pull down entire wp_post table and convert it into usable _data.

To Run:

Fill out the database credentials at the top of the file
You'll now have an example.jade file, and a few directories with _data.json inside.

Once you've ran the script, you'll get folders for each of the _data.json files. This makes things a bit easier to coordinate, and you can see from the example jade file how you can access the information pulled from your blog.

It's pretty heavy on I/O from all the writes, but I pulled down a sizable wordpress database within a reasonable time (less than a minute) that had 347915 rows in the postmeta, and 34617 in the posts table, so it works alright.

If you'd like to try it out:

  1. Install Wordpress
  2. Install Harp
  3. Download this script
  4. Configure it to your liking using the options below
  5. Run the script!
  6. Move the folders and files into your harp site area.

Some configuration details:

Configuration is at the top of the script, you'll need to enter your database credentials. Optionally, you can fully configure the script using the constants below:

Configuration of Script

ConstantWhat it does
MYSQL_HOST Defines the host of the database to connect to.
MYSQL_USER Defines the user to connect to the database as
MYSQL_PASS Defines the password to the database
MYSQL_DB Defines the database name connected to on the host.
WP_PREFIX The prefix to your wordpress tables, typically this is `wp_`
ONLY_PUBLISHED Only retrieve posts and pages that have been published
GENERATE_PAGES Generate a markdown file for the pages being pulled from the WP database. This will exist in the PAGES_DIR
GENERATE_POSTS Generate a markdown file for the posts being pulled from the WP database. This will exist in the BLOG_DIR
ROOT_DIR Where to generate all the files this script creates, leave empty by default for the area where the script is being ran
ENCODING The encoding to decode the content from the database in, I've defaulted it to latin to handle some annoying unicode errors
OUTPUT_ENCODING The encoding to encode the _data.json files in
STRIP_NON_ASCII strips out non-ascii characters from data being written into _data.json
PULL_TYPES Specify this to true and all post types will be pulled out of the database and _data files created for eachs, if you use this, then the *_DIR constants mean nothing.
PAGES_DIR The directory name where the pages will be stored
BLOG_DIR The directory name where the blog posts will be stored
NAV_DIR The directory where the navigation json will be stored
COMMENTS_DIR The directory where comments will be stored.
EXAMPLE_FILE The name of the file that will be generated to show some of the posts and pages.
STOP_ON_ERR Boolean value that causes errors to stop the script,


  • Taxonomy?
  • Add nav stuff to the PULL_TYPES area as well to help out with navigation
  • How does one pull in the comments to a post?
  • Use getopt to make cmd line arguments instead of constants
  • More examples!