Skip to content

8020admin/wordpress-html-cleanup

Repository files navigation

WordPress HTML Cleanup

A Next.js application that cleans WordPress-exported content for Webflow compatibility using Airtable as the data source. This tool automates the process of transforming WordPress API JSON format, HTML entities, and WordPress-specific markup into Webflow-compatible content.

Features

  • WordPress Content Cleanup: Removes JSON wrappers, decodes HTML entities, and cleans WordPress-specific markup
  • Airtable Integration: Connects to Airtable to fetch and update content records
  • Progress Tracking: Real-time progress monitoring with detailed statistics
  • Rate Limiting: Respects Airtable API rate limits with intelligent batching
  • Quality Assurance: Provides QA notes and validation for cleaned content
  • Webflow Cloud Ready: Optimized for deployment on Webflow Cloud

What Gets Cleaned

  1. Character Encoding: Converts curly quotes to straight quotes, normalizes escape sequences
  2. WordPress JSON Wrappers: Removes {"rendered":"content"} formatting
  3. HTML Entities: Decodes &, <, >, etc. to readable characters
  4. HTML Tags: Removes disallowed tags (div, section, etc.) and wraps embed tags
  5. Image URLs: Fixes relative image paths to absolute URLs
  6. Code Blocks: Cleans syntax highlighting bloat from WordPress code blocks
  7. Figure Tags: Updates to Webflow-compatible classes
  8. HTML Spacing: Normalizes whitespace and removes unnecessary spacing

Prerequisites

  • Node.js 18+
  • Airtable API key
  • Airtable base with content to clean

Installation

  1. Clone the repository:
git clone https://github.com/8020admin/wordpress-html-cleanup.git
cd wordpress-html-cleanup
  1. Install dependencies:
npm install
  1. Run the development server:
npm run dev
  1. Open http://localhost:3000 in your browser.

Usage

1. API Key Setup

2. Configuration

  • Select your Airtable base and table
  • Choose the source field containing WordPress content
  • Select the output field for cleaned content
  • Optionally choose a notes field for QA information

3. Processing

  • The app will fetch all records from the selected table
  • Clean each record's content according to the specification
  • Update the records in batches with rate limiting
  • Show real-time progress and statistics

4. Results

  • View a comprehensive summary of the cleanup process
  • See which records had issues and what was cleaned
  • Start a new cleanup or review the results

Deployment

Webflow Cloud

This app is optimized for Webflow Cloud deployment:

  1. Build the app:
npm run build
  1. Deploy to Webflow Cloud:

    • Connect your GitHub repository to Webflow Cloud
    • Set environment variables in Webflow Cloud dashboard
    • Deploy using the Webflow CLI or dashboard
  2. Environment Variables:

    • Set any required environment variables in Webflow Cloud
    • API keys should be stored securely in Webflow Cloud variables

Other Platforms

The app can also be deployed to:

  • Vercel
  • Netlify
  • AWS
  • Any Node.js hosting platform

API Endpoints

/api/airtable/bases

  • GET: Fetch available Airtable bases
  • Headers: x-airtable-api-key

/api/airtable/tables

  • GET: Fetch tables for a specific base
  • Headers: x-airtable-api-key
  • Query: baseId

/api/airtable/fields

  • GET: Fetch fields for a specific table
  • Headers: x-airtable-api-key
  • Query: baseId, tableId

/api/cleanup

  • POST: Start the cleanup process
  • Headers: x-airtable-api-key, Content-Type: application/json
  • Body: Setup configuration with base, table, and field selections

Configuration

The cleanup process can be customized through the configuration object:

interface CleanupConfig {
  imageDomain?: string;           // Domain for fixing image URLs
  allowedTags?: Set<string>;      // HTML tags to preserve
  disallowedTags?: string[];      // HTML tags to remove
  embedTags?: string[];           // Tags to wrap in embed containers
  preserveSchema?: boolean;       // Whether to preserve schema fields
  strictMode?: boolean;           // Enable strict validation
  maxContentLength?: number;      // Maximum content length to process
  enableQANotes?: boolean;        // Generate QA notes
}

Rate Limiting

The app implements intelligent rate limiting to respect Airtable's API limits:

  • Processes records in batches of 10
  • 200ms delay between batches
  • Automatic retry logic for failed requests
  • Progress tracking with error handling

Error Handling

  • Graceful degradation when cleanup fails
  • Detailed error messages and logging
  • Retry mechanisms for transient failures
  • Validation of input data and API responses

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

MIT License - see LICENSE file for details

Support

For issues and questions:

Acknowledgments

  • Based on the comprehensive WordPress to Webflow cleanup specification
  • Built with Next.js, TypeScript, and Tailwind CSS
  • Uses Airtable API for data management
  • Optimized for Webflow Cloud deployment

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors