Skip to content

Stdlib: strip_tags() — allowed-tags subset for HTML sanitization #299

@PurHur

Description

@PurHur

Problem

MiniWebApp templates (#67, #246) and comment fields need a simple way to strip HTML while keeping a whitelist of tags. Today only htmlspecialchars() is reliable in AOT (#124); apps often call strip_tags($html, '<p><br>') on Zend.

Goal

Implement strip_tags(string $str, ?string $allowed_tags = null) in VM with a documented subset (no full HTML5 parser). JIT/AOT parity optional for v1 if tracked as follow-up.

Scope

  • VM builtin in ext/standard/ (or existing string module)
  • Default: strip all tags; optional allow-list like '<p><a><br>'
  • PHPT: basic tags removed, allowed tags preserved, malformed input does not crash
  • Update docs/capabilities.md via script/capability-matrix.php
  • Register in UnsupportedRegistry until implemented (if lint hits call sites)

Acceptance criteria

docker run --rm -v "$(pwd):/compiler" -w /compiler php-compiler:22.04-dev \
  php bin/vm.php -r 'echo strip_tags("<b>x</b><i>y</i>", "<b>");'

Prints xy (or documented equivalent).

Verification (local / Docker only)

./script/ci-local.sh --filter strip_tags

No GitHub Actions required.

Dependencies

Links

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestphase-4:stdlibPhase 4 – stdlib for web apps

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions