Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XmlSerializer configurable amount of spaces when $prettyPrint=true #303

Closed
llaville opened this issue May 31, 2023 · 6 comments
Closed

XmlSerializer configurable amount of spaces when $prettyPrint=true #303

llaville opened this issue May 31, 2023 · 6 comments
Labels

Comments

@llaville
Copy link

Hello,

Because I used 4 spaces indentation (for my XML documents) rather than default 2 spaces provided by the DOMDocument, I wanted a solution to configure the XMLSerializer.

As we cannot alter indendation via DOMDocument

I propose two alternatives :

  1. the most simple and easy way (through a regex like suggested https://stackoverflow.com/questions/3325488/php-increase-indentation-of-domdocument-savexml)

  2. Change DOMDocument by XMLWriter that is able to configure it (for example : https://github.com/phar-io/manifest/blob/2.0.3/src/ManifestSerializer.php#L43-L44)

I've already tested solution 1 and it works as expected :

Patching:
https://github.com/CycloneDX/cyclonedx-php-library/blob/v2.1.2/src/Core/Serialization/XmlSerializer.php#L78

        $xml = $document->saveXML();
        $xml = \preg_replace('/^[ ]+(?=<)/m','$0$0', $xml);

Of course adding an option argument on class constructor to make it configurable at runtime.

What do you think ?

@jkowalleck
Copy link
Member

jkowalleck commented May 31, 2023

did you consider configuring your project so that this one bom.xml file uses 2 spaces, regardless of the other configs?
your editor might even support EditorConfig which makes this possible with ease.

@jkowalleck jkowalleck changed the title [Feature] The XmlSerializer should be able to configure it XmlSerializer configurable amount of spaces when $prettyPrint=true May 31, 2023
@jkowalleck
Copy link
Member

jkowalleck commented May 31, 2023

Relevant API:

/**
* Serialize a {@see realNormalize() normalized} version of a {@see \CycloneDX\Core\Models\Bom}.
*
* @param TNormalizedBom $normalizedBom a version of the Bom that was normalized for serialization
*
* @throws Exception
*
* @SuppressWarnings(PHPMD.BooleanArgumentFlag)
*/
abstract protected function realSerialize(/* TNormalizedBom */ $normalizedBom, ?bool $prettyPrint): string;
// no typehint for `$normalizedBom` parameter, as it is not actually `mixed` but a templated type.

@jkowalleck
Copy link
Member

jkowalleck commented May 31, 2023

i think the regex solution is potentially dangerous. it could add unwanted spaces in here:

<!--- [...] -->
<description>
  <![CDATA[the text in here is free to 
  write whatever they want, without escaping...
including XML special chars like the following
  <-- this line is possible and starts with an `<`

Depending on the definition of `description` in an XSD adding spaces here can change the content.
`xs:normalizedString` VS `xs:string` - see https://www.w3schools.com/XML/schema_dtypes_string.asp

  ]]>
</description>
<!--- [...] -->

I did not use XMLWriter when I implemented XML serialization, because I wanted typed return values for the normalizer. Therefore, I did not want to create a wrapper-Class just to represent the capabilities of an XML element, which already existed in PHP as a class, and can be serialized via libXML already.


I actually considered having a configurable amount of spaces when serializing.
The goal was just not worth the effort and trade-offs.
Furthermore, the amount of spaces does not change the quality of the document, so I skipped this feature.
I thought: if anybody wanted an alternative amount of spaces, then they could use external JSON/XML document reformatters that are feature-complete and work perfect in their domain.

@llaville
Copy link
Author

llaville commented May 31, 2023

I propose to add (as last alternative) a tidy optional process.

Here are an example of my implementation :

<?php

namespace Bartlett\Manifests\Helper;

use CycloneDX\Core\Serialization\XmlSerializer;
use CycloneDX\Core\Serialization\DOM\NormalizerFactory;

class ManifestSerializer extends XmlSerializer
{
    protected readonly array $tidyConfig;

    public function __construct(
        NormalizerFactory $normalizerFactory,
        private string $xmlVersion = '1.0',
        private string $xmlEncoding = 'UTF-8',
        protected bool $withTidyRepair = true
    ) {
        parent::__construct($normalizerFactory, $xmlVersion, $xmlEncoding);

        $this->setTidyConfig(
            array(
                'input-xml' => true,
                'indent-attributes' => false,
                'wrap' => false,
                'indent-cdata' => true,
                'indent' => true,
                'indent-spaces' => 4
            )
        );
    }

    public function setTidyConfig(array $config)
    {
        $this->tidyConfig = $config;
    }

    protected function realSerialize(/* TNormalizedBom */ $normalizedBom, ?bool $prettyPrint): string
    {
        $document = new \DOMDocument($this->xmlVersion, $this->xmlEncoding);
        $document->appendChild(
            $document->importNode(
                $normalizedBom,
                true
            )
        );

        if (null !== $prettyPrint) {
            $document->formatOutput = $prettyPrint;
        }

        // option LIBXML_NOEMPTYTAG might lead to errors in consumers, do not use it.
        $xml = $document->saveXML();
        \assert(false !== $xml);


        if (!$this->withTidyRepair) {
            return $xml;
        }

        $clean = \tidy::repairString(
            $xml,
            $this->tidyConfig
        );

        if (\is_string($clean)) {
            return $clean;
        }

        // fallback to original version
        return $xml;
    }
}

And (example) results without tidy repair :

<?xml version="1.0" encoding="UTF-8"?>
<bom xmlns="http://cyclonedx.org/schema/bom/1.4" version="1" serialNumber="urn:uuid:4ee86da4-a5ef-4654-be46-0481c8f9080d">
  <metadata>
    <timestamp><![CDATA[2023-05-31T14:22:23Z]]></timestamp>
    <tools>
      <tool>
        <vendor><![CDATA[bartlett]]></vendor>
        <name><![CDATA[manifests]]></name>
        <version><![CDATA[dev-master]]></version>
      </tool>
    </tools>
  </metadata>
  <components>
    <component type="library" bom-ref="pkg:composer/clue/graph-composer@v1.1.0">
      <group><![CDATA[clue]]></group>
      <name><![CDATA[graph-composer]]></name>
      <version><![CDATA[v1.1.0]]></version>
      <purl><![CDATA[pkg:composer/clue/graph-composer@v1.1.0]]></purl>
      <properties>
        <property name="cdx:composer:package:sourceReference"><![CDATA[eff70fe2af7704b15cf675fcad663abe42034153]]></property>
        <property name="cdx:composer:package:distReference"><![CDATA[eff70fe2af7704b15cf675fcad663abe42034153]]></property>
        <property name="cdx:composer:package:isDevRequirement"><![CDATA[false]]></property>
      </properties>
    </component>
  </components>
  <dependencies>
    <dependency ref="pkg:composer/clue/graph-composer@v1.1.0"/>
  </dependencies>
</bom>

And (example) results with tidy repair :

<?xml version="1.0" encoding="utf-8"?>
<bom xmlns="http://cyclonedx.org/schema/bom/1.4" version="1" serialNumber="urn:uuid:39b92711-c69b-448f-9d51-f68135735507">
    <metadata>
        <timestamp>
            <![CDATA[2023-05-31T14:15:47Z]]>
        </timestamp>
        <tools>
            <tool>
                <vendor>
                    <![CDATA[bartlett]]>
                </vendor>
                <name>
                    <![CDATA[manifests]]>
                </name>
                <version>
                    <![CDATA[dev-master]]>
                </version>
            </tool>
        </tools>
    </metadata>
    <components>
        <component type="library" bom-ref="pkg:composer/clue/graph-composer@v1.1.0">
            <group>
                <![CDATA[clue]]>
            </group>
            <name>
                <![CDATA[graph-composer]]>
            </name>
            <version>
                <![CDATA[v1.1.0]]>
            </version>
            <purl>
                <![CDATA[pkg:composer/clue/graph-composer@v1.1.0]]>
            </purl>
            <properties>
                <property name="cdx:composer:package:sourceReference">
                    <![CDATA[eff70fe2af7704b15cf675fcad663abe42034153]]>
                </property>
                <property name="cdx:composer:package:distReference">
                    <![CDATA[eff70fe2af7704b15cf675fcad663abe42034153]]>
                </property>
                <property name="cdx:composer:package:isDevRequirement">
                    <![CDATA[false]]>
                </property>
            </properties>
        </component>
    </components>
    <dependencies>
        <dependency ref="pkg:composer/clue/graph-composer@v1.1.0" />
    </dependencies>
</bom>

@llaville
Copy link
Author

BTW, when class is not marked as final, I'd like to have protected visibility to properties. That will avoid inheritance issue !

@jkowalleck
Copy link
Member

tidy is not part of PHP core.
Introducing this as a dependency would be a breaking change.

@CycloneDX CycloneDX locked and limited conversation to collaborators May 31, 2023
@jkowalleck jkowalleck converted this issue into discussion #304 May 31, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

2 participants