Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MXParser tokenization fails when PI is before first tag #7

Closed
belingueres opened this issue Mar 29, 2020 · 3 comments
Closed

MXParser tokenization fails when PI is before first tag #7

belingueres opened this issue Mar 29, 2020 · 3 comments

Comments

@belingueres
Copy link
Contributor

Consider these valid XML documents:

<?a?>
<test>nnn</test>

and

<?xml version="1.0" encoding="UTF-8"?>
<?a?>
<test>nnn</test>

Those tests fail then parsing the PI, returning instead START_DOCUMENT.


   @Test
    public void testProcessingInstructionTokenizeBeforeFirstTag()
        throws Exception
    {
        String input = "<?a?><test>nnn</test>";

        MXParser parser = new MXParser();
        parser.setInput( new StringReader( input ) );

        assertEquals( XmlPullParser.PROCESSING_INSTRUCTION, parser.nextToken() );
        assertEquals( XmlPullParser.START_TAG, parser.nextToken() );
        assertEquals( XmlPullParser.TEXT, parser.nextToken() );
        assertEquals( XmlPullParser.END_TAG, parser.nextToken() );
    }

    @Test
    public void testProcessingInstructionTokenizeAfterXMLDeclAndBeforeFirstTag()
        throws Exception
    {
        String input = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><?a?><test>nnn</test>";

        MXParser parser = new MXParser();
        parser.setInput( new StringReader( input ) );

        assertEquals( XmlPullParser.PROCESSING_INSTRUCTION, parser.nextToken() );
        assertEquals( XmlPullParser.PROCESSING_INSTRUCTION, parser.nextToken() );
        assertEquals( XmlPullParser.START_TAG, parser.nextToken() );
        assertEquals( XmlPullParser.TEXT, parser.nextToken() );
        assertEquals( XmlPullParser.END_TAG, parser.nextToken() );
    }

@belingueres
Copy link
Contributor Author

I think the problem is the parsePI() method: it returns false when parsing the xml declaration (<?xml ...?>), which sets the event to PROCESSING_INSTRUCTION, but returns true when encounters another PI, which set the event to START_DOCUMENT. The logic should be exactly the inverse.
However this breaks several tests (and surely several apps).
A middle point may be to always return false from parsePI() so that always return PROCESSING_INSTRUCTION? WDYT?

@michael-o
Copy link
Member

Just tested, tests still fail.

@gnodet gnodet transferred this issue from codehaus-plexus/plexus-utils Apr 19, 2023
@gnodet
Copy link
Member

gnodet commented May 10, 2023

@belingueres @michael-o I also come up with the same fix. Also, according to the javadoc for START_DOCUMENT, this event should only be returned before the parser actually parses anything, which goes in the same way.
The PR does that and fixes both tests.

@gnodet gnodet closed this as completed in 792f947 May 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants