Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

xread XError 2 when parsing XML declaration #4

Closed
robstewart57 opened this Issue · 3 comments

2 participants

@robstewart57

How should one use Text.XML.HXT.Parser.XmlParsec.xread when reading in a String which contains XML declaration information. e.g. this does not parse:

<?xml version="1.0" encoding="ISO-8859-1"?>

<shiporder orderid="889923"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="shiporder.xsd">
  <orderperson>John Smith</orderperson>
  <shipto>
    <name>Ola Nordmann</name>
    <address>Langgt 23</address>
    <city>4000 Stavanger</city>
    <country>Norway</country>
  </shipto>
</shiporder>

Here's my code:

main = do
  str <- readFile "xml_test.xml"
  let x = xread str
  putStrLn (show x

Error:

[NTree (XError 2 "\"string: \"<?xml version=\\\"1.0\\\" encoding=\\\"ISO-8859-1...\"\" (line 1, column 6):\nunexpected xml\nexpecting legal XML name character\n") []]

@UweSchmidt
Owner
@UweSchmidt UweSchmidt closed this
@robstewart57

Thanks. I'm not familiar with how to compose arrows. To ask another way, here is some Haskell code that demonstrates my problem. Executing test1 (which tries to parse the <?xml .. ?> processing instruction) fails. The test2 function succeeds, which parses the same XML document, but this time with this first processing instruction omitted. Any suggestions on how to introduce a pure arrow to parse xmlDoc1 successfully in test2 ?

module HXTRport where

import Text.XML.HXT.Core

data GParseState = GParseState { stateGenId :: Int } deriving(Show)

-- | Doesn't work
test1 :: (GParseState,[XmlTree])
test1 = runSLA xread initState xmlDoc1
  where
    initState = GParseState { stateGenId = 0 }

{- ERROR running test1
(GParseState {stateGenId = 0},[NTree (XError 2 "\"string: \"<?xml version=\\\"1.0\\\"?><rdf:RDF xmlns:rdf=...\"\" (line 1, column 6):\nunexpected xml\nexpecting legal XML name character\n") []])
-}

-- | Works
test2 :: (GParseState,[XmlTree])
test2 = runSLA xread initState xmlDoc2
  where
    initState = GParseState { stateGenId = 0 }

{- output of runnnig test2
(GParseState {stateGenId = 0},[NTree (XTag "rdf:RDF" [NTree (XAttr "xmlns:rdf") [NTree (XText "http://www.w3.org/1999/02/22-rdf-syntax-ns#") []]]) [NTree (XTag "rdf:Description" [NTree (XAttr "rdf:about") [NTree (XText "http://example.com/123") []]]) [NTree (XTag "rdf:value" []) [NTree (XText "xxx") []]]]])
-}

xmlDoc1 :: String
xmlDoc1 = "<?xml version=\"1.0\"?>" ++
    "<rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">" ++
      "<rdf:Description rdf:about=\"http://example.com/123\">" ++
        "<rdf:value>xxx</rdf:value>" ++
      "</rdf:Description>" ++
    "</rdf:RDF>"

xmlDoc2 :: String
xmlDoc2 =
    "<rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">" ++
      "<rdf:Description rdf:about=\"http://example.com/123\">" ++
        "<rdf:value>xxx</rdf:value>" ++
      "</rdf:Description>" ++
    "</rdf:RDF>"
@UweSchmidt
Owner

Hi Rob,

parsing a whole document with xread does not work, just an XML content is parsed.
To enable parsing a whole doc in a pure context, I've added a new xreadDoc function.
So if you change

test1 = runSLA xread initState xmlDoc1

into

test1 = runSLA xreadDoc initState xmlDoc1

you will get a result list of 2 elements, the first for the "<?xml ...?>" and the second
with the content. If you want to get rid of everything around the interesting "rdf:RDF"
element, use

test1 = runSLA (xread >>> isElem) initState xmlDoc1

There is a corresponding hreadDoc for HTML. This extension is in
hxt-9.3.1.2. I've just uploaded it on hackage.

Uwe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.