Skip to content

Commit

Permalink
Support all readers and writers that pandoc does
Browse files Browse the repository at this point in the history
This uses pandoc's own getReader and getWriter functions, so it doesn't
need to be constantly updated with newer formats.

Furthermore, libpandoc also now supports readers and writers that are
not pure.

Signed-off-by: Shahbaz Youssefi <ShabbyX@gmail.com>
  • Loading branch information
ShabbyX committed Feb 14, 2016
1 parent d2a96d1 commit eb25c89
Show file tree
Hide file tree
Showing 6 changed files with 49 additions and 106 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
/dist
bin
TestResult.xml
*.swp
46 changes: 6 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,48 +93,13 @@ the buffer and `user_data` is the last argument of the `pandoc` function,
similar to `user_data` of the reader. The writer function must write the
contents of the buffer as the output of the conversion by Pandoc.


#### Input and Output Formats

Input and output formats depend on Pandoc version the library is built
against. They are passed as strings. Possible values include (TODO: needs
to be verified and possibly extended):

- For reader:

* docbook
* html
* latex
* markdown
* mediawiki
* native
* rst
* ~~texmath~~
* textile

- For writer:

* asciidoc
* context
* docbook
* ~~docx~~
* ~~epub~~
* ~~fb2~~
* html
* latex
* man
* markdown
* mediawiki
* native
* ~~odt~~
* opendocument
* org
* rst
* rtf
* texinfo
* textile

Note: Some read and write types supported by Pandoc striked above are not yet
supported by libpandoc.
against. They are passed as strings. Reader and writer names are the same
as understood by Pandoc, for example `"html"` or `"markdown"`.


#### JSON Settings

Expand All @@ -158,7 +123,8 @@ fields that have non-default values have to be provided.
## Changelog

* 0.8
- Switched to JSON internal representation instead of XML
- Switched to JSON format for settings instead of XML
- Completed support for all Pandoc readers and writers
- Updated to Pandoc version 1.16 and higher
* 0.7 - Updated to Pandoc version 1.13 and higher
* 0.6 - Updated to Pandoc version 1.10 and higher
Expand Down
4 changes: 2 additions & 2 deletions libpandoc.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Maintainer: Shahbaz Youssefi <shabbyx@gmail.com>
Synopsis: Pandoc as a shared object or DLL.
Build-Type: Simple
Executable libpandoc.so
If os(windows)
if os(windows)
CPP-Options: -DWIN32
Extensions: ForeignFunctionInterface
Build-Depends: base >= 4.6,
Expand All @@ -32,6 +32,6 @@ Executable libpandoc.so
Ghc-Options: -no-hs-main -shared
if !os(windows)
Ghc-Options: -dynamic
extra-libraries: HSrts-ghc$compiler
extra-libraries: HSrts-ghc7.10.2
else
extra-libraries: HSrts
84 changes: 27 additions & 57 deletions src/LibPandoc.hs
Original file line number Diff line number Diff line change
Expand Up @@ -26,22 +26,21 @@ module LibPandoc (pandoc, LibPandocSettings(..), defaultLibPandocSettings) where

import Control.Arrow ((>>>))
import Control.Exception (catch, Exception(..), SomeException(..))
import Control.Monad.Except (MonadError(..))
import Control.Monad ((>=>), liftM)
import qualified Data.ByteString.Lazy.Char8 as BLC
import qualified Data.Char as Char
import qualified Data.List as List
import qualified Data.Map as Map
import Data.Maybe
import Data.String (IsString)
import Data.Typeable (typeOf)
import Foreign
import Foreign.C.String
import Foreign.C.Types
import LibPandoc.IO
import LibPandoc.Settings
import System.IO.Unsafe
import Text.Pandoc
import Text.Pandoc.Error
import Text.Pandoc.MediaBag
import Text.JSON
import Text.JSON.Generic (toJSON,fromJSON)

Expand All @@ -51,55 +50,9 @@ type CPandoc = CInt -> CString -> CString -> CString
-> IO CString

foreign export ccall "pandoc" pandoc :: CPandoc
foreign export ccall "increase" increase :: CInt -> IO CInt
foreign import ccall "dynamic" peekReader :: FunPtr CReader -> CReader
foreign import ccall "dynamic" peekWriter :: FunPtr CWriter -> CWriter

increase :: CInt -> IO CInt
increase x = return (x + 1)

readNativeWrapper :: ReaderOptions -> String -> Either PandocError Pandoc
readNativeWrapper options = readNative

getInputFormat :: String -> Maybe (ReaderOptions -> String -> Either PandocError Pandoc)
getInputFormat x =
case map Char.toLower x of
"docbook" -> Just readDocBook
"html" -> Just readHtml
"latex" -> Just readLaTeX
"markdown" -> Just readMarkdown
"mediawiki" -> Just readMediaWiki
"native" -> Just readNativeWrapper
"rst" -> Just readRST
-- "texmath" -> Just readTeXMath TODO: disabled until I figure out how to convert it to ReaderOptions -> String -> Pandoc
"textile" -> Just readTextile
_ -> Nothing

getOutputFormat :: String -> Maybe (WriterOptions -> Pandoc -> String)
getOutputFormat x =
case map Char.toLower x of
"asciidoc" -> Just writeAsciiDoc
"context" -> Just writeConTeXt
"docbook" -> Just writeDocbook
-- "docx" -> Just writeDocx TODO: The following are disabled because they return IO types
-- "epub" -> Just writeEPUB TODO: Which I do not know yet how to mix with the non IO type
-- "fb2" -> Just writeFB2
"html" -> Just writeHtmlString
"latex" -> Just writeLaTeX
"man" -> Just writeMan
"markdown" -> Just writeMarkdown
"mediawiki" -> Just writeMediaWiki
"native" -> Just writeNative
-- "odt" -> Just writeODT
"opendocument" -> Just writeOpenDocument
"org" -> Just writeOrg
"rst" -> Just writeRST
"rtf" -> Just writeRTF
"texinfo" -> Just writeTexinfo
"textile" -> Just writeTextile
_ -> Nothing


-- | Gives preferential treatment to first argument (should be user options)
joinJSON :: JSValue -> JSValue -> JSValue
joinJSON (JSArray a) (JSArray b) = JSArray (List.zipWith joinJSON a b)
Expand All @@ -124,17 +77,34 @@ getSettings settings = do

pandoc :: CPandoc
pandoc bufferSize input output settings reader writer userData = do
let r = peekReader reader
w = peekWriter writer
let cr = peekReader reader
cw = peekWriter writer
i <- peekCString input
o <- peekCString output
s <- getSettings settings
case (getInputFormat i, getOutputFormat o) of
(Nothing, _) -> newCString "Invalid input format."
(_, Nothing) -> newCString "Invalid output format."
(Just read, Just write) ->
do let run = read (readerOptions s) >>> handleError >>> write (writerOptions s)
result <- tryMaybe (transform (decodeInt bufferSize) run r w userData)

read <- return $ getReader i >>= \rd -> case rd of
-- if the reader returns just a string, add an empty media bag to it
StringReader r -> Right $ \o s -> r o s >>= \p -> return $ p >>= \x -> Right (x, mempty::MediaBag)
-- otherwise, the output (Pandoc, MediaBag) is already fine
ByteStringReader r -> Right $ \o s -> r o (BLC.pack s)
-- Note: currently, the media bag is actually later thrown away. Perhaps in the future there could be support for it

write <- return $ getWriter o >>= \wr -> case wr of
-- if the writer returns just a string, add an IO
PureStringWriter w -> Right $ \o s -> return $ w o s
-- if it returns an IO string, it's fine
IOStringWriter w -> Right w
-- if it returns an IO byte string, convert it to IO string
IOByteStringWriter w -> Right $ \o s -> w o s >>= \s -> return $ BLC.unpack s

case (read, write) of
(Left e, _) -> newCString e
(_, Left e) -> newCString e
(Right r, Right w) ->
do let run = (r (readerOptions s) >=> return . liftM fst) >>> liftM handleError >>> \x -> x >>= w (writerOptions s)
-- run takes the media bag out of reader, as it is currently unused. Since the reader returns IO, everything is lifted
result <- tryMaybe (transform (decodeInt bufferSize) run cr cw userData)
case result of
Just (SomeException res) -> newCString (show (typeOf res) ++ ": " ++ show res)
Nothing -> return nullPtr
Expand Down
17 changes: 12 additions & 5 deletions src/LibPandoc/IO.hs
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ type CWriter = CString -> CInt -> Ptr () -> IO ()
-- | parameter is the size of the buffer in bytes. The number of
-- | bytes returned by the reader and passed to the writer should not
-- | exceed the buffer size.
transform :: Int -> (String -> String) -> CReader -> CWriter -> Ptr () -> IO ()
transform :: Int -> (String -> IO String) -> CReader -> CWriter -> Ptr () -> IO ()
transform bufferSize transformer reader writer userData =
withBuffer bufferSize $ \rbuf ->
withBuffer bufferSize $ \wbuf ->
Expand All @@ -65,8 +65,15 @@ readStream (Buffer (buf, size)) reader userData = unsafeInterleaveIO result wher
k <- reader buf userData
fmap (Utf8.decodeString s ++) (loop (decodeInt k))

writeStream :: Buffer -> CWriter -> String -> Ptr() -> IO ()
writeStream (Buffer (buf, size)) writer text userData = loop text where
{--
writeStream' :: Buffer -> CWriter -> IO String -> Ptr() -> IO ()
writeStream' buf writer text userData = do
text' <- text
writeStream buf writer text' userData
--}

writeStream :: Buffer -> CWriter -> IO String -> Ptr() -> IO ()
writeStream (Buffer (buf, size)) writer text userData = text >>= loop where
buffer = castPtr buf
loop text = do
let (head, tail) = splitAt (div size 4) text
Expand All @@ -78,7 +85,7 @@ writeStream (Buffer (buf, size)) writer text userData = loop text where
_ -> loop tail

decodeInt :: CInt -> Int
decodeInt x = fromInteger (toInteger x)
decodeInt = fromInteger . toInteger

encodeInt :: Int -> CInt
encodeInt x = fromInteger (toInteger x)
encodeInt = fromInteger . toInteger
2 changes: 1 addition & 1 deletion src/pandoc.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ void pandoc_exit();
/*
* Calls `pandoc` with given input and output formats and streams.
* Returns a `NULL` on success, or a `NULL`-terminated error message
* on failure. Settings is an XML string conforming to a schema
* on failure. Settings is a JSON string conforming to a schema
* distributed with `libpandoc`. Settings can be `NULL`. All strings
* should be encoded as UTF-8. User data is any pointer.
*/
Expand Down

0 comments on commit eb25c89

Please sign in to comment.