Skip to content

Commit

Permalink
Migrate docs to scalawilliam.com
Browse files Browse the repository at this point in the history
  • Loading branch information
ScalaWilliam committed Mar 21, 2021
1 parent bdfef53 commit 10d4ee6
Showing 1 changed file with 2 additions and 114 deletions.
116 changes: 2 additions & 114 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
XML Streaming for Scala (xs4s) [![Maven Central](https://img.shields.io/maven-central/v/com.scalawilliam/xs4s-core_2.13.svg)](https://maven-badges.herokuapp.com/maven-central/com.scalawilliam/xs4s-core_2.13)
====


## Capabilities

xs4s offers the following capabilities:
Expand All @@ -14,120 +15,7 @@ assert(xs4s.XML.loadString("<test/>") == <test/>)
- An integration with FS2 and ZIO for pure FP streaming.
- Large file streaming, such as multi-gigabyte XML files, for example GZIPped files straight from Wikipedia, without running out of memory.


## How it does it
It uses the standard XML API (https://github.com/FasterXML/woodstox) as a back-end. It gradually forms a partial tree, and based on a user-supplied function ("query"), it will materialise that partial tree into a full tree, which will return to the user.

## Getting started

Add the following to your build.sbt (compatible with Scala 3.0.0-RC1, Scala 2.13 and 2.12 series):

```sbt
libraryDependencies += "com.scalawilliam" %% "xs4s-core" % "0.8.7"
libraryDependencies += "com.scalawilliam" %% "xs4s-fs2" % "0.8.7"
libraryDependencies += "com.scalawilliam" %% "xs4s-zio" % "0.8.7"
```

## Examples

### FS2 Streaming

Then, you can implement functions such as the following ([BriefFS2Example](example/src/main/scala/xs4s/example/brief/BriefFS2Example.scala) - note the explicit types are for clarity):

```scala
/**
*
* @param byteStream Could be, for example, fs2.io.readInputStream(inputStream)
* @param blocker obtained with Blocker[IO]
*/
def extractAnchorTexts(byteStream: Stream[IO, Byte], blocker: Blocker)(
implicit cs: ContextShift[IO]): Stream[IO, String] = {

/** extract all elements called 'anchor' **/
val anchorElementExtractor: XmlElementExtractor[Elem] =
XmlElementExtractor.filterElementsByName("anchor")

/** Turn into XMLEvent */
val xmlEventStream: Stream[IO, XMLEvent] =
byteStream.through(byteStreamToXmlEventStream(blocker))

/** Collect all the anchors as [[scala.xml.Elem]] */
val anchorElements: Stream[IO, Elem] =
xmlEventStream.through(anchorElementExtractor.toFs2PipeThrowError)


/** And finally extract the text contents for each Elem */
anchorElements.map(_.text)
}
```

### ZIO Streaming

Then, you can implement functions such as the following ([BriefZIOExample](example/src/main/scala/xs4s/example/brief/BriefZIOExample.scala) - note the explicit types are for clarity):

```scala
/**
*
* @param byteStream Could be, for example, zio.stream.Stream.fromInputStream(inputStream)
* @return
*/
def extractAnchorTexts[R <: Blocking](byteStream: ZStream[R, IOException, Byte]): ZStream[R, Throwable, String] = {
/** extract all elements called 'anchor' **/
val anchorElementExtractor: XmlElementExtractor[Elem] =
XmlElementExtractor.filterElementsByName("anchor")

/** Turn into XMLEvent */
val xmlEventStream: ZStream[R, Throwable, XMLEvent] =
byteStream.via(byteStreamToXmlEventStream()(_))

/** Collect all the anchors as [[scala.xml.Elem]] */
val anchorElements: ZStream[R, Throwable, Elem] =
xmlEventStream.via(anchorElementExtractor.toZIOPipeThrowError)

/** And finally extract the text contents for each Elem */
anchorElements.map(_.text)
}
```

### Plain `Iterator` streaming

Alternatively, we have a plain-Scala API, especially where you have legacy Java interaction, or you feel uncomfortable with pure FP for now: [BriefPlainScalaExample](example/src/main/scala/xs4s/example/brief/BriefPlainScalaExample.scala).:

```scala
def extractAnchorTexts(sourceFile: File): Unit = {
val anchorElementExtractor: XmlElementExtractor[Elem] =
XmlElementExtractor.filterElementsByName("anchor")
val xmlEventReader = XMLStream.fromFile(sourceFile)
try {
val elements: Iterator[Elem] =
xmlEventReader.extractWith(anchorElementExtractor)
val text: Iterator[String] = elements.map(_.text)
text.foreach(println)
} finally xmlEventReader.close()
}
```

### Advanced Wikipedia example

This example counts the popularity of Wikipedia anchors from their `abstract` documentation.

Many things all at once:
- Reading a streaming URL
- Passing through GZip decoder
- Then parsing XML
- Then doing map-reduce data from Wikipedia

The main example is in [FindMostPopularWikipediaKeywordsFs2App](example/src/main/scala/xs4s/example/FindMostPopularWikipediaKeywordsFs2App.scala) or [FindMostPopularWikipediaKeywordsZIOApp](example/src/main/scala/xs4s/example/FindMostPopularWikipediaKeywordsZIOApp.scala).
There is also a plain Scala example (using `Iterator`) in [FindMostPopularWikipediaKeywordsPlainScalaApp](example/src/main/scala/xs4s/example/FindMostPopularWikipediaKeywordsPlainScalaApp.scala).

```bash
$ git clone https://github.com/ScalaWilliam/xs4s.git
$ sbt "examples/runMain xs4s.example.FindMostPopularWikipediaKeywordsFs2App"
$ sbt "examples/runMain xs4s.example.FindMostPopularWikipediaKeywordsZIOApp"
$ sbt "examples/runMain xs4s.example.FindMostPopularWikipediaKeywordsPlainScalaApp"
```

This can consume 100MB files or 4GB files without any problems. And it does it fast. It converts XML streams into Scala XML trees on demand, which you can then query from.
Find the full documentation at --> **https://www.scalawilliam.com/xml-streaming-for-scala/**

## Authors & Contributors
- @ScalaWilliam <https://www.scalawilliam.com/>
Expand Down

0 comments on commit 10d4ee6

Please sign in to comment.