Skip to content

CatalystCode/streaming-rss-html

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Travis CI status

streaming-rss-html

A library for reading public RSS feeds using Spark Streaming.

Usage example

Run a demo via:

# compile scala, run tests, build fat jar
sbt assembly

# run on spark
spark-submit --class RSSDemo target/scala-2.11/streaming-rss-html-assembly-0.0.1.jar http://somehost/somepath/to/rss

Add to your own project by adding this dependency in your build.sbt:

libraryDependencies ++= Seq(
  //...
  "com.github.catalystcode" %% "streaming-rss-html" % "1.0.2",
  //...
)

How does it work?

Currently, this RDDInputDStream polls the given RSS feed at the specified rated. All scraping of any HTML content is up to the caller.

Release process

  1. Configure your credentials via the SONATYPE_USER and SONATYPE_PASSWORD environment variables.
  2. Update version.sbt
  3. Run sbt then from the sbt shell, do this:
sonatypeOpen "enter staging description here"
publishSigned
sonatypeRelease

About

Spark connector for RSS and HTML sources.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages