## RESTful Service
**Goal:** The goal of this notebook is to show how we can use Apache Toree for other purposes than exploratory reasons. This example will transform our previous notebook into a backend service we can invoke from the browser or other applications.

This notebook exposes our hacker news data explorations as RESTful service. This is done externally through a Jupyter extension called the [Jupyter Kernel Gateway](https://github.com/jupyter-incubator/dashboards). If you are running this demo with `docker-compose` you should be able to access the endpoints at:

* http://localhost:9999/story/12476597/comments
* http://localhost:9999/story/12476597/links
* http://localhost:9999/story/12476597/words

You can substitute the id for any valid story in case you would like the verify the demo :D.

In [None]:
%adddeps org.jsoup jsoup 1.9.2 --transitive
%adddeps com.github.seratch hackernews4s_2.10 0.6.0 --transitive

In [None]:
val sqlC = sqlContext

In [None]:
import hackernews4s.v0._
import sqlC.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row
import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import org.jsoup.nodes.Element
import scala.collection.JavaConversions._
import org.apache.spark.ml.feature.{StopWordsRemover, Tokenizer}
import org.apache.spark.sql.DataFrame
import org.apache.spark.rdd.RDD

The code in the next cell is the same from the previous notebook. The interesting bits are going to be after this cell.

In [None]:
case class Comment(story: Long, id: Long, text: String)

val tokenizer = new Tokenizer().setInputCol("_1").setOutputCol("words")
val remover = new StopWordsRemover().setInputCol("words").setOutputCol("filteredWords")

// A function to transform an item into a tuple of that item and a list of comments on that item
val getComments: (Item) => Seq[Comment] = (story: Item) => {
    def _getComments:  (Item) => Seq[Comment] = (item: Item) => {
        val commentIds = item.commentIds
        if(commentIds.size == 0){
            Seq(Comment(story.id.id, item.id.id, item.text))
        } else {
            val comments: Seq[Comment] = commentIds.flatMap((itemId: ItemId) => { 
                _getComments(HackerNews.getItem(itemId).get)
            })
            if("Story".equals(item.itemType.toString)){
                comments
            } else {
                Comment(story.id.id, item.id.id, item.text) +: comments
            }
            
        }   
    }
    
    _getComments(story)
}

val getItemText: (Comment) => String = (comment: Comment) => {
    Jsoup.parse(comment.text).text()
}
val getItemLinks: (Comment) => Seq[String] = (comment: Comment) => {
    val aTags: List[Element] = Jsoup.parse(comment.text).select("a").toList
    aTags.map((link: Element) => {
        link.attr("href")
    })
}

def getStoryComments(storyId: Int) = {
    val story = Seq(HackerNews.getItem(ItemId(storyId)).get)
    sc.parallelize(story).flatMap((item: Item) => {
        getComments(item)
    })
}   

def getCommentLinks(comments: RDD[Comment]) = {
    comments.flatMap((comment:Comment) => {
        getItemLinks(comment)
    })
}

def getCommentWordCounts(comments: RDD[Comment]) = {
    val textDF = comments.map((comment:Comment) => {
        getItemText(comment)
    }).toDF
    val tokenizedComments = tokenizer.transform(textDF)
    val filteredWordCountsDF = remover.transform(tokenizedComments)
    val terms = filteredWordCountsDF.flatMap((row: Row) =>{
        row.getSeq[String](2)
    })
    val wordCounts = terms.map((word: String) => {
        (word, 1)
    }).reduceByKey(_+_)
    wordCounts
}

This JsonHelper object is what we will use to serialize the results from our previous code. This class will output our data structures as JSON, allowing for other applications and users to consume the data.

In [None]:
object JsonHelper extends Serializable {
    import play.api.libs.json._
    
    implicit val commentWrites = new Writes[Comment] {
        def writes(comment: Comment) = Json.obj(
        "story" -> comment.story,
        "id" -> comment.id,
        "text" -> comment.text
        )
    }
    
    implicit val tupleWrites = new Writes[(String, Int)] {
        def writes(tuple: (String, Int)) = Json.obj(
            tuple._1 -> tuple._2
        )
    }
    
    def jsonComments(REQUEST: String) = {
        val req = Json.parse(REQUEST)
        val storyId = (req \ "path" \ "story_id").as[String].toInt
        val comments = getStoryComments(storyId).collect()
        Json.toJson(comments)
    }
    
    def jsonLinks(REQUEST: String) = {
        val req = Json.parse(REQUEST)
        val storyId = (req \ "path" \ "story_id").as[String].toInt
        val links = getCommentLinks(getStoryComments(storyId)).collect()
        Json.toJson(links)
    }
    
    def jsonWords(REQUEST: String) = {
        val req = Json.parse(REQUEST)
        val storyId = (req \ "path" \ "story_id").as[String].toInt
        val words = getCommentWordCounts(getStoryComments(storyId)).sortBy((wordCount: (String, Int)) => {
        wordCount._2
    }, ascending=false).take(50)
        Json.toJson(words)
    }
}

Below we have annotated our cells with comments to register them as RESTful endpoints. The three endpoints take a __`story_id`__ as a path parameter. This is parsed out in our **`JSONHelper`** class above. The three endpoints are:

* GET /story/:story_id/comments: Gets all of the comments for a story
* GET /story/:story_id/links: Gets all of the links in the comments of a story
* GET /story/:story_id/words: Gets all of the word counts for the comments of a story.

The Jupyter Kernel Gateway automatically injects a JSON string stored in the variable **`REQUEST`**. This is done before each cell is invoked.

If we wanted to test this code locally we could create **`REQUEST`** inline in a code cell.

In [None]:
// GET /story/:story_id/comments
println(JsonHelper.jsonComments(REQUEST))

In [None]:
// GET /story/:story_id/links
println(JsonHelper.jsonLinks(REQUEST))

In [None]:
// GET /story/:story_id/words
println(JsonHelper.jsonWords(REQUEST))