Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to track multiple pages in one stream #2

Merged
merged 2 commits into from Jul 20, 2017
Merged

Conversation

@c-w
Copy link
Contributor

@c-w c-w commented Jul 19, 2017

Resolves #1

Copy link

@erikschlegel erikschlegel left a comment

The pageIds PR looks good. Perhaps we can create a separate PR to address the enhancement to bifurcate FB posts and comments.


override def fetchFacebookResponse(after: Date): ResponseList[Post] = {
protected def fetchFacebookResponse(after: Date): ResponseList[Post] = {
logDebug(s"Fetching posts for $pageId since $after")
facebook.getPosts(pageId, createReading(after))

What happens to the comments that were posted since after? Keep in mind that FB comments are the most useful piece of data for trend detection in fortis. These posts can have comments that carry across for a very long time.

Copy link
Contributor Author

@c-w c-w Jul 20, 2017

Posts and Comments are linked in the Facebook API. As such, you can get a post for date T and then look up its linked comments for it that may have been posted at a date after T.

storageLevel: StorageLevel = StorageLevel.MEMORY_ONLY
): ReceiverInputDStream[FacebookPost] = {
createPageStreams(ssc, auth, Set(pageId), fields, pollingSchedule, pollingWorkers, storageLevel)
}
}

we're going to need to expose a createPageCommentsStream utility to focus on newly posted comments since the previous processed time. This stream should crawl facebook4j getPostComments http://facebook4j.github.io/javadoc/facebook4j/api/PostMethods.html#getPostComments(java.lang.String, facebook4j.Reading)

Copy link
Contributor Author

@c-w c-w Jul 20, 2017

Okay. Will do in a follow-up PR, see #3 for tracking.

ssc: StreamingContext,
auth: FacebookAuth,
pageId: String,
pageIds: Set[String],
fields: Set[String] = Set("message", "place", "caption", "from", "name", "comments"),

let's remove comments as this should be populated from createPageCommentsStream

Copy link
Contributor Author

@c-w c-w Jul 20, 2017

This is now implemented in #4.

@c-w c-w merged commit 868f99c into master Jul 20, 2017
2 checks passed
@c-w c-w deleted the track-multiple-pages branch Jul 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants