Skip to content
Spark Streaming library for reading chat messages from
Java Scala
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Spark-Streaming-Twitch: live stream data with Spark Streaming

Twitch-Streamer uses Twitch's Chat and IRC API to stream in messages from specified channels. It is a light-weight wrapper over Spark Streaming and Twitch's Chat IRC data feed. The goal of this project is to fully utilize the strengths of Spark Streaming to allow others to perform analyses of Twitch's live stream chatrooms.

Full Documentation


Clients (yourself!) will interact directly using the scala classes -- these are important!

Find them here


The nitty gritty internals of the project were written in Java.

Find them here


Getting Started

Using Maven


Using SBT

libraryDependencies += "com.andrewgapic" %% "spark-streaming-twitch" % "1.0.0"


Built With

This project is built with Java 1.8, Scala 2.11.8, and Spark Streaming 2.11.8.


$ git clone
$ cd twitch-streamer/
$ mvn clean install

Note: If you're interested in helping with developing the project further, there are lots of features and optimizations that could be done. For example, concurrency isn't as optimal as it could be; the entire stream is being fed through one thread. Other asynchronous calls could potentially also have their own thread.


Twitch-Streamer can be used with either Scala or Java, and was built with this notion in mind. Generally, Scala is a better fit since Spark was built in Scala; but, as always, use your favourite language. Twitch-Streamer uses the Builder pattern to construct a new Receiver object; please refer to the scaladocs for a complete listing of mutator methods.


  1. You can obtain the twitch_client_id by registering for your application here: Alternatively, you can follow the instructions here:
  2. Your twitch_username can be any string that hasn't joined the IRC chat already. Typically, I just use my actual username.
  3. Your twitch_password can be retrieved here: ttp:// after your application has been registered.
twitch_client_id <clientid>
twitch_username <username>
twitch_password <go here:> It's an oauth: password, not your twitch account password.


Twitch-Streamer introduces an abstraction called a Message. It transforms a line of text from Twitch's IRC chat into a Message, which allows clients to get the author of the message, the channel name, and the actual message content.

Scala API

import com.andrewgapic.spark.streaming.TwitchStreamBuilder
val gamesSet: Set[String] = Set("League+of+Legends")
val stream: ReceiverInputDStream[Message] = new TwitchStreamBuilder().setGames(gamesSet).build(ssc)

Java API

import com.andrewgapic.spark.streaming.TwitchStreamBuilder;
Set<String> gamesSet = new HashSet<>();
JavaReceiverInputDStream<Message> stream = new TwitchStreamBuilder().setGames(gamesSet).build(jssc);

More advanced usage (Scala)

Note: spaces in game names must be replaced by a + character. This will be done automatically in future versions.

import com.andrewgapic.spark.streaming.TwitchStreamBuilder
val sparkConf = new SparkConf().setAppName("TwitchTest")
val ssc = new StreamingContext(sparkConf, Seconds(2))
val gamesSet: Set[String] = Set("League+of+Legends")
val channelsSet: Set[String] = Set("TSM_Dyrus")
val language: String = "en" //english
val storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2
val schedulingInterval: FiniteDuration = 600 seconds // refresh channels every 10 minutes
val stream: ReceiverInputDStream[Message] = new TwitchStreamBuilder()


There are two examples in the examples folder; one in Scala (ChannelAndWordsCount), and one in Java (JavaWordsCount). The Scala example does two things: displays the top 15 words by word frequency (ignores stopwords), and displays the top channels by message frequency. The Java example only displays word frequency.

Bugs and Feedback

For bugs, questions and discussions please use the GitHub Issues.


Copyright 2017, Andrew Gapic.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

You can’t perform that action at this time.