Skip to content
No description or website provided.
Scala Python
Branch: master
Clone or download
Latest commit 1b76c0c Dec 3, 2019

Tweet Archives Unleashed Toolkit (twut)

Build Status codecov LICENSE Contribution Guidelines

An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.


Getting Started

Until we have a release, you'll need to clone the repo, build it, and pass the jar to Apache Spark.

$ git clone
$ cd twut
$ mvn clean install
$ /path/to/spark/bin/spark-shell --jars /path/to/twut-0.0.1-SNAPSHOT-fatjar.jar"

Spark context Web UI available at
Spark context available as 'sc' (master = local[*], app id = local-1575383157031).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.0-preview

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
Type in expressions to have them evaluated.
Type :help for more information.


Documentation! Or, how do I use this?

Once built or downloaded, you can follow the basic set of recipes and tutorials here.


Licensed under the Apache License, Version 2.0.


This work is primarily supported by the Andrew W. Mellon Foundation. Other financial and in-kind support comes from the Social Sciences and Humanities Research Council, Compute Canada, the Ontario Ministry of Research, Innovation, and Science, York University Libraries, Start Smart Labs, and the Faculty of Arts and David R. Cheriton School of Computer Science at the University of Waterloo.

Any opinions, findings, and conclusions or recommendations expressed are those of the researchers and do not necessarily reflect the views of the sponsors.

You can’t perform that action at this time.