Skip to content
This repository has been archived by the owner. It is now read-only.
A repository for all spark jobs running on fortis
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.travis
lib
project
src
.gitignore
.travis.yml
LICENSE
README.md
build.sbt
sonatype.sbt
version.sbt

README.md

This repository is outdated and was migrated to project-fortis.




Travis CI status

project-fortis-spark

A repository for Project Fortis's data processing pipeline, built on Apache Spark.

What's this?

This project contains a Spark Streaming job that ingests data into the Fortis system. Specifically, we:

  1. Ingest data in real time from sources such as Twitter, Facebook, Online Radio, Newspapers, Instagram, TadaWeb, and so forth.
  2. Analyze and augment the raw data with intelligence like sentiment analysis, entity extraction, place recognition, or image understanding.
  3. Narrow down the stream of events based on user-defined geo-areas, target keywords and blacklisted terms.
  4. Perform trend detection and aggregate the metrics that back Project Fortis.

At the end of the ingestion pipeline, we publish the events and various aggregations to Cassandra.

Development setup

# set up variables from deployment environment
export HA_PROGRESS_DIR="..."
export APPINSIGHTS_INSTRUMENTATIONKEY="..."
export FORTIS_FEATURE_SERVICE_HOST="..."
export FORTIS_MODELS_DIRECTORY="..."
export FORTIS_CENTRAL_ASSETS_HOST="..."
export FORTIS_SERVICEBUS_NAMESPACE="..."
export FORTIS_SERVICEBUS_CONFIG_QUEUE="..."
export FORTIS_SERVICEBUS_POLICY_NAME="..."
export FORTIS_SERVICEBUS_POLICY_KEY="..."

# compile scala, run tests, build fat jar
export JAVA_OPTS="-Xmx2048M"
sbt assembly

# run on spark
spark-submit --driver-memory 4g target/scala-2.11/project-fortis-spark-assembly-0.0.1.jar
You can’t perform that action at this time.