Outages processor

Tech stack

Java 1.8
Spark 3.3.0
Kryo serializers

How to run

(make sure to these with Java 8, check with mvn -v)

Run the tests:

mvn clean test

Build the application package

mvn clean package

Next, submit your job

mvn clean package && spark-submit --class org.cannotsay.Main --master local[*] --packages com.databricks:spark-xml_2.12:0.15.0 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer ./target/XMLOutagesSparkReader-1.0-SNAPSHOT.jar

Configurations

Input path

Add your json files under the raw zone at : src/java/resources/raw.

Outup path

Find your outputs under the trusted zone at : src/java/resources/trusted.

Zones

Raw, where the raw inputs live
Staging, where the processed/in-process data lives
Trusted, where the final/trusted/processed data lives

About

Use Spark OutagesRawXMLReader to read the XML file
Use Spark OutagesWriter to write contents to the staing area in json format.
- Using json for its readability, should use parquet instead
Use Spark OutagesStagingStreamReader to read json contents to a Spark stream.
Process data with OutagesStreamProcessor
Start stream querying (application will keep running in the background)
Per each stream batch call OutageSink for filtering and driving contents to wither business or customer trust areas.

Future improvements

Parse postal codes as a list of elements
Parse locations as a list of elements
Replace formats from json to parquet for performance increase.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
src		src
.gitignore		.gitignore
README.md		README.md
dependency-reduced-pom.xml		dependency-reduced-pom.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Outages processor

Tech stack

How to run

Configurations

Zones

About

Future improvements

About

Releases

Packages

Languages

JoelPintoMata/OutagesProcessor

Folders and files

Latest commit

History

Repository files navigation

Outages processor

Tech stack

How to run

Configurations

Zones

About

Future improvements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages