Skip to content

NerdonblooR/SparkHyperCubeJoin

Repository files navigation

Apache Spark

This repository is a copy version of Apache Spark. We will add a new join algorithm called HyperCube join to optimize some multi-way join cases such that all tables can be joined in one round of shuffling. The original repository is https://github.com/apache/spark

From Apache Spark GitHub Repository:

Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.