Skip to content

Latest commit

 

History

History
53 lines (40 loc) · 8.52 KB

File metadata and controls

53 lines (40 loc) · 8.52 KB

Apache Spark

  • Apache Spark is an open-source framework & compute engine that is known for its speed, easy-to-use nature in the field of big data processing and analysis.
  • It also has built-in modules for graph processing, machine learning, streaming, SQL etc.
  • The spark execution engine supports in-memory computation and cyclic data flow.
  • It can select the distributed data, create a map and then reduce the values to give proper data.

Spark Stack

⭐ Real world use cases of Apache Spark

Remarks
Zomato - HLD Design Generate recommendations based on continuous streams of user activity, order activity, pricing and promo changes etc.
Uber Driver Allocation Select driver based on continuous streams of driver locations, user location etc.

Features

Feature Remarks
Resilient Distributed Dataset (RDD) RDD is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel.
Directed acyclic graph (DAG) Read more
Accumulator A simpler value of Accumulable where the result type being accumulated is the same as the types of elements being merged, i.e. variables that are only "added" to through an associative operation and can therefore be efficiently supported in parallel. They can be used to implement counters (as in MapReduce) or sums. Spark natively supports accumulators of numeric value types, and programmers can add support for new types.
Spark UI Apache Spark provides a suite of web user interfaces (UIs) that you can use to monitor the status and resource consumption of your Spark cluster.
Spark Core
MLlib/GraphX
Spark Streaming Read more
Spark SQL Read more
Associative Property Associative property is defined as, when more than two numbers are added or multiplied, the result remains the same, irrespective of how they are grouped.

Cluster Manager

Cluster Manager
Standalone cluster manager (spark's own cluster manager)
Hadoop Yarn
Apache Mesos
Kubernates

Read more

Data Sources

Apache Spark vs Others

References