Skip to content

This is a repository i have created to put up some of the knowledge i have gained around Big Data Technologies especially Spark, GraphX etc.

Notifications You must be signed in to change notification settings

SudhansuTaparia/BigData

Repository files navigation

Big Data Technologies

This is a repository i have created to put up some of the knowledge i have gained around Big Data Technologies especially Spark, GraphX etc.

SPARK

Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.

https://spark.apache.org/

GRAPHX

GraphX is Apache Spark's API for graphs and graph-parallel computation. GraphX unifies ETL, exploratory analysis, and iterative graph computation within a single system. You can view the same data as both graphs and collections, transform and join graphs with RDDs efficiently, and write custom iterative graph algorithms using the Pregel API.

https://spark.apache.org/graphx/

SPARK SQL

Spark SQL is Apache Spark's module for working with structured data. Spark SQL lets you query structured data inside Spark programs, using either SQL or a familiar DataFrame API. Usable in Java, Scala, Python and R.

https://spark.apache.org/sql/

SPARK STREAMING

Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. Spark Streaming brings Apache Spark's language-integrated API to stream processing, letting you write streaming jobs the same way you write batch jobs. It supports Java, Scala and Python.

https://spark.apache.org/streaming/

MLLIB

MLlib is Apache Spark's scalable machine learning library. MLlib fits into Spark's APIs and interoperates with NumPy in Python (as of Spark 0.9) and R libraries (as of Spark 1.5). You can use any Hadoop data source (e.g. HDFS, HBase, or local files), making it easy to plug into Hadoop workflows.

https://spark.apache.org/mllib/

Please go through the PPT's and let me know if you feel some additional information would help.

About

This is a repository i have created to put up some of the knowledge i have gained around Big Data Technologies especially Spark, GraphX etc.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published