Create your own GitHub profile
Sign up for your own profile on GitHub, the best place to host code, manage projects, and build software alongside 28 million developers.
Using Hash table based indexes for optimising joins in Apache Spark
Spark Data Source package to read data warehouse exports from Site Catalyst written for Apache Spark v1.6 and earlier and compatible with Spark 2.0 and above.
Apache Spark v2.0.0 application written in Scala to map given latitude longitude values to nearest latitude longitude values in a given set using broadcasted indexes of available geo coordinates.
A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0
Data Pipeline examples using Oozie, Spark and Hive on Cloudera VM and AWS EC2 (branch aws-ec2)
Data Structures and Algorithms for Coding Interviews (Java & Scala)
54 contributions in the last year
The transform name was hard coded to
UnionAll for the union all transform, and doesn't pick up the call site name as the transform name. Adding uni…
Press h to open a hovercard with more details.