Skip to content

Implementation of minimal map reduce sliding aggregation algorithm in pyspark.

Notifications You must be signed in to change notification settings

GaspardIV/SparkSlidingAggregation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

SparkSlidingAggregation

Implementation of minimal map reduce sliding aggregation algorithm in pyspark:

Authors of algortithm: Yufei Tao, Wenqing Lin, Xiaokui Xiao

Link to paper describing algorithm:

https://dl.acm.org/doi/10.1145/2463676.2463719

https://www.cse.cuhk.edu.hk/~taoyf/paper/sigmod13-mr.pdf

Yellow Taxi Trip Records (CSV) data from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page for January 2021. For each record I've computed the average ride distance and the average passenger occupancy during the last 1000 rides. The algorithm is minimal and follows the one from the paper. It Uses Spark RDD API Python.

About

Implementation of minimal map reduce sliding aggregation algorithm in pyspark.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published