GitHub is home to over 31 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.
This repository provides Scotty, a framework for efficient window aggregations for out-of-order Stream Processing.
Efficient k-Means in OpenCL
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing
This repository contains our work on efficient SIMD vectorization methods for hashing in OpenCL. It was first published at the 21th International Conference on Extending Database Technology (EDBT) in March 2018.
I²: Interactive Real-Time Visualization for Streaming Data
2018 Streamline Hackathon Boilerplate
DBPL 2017: The 16th International Symposium on Database Programming Languages (Website)
Computation using data flow graphs for scalable machine learning
Repository for the code and documents of streamline hackathon in Munich
IMWA Framework: Code for a framework to calculate impact measures for Wikipedia authors
Mirror of Apache Zeppelin
Implicit Parallelism for Scalable Data Analysis.
Peel bundle benchmarking Apache Flink job including broadcast DataSets
Experiments for FLINK-2237
Temporary java prototype to support text generation.
A set of CLI tools for batch management of Scrum infrastructure (GitHub, Trello).
Myriad Parallel Data Generator Toolkit
A database profiler for Myriad Toolkit
A project containing a set of unit-tests for the Myriad toolkit.
Slides and Code for the Large Scala Data Mining Tutorial
An parallel implementation of the TPC-H data generator using the Myriad toolkit.
Simple generator of synthetic clickstreams
An parallel generator for the TeraSort benchmark using the Myriad toolkit.
An parallel generator of long-tailed random word sequences using the Myriad toolkit.
Virtual machine for testing provided by the the DOPA project.
Synthetic generator of nested sets data modelling package delifery logs.