lakeFS - Data version control for your data lake | Git for data
-
Updated
Jun 2, 2024 - Go
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
lakeFS - Data version control for your data lake | Git for data
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Open source platform for the machine learning lifecycle
A minimal, responsive and feature-rich Jekyll theme for my technical writing.
curated list of awesome tools and libraries for specific domains
REST API for Apache Spark on K8S or YARN
The Tech Canvas Experimenters Hub is an interdisciplinary repository for collaborative projects spanning various fields, such as hardware like Arduino UNO, financial engineering, machine learning, natural language processing, and the corresponding mathematical foundations for all fields.
DeSQL is an interactive step-through debugging technique for DISC-backed SQL queries. This approach allows users to inspect constituent parts of a query and their corresponding intermediate data interactively, similar to watchpoints in gdb-like debuggers.
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
Fully managed Apache Parquet implementation
Experiment tracking server focused on speed and scalability
The Proxima platform.
📘 FIWARE 306: Real-time Processing of Context Data using Apache Spark
A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets methods to convert it into the rest of the types and vice versa. E.g: a Spark Schema can be transformed into a BigQuery table.
Code for "Efficient Data Processing in Spark" Course
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Data transformation framework for ETL processing with SQL-like syntax and GIS extensions, based on Apache Spark
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
Free High-Quality Financial Data in Azure
Simple and Distributed Machine Learning
Created by Matei Zaharia
Released May 26, 2014