Datasets are a collection of files with the same schema that reside in
kartothek offers a metadata definition to handle these datasets
efficiently. In addition, the
kartothek.io module provides building
blocks to create and modify these datasets. Handling of I/O, tracking of
dataset partitions and selecting subsets of data are handled transparently.
Installers for the latest released version are availabe at the Python package index and on conda.
# Install with pip pip install kartothek
# Install with conda conda install -c conda-forge kartothek
What is a (real) Kartothek?
A Kartothek (or more modern: Zettelkasten/Katalogkasten) is a tool to organize (high-level) information extracted from a source of information.