Quantco · AndreasAlbertQC · Oct 17, 2025 · Oct 17, 2025 · Oct 17, 2025 · Oct 17, 2025
@@ -63,7 +63,7 @@ class Collection(BaseCollection, ABC):
     represent "semantic objects" which cannot be represented in a single data frame due
     to 1-N relationships that are managed in separate data frames.
 
-    A collection must only have type annotations for :class:`~dataframely.LazyFrame`s
+    A collection must only have type annotations for :class:`~dataframely.LazyFrame`
     with known schema:
 
     .. code:: python
@@ -786,20 +786,20 @@ def read_parquet(
                 Parquet files may have been written with Hive partitioning.
             validation: The strategy for running validation when reading the data:
 
-                - ``"allow"`: The method tries to read the schema data from the parquet
+                - ``"allow"``: The method tries to read the schema data from the parquet
                   files. If the stored collection schema matches this collection
                   schema, the collection is read without validation. If the stored
                   schema mismatches this schema no metadata can be found in
                   the parquets, or the files have conflicting metadata,
                   this method automatically runs :meth:`validate` with ``cast=True``.
-                - ``"warn"`: The method behaves similarly to ``"allow"``. However,
+                - ``"warn"``: The method behaves similarly to ``"allow"``. However,
                   it prints a warning if validation is necessary.
                 - ``"forbid"``: The method never runs validation automatically and only
                   returns if the metadata stores a collection schema that matches
                   this collection.
                 - ``"skip"``: The method never runs validation and simply reads the
-                  data, entrusting the user that the schema is valid. _Use this option
-                  carefully_.
+                  data, entrusting the user that the schema is valid. *Use this option
+                  carefully*.
 
             kwargs: Additional keyword arguments passed directly to
                 :meth:`polars.read_parquet`.
@@ -849,20 +849,20 @@ def scan_parquet(
                 Parquet files may have been written with Hive partitioning.
             validation: The strategy for running validation when reading the data:
 
-                - ``"allow"`: The method tries to read the schema data from the parquet
+                - ``"allow"``: The method tries to read the schema data from the parquet
                   files. If the stored collection schema matches this collection
                   schema, the collection is read without validation. If the stored
                   schema mismatches this schema no metadata can be found in
                   the parquets, or the files have conflicting metadata,
                   this method automatically runs :meth:`validate` with ``cast=True``.
-                - ``"warn"`: The method behaves similarly to ``"allow"``. However,
+                - ``"warn"``: The method behaves similarly to ``"allow"``. However,
                   it prints a warning if validation is necessary.
                 - ``"forbid"``: The method never runs validation automatically and only
                   returns if the metadata stores a collection schema that matches
                   this collection.
                 - ``"skip"``: The method never runs validation and simply reads the
-                  data, entrusting the user that the schema is valid. _Use this option
-                  carefully_.
+                  data, entrusting the user that the schema is valid. *Use this option
+                  carefully*.
 
             kwargs: Additional keyword arguments passed directly to
                 :meth:`polars.scan_parquet` for all members.
@@ -947,20 +947,20 @@ def scan_delta(
             source: The location or DeltaTable to read from.
             validation: The strategy for running validation when reading the data:
 
-                - ``"allow"`: The method tries to read the schema data from the parquet
+                - ``"allow"``: The method tries to read the schema data from the parquet
                   files. If the stored collection schema matches this collection
                   schema, the collection is read without validation. If the stored
                   schema mismatches this schema no metadata can be found in
                   the parquets, or the files have conflicting metadata,
                   this method automatically runs :meth:`validate` with ``cast=True``.
-                - ``"warn"`: The method behaves similarly to ``"allow"``. However,
+                - ``"warn"``: The method behaves similarly to ``"allow"``. However,
                   it prints a warning if validation is necessary.
                 - ``"forbid"``: The method never runs validation automatically and only
                   returns if the metadata stores a collection schema that matches
                   this collection.
                 - ``"skip"``: The method never runs validation and simply reads the
-                  data, entrusting the user that the schema is valid. _Use this option
-                  carefully_.
+                  data, entrusting the user that the schema is valid. *Use this option
+                  carefully*.
 
             kwargs: Additional keyword arguments passed directly to :meth:`polars.scan_delta`.
 
@@ -1010,20 +1010,20 @@ def read_delta(
             source: The location or DeltaTable to read from.
             validation: The strategy for running validation when reading the data:
 
-                - ``"allow"`: The method tries to read the schema data from the parquet
+                - ``"allow"``: The method tries to read the schema data from the parquet
                   files. If the stored collection schema matches this collection
                   schema, the collection is read without validation. If the stored
                   schema mismatches this schema no metadata can be found in
                   the parquets, or the files have conflicting metadata,
                   this method automatically runs :meth:`validate` with ``cast=True``.
-                - ``"warn"`: The method behaves similarly to ``"allow"``. However,
+                - ``"warn"``: The method behaves similarly to ``"allow"``. However,
                   it prints a warning if validation is necessary.
                 - ``"forbid"``: The method never runs validation automatically and only
                   returns if the metadata stores a collection schema that matches
                   this collection.
                 - ``"skip"``: The method never runs validation and simply reads the
-                  data, entrusting the user that the schema is valid. _Use this option
-                  carefully_.
+                  data, entrusting the user that the schema is valid. *Use this option
+                  carefully*.
 
             kwargs: Additional keyword arguments passed directly to :meth:`polars.read_delta`.
 

@@ -43,8 +43,15 @@
     "sphinx.ext.autodoc",
     "sphinx.ext.linkcode",
     "sphinxcontrib.apidoc",
+    "myst_parser",
 ]
 
+myst_parser_config = {"myst_enable_extensions": ["rst_eval_roles"]}
+source_suffix = {
+    ".rst": "restructuredtext",
+    ".txt": "markdown",
+    ".md": "markdown",
+}
 numpydoc_class_members_toctree = False
 
 apidoc_module_dir = "../dataframely"

@@ -1,41 +1,41 @@
-Dataframely
-============
+# Dataframely
 
-Dataframely is a Python package to validate the schema and content of `polars <https://pola.rs/>`_ data frames.
-Its purpose is to make data pipelines more robust by ensuring that data meet expectations and more readable by adding schema information to data frame type hints.
+Dataframely is a Python package to validate the schema and content of [polars](https://pola.rs/)\_ data frames.
+Its purpose is to make data pipelines more robust by ensuring that data meet expectations and more readable by adding
+schema information to data frame type hints.
 
-Features
---------
+## Features
 
 - Declaratively define schemas as classes with arbitrary inheritance structure
 - Specify column-specific validation rules (e.g. nullability, minimum string length, ...)
-- Specify cross-column and group validation rules with built-in support for checking the primary key property of a column set
+- Specify cross-column and group validation rules with built-in support for checking the primary key property of a
+  column set
 - Specify validation constraints across collections of interdependent data frames
 - Validate data frames softly by simply filtering out rows violating rules instead of failing hard
 - Introspect validation failure information for run-time failures
-- Enhanced type hints for validated data frames allowing users to clearly express expectations about inputs and outputs (i.e., contracts) in data pipelines
-- Integrate schemas with external tools (e.g., ``sqlalchemy`` or ``pyarrow``)
+- Enhanced type hints for validated data frames allowing users to clearly express expectations about inputs and
+  outputs (i.e., contracts) in data pipelines
+- Integrate schemas with external tools (e.g., `sqlalchemy` or `pyarrow`)
 - Generate test data that comply with a schema or collection of schemas and its validation rules
 
-Contents
-========
+## Contents
 
-.. toctree::
-    :caption: Contents
-    :maxdepth: 2
+```{toctree}
+:caption: Contents
+:maxdepth: 2
 
-    Installation <sites/installation.rst>
-    Quickstart <sites/quickstart.rst>
-    Real-world Example <sites/examples/real-world.ipynb>
-    Features <sites/features/index.rst>
-    FAQ <sites/faq.rst>
-    Development Guide <sites/development.rst>
-    Versioning <sites/versioning.rst>
+sites/installation
+sites/quickstart
+sites/examples/real-world
+sites/features/index.md
+sites/faq.md
+sites/development.md
+sites/versioning.md
+```
 
-API Documentation
-=================
+## API Documentation
 
-.. toctree::
+```{toctree}
     :caption: API Documentation
     :maxdepth: 1
 
@@ -45,3 +45,5 @@ API Documentation
     Random Data Generation <_api/dataframely.random>
     Failure Information <_api/dataframely.failure>
     Schema <_api/dataframely.schema>
+
+```
@@ -0,0 +1,46 @@
+# Development
+
+Thanks for deciding to work on `dataframely`!
+You can create a development environment with the following steps:
+
+## Environment Installation
+
+```bash
+git clone https://github.com/Quantco/dataframely
+cd dataframely
+pixi install
+```
+
+Next make sure to install the package locally and set up pre-commit hooks:
+
+```bash
+pixi run postinstall
+pixi run pre-commit-install
+```
+
+## Running the tests
+
+```bash
+pixi run test
+```
+
+You can adjust the `tests/` path to run tests in a specific directory or module.
+
+## Documentation
+
+We use [Sphinx](https://www.sphinx-doc.org/en/master/index.html) together
+with [MyST](https://myst-parser.readthedocs.io/), and write user documentation in markdown.
+If you are not yet familiar with this setup,
+the [MyST docs for Sphinx](https://myst-parser.readthedocs.io/en/v0.17.2/sphinx/intro.html) are a good starting point.
+
+When updating the documentation, you can compile a localized build of the
+documentation and then open it in your web browser using the commands below:
+
+```bash
+# Run build
+pixi run -e docs postinstall
+pixi run docs
+
+# Open documentation
+open docs/_build/html/index.html
+```
@@ -0,0 +1,30 @@
+# FAQ
+
+Whenever you find out something that you were surprised by or needed some non-trivial
+thinking, please add it here.
+
+## How do I define additional unique keys in a `dy.Schema`?
+
+By default, `dataframely` only supports defining a single non-nullable (composite) primary key in `dy.Schema`.
+However, in some scenarios it may be useful to define additional unique keys (which support nullable fields and/or which
+are additionally unique).
+
+Consider the following example, which demonstrates two rules: one for validating that a field is entirely unique, and
+another for validating that a field, when provided, is unique.
+
+```python
+class UserSchema(dy.Schema):
+    user_id = dy.UInt64(primary_key=True, nullable=False)
+    username = dy.String(nullable=False)
+    email = dy.String(nullable=True)  # Must be unique, or null.
+
+    @dy.rule(group_by=["username"])
+    def unique_username() -> pl.Expr:
+        """Username, a non-nullable field, must be total unique."""
+        return pl.len() == 1
+
+    @dy.rule()
+    def unique_email_or_null() -> pl.Expr:
+        """Email must be unique, if provided."""
+        return pl.col("email").is_null() | pl.col("email").is_unique()
+```
@@ -0,0 +1,5 @@
+# Features
+
+```{toctree}
+primary-keys
+```