Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions docs/_api/dataframely.columns.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,14 @@ dataframely.columns.array module
:show-inheritance:
:undoc-members:

dataframely.columns.binary module
---------------------------------

.. automodule:: dataframely.columns.binary
:members:
:show-inheritance:
:undoc-members:

dataframely.columns.bool module
-------------------------------

Expand All @@ -33,6 +41,14 @@ dataframely.columns.bool module
:show-inheritance:
:undoc-members:

dataframely.columns.categorical module
--------------------------------------

.. automodule:: dataframely.columns.categorical
:members:
:show-inheritance:
:undoc-members:

dataframely.columns.datetime module
-----------------------------------

Expand Down
8 changes: 8 additions & 0 deletions docs/_api/dataframely.testing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,14 @@ dataframely.testing.rules module
:show-inheritance:
:undoc-members:

dataframely.testing.storage module
----------------------------------

.. automodule:: dataframely.testing.storage
:members:
:show-inheritance:
:undoc-members:

dataframely.testing.typing module
---------------------------------

Expand Down
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,12 @@ Contents

.. toctree::
:caption: Contents
:maxdepth: 1
:maxdepth: 2

Installation <sites/installation.rst>
Quickstart <sites/quickstart.rst>
Real-world Example <sites/examples/real-world.ipynb>
Features <sites/features/index.rst>
FAQ <sites/faq.rst>
Development Guide <sites/development.rst>
Versioning <sites/versioning.rst>
Expand Down
7 changes: 7 additions & 0 deletions docs/sites/features/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Features
========

.. toctree::
:maxdepth: 1

primary-keys.rst
47 changes: 47 additions & 0 deletions docs/sites/features/primary-keys.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Primary keys
============

Defining primary keys in ``dy.Schema``
--------------------------------------

When working with tabular data, it is often useful to define `primary key <https://en.wikipedia.org/wiki/Primary_key>`_. A primary key is a set of one or multiple column, the combined values of which form a unique identifier for a record in a table.

Dataframely supports marking columns as part of the primary key when defining a ``dy.Schema`` by setting ``primary_key=True`` on the respective columns.

.. note::

Primary key columns must not be nullable.

Single primary keys
^^^^^^^^^^^^^^^^^^^

For example, when managing data about users, we might use an ``id`` column to uniquely identify users:

::

class UserSchema(dy.Schema):
name = dy.String(primary_key=True)
name = dy.String()

When we later validate data with this schema, ``dataframely`` checks that the values of the primary key are unique, i.e. there are no two users with the same value of ``id``. Having multiple users with the same ``name`` but different ``id`` but be allowed in this case.

Composite primary keys
^^^^^^^^^^^^^^^^^^^^^^

In another scenario, we might be tracking line items on invoices. We have many invoices, and each invoice may contain any number of line items. To uniquely identify a line item, we need to specify the invoice, as well as the line items position within the invoice. To encode this, we set ``primary_key=True`` on both the ``invoice_id`` and ``item_id`` columns:

::

class LineItemSchema(dy.Schema):
invoice_id = dy.Int64(primary_key=True)
item_id = dy.Int64(primary_key=True)
price = dy.Decimal()

Validation will now ensure that all pairs of (``invoice_id``, ``item_id``) are unique.


Primary keys in ``dy.Collection``
---------------------------------

The central idea behind ``dy.Collection`` is to unify multiple tables relating to the same set of underlying entities.
This is useful because it allows us to write `dy.filter`s that use information from multiple tables to identify whether the underlying entity is valid or not. If any `dy.filter`s are defined, ``dataframely`` requires the tables in a ``dy.Collection`` to have an overlapping primary key, i.e. there must be at least one column that is a primary key in all tables.
Loading