# Sets and Lists

It is often necessary to group rows together in data pipelines. For example, you might want to represent a *set of neurons* as a *single row*, in order to create a row-oriented analysis pipeline for neuronal populations. `djutils` provides a simple way to do this with minimal boilerplate code with `schema.set` and `schema.list`.

---

To start, let's create example lookup tables `Neuron` and `Compartment`

In [1]:
from djutils import Schema

schema = Schema("djutils_tutorials_3")

@schema.lookup
class Neuron:
    definition = """
    neuron_id     : int          # neuron id
    """
    
    contents = [
        [0], [1], [2], [3], [4], [5], [6], [7]
    ]
    
    
@schema.lookup
class Compartment:
    definition = """
    compartment   : varchar(64)  # neuron compartment
    """
    
    contents = [
        ["soma"], ["axon"], ["dendrite"]
    ]

Connecting ewang@at-database.ad.bcm.edu:3306


In [2]:
Neuron()

neuron_id  neuron id
0
1
2
3
4
5
6
7


In [3]:
Compartment()

compartment  neuron compartment
axon
dendrite
soma


### `schema.set`

This is how to create a table for handling sets of rows from `Neuron * Compartment`

In [4]:
@schema.set
class NeuronCompartmentSet:
    keys = [Neuron, Compartment]
    name = "ncset"

The `schema.set` decorator creates an `NeuronCompartmentSet` lookup table with the primary key `{name}_id` (set id) and secondary keys `members` (number of members in the set) and `{name}_ts` (timestamp of set creation).

In [5]:
NeuronCompartmentSet.heading

# 
ncset_id             : char(32)                     # ncset
---
members              : int unsigned                 # number of members
ncset_ts=CURRENT_TIMESTAMP : timestamp                    # automatic timestamp

In addition, `NeuronCompartmentSet.Member` and `NeuronCompartmentSet.Note` part tables are created, which store the members of each set, and optional notes, respectively.

In [6]:
NeuronCompartmentSet.Member.heading

# 
ncset_id             : char(32)                     # ncset
neuron_id            : int                          # neuron id
compartment          : varchar(64)                  # neuron compartment
---
ncset_index          : int unsigned                 # set index

In [7]:
NeuronCompartmentSet.Note.heading

# 
ncset_id             : char(32)                     # ncset
note                 : varchar(1024)                # note for set
---
note_ts=CURRENT_TIMESTAMP : timestamp                    # automatic timestamp

---
Use the `fill` method to create sets.

Let's say that we want to create a set of axons for neurons 0 to 2. Here is how to do it

In [8]:
key = NeuronCompartmentSet.fill(
    'neuron_id >= 0 and neuron_id <= 2 and compartment="axon"'
)

Insert set with 3 keys? [yes, no]:  yes


{'ncset_id': '894a1d3327254d4b9d01ebcd05d75a00'} inserted.


The `fill` method will prompt the user, hash the rows to provide a unique key for the set, display that key and return it to the user.

In [9]:
key

{'ncset_id': '894a1d3327254d4b9d01ebcd05d75a00'}

We can see that our `NeuronCompartmentSet` now has one row, corresponding to the set that we just inserted

In [10]:
NeuronCompartmentSet()

ncset_id  ncset,members  number of members,ncset_ts  automatic timestamp
894a1d3327254d4b9d01ebcd05d75a00,3,2023-07-08 11:04:39


And we can use the key to examine the members of the set

In [11]:
NeuronCompartmentSet.Member & key

ncset_id  ncset,neuron_id  neuron id,compartment  neuron compartment,ncset_index  set index
894a1d3327254d4b9d01ebcd05d75a00,0,axon,0
894a1d3327254d4b9d01ebcd05d75a00,1,axon,1
894a1d3327254d4b9d01ebcd05d75a00,2,axon,2


This is a safer way to access the set members. It returns the same thing as above, but it will first check that the number of members in `NeuronCompartmentSet.Member` matches the `members` value in `NeuronCompartmentSet`

In [12]:
(NeuronCompartmentSet & key).members

ncset_id  ncset,neuron_id  neuron id,compartment  neuron compartment,ncset_index  set index
894a1d3327254d4b9d01ebcd05d75a00,0,axon,0
894a1d3327254d4b9d01ebcd05d75a00,1,axon,1
894a1d3327254d4b9d01ebcd05d75a00,2,axon,2


We can also view notes for the set

In [13]:
NeuronCompartmentSet.Note & key

ncset_id  ncset,note  note for set,note_ts  automatic timestamp
,,


.. Oops, we forgot to add a note. No worries. We can add notes to existing sets by providing a `note` argument to the `fill` function.

In [14]:
key = NeuronCompartmentSet.fill(
    'neuron_id >= 0 and neuron_id <= 2 and compartment="axon"', 
    note="axons of neurons 0 to 2, inclusive",
)

{'ncset_id': '894a1d3327254d4b9d01ebcd05d75a00'} already exists.
Note for {'ncset_id': '894a1d3327254d4b9d01ebcd05d75a00'} inserted.


Now we have a note for our set

In [15]:
NeuronCompartmentSet.Note & key

ncset_id  ncset,note  note for set,note_ts  automatic timestamp
894a1d3327254d4b9d01ebcd05d75a00,"axons of neurons 0 to 2, inclusive",2023-07-08 11:04:46


And the set members remain the same

In [16]:
(NeuronCompartmentSet & key).members

ncset_id  ncset,neuron_id  neuron id,compartment  neuron compartment,ncset_index  set index
894a1d3327254d4b9d01ebcd05d75a00,0,axon,0
894a1d3327254d4b9d01ebcd05d75a00,1,axon,1
894a1d3327254d4b9d01ebcd05d75a00,2,axon,2


---

For any set that exists in `NeuronCompartment`, to get the key that corresponds to the set restriction, use the `get` function.

In [17]:
key = NeuronCompartmentSet.get(
    'neuron_id >= 0 and neuron_id <= 2 and compartment="axon"', 
)
key

{'ncset_id': '894a1d3327254d4b9d01ebcd05d75a00'}

Note that a djutils `MissingError` will be thrown if that set does not exist in the table.

In [18]:
key = NeuronCompartmentSet.get(
    'neuron_id >= 1 and neuron_id <= 3 and compartment="dendrite"', 
)

MissingError: Set does not exist.

---

### `schema.list`

Sets are unordered, and the `{name}_index` is simply the sorted order of member primary keys. If we want to group together an ordered sequence of rows, we can the `list` table design.

In [19]:
@schema.list
class NeuronCompartmentList:
    keys = [Neuron, Compartment]
    name = "ncset"

The master and part tables created by `schema.list` are similar to those created by `schema.set`, except for one major difference:

the `{name}_index` of lists resides in the primary key and is user-defined, rather than automatically sorted in `schema.set`

In [20]:
NeuronCompartmentList.heading

# 
ncset_id             : char(32)                     # ncset
---
members              : int unsigned                 # number of members
ncset_ts=CURRENT_TIMESTAMP : timestamp                    # automatic timestamp

In [21]:
NeuronCompartmentList.Member.heading

# 
ncset_id             : char(32)                     # ncset
ncset_index          : int unsigned                 # list index
---
neuron_id            : int                          # neuron id
compartment          : varchar(64)                  # neuron compartment

In [22]:
NeuronCompartmentList.Note.heading

# 
ncset_id             : char(32)                     # ncset
note                 : varchar(1024)                # note for list
---
note_ts=CURRENT_TIMESTAMP : timestamp                    # automatic timestamp

To create a list of rows, we provide an list of restrictions to the `fill` method, where each restriction restricts the members to a **single row**

In [23]:
key = NeuronCompartmentList.fill(
    [
        'neuron_id=1 and compartment="axon"',
        'neuron_id=0 and compartment="axon"', 
        'neuron_id=2 and compartment="axon"',
    ],
    note="example list of neuron axons"
)

Insert list with 3 keys? [yes, no]:  yes


{'ncset_id': '073485bf41e3ceb9e0cda0a16906f177'} inserted.
Note for {'ncset_id': '073485bf41e3ceb9e0cda0a16906f177'} inserted.


In [24]:
NeuronCompartmentList()

ncset_id  ncset,members  number of members,ncset_ts  automatic timestamp
073485bf41e3ceb9e0cda0a16906f177,3,2023-07-08 11:05:01


In [25]:
(NeuronCompartmentList & key).members

ncset_id  ncset,ncset_index  list index,neuron_id  neuron id,compartment  neuron compartment
073485bf41e3ceb9e0cda0a16906f177,0,1,axon
073485bf41e3ceb9e0cda0a16906f177,1,0,axon
073485bf41e3ceb9e0cda0a16906f177,2,2,axon


In [26]:
NeuronCompartmentList.Note & key

ncset_id  ncset,note  note for list,note_ts  automatic timestamp
073485bf41e3ceb9e0cda0a16906f177,example list of neuron axons,2023-07-08 11:05:01


In [27]:
key = NeuronCompartmentList.get(
    [
        'neuron_id=1 and compartment="axon"',
        'neuron_id=0 and compartment="axon"', 
        'neuron_id=2 and compartment="axon"',
    ]
)
key

{'ncset_id': '073485bf41e3ceb9e0cda0a16906f177'}