### Arches QuerySets tutorial

This tutorial shows the approach `arches-querysets` uses to adapt the Arches data model to the Django ORM and the rest of the Django ecosystem.

#### Test data
We'll use a sample resource model called "Datatype Lookups". Set up the test data. You should replace the settings module below with the one specifiying the database you wish to use for this tutorial. Run `manage.py setup_db` first if necessary. (This is not included in the notebook out of caution.)

>[!TIP]
> If you're having trouble, be sure your IDE has the python interpreter for your virtual environment selected.

In [1]:
import os

import django
from django.core.management import call_command

# Simulate manage.py
os.environ["DJANGO_SETTINGS_MODULE"] = "arches_querysets.settings"
os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"
django.setup()

call_command("delete_test_data")
call_command("add_test_data")

Deleting test data...
Finished!
Creating test data...
Finished!


#### Design

The system allows you to request resources and tiles with node values aliased by their node aliases, and with nested tiles nested under the alias for that child nodegroup's grouping node. This reflection of the `TileModel.data` attribute becomes a new attribute `aliased_data`.

This is done by subclassing the `ResourceInstance` and `TileModel` models from core Arches. Because they provide methods to retrieve and query nested tiles, the names of the proxy models are:
- `ResourceInstance` -> `ResourceTileTree`
- `TileModel` -> `TileTree`

> [!NOTE]
> These models do not subclass the proxy models `Resource` and `Tile` from core Arches, because, like those models, these models have their own distinct approach for fetching data.

In TypeScript notation, the nesting of data looks as follows:

```ts
interface ResourceTileTree {
    aliased_data: AliasedData;
    resourceinstanceid: string | null;
    name: string;
    // further resourceinstance fields...
}

interface TileTree {
    aliased_data: AliasedData;
    nodegroup: string;
    parenttile: string | null;
    provisionaledits: object | null;
    resourceinstance: string;
    sortorder: number;
    tileid: string | null;
}

interface AliasedData {
    [node_alias: string]: AliasedNodeData | AliasedNodegroupData;
}

interface AliasedNodeData {
    display_value: string;
    node_value: unknown;
    details: unknown[];
}

type AliasedNodegroupData = TileTree | TileTree[] | null;
```

Of note:
- The key-value pairs on any `AliasedData` representation will be a mixture of single nodes in the current nodegroup and nodegroup aliases for child nodegroups that unpack to nested tiles.
- Cardinality-1 nodegroups will be represented as `null` or a `TileTree`.
- Cardinality-n nodegroups will be represented as a possibly empty array of `TileTree`s.
- Node values are always grouped under the alias of the grouping node for the nodegroup ("nodegroup alias"). That is, no data-collecting nodes appear will appear *directly* under the tile.
- Node values are represented as an object of three key-value pairs. `display_value` and `details` are read-only.

#### Entrypoint: get_tiles()

The `ResourceTileTree` and `TileTree` models have [custom model managers](https://docs.djangoproject.com/en/5.2/topics/db/managers/#custom-managers) that encapsulate the custom querying behavior.

They both provide a `get_tiles()` entrypoint. This method allows you to use node aliases as ORM lookups and tells the ORM how to shape the data once it is fetched.

Query for the two test resources. One test resource has node values referencing the number 42 for most nodes; the other has a value of `None` in every node:

In [2]:
from arches_querysets.models import *

ResourceTileTree.get_tiles(graph_slug="datatype_lookups")

<ResourceTileTreeQuerySet [<ResourceTileTree: Datatype Lookups: Resource referencing 42 (4d9cd2e0-1a08-4e8d-9606-58482c1f520a)>, <ResourceTileTree: Datatype Lookups: Resource referencing None (41a0cbc5-7d51-4260-bed6-7ab59417e535)>]>

Filter on node alias values to exclude the resource with only None for node values. The node aliases for the test model follow this pattern:
- number_alias_1
- number_alias_n
- number_alias_1_child
- number_alias_n_child

In [3]:
from pprint import pprint

resource_with_data = ResourceTileTree.get_tiles(graph_slug="datatype_lookups").exclude(number_alias=None).get()
pprint(resource_with_data.aliased_data)
# This resource has two top nodegroups: "datatypes_1" and "datatypes_n"

AliasedData(datatypes_1=<TileTree: datatypes_1 (025d4b11-4c50-4925-8ab1-8191b6551b13)>,
            datatypes_n=[<TileTree: datatypes_n (8ead52f8-5abd-4f4b-b4a4-b2237d170bdc)>])


Inspect each tile's `aliased_data`:

In [4]:
pprint(resource_with_data.aliased_data.datatypes_1.aliased_data)
pprint(resource_with_data.aliased_data.datatypes_n[0].aliased_data)

AliasedData(concept_alias=<Value: Value object (d8c60bf4-e786-11e6-905a-b756ec83dad5)>,
            geojson_feature_collection_alias=None,
            concept_list_alias=[<Value: Value object (d8c60bf4-e786-11e6-905a-b756ec83dad5)>],
            number_alias=42,
            resource_instance_alias=<ResourceInstance: Datatype Lookups: Resource referencing 42 (4d9cd2e0-1a08-4e8d-9606-58482c1f520a)>,
            resource_instance_list_alias=[<ResourceInstance: Datatype Lookups: Resource referencing 42 (4d9cd2e0-1a08-4e8d-9606-58482c1f520a)>],
            file_list_alias=[{'accepted': True,
                              'altText': {'en': {'direction': 'ltr',
                                                 'value': 'Illustration of '
                                                          'recent '
                                                          'accessibility '
                                                          'improvements'}},
                              'attributio

Notice some node values are richer than the pure tile representation. For instance, this concept node value is a `Concept` instance rather than a `UUID`:

In [5]:
print(resource_with_data.aliased_data.datatypes_1.aliased_data.concept_alias)

Value object (d8c60bf4-e786-11e6-905a-b756ec83dad5)


This happens because the `ConceptDataType` registered by arches-querysets in its `datatypes` directory has a `to_python()` method that is called when materializing the QuerySet. As a project implementer, you can register your own datatypes to provide similar functionality, e.g. to provide a `LocalizedString` class for string values.

If you don't need these richer Python representations--perhaps because you simply care about the display value--pass `as_representation=False`, as the Django REST Framework (DRF) integration always does, to get the "three-key" object mentioned above:

> [!NOTE]
> Depending on the datatype, getting the representation instead of the python object may be more or less performant.

In [6]:
resource_as_repr = ResourceTileTree.get_tiles(
    graph_slug="datatype_lookups", as_representation=True
).exclude(number_alias=None).get()

print(resource_as_repr.aliased_data.datatypes_1.aliased_data.concept_alias)

{'node_value': 'd8c60bf4-e786-11e6-905a-b756ec83dad5', 'display_value': 'Arches', 'details': [{'concept_id': '00000000-0000-0000-0000-000000000001', 'language_id': 'en', 'value': 'Arches', 'valueid': 'd8c60bf4-e786-11e6-905a-b756ec83dad5', 'valuetype_id': 'prefLabel'}]}


#### Filtering

`.get_tiles()` fetches node aliases for the requested graph, prepares ORM expressions for each one, and passes the expressions to [`QuerySet.alias()`](https://docs.djangoproject.com/en/5.2/ref/models/querysets/#alias) so that the node alias can be used like any standard Django field name.

>[!NOTE]
> `.alias()` is like `.annotate()`, but lazier: if you don't actually *use* the alias in a filter, order by, or aggregate, Django just drops the alias entirely instead of including it in the SQL.

>[!NOTE]
> Fetching node aliases for a graph is the only database query done inside `.get_tiles()`. If your code calls `.get_tiles()` multiple times, for better performance you can provide a QuerySet of nodes yourself via the `nodes` argument.

Filter on the `string_alias` node, using [key and path transforms](https://docs.djangoproject.com/en/5.2/topics/db/queries/#key-index-and-path-transforms) to interrogate JSON values:

In [7]:
test_resources = ResourceTileTree.get_tiles("datatype_lookups")
resource = test_resources.filter(string_alias__en__value__startswith="forty").first()
resource.aliased_data.datatypes_1.aliased_data.string_alias

{'en': {'value': 'forty-two', 'direction': 'ltr'}}

But what if you want a `startswith` search on a value in *any* language? arches-querysets registers additional datatype-specific lookups, such as `any_lang_startswith` for localized strings, that can be used with the Django `__` syntax for joining lookups:

In [8]:
resource = test_resources.filter(string_alias__any_lang_startswith="forty").first()
resource.aliased_data.datatypes_1.aliased_data.string_alias

{'en': {'value': 'forty-two', 'direction': 'ltr'}}

The following custom lookups are available as of 96b8e802d367b53c786e2bc02fdd121835b5e008:


String datatype (cardinality 1 & cardinality N):
`any_lang_contains`
`any_lang_icontains`
`any_lang_startswith`
`any_lang_istartswith`

Cardinality-N non-localized-string:
`any_contains`
`any_icontains`

Resource Instance datatype:
`id`

Cardinality-N resource instance:
`ids_contain`

Resource Instance list datatype:
`contains`

Cardinality-N resource instance list:
`ids_contain`

>[!NOTE]
> Custom lookups are defined in `lookups.py` and tested in `test_lookups.py`.

For resources, *shallow* queries on nodes for nested tiles are possible without needing to specify the parents:

In [9]:
resource = test_resources.filter(non_localized_string_alias_child__isnull=False).first()
child_tile = resource.aliased_data.datatypes_1.aliased_data.datatypes_1_child
print(child_tile.aliased_data.non_localized_string_alias_child)

child-1-value


From there, you can backtrack to the parent:

In [10]:
child_tile.parent

<TileTree: datatypes_1 (025d4b11-4c50-4925-8ab1-8191b6551b13)>

#### Updating
You can save back any value accepted by the datatype's `transform_value_for_tile()`:

This example uses `.save(force_admin=True)` because without, `save()` falls back to the anonymous user, which lacks Resource Editor permissions. (The edits might seem to "go missing", but they actually end up in provisional edits.) In a request/response cycle, provide `user=request.user`. (The DRF API handles all this for you.)

In [11]:
resource.aliased_data.datatypes_1.aliased_data.string_alias = 'forty-three'
resource.save(force_admin=True)
resource.aliased_data.datatypes_1.aliased_data.string_alias

{'en': {'value': 'forty-three', 'direction': 'ltr'}}

In [12]:
resource.aliased_data.datatypes_1.aliased_data.string_alias = {'en': {'value': 'forty-four', 'direction': 'ltr'}}
resource.save(force_admin=True)
resource.aliased_data.datatypes_1.aliased_data.string_alias

{'en': {'value': 'forty-four', 'direction': 'ltr'}}

In [13]:
resource.aliased_data.datatypes_1.aliased_data.string_alias = {
    "display_value": "",  # ignored
    "node_value": {'en': {'value': 'forty-five', 'direction': 'ltr'}},
    "details": [],
}
resource.save(force_admin=True)
resource.aliased_data.datatypes_1.aliased_data.string_alias

{'en': {'value': 'forty-five', 'direction': 'ltr'}}

Those three calls were equivalent. This one fails, though:

In [16]:
resource.aliased_data.datatypes_1.aliased_data.string_alias = object()
# uncomment to see failure
# resource.save(force_admin=True)

This is a drop-in replacement for `Resource` and `Tile` `save()` methods, so it performs side effects like indexing and writing to the edit log.

In [17]:
from arches.app.models.models import EditLog
EditLog.objects.filter(resourceinstanceid=resource.pk).count()

6

#### Tailoring the results

Just like ordinary Django usage, you can avoid the overhead of instantiating model instances (and fetching graph objects to aid with display value calculations) by requesting just `.values()`:

In [42]:
results = TileTree.objects.get_tiles(
    "datatype_lookups", nodegroup_alias="datatypes_n"
).values("number_alias_n", "concept_list_alias_n")
for result in results:
    print(result)

{'number_alias_n': 42.0, 'concept_list_alias_n': ['d8c60bf4-e786-11e6-905a-b756ec83dad5']}
{'number_alias_n': None, 'concept_list_alias_n': None}


Aggregate functions from Django are available:

In [43]:
from django.db.models import Max
TileTree.objects.get_tiles("datatype_lookups", "datatypes_1").aggregate(my_max=Max("number_alias"))

{'my_max': 42.0}

You can limit the depth of the child tiles that are fetched. The default is 20.

In [44]:
resource = ResourceTileTree.objects.get_tiles("datatype_lookups", depth=0).filter(number_alias=42).get()
print(resource.aliased_data.datatypes_1.aliased_data.datatypes_1_child)

resource = ResourceTileTree.objects.get_tiles("datatype_lookups").filter(number_alias=42).get()
resource.aliased_data.datatypes_1.aliased_data.datatypes_1_child


None


<TileTree: datatypes_1_child (fa193373-e971-4f1e-8adc-d92e6ea8fe85)>

# API Integration

