10875 Includes ontology property in search strings #10876

whatisgalen · 2024-05-05T07:09:51Z

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Description of Change

(quality of life improvement) consolidates redundant logic into base datatype method get_nodevalue_as_list(), implemented in resource-instance and concept datatypes
includes ontologyProperty and inverseOntologyProperty (and the tile nodegroupid, whether provisional) in the document "strings" so that if a search term matches the text value of a relationship, it will show up in the suggested terms and be usable as a term query

Why does this matter?

Resource-instance datatype tiledata isn't searchable in the term search except via the resource displayname of a related instance. The ontology property (relationship) is arguably just as important if not more than the displayname of related instances. Given that there is zero exposure of any resource's relationship to search, adding it in to term search as a searchable string seemed a logical start.

Issues Solved

#10875

Checklist

Unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Ticket Background

Sponsored by: @scholiumtech
Found by: @whatisgalen
Tested by: @
Designed by: @whatisgalen

Further comments

Note that #10898 should be merged first as this PR has cherry-picked commits from that PR

…urce_instance_list, re #10875

…erties, re #10875

#10875

…s, re #10875

chiatt

Since concept relationships were introduced to describe relationships, most Arches instances will likely rely on concept labels rather than ontology properties because concepts can be more descriptive and are human readable. This is true even for Arches for Science which has semantic models and once had ontology relationships, but has since migrated to using concept relationships.

I mention this because this PR only supports ontology properties, but probably should support concept relationships as well.

arches/app/datatypes/base.py

whatisgalen · 2024-05-08T20:21:34Z

Since concept relationships were introduced to describe relationships, most Arches instances will likely rely on concept labels rather than ontology properties because concepts can be more descriptive and are human readable. This is true even for Arches for Science which has semantic models and once had ontology relationships, but has since migrated to using concept relationships.

I mention this because this PR only supports ontology properties, but probably should support concept relationships as well.

I had forgotten that defining relationships out of collections was a thing. I tried to test it out but it didn't work (#10894). I think we should absolutely include the values in the document strings, though.

…DataType.__init__, re #10897

…e class, re #10897

whatisgalen · 2024-05-09T00:41:20Z

Since concept relationships were introduced to describe relationships, most Arches instances will likely rely on concept labels rather than ontology properties because concepts can be more descriptive and are human readable. This is true even for Arches for Science which has semantic models and once had ontology relationships, but has since migrated to using concept relationships.

I mention this because this PR only supports ontology properties, but probably should support concept relationships as well.

One approach to accomplish the lookup needed in order to index concept value labels as strings (vs their ids) is to leverage ConceptDataType.get_pref_label method. To access that from the ResourceInstanceDataType class, PR #10898 would need to be merged.

…ride, re #10897

…s, re #10897

…ntology_indexed merging #10898 into feature branch to enable concept string indexing

…e to lookup preflabel, re #10875

… re #10875

whatisgalen · 2024-05-09T07:20:09Z

I realized that the concept datatype actually just calls a method from the concept model here. While we could have the ResourceInstanceDataType do the same thing, I think it's better to go through the concept datatype as an intermediary anyway.

merges latest from dev/7.6.x

…ntology_indexed merging latest from dependency branch

philtweir · 2024-05-23T09:35:17Z

This looks good to me - this might be a bit of a daft question, but would you have an example of a motivating use-case -- I wasn't sure if I was missing something, since the ontology property is defined on the model, but would this be, for example, if I wanted to find all monuments with a child monument present?

chiatt · 2024-05-23T19:08:12Z

arches/app/datatypes/datatypes.py

+        concept_datatype_instance = self.datatype_factory.get_instance('concept')
+        concept_preflabel = concept_datatype_instance.get_pref_label(relationship_valueid)


This will always return the preflabel in en-US because that's the default. You'll need to get the apps active language using something like this:

from django.utils.translation import get_language lang = get_language()

All the datatype is doing in get_pref_label is calling a concept method. You could just call that same method and then you don't need the datatype:

from arches.app.models.concept import get_preflabel_from_valueid get_preflabel_from_valueid(valid, lang)['value']

Good catch on the language, fixed in commit 64690bf

All the datatype is doing in get_pref_label is calling a concept method. You could just call that same method and then you don't need the datatype

Yea I was reflecting on that for a little while. To be honest it still feels ambiguous to me whether the concept model's method should be called, or whether the concept datatype's method (which I recognize calls the former) should be called. Something about calling the model directly from a different datatype feels like an end-run around the concept-datatype, like it ought to be the intermediary/authority to get a preflabel from a valueid, but it's hard to justify that case given that it simply calls the model method.

@chiatt I guess with all else being equal, do you know if the v8 migration from concepts to reference datatype would favor calling the model vs datatype?

I think it's better just to use the concept method than the concept datatype. If the resource instance datatype needed a really complex method from the concept datatype, then it might be different because it could require repeating a lot of code. However, in this case the Concept model has exactly what you need without writing anything new. Because of that, using the datatype isn't any DRYer than just using the Concept method. Instead, it adds an extra layer of code between the resource-instance datatype and the Concept method.

When we move over to using controlled lists in v8, things should be pretty similar. The ControlledList model could just have a method that returns the prefLabel. The datatype doesn't actually need a 'get_pref_label' method because the tile actually stores the label in its data. So when we start using controlled list values to describe relationships then could also use the ControlledList model like you do the Concept model.

@chiatt changes pushed to latest

whatisgalen · 2024-05-23T19:46:01Z

This looks good to me - this might be a bit of a daft question, but would you have an example of a motivating use-case -- I wasn't sure if I was missing something, since the ontology property is defined on the model, but would this be, for example, if I wanted to find all monuments with a child monument present?

@philtweir So if the relationship type (present in resource-instance node tile data) matters to your search, this PR lets you search for instances of, say "Historic Resource" resource model, and lets you include "is district contributor" as a search term. If your resource-instance tile data had such a relationship, you're now filtering based on relationships straight from term search as opposed to in the advanced search. Combined with my other PR #10871 this would actually filter the results not just to a string match of "is district contributor", but a nodegroup-based field match (node values for "Related Entities" node must contain a relationship: "is district contributor").

If of course your graphs are meticulously semantically modeled, you presumably already know what relationship types are populating which resource-instance nodes. However if your graphs are "flexibly" semantically modeled, i.e. the relationship type isn't pre-populated for resource-instance node values, and you might have a range of possible values for related-resource-instance.relationshiptype (either from a collection or from a list of ontology relationships), incorporating that relationship type onto the ElasticSearch document for the resourceinstance makes a previously un-searchable property of resource-instance node tiledata now searchable.

…e preflabel mthd, re #10875

whatisgalen added 2 commits May 5, 2024 16:07

renames get_id_list method in r-i dt to more verbose get_related_reso…

60de0c5

…urce_instance_list, re #10875

implements get_search_terms for r-i datatype to include ontology prop…

ec80a1e

…erties, re #10875

whatisgalen marked this pull request as draft May 5, 2024 07:09

whatisgalen added 7 commits May 5, 2024 16:18

creates base datatype method get_nodevalue_as_list, re #10875

25bb789

concept dt calls base get_nodevalue_as_list method in append_to_doc, re

ff3c68d

#10875

r-i dt calls base method get_nodevalue_as_list, re #10875

6e88e6e

cover edge case of nodevalue=None, re #10875

b5d06ae

commits ontology meta from r-i dt to document strings, re #10875

3d2f6e1

rm get_search_terms implement from r-i dt, not necessary, re #10875

2c763e5

nit, re #10875

9a56cde

whatisgalen changed the title ~~10875 ri ontology indexed~~ 10875 Includes ontology property in search strings on Resource May 5, 2024

whatisgalen marked this pull request as ready for review May 6, 2024 23:44

rm and not in doc strings; impossible str lookup against list of dict…

fdff09a

…s, re #10875

whatisgalen requested a review from apeters May 7, 2024 18:29

whatisgalen assigned apeters May 7, 2024

chiatt requested changes May 8, 2024

View reviewed changes

arches/app/datatypes/base.py Outdated Show resolved Hide resolved

whatisgalen added 3 commits May 8, 2024 17:14

renames get_nodevalues_as_list -> get_nodevalues, re #10875

433adc9

includes the instantiating datatype factory as class property on Base…

f6a4d40

…DataType.__init__, re #10897

sends DataTypeFactory instance as kwarg into instantiation of datatyp…

a9a2f93

…e class, re #10897

includes kwarg datatype_instance on BaseConceptDataType __init__ over…

e07830d

…ride, re #10897

whatisgalen mentioned this pull request May 9, 2024

Individual datatype instances from DataTypeFactory cannot access each other #10897

Closed

whatisgalen added 5 commits May 8, 2024 18:00

correct typo of kwarg in baseConceptDatatype, re #10897

d79f1b9

includes datatypefactory kwarg in datatype-specific __init__ override…

a1a3ad4

…s, re #10897

Merge branch '10897_datatype_factory_instance_access' into 10875_ri_o…

6c19c1e

…ntology_indexed merging #10898 into feature branch to enable concept string indexing

creates method in ResourceInstanceDataType to leverage ConceptDataTyp…

c90a1ed

…e to lookup preflabel, re #10875

looks up concept preflabel for when concept used as r-i relationship,…

d57933a

… re #10875

whatisgalen added a commit that referenced this pull request May 9, 2024

in-commit merge of #10876

23bc300

whatisgalen linked an issue May 11, 2024 that may be closed by this pull request

Make ResourceInstance datatype ontology metadata searchable in ES #10875

Open

whatisgalen added 3 commits May 22, 2024 19:52

Merge branch 'dev/7.6.x' into 10897_datatype_factory_instance_access

c9d6df5

merges latest from dev/7.6.x

commits datatype_factory ref to geojson dt, re #10897

dc32188

Merge branch '10897_datatype_factory_instance_access' into 10875_ri_o…

1d65310

…ntology_indexed merging latest from dependency branch

chiatt requested changes May 23, 2024

View reviewed changes

includes active language in arches for preflabel, re #10875

64690bf

chiatt mentioned this pull request May 23, 2024

10897 datatype factory cross-instance access #10898

Closed

6 tasks

whatisgalen added 2 commits May 23, 2024 14:23

rm changes from #10898, calls concept model method instead of datatyp…

784583a

…e preflabel mthd, re #10875

handles missing preflabel result in ri-get-preflabel, re #10875

e26d058

whatisgalen changed the title ~~10875 Includes ontology property in search strings on Resource~~ 10875 Includes ontology property in search strings Jul 8, 2024

merge latest from dev/7.6.x into branch, re #10875

c3ef478

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10875 Includes ontology property in search strings #10876

10875 Includes ontology property in search strings #10876

whatisgalen commented May 5, 2024 •

edited

Loading

chiatt left a comment

whatisgalen commented May 8, 2024 •

edited

Loading

whatisgalen commented May 9, 2024 •

edited

Loading

whatisgalen commented May 9, 2024

philtweir commented May 23, 2024

chiatt May 23, 2024

whatisgalen May 23, 2024 •

edited

Loading

chiatt May 23, 2024

whatisgalen May 23, 2024

whatisgalen commented May 23, 2024 •

edited

Loading

		concept_datatype_instance = self.datatype_factory.get_instance('concept')
		concept_preflabel = concept_datatype_instance.get_pref_label(relationship_valueid)

10875 Includes ontology property in search strings #10876

Are you sure you want to change the base?

10875 Includes ontology property in search strings #10876

Conversation

whatisgalen commented May 5, 2024 • edited Loading

Types of changes

Description of Change

Why does this matter?

Issues Solved

Checklist

Ticket Background

Further comments

chiatt left a comment

Choose a reason for hiding this comment

whatisgalen commented May 8, 2024 • edited Loading

whatisgalen commented May 9, 2024 • edited Loading

whatisgalen commented May 9, 2024

philtweir commented May 23, 2024

chiatt May 23, 2024

Choose a reason for hiding this comment

whatisgalen May 23, 2024 • edited Loading

Choose a reason for hiding this comment

chiatt May 23, 2024

Choose a reason for hiding this comment

whatisgalen May 23, 2024

Choose a reason for hiding this comment

whatisgalen commented May 23, 2024 • edited Loading

whatisgalen commented May 5, 2024 •

edited

Loading

whatisgalen commented May 8, 2024 •

edited

Loading

whatisgalen commented May 9, 2024 •

edited

Loading

whatisgalen May 23, 2024 •

edited

Loading

whatisgalen commented May 23, 2024 •

edited

Loading