Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10875 Includes ontology property in search strings #10876

Open
wants to merge 26 commits into
base: dev/7.6.x
Choose a base branch
from

Conversation

whatisgalen
Copy link
Member

@whatisgalen whatisgalen commented May 5, 2024

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Description of Change

  • (quality of life improvement) consolidates redundant logic into base datatype method get_nodevalue_as_list(), implemented in resource-instance and concept datatypes
  • includes ontologyProperty and inverseOntologyProperty (and the tile nodegroupid, whether provisional) in the document "strings" so that if a search term matches the text value of a relationship, it will show up in the suggested terms and be usable as a term query

Why does this matter?

Resource-instance datatype tiledata isn't searchable in the term search except via the resource displayname of a related instance. The ontology property (relationship) is arguably just as important if not more than the displayname of related instances. Given that there is zero exposure of any resource's relationship to search, adding it in to term search as a searchable string seemed a logical start.

Issues Solved

#10875

Checklist

  • Unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Ticket Background

Further comments

Note that #10898 should be merged first as this PR has cherry-picked commits from that PR

@whatisgalen whatisgalen marked this pull request as draft May 5, 2024 07:09
@whatisgalen whatisgalen changed the title 10875 ri ontology indexed 10875 Includes ontology property in search strings on Resource May 5, 2024
@whatisgalen whatisgalen marked this pull request as ready for review May 6, 2024 23:44
Copy link
Member

@chiatt chiatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since concept relationships were introduced to describe relationships, most Arches instances will likely rely on concept labels rather than ontology properties because concepts can be more descriptive and are human readable. This is true even for Arches for Science which has semantic models and once had ontology relationships, but has since migrated to using concept relationships.

I mention this because this PR only supports ontology properties, but probably should support concept relationships as well.

arches/app/datatypes/base.py Outdated Show resolved Hide resolved
@whatisgalen
Copy link
Member Author

whatisgalen commented May 8, 2024

Since concept relationships were introduced to describe relationships, most Arches instances will likely rely on concept labels rather than ontology properties because concepts can be more descriptive and are human readable. This is true even for Arches for Science which has semantic models and once had ontology relationships, but has since migrated to using concept relationships.

I mention this because this PR only supports ontology properties, but probably should support concept relationships as well.

I had forgotten that defining relationships out of collections was a thing. I tried to test it out but it didn't work (#10894). I think we should absolutely include the values in the document strings, though.

@whatisgalen
Copy link
Member Author

whatisgalen commented May 9, 2024

Since concept relationships were introduced to describe relationships, most Arches instances will likely rely on concept labels rather than ontology properties because concepts can be more descriptive and are human readable. This is true even for Arches for Science which has semantic models and once had ontology relationships, but has since migrated to using concept relationships.

I mention this because this PR only supports ontology properties, but probably should support concept relationships as well.

One approach to accomplish the lookup needed in order to index concept value labels as strings (vs their ids) is to leverage ConceptDataType.get_pref_label method. To access that from the ResourceInstanceDataType class, PR #10898 would need to be merged.

@whatisgalen
Copy link
Member Author

I realized that the concept datatype actually just calls a method from the concept model here. While we could have the ResourceInstanceDataType do the same thing, I think it's better to go through the concept datatype as an intermediary anyway.

whatisgalen added a commit that referenced this pull request May 9, 2024
@philtweir
Copy link
Contributor

This looks good to me - this might be a bit of a daft question, but would you have an example of a motivating use-case -- I wasn't sure if I was missing something, since the ontology property is defined on the model, but would this be, for example, if I wanted to find all monuments with a child monument present?

Comment on lines 2413 to 2414
concept_datatype_instance = self.datatype_factory.get_instance('concept')
concept_preflabel = concept_datatype_instance.get_pref_label(relationship_valueid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always return the preflabel in en-US because that's the default. You'll need to get the apps active language using something like this:

from django.utils.translation import get_language
lang = get_language()

All the datatype is doing in get_pref_label is calling a concept method. You could just call that same method and then you don't need the datatype:

from arches.app.models.concept import get_preflabel_from_valueid
get_preflabel_from_valueid(valid, lang)['value']

Copy link
Member Author

@whatisgalen whatisgalen May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the language, fixed in commit 64690bf

All the datatype is doing in get_pref_label is calling a concept method. You could just call that same method and then you don't need the datatype

Yea I was reflecting on that for a little while. To be honest it still feels ambiguous to me whether the concept model's method should be called, or whether the concept datatype's method (which I recognize calls the former) should be called. Something about calling the model directly from a different datatype feels like an end-run around the concept-datatype, like it ought to be the intermediary/authority to get a preflabel from a valueid, but it's hard to justify that case given that it simply calls the model method.

@chiatt I guess with all else being equal, do you know if the v8 migration from concepts to reference datatype would favor calling the model vs datatype?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better just to use the concept method than the concept datatype. If the resource instance datatype needed a really complex method from the concept datatype, then it might be different because it could require repeating a lot of code. However, in this case the Concept model has exactly what you need without writing anything new. Because of that, using the datatype isn't any DRYer than just using the Concept method. Instead, it adds an extra layer of code between the resource-instance datatype and the Concept method.

When we move over to using controlled lists in v8, things should be pretty similar. The ControlledList model could just have a method that returns the prefLabel. The datatype doesn't actually need a 'get_pref_label' method because the tile actually stores the label in its data. So when we start using controlled list values to describe relationships then could also use the ControlledList model like you do the Concept model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chiatt changes pushed to latest

@whatisgalen
Copy link
Member Author

whatisgalen commented May 23, 2024

This looks good to me - this might be a bit of a daft question, but would you have an example of a motivating use-case -- I wasn't sure if I was missing something, since the ontology property is defined on the model, but would this be, for example, if I wanted to find all monuments with a child monument present?

@philtweir So if the relationship type (present in resource-instance node tile data) matters to your search, this PR lets you search for instances of, say "Historic Resource" resource model, and lets you include "is district contributor" as a search term. If your resource-instance tile data had such a relationship, you're now filtering based on relationships straight from term search as opposed to in the advanced search. Combined with my other PR #10871 this would actually filter the results not just to a string match of "is district contributor", but a nodegroup-based field match (node values for "Related Entities" node must contain a relationship: "is district contributor").

If of course your graphs are meticulously semantically modeled, you presumably already know what relationship types are populating which resource-instance nodes. However if your graphs are "flexibly" semantically modeled, i.e. the relationship type isn't pre-populated for resource-instance node values, and you might have a range of possible values for related-resource-instance.relationshiptype (either from a collection or from a list of ontology relationships), incorporating that relationship type onto the ElasticSearch document for the resourceinstance makes a previously un-searchable property of resource-instance node tiledata now searchable.

@whatisgalen whatisgalen changed the title 10875 Includes ontology property in search strings on Resource 10875 Includes ontology property in search strings Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make ResourceInstance datatype ontology metadata searchable in ES
4 participants