-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10875 Includes ontology property in search strings #10876
base: dev/7.6.x
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since concept relationships were introduced to describe relationships, most Arches instances will likely rely on concept labels rather than ontology properties because concepts can be more descriptive and are human readable. This is true even for Arches for Science which has semantic models and once had ontology relationships, but has since migrated to using concept relationships.
I mention this because this PR only supports ontology properties, but probably should support concept relationships as well.
I had forgotten that defining relationships out of collections was a thing. I tried to test it out but it didn't work (#10894). I think we should absolutely include the values in the document strings, though. |
One approach to accomplish the lookup needed in order to index concept value labels as strings (vs their ids) is to leverage |
I realized that the concept datatype actually just calls a method from the concept model here. While we could have the |
merges latest from dev/7.6.x
…ntology_indexed merging latest from dependency branch
This looks good to me - this might be a bit of a daft question, but would you have an example of a motivating use-case -- I wasn't sure if I was missing something, since the ontology property is defined on the model, but would this be, for example, if I wanted to find all monuments with a child monument present? |
arches/app/datatypes/datatypes.py
Outdated
concept_datatype_instance = self.datatype_factory.get_instance('concept') | ||
concept_preflabel = concept_datatype_instance.get_pref_label(relationship_valueid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always return the preflabel in en-US
because that's the default. You'll need to get the apps active language using something like this:
from django.utils.translation import get_language
lang = get_language()
All the datatype is doing in get_pref_label
is calling a concept method. You could just call that same method and then you don't need the datatype:
from arches.app.models.concept import get_preflabel_from_valueid
get_preflabel_from_valueid(valid, lang)['value']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on the language, fixed in commit 64690bf
All the datatype is doing in get_pref_label is calling a concept method. You could just call that same method and then you don't need the datatype
Yea I was reflecting on that for a little while. To be honest it still feels ambiguous to me whether the concept model's method should be called, or whether the concept datatype's method (which I recognize calls the former) should be called. Something about calling the model directly from a different datatype feels like an end-run around the concept-datatype, like it ought to be the intermediary/authority to get a preflabel from a valueid, but it's hard to justify that case given that it simply calls the model method.
@chiatt I guess with all else being equal, do you know if the v8 migration from concepts to reference datatype would favor calling the model vs datatype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better just to use the concept method than the concept datatype. If the resource instance datatype needed a really complex method from the concept datatype, then it might be different because it could require repeating a lot of code. However, in this case the Concept model has exactly what you need without writing anything new. Because of that, using the datatype isn't any DRYer than just using the Concept method. Instead, it adds an extra layer of code between the resource-instance datatype and the Concept method.
When we move over to using controlled lists in v8, things should be pretty similar. The ControlledList model could just have a method that returns the prefLabel. The datatype doesn't actually need a 'get_pref_label' method because the tile actually stores the label in its data. So when we start using controlled list values to describe relationships then could also use the ControlledList model like you do the Concept model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chiatt changes pushed to latest
@philtweir So if the relationship type (present in resource-instance node tile data) matters to your search, this PR lets you search for instances of, say "Historic Resource" resource model, and lets you include "is district contributor" as a search term. If your resource-instance tile data had such a relationship, you're now filtering based on relationships straight from term search as opposed to in the advanced search. Combined with my other PR #10871 this would actually filter the results not just to a string match of "is district contributor", but a nodegroup-based field match (node values for "Related Entities" node must contain a relationship: "is district contributor"). If of course your graphs are meticulously semantically modeled, you presumably already know what relationship types are populating which resource-instance nodes. However if your graphs are "flexibly" semantically modeled, i.e. the relationship type isn't pre-populated for resource-instance node values, and you might have a range of possible values for related-resource-instance.relationshiptype (either from a collection or from a list of ontology relationships), incorporating that relationship type onto the ElasticSearch document for the resourceinstance makes a previously un-searchable property of resource-instance node tiledata now searchable. |
Types of changes
Description of Change
get_nodevalue_as_list()
, implemented in resource-instance and concept datatypesontologyProperty
andinverseOntologyProperty
(and the tile nodegroupid, whether provisional) in the document"strings"
so that if a search term matches the text value of a relationship, it will show up in the suggested terms and be usable as a term queryWhy does this matter?
Resource-instance datatype tiledata isn't searchable in the term search except via the resource displayname of a related instance. The ontology property (relationship) is arguably just as important if not more than the displayname of related instances. Given that there is zero exposure of any resource's relationship to search, adding it in to term search as a searchable string seemed a logical start.
Issues Solved
#10875
Checklist
Ticket Background
Further comments
Note that #10898 should be merged first as this PR has cherry-picked commits from that PR