-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dcterms:subject (free-text) #175
Comments
|
I'd like to reopen this but don't have the permissions. I'l provide the rationale upon the reopening (it would make it easier to follow on my part). |
What is the rationale? |
@zednis is it a problem for dcterms:subject to contain multiple objects? Once I gain some clarification on it, I am all right with closing. |
Two things:
from http://dublincore.org/documents/dcmi-terms/#terms-subject
Most terms in dcterms are intended to be used with non-literal values. This is what separates dublin core terms properties from regular dublin core properties - which are intended to be used with literal values. Guidance on usage at available at http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dcterms:subject The correct representation in RDF for
This is why The correct representation in RDF for
|
@bduggan this issues should be reopened and we should modify your usage of |
I would propose removing dcterms:subject from the rdf. This falls into the larger category of controlled vocabularies which is work in progress. Again, this data was provided as a free form text field, and we do not want to make any assertions about splitting the contents and making URIs for various phrases there. |
I think this information is too important to just drop. What if we have a student work on a post-process to split free-text provided contents into separate text keywords. We can then use either |
The lists in the attributes are not constrained, and there are no definitions. An example: "Precipitation, projections, seasonal, CMIP5, RCP2.6" -- collections of keywords and a model and a scenario. I would argue that treating these as dcterms:subject's is not a good idea -- better would be for the image to be associated with /model/cmip5 and /scenario/rcp2.6. Once we start properly managing controlled vocabularies we will want definitions for each of the terms as well as how they relate to other controlled vocabularies. I think these lists are important to keep in mind and should help inform our discussions about representing (and curating) controlled vocabularies. |
I don't want to just throw this data out. I think it makes sense to use We should do some data science to post-process the free-text data we have into delimited keywords, but thats why we have students ;-) |
I'm not so worried about the effort of splitting these up, I'm more worried about making a category mistake with this data. Besides models and scenarios, there are also regions and even platforms/instruments: things for which we already have URIs. Anyway, eventually, yes, we do want URIs for term, perhaps under "/term" (in your example)., To support this, we need to decide what is returned from the /term endpoint. i.e. what are the attributes and relationships of a term? (not in the lexicon sense but in the controlled vocabulary sense.) Also, I wasn't suggesting throwing it out, just leaving it out of the RDF for now; it'll still be in the database and we can add it once we iron out our representation of controlled vocabularies. |
@bduggan is right. Think engineered. Think lean and automated. And think of an operational GCIS ontology that is automatically generated on a weekly schedule using that lean automated approach. Manual work is not an option; we do not have students for new work like this nor should we assume we shall in the future. Actually, I am okay with throwing this "data" out. Based on two years of GCIS work, Attributes is a spotty, usually blank, catch-all that few bother to fill out. When they do, the values are amazingly varied and not terribly useful. Perhaps we should remove the Attributes field completely. |
I'd be all right with removing the "attributes" field completely given the lack of criteria for the process of assigning them. Let me confirm first that this is so and will update. |
ok, I have been persuaded to the leave this content out of the RDF for now. I support Brian's suggesting of revisiting this and eventually bringing it back with controlled vocabularies. |
Works for me. On Wed, Jun 24, 2015 at 2:26 PM, Stephan Zednik notifications@github.com
Justin Goldstein, Ph.D. O: (202) 419-3496 e-mail: jgoldstein AT usgcrp Dot gov |
Inspired by the discussion of free-form text strings at:
#150
hence I'm broaching it at this point.
Regarding the use of "attributes" for images, e.g.:
http://data.globalchange.gov/image/ff6a7a8e-d886-4b30-acd7-a3538a787baf
Note that the line beginning with "dcterms:subject" also contains multiple objects.
I can think of the use case where someone wishes to query all images pertaining to "precipitation." As written now at:
http://data.globalchange.gov/image/ff6a7a8e-d886-4b30-acd7-a3538a787baf
(1) Will this image come up given the multiple objects?
(2) If not, could we fix this?
Thanks.
The text was updated successfully, but these errors were encountered: