Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR to handle dataset information in TIMDEX #125

Merged
merged 6 commits into from
Feb 27, 2024

Conversation

ghukill
Copy link
Contributor

@ghukill ghukill commented Feb 15, 2024

Purpose and background context

This PR introduces a new Architecture Decision Record (ADR) that proposes to add a new form field to the TIMDEX data model.

This discussion was kicked off when seeking to understand what TIMDEX fields to use for facet filters in a new geospatial TIMDEX UI. Reaching a decision on where to store this data in a TIMDEX record will unblock work for DataEng (work to store the data) and EngX (utilize the data for facets).

Rendered markdown for current version of the ADR: https://github.com/MITLibraries/transmogrifier/blob/GDT-193-adr-handle-data-forms/docs/adrs/0002-field-for-data-type-form-information.md

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

How this addresses that need:
* Adds new ADR markdown in 'docs/adrs'

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-193
In this way, the TIMDEX data model is lacking a field that can be used to describe the "spatial type or structure of a dataset". Maybe stated more generally, a field that describes
the structure of the data _inside_ the format noted in `format`.

It is also worth noting that data values for both facets -- present in fields `dct_format_s` and `gbl_resourceType_sm` from the Aardvark records -- will be controlled values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if ResourceType is more or less confusing than Form for the new object name? I do think Form works and I will support it, but the fact that at least one other use case of this type of data refers to it as ResourceType seems worth explicitly considering.

Copy link
Contributor Author

@ghukill ghukill Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JPrevost - do you think ResourceType could still take values formerly used in literary_form? e.g.

"resource_type":[
  {
    "value":"Nonfiction",
    "kind":"Literary Form"
  }
]

I'd also wonder how this would to content_type which we define in the spec as:

"Type of content of item; e.g., "still image", "text", etc."

Sketching an example:

{
  "content_type": "Geospatial data",   # this feels pretty well established
  "format": "Shapefile",
  "resource_type": [
    {
      "value":"Polygon",
      "kind": "Data Type"
    },
    {
      "value":"Vector",
      "kind": "Data Type"
    }
]
}

Given the GIS data, it seems that whatever we name this field it should allow multiple values (some MIT and OGM records have 3-4 "Data Type" qualities in the metadata). I'd worry that content_type and resource_type seem conceptually similar, but content_type is single value string, while resource_type is multivalued object. And, that format kind of sits between them hierarchically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think literary_form could still collapse into this object with either the name form or resourceType. I'm not sure which I prefer yet fwiw. I think form has a meaning in our current language that is not how we are using it here. I also think that we aren't using it wrong fwiw, but it isn't the first meaning I would intuit when trying to use the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noting that our current content_type appears to be an array of Strings and not just a single String.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also nothing that our mappings for our sources should be confirmed.

We have thesis records from Alma and DSpace for the same thesis that have different content_types, literary_forms, and formats.

This is not directly related to if we create a new object, but the lack of consistency across sources feels important to note as we consider what these closely related fields mean.

Example:

{
  "source": "DSpace@MIT",
  "title": "Gardens of resistance",
  "contributors": [
    {
      "kind": "author",
      "value": "Jhaveri, Nynika\n            (Nynika P.)"
    },
    {
      "kind": "other",
      "value": "Massachusetts Institute of Technology. Department of Architecture."
    },
    {
      "kind": "department",
      "value": "Massachusetts Institute of Technology. Department of Architecture"
    }
  ],
  "contentType": [
    "Thesis"
  ],
  "literaryForm": null,
  "format": "electronic resource"
},
{
  "source": "MIT Alma",
  "title": "Gardens of resistance",
  "contributors": [
    {
      "kind": "Not specified",
      "value": "Jhaveri, Nynika (Nynika P.)"
    },
    {
      "kind": "Not specified",
      "value": "Massachusetts Institute of Technology. Department of Architecture"
    }
  ],
  "contentType": [
    "Manuscript language material"
  ],
  "literaryForm": "Nonfiction",
  "format": null
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noting that our current content_type appears to be an array of Strings and not just a single String.

Good catch! Thanks. Crossing that out from above.

Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in agreement with the core proposal here. I want to consider if Form is the best label for this new Object.

I may have some very small additional details to add some more context and once I have had a chance to do that and we have considered Form vs ResourceType vs ? I will feel confident in approving this as-written.

Solid write up, thanks!

* "Data Type"
* values like: `[Polygon, Point, Raster, Image, etc.]`
* "Format"
* values like: `[Shapefile, TIFF, GeoTIFF, JPEG, etc.]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, these feel more like file_formats, a field which was not used in MITAardvark but maybe should have been?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that theoretically allow:

content_type = Geospatial data
format = [Polygon, Point, Raster, Image, etc.]
file_formats = [Shapefile, TIFF, GeoTIFF, JPEG, etc.]

Or do we want to keep file_formats focused on MIME type values like application/pdf?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I think that might be another option to document?

i.e. move [Shapefile, TIFF, GeoTIFF, JPEG, etc.] to file_formats... and [Polygon, Point, Raster, Image, etc.] slide into format... no data model changes required?

(I'm not saying I'd pick that option, but it feels like an option to document)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can write that up and push a commit, if that works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggesting @ehanson8. file_formats felt like a non-starter because other Transmog sources were writing mimetypes... but would probably be relatively easy to update them.

I'm liking that proposal quite a bit, but did want to note that would mean the "Format" filter in the UI would actually be coming from file_formats and "Data Type" filter would be coming from format. Would probably want to document that in the UI why this mapping was selected.

For any other sources that write to format, I'm assuming we might want to consider if the values they are writing might then belong in file_formats?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed commit with the 4th option

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the addition of the fourth option as one to consider in the ADR.

Comment on lines +127 to +147
#### Option 4 - Use `file_formats` for current `format` values and `format` for Data Type values

In this approach, the current `MITAardvark.format` values would shift to the previously unused `MITAardvark.file_formats` property and the Data Type values would be stored in `MITAardvark.format`

Example:
```json

{
"content_type": "Geospatial data",
"format": ["Polygon", "Point", "Raster", "Image"],
"file_formats": ["Shapefile", "TIFF", "GeoTIFF", "JPEG"]
}
```

Pros:
* does not require TIMDEX data model changes

Cons:
* `file_formats` has previously only stored MIME type values, such as `application/pdf`
* may require explanation of the facet mapping in the UI documentation
* may require updates of other transform classes for consistency
Copy link
Contributor Author

@ghukill ghukill Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding Option 4 pretty appealing now.

One question: what if we had a record for a print map? If the Aardvark record had dct_format_s="Print map", are we okay with this value mapping to file_formats?

Maybe helpful for discussion, this spreadsheet has dct_format_s and gbl_resourceType_sm values across all MIT and OGM records.

In support of Option 4, I'm hard pressed to find a value that isn't a digital file format in essence (where even msot of the complex ones could be mapped to "Online resource" or something similar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question: if we go this route, are we okay losing mimetype as a value in a TIMDEX record? My inclination is that it would be okay as I'd imagine this field is not widely used yet.

If we did want specifics about mimetypes, perhaps that could get added to the links section? Seems more meaningful for a mimetype to point precisely to an actual resource vs the record as an abstract whole.

Copy link
Contributor

@ehanson8 ehanson8 Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did want specifics about mimetypes, perhaps that could get added to the links section? Seems more meaningful for a mimetype to point precisely to an actual resource vs the record as an abstract whole.

Agree that including it with the actual download link would be more useful, Link.file_type attribute?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: what if we had a record for a print map? If the Aardvark record had dct_format_s="Print map", are we okay with this value mapping to file_formats?

If we do go this route, we may want to change the name from file_formats so it's more broadly applicable but I don't have a suggestion yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: what if we had a record for a print map? If the Aardvark record had dct_format_s="Print map", are we okay with this value mapping to file_formats?

If we do go this route, we may want to change the name from file_formats so it's more broadly applicable but I don't have a suggestion yet

I've had time to look more closely at the spreadsheet linked above, and I don't think we'll encounter "Print Maps", at least in the gismit or gisogm sources. So maybe this is moot? I think it's safe to say that the values we get back from dct_format_s in the MITAardvark files will uniformly be digital in nature. And therefore, file_formats could be applicable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did want specifics about mimetypes, perhaps that could get added to the links section? Seems more meaningful for a mimetype to point precisely to an actual resource vs the record as an abstract whole.

Agree that including it with the actual download link would be more useful, Link.file_type attribute?

If we did repurpose file_formats, and therefore needed to move those mimetype data points to another field, and we did like putting them on the links entries, I might propose mimetype explicitly; no ambiguity there.

If we go that route, I think we'd want to make sure the links made sense though. If we're just pointing someone back to an Alma record, or ASpace page, I'm not sure that an link.mimetype=application/pdf makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can weigh in on what to do with the mimetype values at the moment, but as far as the rest of this choice is concerned, I think I'd prefer option 4 over the others - at least as I understand them now.

To be clear, this option would leave the current literary_form field untouched?

@jonavellecuerdo
Copy link
Contributor

jonavellecuerdo commented Feb 20, 2024

@ghukill and @ehanson8 Thank you for writing up detailed descriptions for possible solutions. Here are my thoughts:

  • Option 1: Didn't feel strongly; he reason stated in the "Cons".
  • Option 2: Didn't feel strongly; I did think the term "Form" was a bit confusing.
  • Option 3: Kinda' dug this solution; But I still thought "Form" didn't suit "data types" or "file formats".
  • Option 4: Definitely felt like this was the most understandable solution. ✅

Question: Would it be a big lift if we were to:

  • Create a new class transmogrifier.models.FileFormat:
    class FileFormat:
       value: str = field(validator=instance_of(str))  # Required subfield
       kind: str | None = field(default=None, validator=optional(instance_of(str)))
    
  • Update TIMDEX model TimdexRecord
    file_formats: list[str] | None = field(default=None, validator=optional(list_of(FileFormat)))
    

Such that....we can specify---

FileFormat(value="application/pdf", kind="MIME")
FileFormat(value="Shapefile", kind="Image file format")

?

@ghukill
Copy link
Contributor Author

ghukill commented Feb 20, 2024

@ghukill and @ehanson8 Thank you for writing up detailed descriptions for possible solutions. Here are my thoughts:

  • Option 1: Didn't feel strongly; he reason stated in the "Cons".
  • Option 2: Didn't feel strongly; I did think the term "Form" was a bit confusing.
  • Option 3: Kinda' dug this solution; But I still thought "Form" didn't suit "data types" or "file formats".
  • Option 4: Definitely felt like this was the most understandable solution. ✅

Question: Would it be a big lift if we were to:

  • Create a new class transmogrifier.models.FileFormatType:
    class FileFormatType:
       value: str = field(validator=instance_of(str))  # Required subfield
       kind: str | None = field(default=None, validator=optional(instance_of(str)))
    
  • Update TIMDEX model TimdexRecord
    file_formats: list[str] | None = field(default=None, validator=optional(list_of(FileFormatType)))
    

Such that....we can specify---

FileFormatType(value="application/pdf", kind="MIME")
FileFormatType(value="Shapefile", kind="Image file format")

?

A couple thoughts here...

If we did this, I'd propose naming the class class FileFormat which matches the CamelCase style of other field-specific classes that slow into a snake_case form in the final record.

As for this change itself, a couple of considerations:

  • for any sources that currently write file_formats, we'd have to update transformation and reharvest to ensure they now have a complex object with value/kind
  • I think we might struggle with programatically applying kind to non-mimetype values, where it's a bit subjective how to classify them.

Having been down this road a bit with preservation systems, I think we might want to stay out of the game of trying to juggle "friendly" and mimetype forms. There are python library we could lean on to give us one from the other, but there are always edge cases.

Zooming out a bit -- something I think Option 4 kind of raises as a question -- perhaps we don't need mimetypes at all? Or, not needed for file_formats? And/or, add mimetype to links only as floated above?

To me, the appeal of Option 4 would be:

  • no structural changes to TIMDEX data model; file_formats remains a list of strings
  • update sources that do currently write to file_formats to write "human" friendly forms, e.g. "PDF" instead of application/pdf which would then match the "human" friendly forms that Aardvark would introduce like GeoTIFF or Shapefile.

@ghukill
Copy link
Contributor Author

ghukill commented Feb 20, 2024

As if this discussion doesn't have enough context, wanted to add a bit more.

After some normalization of terms in the GeoHarvester (specifically dct_format_s to these terms, and gbl_resourceType_sm to these and these terms), here are the counts across MIT and OGM records: https://docs.google.com/spreadsheets/d/10oNtwD-3HagVHFfa00sB8eWs5EVmmWMiLSq2q08VU-w/edit#gid=61712783.

This gives a pretty good preview of what kind of actual values we're looking to slot into the data model, and see in the UI.

One observation: for the OGM gbl_resourceType_sm field, you can see a longer tail of values than the MIT records. This is because the Aardvark schema allows for either LOC or OGM terms here, and clearly many institutions do this. It suggests to me that Geoblacklight -- the model we are basing our "Format" and "Data Type" facet filters on -- likely takes a subset from this field to populate the "Data Type" filter.

I would propose we filter this down in Transmogrifier to only allow the values we want to see for "Data Types", e.g. "Polygon", "Point", etc., and then put the rest into subjects, as they are quite nicely formed and we even have some namespace/provenance to understand where they come from.

@jonavellecuerdo
Copy link
Contributor

@ghukill and @ehanson8 Thank you for writing up detailed descriptions for possible solutions. Here are my thoughts:

  • Option 1: Didn't feel strongly; he reason stated in the "Cons".
  • Option 2: Didn't feel strongly; I did think the term "Form" was a bit confusing.
  • Option 3: Kinda' dug this solution; But I still thought "Form" didn't suit "data types" or "file formats".
  • Option 4: Definitely felt like this was the most understandable solution. ✅

Question: Would it be a big lift if we were to:

  • Create a new class transmogrifier.models.FileFormatType:
    class FileFormatType:
       value: str = field(validator=instance_of(str))  # Required subfield
       kind: str | None = field(default=None, validator=optional(instance_of(str)))
    
  • Update TIMDEX model TimdexRecord
    file_formats: list[str] | None = field(default=None, validator=optional(list_of(FileFormatType)))
    

Such that....we can specify---

FileFormatType(value="application/pdf", kind="MIME")
FileFormatType(value="Shapefile", kind="Image file format")

?

A couple thoughts here...

If we did this, I'd propose naming the class class FileFormat which matches the CamelCase style of other field-specific classes that slow into a snake_case form in the final record.

As for this change itself, a couple of considerations:

  • for any sources that currently write file_formats, we'd have to update transformation and reharvest to ensure they now have a complex object with value/kind
  • I think we might struggle with programatically applying kind to non-mimetype values, where it's a bit subjective how to classify them.

Having been down this road a bit with preservation systems, I think we might want to stay out of the game of trying to juggle "friendly" and mimetype forms. There are python library we could lean on to give us one from the other, but there are always edge cases.

Zooming out a bit -- something I think Option 4 kind of raises as a question -- perhaps we don't need mimetypes at all? Or, not needed for file_formats? And/or, add mimetype to links only as floated above?

To me, the appeal of Option 4 would be:

  • no structural changes to TIMDEX data model; file_formats remains a list of strings
  • update sources that do currently write to file_formats to write "human" friendly forms, e.g. "PDF" instead of application/pdf which would then match the "human" friendly forms that Aardvark would introduce like GeoTIFF or Shapefile.

Given the considerations you mentioned, it seems like the additional changes I proposed would prove challenging, so please feel ignore! I do like the idea of "updat[ing] sources that do currently write to file_formats to write "human" friendly forms". Curious to see what others think! 🤓

Copy link
Member

@matt-bernhardt matt-bernhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns with the third option, which I've tried to explain above. Ultimately, though, I don't know that I'm familiar enough with this problem space to attempt to veto any of the four options.

I have a separate question about our change processes which this has brought up for me - how significant an undertaking is it for the data model to change? Several of these options include "pro" statements about not requiring the data model to change, and I want to make sure I understand that prospect accurately. Are we avoiding changes to the data model because:

  • Doing so actively causes problems, or
  • Because it is a complex process that must be undertaken thoughtfully (and which takes significant time), or
  • Because we've yet to make these sorts of data model changes, so the process is completely undefined?

* "Data Type"
* values like: `[Polygon, Point, Raster, Image, etc.]`
* "Format"
* values like: `[Shapefile, TIFF, GeoTIFF, JPEG, etc.]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the addition of the fourth option as one to consider in the ADR.

```

Pros:
* allows collapsing of `literary_form` field; noting some shared sentiment that this field might be too source-specific for TIMDEX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't know that I feel strongly enough to veto this option, I do have concerns.

While I agree that, in hindsight, literary_form field seems to over-fit the data model to one data source, I'm not sure how comfortable I am creating a vocabulary that encompasses all of these values: nonfiction, fiction, point, polygon, raster, image

Terms like point, polygon, and raster feel like they address the internal structure of the record, but nonfiction and fiction are more about the content (akin to genre). Equivalent literary terms feel like they might be prose, verse, or drama (which are values that I don't think appear in any current field?)

image is a value I also struggle with in this set, as I understand it to be closely related to (if not synonymous with) raster. I vaguely remember seeing raster data that is not an image, so maybe there's something here I'm missing though.

Copy link
Member

@JPrevost JPrevost Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt-bernhardt would your concern be addressed by the different kind provided by this Object?

Example:

GIS record

{
    "form": [
        {
            "value": "Polygon",
            "kind": "Data Type"
        },
        {
            "value": "Vector",
            "kind": "Data Type"
        }
    ]
}

Literary form example

{
    "form": [
        {
            "value": "Fiction",
            "kind": "Literary Form"
        }
    ]
}

I remain unsure myself if kind addresses the "sameness" across fields. When I imagine using this data in a User Interface, I find myself agreeing with your concern that the geo values looks super weird along with the literary form values. We could address that by exposing aggregations that are filtered to the separate kind values but then it starts to feel slightly weird again of why are we combining things in OpenSearch that we would then split out again in GraphQL and thus nobody would ever use the combined data. It definitely starts to feel like if we are concerned we'll never use the data together -- because it is indeed "different" -- we need to consider if it ever should have been together in the first place. I'm still not sure.

Comment on lines +127 to +147
#### Option 4 - Use `file_formats` for current `format` values and `format` for Data Type values

In this approach, the current `MITAardvark.format` values would shift to the previously unused `MITAardvark.file_formats` property and the Data Type values would be stored in `MITAardvark.format`

Example:
```json

{
"content_type": "Geospatial data",
"format": ["Polygon", "Point", "Raster", "Image"],
"file_formats": ["Shapefile", "TIFF", "GeoTIFF", "JPEG"]
}
```

Pros:
* does not require TIMDEX data model changes

Cons:
* `file_formats` has previously only stored MIME type values, such as `application/pdf`
* may require explanation of the facet mapping in the UI documentation
* may require updates of other transform classes for consistency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can weigh in on what to do with the mimetype values at the moment, but as far as the rest of this choice is concerned, I think I'd prefer option 4 over the others - at least as I understand them now.

To be clear, this option would leave the current literary_form field untouched?

Why these changes are being introduced:

After multiple discussions about the various options outlined in the first version,
it was decided to take another pass at the ADR context and decision.  This reflects
some discussions to emphasize the data model as an evolving one, in which this
decision may feed into future decisions, while hopefully solving an immediate
issue for a TIMDEX source.

How this addresses that need:
* Reworks ADR 0002

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-200
@ghukill
Copy link
Contributor Author

ghukill commented Feb 23, 2024

Please find in this new commit a fairly major rework of the ADR:

  • attempt to tighten up background / context
  • removed multiple options after discussions, proposing single option approach w/ revisions, discussion, etc.

The new proposal is a synthesis of previous options, and discussions this week. The data model changes would be minimal or even non-existent, but there is data model discussion in the ADR and implications for future work.

Thanks again to all for the engagement!

@ghukill
Copy link
Contributor Author

ghukill commented Feb 23, 2024

Please find in this new commit a fairly major rework of the ADR:

  • attempt to tighten up background / context
  • removed multiple options after discussions, proposing single option approach w/ revisions, discussion, etc.

The new proposal is a synthesis of previous options, and discussions this week. The data model changes would be minimal or even non-existent, but there is data model discussion in the ADR and implications for future work.

Thanks again to all for the engagement!

I wanted to point out one subtle change that might go unnoticed in this reworked form.

Previously, examples for "Data Type" had shown "Polygon" or "Raster", where now they show the more correct "Polygon data" or "Raster data". This conforms to the allowed terms for those fields in Aardvark, and while Geoblacklight instances remove the "data" suffix for their "Data Type" filters, I'd propose that we keep it.

This new proposal puts these values in content_type, and therefore will sit alongside values like "Short stories" or "Personal papers". In this way, "Polygon data" is more meaningful than just the opaque "Polygon". This actually requires fewer changes or normalizations in code, but is a slight devitation from other Geoblacklight interfaces which have this "Data Type" UI filter.

@ghukill ghukill force-pushed the GDT-193-adr-handle-data-forms branch from 2348b16 to bb82f62 Compare February 23, 2024 15:53
@jazairi
Copy link

jazairi commented Feb 23, 2024

Previously, examples for "Data Type" had shown "Polygon" or "Raster", where now they show the more correct "Polygon data" or "Raster data". This conforms to the allowed terms for those fields in Aardvark, and while Geoblacklight instances remove the "data" suffix for their "Data Type" filters, I'd propose that we keep it.

Agreed. Including the 'data' suffix feels friendlier to use cases beyond GeoData.

Copy link

@jazairi jazairi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this decision. While losing Geospatial data is a shame, I agree that considering content_class is out of scope of this ADR. (Though, I think I'd support adding that field, too.) In the meantime, if we need to identify whether a record is geospatial, we should be able to infer as much from source.

Thank you to the DataEng team for your thoughtful consideration of this issue, and to Graham for the clear and detailed write-up!

@jonavellecuerdo
Copy link
Contributor

@ghukill Option 5 sounds like the best option. I agree with @jazairi 's comment that the values for source imply the "content class" for a given resource, so there isn't a loss of information if we let the "Geospatial metadata" values go for GIS sources.

Also--thank you for the detailed ADR! I appreciate your time and effort into capturing all the ideas/learnings that came out of our teamwide discussions. These notes will serve as a helpful reference to understand the reasoning behind our TIMDEX data model(s). 📓

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also in favor of option 5, the added examples were very helpful

Copy link
Member

@matt-bernhardt matt-bernhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for facilitating this conversation, everyone. Based on the way the document now reads, I'd support choosing option 5 from those laid out. I'm also likely in favor of the future addition of a content_class field.

The example values across a variety of sources was particularly helpful in coming to terms with this.

I see that the Decision area is still left as TBD, so I'm not sure that I'm in a position to approve as-is - but I think I'm with the others so far who have said they support that choice.

Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like my approval to signal a specific option. Could you update the status to Approved and the decision to Option 5 and dismiss all current reviews.

I do support option 5, but it's unclear if all of the other approvals are approving option 5 or something else because the document doesn't have a decision listed :)

Thanks!

docs/adrs/0002-field-for-data-type-form-information.md Outdated Show resolved Hide resolved
docs/adrs/0002-field-for-data-type-form-information.md Outdated Show resolved Hide resolved
@ghukill ghukill force-pushed the GDT-193-adr-handle-data-forms branch from 82cac11 to 44e9ef9 Compare February 27, 2024 19:46
@ghukill
Copy link
Contributor Author

ghukill commented Feb 27, 2024

I'd like my approval to signal a specific option. Could you update the status to Approved and the decision to Option 5 and dismiss all current reviews.

I do support option 5, but it's unclear if all of the other approvals are approving option 5 or something else because the document doesn't have a decision listed :)

Thanks!

Thanks for the ADR logistical help here @JPrevost! ADR marked as "Accepted", decision logged, and all reviewers have been re-requested.

@ghukill ghukill merged commit eb13daf into main Feb 27, 2024
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants