New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPGObjects:ObjectNumber #10

Closed
steads opened this Issue Jul 3, 2016 · 26 comments

Comments

Projects
None yet
8 participants
@steads

steads commented Jul 3, 2016

ObjectNumber: This does not seem to have a mapping. It would seem to be a very important field as it looks like the primary identifier for the artwork and so would form an important access point for integration. E22-> P48 has preferred identifier (is preferred identifier of): E42 Identifier

@azaroth42

This comment has been minimized.

azaroth42 commented Jul 5, 2016

IMO, this should be used as a natural key in the URI of the resource as a constant slug. Having it mapped as an Identifier is also important for discovery.

@VladimirAlexiev

This comment has been minimized.

VladimirAlexiev commented Jul 14, 2016

From my experience with the British Museum, ObjectNumbers/AccessionNumbers sometimes change, and sometimes are not unique.
In contrast, the database ID (ObjectID) is unique and will change only if two records are merged (in such case one of the respective ObjectNumber/AccessionNumber will disappear also).

So I think both should be mapped, with a P2 type to distinguish them. Note: this search that also returns Persistent ID and Unique ID (and URI).

  • aat:300312355 accession number
  • aat:300404626 identifier (identification number)

ObjectID should be used in the LOD URL.

@edgartdata, what do you think?

@steads

This comment has been minimized.

steads commented Jul 14, 2016

Nobody knows or can guess the internal database identifier and so it is next to useless for integration. Object Numbers/Accession numbers, even if they change, are published and so can be used for integration with other data sources.

@edgartdata

This comment has been minimized.

edgartdata commented Jul 29, 2016

Couldn't we make the case for internal database identifiers being useful for some applications? Whenever we need to query objects outside of TMS/our internal database we use ObjectID, as in image file names for example. If these internal # are clearly typed as such, whoever is querying for Object Numbers/Accession numbers could just ignore them?

@azaroth42

This comment has been minimized.

azaroth42 commented Jul 29, 2016

I don't follow @steads objection that the internal database reference can't be guessed or known, I'm afraid. Could you expand a little bit for me please? If they're published, then they can be known. If they're not published, then clearly they can't :) So long as the database references don't change, they seem like a reasonable key to me.

@steads

This comment has been minimized.

steads commented Jul 29, 2016

The point is that internal database references are just that, internal to the information system. Do you publish in your exhibition catalogues the internal database reference number? Do people without access to the information system send you queries saying can you give me the information about the item with a internal database reference number 2210? No, typically not, because it is a number internal to that system.
Also when you move to a new system the number may change and do you care? I would suggest not as this was a number used inside a piece of software you no longer use. However your Accession Numbers will appear in external documentation like books and exhibition catalogues and Phd theses and will stay stable for very long periods of time (maybe for as long as the Institution exists!).
Now there is no doubt you can use internal database reference numbers to coin URIs but how do I know what the URI is going to be for an object I have found in a 1947 exhibition catalogue? I perhaps only have an image, a few short notes that intrigue me and the Accession Number. If the URI is coined using the Accession Number then I can follow the published rules for coining and have a reasonable chance of getting to the correct record straight away. Or I can have a machine use the scanned text of the catalogue to make some intelligent guesses on my behalf about which current records match with which objects in the exhibition catalogue (may not be perfect but at least an intelligent guess).
To lampoon the process a bit, why not assign a hash of all the meta-data for the object including a JPG of the object and assign that as the core of the URI; it would be unique and as long as I keep a copy of that data and JPG I can always recreate the hash and no-one could ever confuse it with another object. It would be useless as a tool for integration though because I would have to know everything about the object before I could find it.

@azaroth42

This comment has been minimized.

azaroth42 commented Jul 29, 2016

You should not need to, or indeed expect to, know what the URI is going to be a priori for any resource on the web and especially not for Linked Data. You only know that CNN's homepage is http://cnn.com/ because you've memorized that string, which is why short domain names are increasingly valuable as the address space (and human memory ability) is limited. So assigning the hash of the metadata about the object would be a very valid, though unnecessarily expensive, way to generate URIs.

The benefits of using of the internal database reference are:

  • Guaranteed to be locally unique, and thus globally unique when appended to a URI pattern
  • Almost certain to be persistent between exports
  • Very likely to be persistent between re-imports of the data when migrating between systems
  • Will never be subject to political pressure to modify it, as it's automatically generated and assigned
  • Accession numbers, being manually assigned, are subject to human error and to modification (per Vlad's comment)
@steads

This comment has been minimized.

steads commented Jul 29, 2016

Your expectations of integration and mine are obviously different. I want users to get from other information systems that have access only to pre-LOD data to my data rather than provide a resource that people can link to in the future. More co-references exist from before LOD so building only for the future is an engineers solution not the approach we champion in the CRM-SIG

@azaroth42

This comment has been minimized.

azaroth42 commented Jul 29, 2016

How are your users going to know the rest of the URI pattern, even if they know the accession number? For example, assume that you know that there's an object at the Getty with accession number "G01.003" from your 1947 exhibition catalogue ... now what?

With all due respect to the SIG for the great work it has done, I'll take my chances with the web architecture being somewhat more popular and successful than CRM, over a similar period of time :)

@steads

This comment has been minimized.

steads commented Jul 29, 2016

I rather assumed that you would follow some rules for coining your URIs and that these would be in the public domain. I did not realise that these would be secret in some way.
I am not advocating not using web architecture but harnessing it so that users and their pet machine based agents can find things out.

@azaroth42

This comment has been minimized.

azaroth42 commented Jul 29, 2016

They wouldn't be secret, but given an accession number, there's still the rest of the URI pattern to memorize for every institution. The web architecture explicitly states that URIs are to be treated as opaque strings (see https://www.w3.org/TR/webarch/#uri-opacity) and that agents should not try to infer information from them, or construct them by intuiting patterns.

Instead, I would expect users to search for the accession number and be taken to some web resource, that may or may not have that accession number as part of its URI. Hopefully in a system that understands what an accession number is, and where to find it in structured data, rather than just a free text search. Software agents don't care about the URI other than as a uniquely identifying string, so no worries on that side.

@steads

This comment has been minimized.

steads commented Jul 29, 2016

OK that sounds good. So when I want to link to something I have the Accession Number for all I have to do is search for the Accession Number and then copy the URI from the resource I get taken to and add that to my record to establish the link.
Where do I search? Is there a special museum centric resource that will maintain this information or am I just using generic web resources.

@azaroth42

This comment has been minimized.

azaroth42 commented Jul 29, 2016

Hopefully in a production version of the AAC's browse application! 😄 Or even better, in a structured-data aware Google. Google are increasing their ability to process more than text all the time, particularly through the schema.org ontology.

We're now wildly wildly off topic, but a mapping from the conceptual reference model to a set of more widely known classes and predicates rather than inventing yet another RDF ontology would, in my opinion, be much more successful. A discussion for 🍻 rather than text :)

@steads

This comment has been minimized.

steads commented Jul 29, 2016

🍻 Agreed. You should come to a SIG meeting and present your ideas and real-world solutions.
I suspect my problem is whenever I get to this point the solution is "in the cloud" and just around the corner. Whereas I can roughly see how to produce "http://gettymuseum.data.org/E42Identifier/G01.003" now.

@VladimirAlexiev

This comment has been minimized.

VladimirAlexiev commented Aug 1, 2016

I think two threads are intermixed here (and a third can be added):

  1. whether to emit the internal Object id as an Identifier
  2. whether to use the Object id or the Accession number in the URL
  3. Which should be emitted as Preferred Identifier, the Object id or Accession number

About 1, I think that having more identifiers cannot hurt, while skipping a stable identifier may very well hurt.
Steve, a current example is Wikidata, eg see the bottom of the page describing this painting: https://www.wikidata.org/wiki/Q2267759. It has CONA ID and RKD Images ID: both of these are internal IDs. They give you links into these respective aggregators, which are extremely valuable both to human researchers, and for data integration.

Wikidata has ongoing coreferencing efforts to over 150 catalogs, and RKD Images is one of them, eg see https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=29&offset=0&show_noq=0&show_autoq=0&show_userq=1&show_na=0#the_start.

Database id's may not be found on ancient catalogs but may well be found on future catalogs. So let's emit them rather than skip them.

@steads

This comment has been minimized.

steads commented Aug 1, 2016

I have given my opinion about the utility in integration scenarios of mapping internal system IDs. If you intend to publish them for general consumption then they should be mapped as E42.

@workergnome

This comment has been minimized.

workergnome commented Aug 2, 2016

Regarding @VladimirAlexiev's question number two, I know at CMOA we decided not to use accession numbers because the institution was unwilling to guarantee that they were unique. We have instances where we have reused them (due to they having both semantic meaning and being used as an internal ID.)

I'd use them as the preferred identifier though, since they are the most commonly (human) used identifier for the object.

@si-npg

This comment has been minimized.

Collaborator

si-npg commented Aug 2, 2016

Both ObjectID and Object Number (=accession number) are unique within our database. Object numbers can be changed, but that happens VERY rarely for accessioned objects.
ObjectID is not used anywhere outside of TMS. Currently, all our web URL's reference Object Number. If we map ObjectID, it will no longer be an exclusively internal field, but I agree that may be a good thing, since our accession number system is a little odd and leaves room for confusion. But I also agree that using the accession number (i.e. Object Number) as the preferred identifier is preferable.

@VladimirAlexiev

This comment has been minimized.

VladimirAlexiev commented Aug 3, 2016

all our web URL's reference Object Number

Then I vote those should also be used in your LOD URLs

@rhao

This comment has been minimized.

Contributor

rhao commented Sep 13, 2016

Most sheets only have ObjectID, not ObjectNumber. For example, NPGObjThesTerms2, NPGDimsParsedUpdate2May, and NPGObjTitles2. I found one sheet (there may be more) that only has ObjectNumber: NPGObjProvenance. So I think we need to continue to use ObjectID in the ObjectURI, since it seems to be the only way to connect back to the object.

Does that make sense? I'm not sure what everyone's consensus was on this issue. Here are my main question on the issue:

  1. Is it safe to use "object/" + ObjectID as the ObjectURI, since almost every sheet uses ObjectID, not ObjectNumber, as the object identifier?
  2. Should I still map ObjectNumber? As an E42 Identifier, or otherwise?
  3. What should I do about NPGObjProvenance, since it's not tied to ObjectID?
@VladimirAlexiev

This comment has been minimized.

VladimirAlexiev commented Sep 14, 2016

  1. You could still use ObjectNumber, if you make a join to the main table. But using ObjectId that's available in all tables would be easier.
  2. Yes, map it. E42, and connect by has_preferred_identifier
  3. Post a separate issue. I for one don't remember examining this table
@si-npg

This comment has been minimized.

Collaborator

si-npg commented Sep 14, 2016

Sorry for excluding ObjectID from the NPGObjProvenance table--that was just an oversight. Can it be taken from the Objects table, or should I re-export?

@caknoblock

This comment has been minimized.

Contributor

caknoblock commented Sep 14, 2016

It you can re-export the table with the ObjectID, that would be easiest.

On Sep 14, 2016, at 7:21 AM, si-npg notifications@github.com wrote:

Sorry for excluding ObjectID from the NPGObjProvenance table--that was just an oversight. Can it be taken from the Objects table, or should I re-export?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #10 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/ABB-qb8GVf8YxFzgt3YiS3w-fuTvtQ4Oks5qqAMHgaJpZM4JDx6e.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/american-art/npg","title":"american-art/npg","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/american-art/npg"}},"updates":{"snippets":[{"icon":"PERSON","message":"@si-npg in #10: Sorry for excluding ObjectID from the NPGObjProvenance table--that was just an oversight. Can it be taken from the Objects table, or should I re-export?"}],"action":{"name":"View Issue","url":"https://github.com/american-art/npg/issues/10#issuecomment-247028978"}}}

@si-npg

This comment has been minimized.

Collaborator

si-npg commented Sep 15, 2016

@si-npg

This comment has been minimized.

Collaborator

si-npg commented Sep 15, 2016

Can the new export I just attached can be easily uploaded by someone there?

@rhao

This comment has been minimized.

Contributor

rhao commented Sep 15, 2016

@si-npg - I just uploaded the file. Thanks for updating!
To sum up, I'm going to use ObjectID in the ObjectURI (question 1), I will map ObjectNumber as E42 (2), and ObjProvenance has been resolved with the new file (3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment