Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine ProvONE index fields #66

Closed
15 of 16 tasks
mbjones opened this issue Jan 5, 2015 · 6 comments
Closed
15 of 16 tasks

Determine ProvONE index fields #66

mbjones opened this issue Jan 5, 2015 · 6 comments
Assignees

Comments

@mbjones
Copy link
Member

mbjones commented Jan 5, 2015

The ProvONE model has many fields, determine which should be indexed in Solr and how.

Discussion today led us to the following new fields for the Solr index:

  • prov_wasDerivedFrom
    • example: compiledData.1.1 prov_wasDerivedFrom data.1.1
    • example: figure.2.1 wasDerivedFrom data.1.1
  • prov_generatedByProgram
    • inferenced via prov:wasGeneratedBy, prov:qualifiedAssociation, prov:hadPlan
    • example: compiledData.1.1 prov_generatedByProgram rScript.1.1
  • prov_generatedByExecution
    • example: compiledData.1.1 prov_generatedByExecution execution.1.1 (this is the prov:wasGeneratedBy property)
      used
  • prov_usedByProgram, inferred via prov:used, prov:qualifiedAssociation, and prov:hadPlan
    • example: data.1.1 prov_usedByProgram rScript.1.1
  • prov_usedByExecution, which is the inverse of prov:used
    • example: data.1.1 prov_usedByExecution execution.1.1
  • wasAssociatedWith has multiple fields, incuding two fields that aggregate the properties, and several for specific types of user identifiers
    • prov_usedByUser
    • prov_generatedByUser
    • specific fields will be
      • Prov_usedByOrcid
      • prov_usedByDataONEDN
      • prov_usedByFoafName
      • prov_generatedByOrcid
      • prov_generatedByDataONEDN
      • prov_generatedByFoafName
    • example: data.1.1 usedByFoafName "Matthew Jones"
    • example: data.1.1 generatedByOricid http://orcid.org/0000-0003-0077-4738
  • prov_wasExecutedByExecution, which is inferred from prov:qualifiedAssociation and prov:hadPlan
    • example: rScript.1.1 prov_wasExecutedByExecution execution.1.1
      ProvONE types
  • prov_instanceOfClass, multivalued, defined as a string URI,
    • similar to Ben's annotation field
    • Value would be a URI that represents an OWL class, with potential values such as the URIs for p1:User, p1:Program, p1:Visualization, prov:Plan, prov:Entity, p1:Execution, p1:Document

Discussion: Do we use DataONE account URIs for Agent properties?

Yes, any. The indexer will take any of these values, and will then look up the other values that exist in the DataONE user portal and add those to the index as well. Model would use any of: hasOrcid, hasDN, foaf:Name, etc. The RDF Subject URI could be a anonymous blank node with each of these properties.

@mbjones mbjones added this to the WBS 2.9.6 Provenance Index and Query Service milestone Jan 5, 2015
@mbjones
Copy link
Member Author

mbjones commented Jan 12, 2015

Reviewed these properties on today's sem-prov call.

I have now incorporated these into the PROVAnnotation design documents in SHA 69ceea7.

Open a new enhancement ticket if additional properties should be created, or a bug if one of these should be changed or revised.

@mbjones mbjones closed this as completed Jan 12, 2015
@laurenwalker
Copy link
Contributor

I propose we also index prov:used as "used."

@leinfelder
Copy link
Contributor

Just a thought, but perhaps the index fields should be a little more distinguishable, like with a namespace: "prov_used"

The bare term - especially with a word like "used" - seems prone to confusion, collision, and misuse.

On Jan 15, 2015, at 4:59 PM, Lauren Walker notifications@github.com wrote:

I propose we also index prov:used as "used."


Reply to this email directly or view it on GitHub.

@mbjones
Copy link
Member Author

mbjones commented Jan 20, 2015

Good idea. We should also be clear about the namespace in the term definition.

@csjx
Copy link
Member

csjx commented Feb 5, 2015

For the UI, being able to display an indication that a metadata document describing a data file has any provenance information. We decided that producing two more fields in the index would be helpful:

  • prov_hasSources
  • prov_hasDerivations

These would list the source pids and derivation pids for the data file this metadata describes. the UI will be able to display the total number based on the list size, and an icon that there is provenance information available. I'll open another ticket to add these.

@csjx
Copy link
Member

csjx commented Feb 5, 2015

To clarify where fields come from, we'll prefix them:
All fields will start with prov_ to diffrentiate them in the Solr index (and make searcing easier for provenance-enabled data packages). I'll open another ticket to add the prefix to the names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants