Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce fields in schema to describe synthetic data #64

Merged
merged 10 commits into from
May 10, 2024
Merged

Conversation

WardLT
Copy link
Contributor

@WardLT WardLT commented May 8, 2024

Our initial schemas focused only on data from battery cycling experiments. This PR is a first step towards expanding the types of data and adds fields associated with describing modeling data.

While at it, I'm starting to map the fields I create to the BattINFO ontology (see #30)

Fixes #62

To do list:

  • Check with @jsimonclark that our approach to mapping schema fields isn't terrible. We we can go through further refinements later
  • Capture software name, type and version number

@coveralls
Copy link

coveralls commented May 8, 2024

Pull Request Test Coverage Report for Build 9006896709

Details

  • 71 of 73 (97.26%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+1.0%) to 83.0%

Changes Missing Coverage Covered Lines Changed/Added Lines %
batdata/schemas/ontology.py 43 45 95.56%
Totals Coverage Status
Change from base Build 8989303579: 1.0%
Covered Lines: 747
Relevant Lines: 900

💛 - Coveralls

@WardLT
Copy link
Contributor Author

WardLT commented May 8, 2024

@jsimonclark, if you (or someone else from BattINFO) do get a chance to look at this, here are the two parts of relevance:

@jsimonclark
Copy link

@WardLT, this is looking really neat!

It looks like you are using BattINFO as a source for things like labels, elucidations, and as a "see also" reference for some of your fields. The mapping is done by adding an iri field to the class description, like in how you treated nominal_capacity.. Is that a correct summary?

This is a good approach to start integrating BattINFO terms. The real power of RDF comes when you start making graphs, which opens things up for semantic querying via SPARQL, machine-readable linking with other sources, and easier automation. JSON-LD is usually my recommended way for doing that, because it is easy to understand and can build on existing JSON infrastructure. Let me know if you'd like to discuss and we can setup a call.

@jsimonclark
Copy link

Also, for bibliographic and general metadata, keep resources like schema.org and dcterms in mind. I'm especially a fan of schema.org because it strikes a good balance of simplicity and features - and it helps with findability in Google or other search engines. For example, here you could map to schema:manufacturer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what this is used for...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get it now, oops...

@victorventuri
Copy link
Contributor

I think this looks fine, but I have not tested any of it. My main question is: we can specify the model that generated synthetic data on the metadata, but still, we have multiple cells per hdf5; how do we attribute specificities of each cell to each cell? For instance, in the case of the main synthetic data we are looking at, there is some variation in degradation parameters for each cell we store. Ideally, we would be able to recover what those are. Should there be a sub-metadata for each cell as well?

@WardLT
Copy link
Contributor Author

WardLT commented May 10, 2024

@victorventuri , I agree that we've got some holes in the implementation of "multiple cells per HDF5" and I've opened up an issue to record discussion on that, #67 .

I'll go ahead and merge this PR then work on the multi-cell support in another branch.

@WardLT WardLT merged commit de9c0ea into main May 10, 2024
2 checks passed
@WardLT
Copy link
Contributor Author

WardLT commented May 10, 2024

Thanks for the feedback @jsimonclark ! You've got my goals right.

I'll see if I can link to dcterms while I continue fleshing out ontology support.

I will probably need to talk to you more about going the next step in going towards better RDF/JSON-LD support. That's a new technology stack to me and I would be glad to learn from you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata fields related to synthetic data
4 participants