[Feature request]: Homogenization of data structures and physical representations #104

laserkelvin · 2024-01-20T00:20:43Z

Feature/behavior summary

To ensure consistency in modeling, each dataset in Open MatSciML Toolkit should have uniform (or near uniform) kinds of data. For example, whether coordinates provided are fractional or Cartesian, ensuring every dataset has sufficient information to represent each data sample in a physically meaningful way, such as periodic boundary conditions (for use in e.g. shift vectors).

Request attributes

Would this be a refactor of existing code?
Does this proposal require new package dependencies?
Would this change break backwards compatibility?
Does this proposal include a new model?
Does this proposal include a new dataset?
Does this proposal include a new task/workflow?

Related issues

No response

Solution description

A good place to start would be to make sure each devset, and subsequently any serialized datasets we have conform to the following:

Check if the coordinates are fractional or not (if there are values outside of 0 and 1 then they're likely Cartesian)
Check to make sure we have enough information to create a Lattice object, can be just a cell key, or have the lattice parameters like materials project
Generally just print and list out the keys in the sample, construct a table of them, so that we can help contribute to [Feature request]: Standardized data structure for datasets #97

We should also check other projects, like Colabfit, to see what extent we can try and conform to community standards, too.

Additional notes

Can't assign Bin yet, but would be good for Bin to aggregate information, and between him and @melo-gonzo to help craft PRs to address things after the survey is done.

The text was updated successfully, but these errors were encountered:

bmuaz · 2024-01-23T04:23:15Z

I had the same thoughts about the data structures and will be happy to work on it with Carmelo.

laserkelvin added good first issue Good for newcomers data Issues related to data loading, pipelining, etc. code maintenance Issue/PR for refactors, code clean up, etc. labels Jan 20, 2024

laserkelvin assigned melo-gonzo Jan 20, 2024

melo-gonzo mentioned this issue May 16, 2024

[Feature request]: Reconciling multi task models with ase Calculator interface. #217

Closed

6 tasks

laserkelvin mentioned this issue Nov 12, 2024

Introducing structured model outputs #316

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request]: Homogenization of data structures and physical representations #104

[Feature request]: Homogenization of data structures and physical representations #104

laserkelvin commented Jan 20, 2024

bmuaz commented Jan 23, 2024

[Feature request]: Homogenization of data structures and physical representations #104

[Feature request]: Homogenization of data structures and physical representations #104

Comments

laserkelvin commented Jan 20, 2024

Feature/behavior summary

Request attributes

Related issues

Solution description

Additional notes

bmuaz commented Jan 23, 2024