Skip to content
This repository has been archived by the owner on Aug 5, 2021. It is now read-only.

Final Schema Definition? #196

Closed
IanLee1521 opened this issue Dec 1, 2016 · 9 comments
Closed

Final Schema Definition? #196

IanLee1521 opened this issue Dec 1, 2016 · 9 comments

Comments

@IanLee1521
Copy link
Contributor

Has there been any finalization in the schema (#41)? Currently there are some discrepancies between:

Specifically, there are differences in formats, but also in the fields being used (e.g. govwideReuseproject vs governmentWideReuseProject, projectTags vs tags.

Additionally, I believe there are still some open questions over how to generate a multi organization code.json file in an agency (see: #187). Perhaps the sample code.json file should be updated to be multi-agency / multi-organization? I can mock up a pull request that would demonstrate that (at least based on opinions / discussion over on #187).

Another related concern, which I created a separate issue for (#195) is about project tags, and whether we should curate the list of possible tags versus allowing anything.

These are questions I've had as I develop @llnl/scraper while generating responses for the 120-day deadline.

@jasonduley
Copy link

jasonduley commented Dec 1, 2016

@IanLee1521 thanks for mentioning this. we'd like have the ability to group multiple organizations with a project array of repositories for each

IanLee1521 added a commit to IanLee1521/code-gov-web that referenced this issue Dec 1, 2016
@IanLee1521
Copy link
Contributor Author

@jasonduley -- Sure! As I described on #187 (comment), I envision something where the final code.json file is a list of agency + organization + array_of_projects objects:

[{
    "agency": "ABC",
    "organization": "FOO",
    "projects": [],
}, {
    "agency": "ABC",
    "organization": "BAR",
    "projects": [],
}, {
    "agency": "XYZ",
    "organization": "BAZ",
    "projects": [],
}]

That's been what I've been picturing at least. 😄

@IanLee1521
Copy link
Contributor Author

IanLee1521 commented Dec 1, 2016

FWIW, I went ahead and took a crack at an updated metadata schema based on many of the discussions we've been having: master...IanLee1521:project-vs-projects

Figured that talking in code (documentation?) would be the easiest way to discuss any updates / changes going forward.

@jfredrickson5
Copy link

+1 for @IanLee1521's proposed schema with an effective way to group by agency/org.

I've been looking at code.json files that several agencies have posted to their websites. There is a mix of various formats in use. Although we've posted our code.json file per the OMB requirement, I'm unsure if we are using the correct format. It would be great if we could have a formal schema so that we can automatically validate our data.

A formal schema with a defined id would also allow for easy migration between revisions of the schema. Some agencies could be using an older schema while others have already migrated to a current one, and Code.gov would be able to support all of them at once.

@IanLee1521
Copy link
Contributor Author

Hi @jfredrickson5, good points, I just updated the branch / pull request (#200) to convert list -> array and integer -> number which appear to be the official primitive types in the formal schema .

One side thought I have. If we are going to move in the direction of something more formal, would it make sense to convert the openSourceProject and governmentWideReuseProject fields from numbers to booleans? To me that would make more sense to someone stumbling upon the project out of the blue.

@jasonduley
Copy link

we are hoping to make the attribute "governmentWideReuseProject" an enum instead of a boolean. For NASA we have the following states:

  • is approved for government-wide reuse
  • is not approved for government-wide reuse
  • is pending review

the first two are available now with the boolean value, but as we add projects into our code.json and those projects have yet to enter our software release process, we'd need the 3rd option.

@IanLee1521
Copy link
Contributor Author

Nod, that seems reasonable to me.

Another possible one I heard, talking with some folks at LLNL recently, was the idea of an "inventoried but release pending publication" or something similar, to support scientists that create code as part of preparation for a research paper submission, which has not yet been submitted / accepted / published.

This is a case that I don't recall having seen anywhere, so might require some further discussion.

lukad03 pushed a commit that referenced this issue Dec 12, 2016
@jqnatividad
Copy link

The team should consider adding more metadata detailing security certifications of a given version of an OSS project, leveraging nvd.nist.gov and cve.mitre.org.

We're currently deploying CKAN at several govt agencies at the state level, and we find ourselves having to go through expensive security regimen repeatedly, which adds to the cost of the project.

@DanielJDufour
Copy link
Contributor

DanielJDufour commented Dec 4, 2018

Thanks for the comments everyone. As this conversation is quite old and we're no longer tracking issues here (we're about to archive this repo), I'm going to close this. Feel free to open up new discussion on the schema at https://github.com/GSA/code-gov/issues . Thanks! :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants