Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static keys in all json #811

Merged
merged 2 commits into from
Sep 19, 2016
Merged

Static keys in all json #811

merged 2 commits into from
Sep 19, 2016

Conversation

arlimus
Copy link
Contributor

@arlimus arlimus commented Jul 4, 2016

Why? Self-documenting API and data fields.

I have recently received the question: "Why do we need filenames and paths in those groups field?" to which I responded "They are actually IDs, it's just paths by convenience right now." Our current output format carries 2 inherent assumptions, which users might not understand: controls and groups are actually maps/dicts that connect IDs to their items.

Changes

This MR essentially turns:

      "groups": {
        "controls/meta.rb": {
          "title": "SSH Server Configuration",
          "controls": [
            "ssh-1"
          ]
        },
        "controls/example.rb": {
...

into

      "groups": [
        {
          "id": "controls/meta.rb",
          "title": "SSH Server Configuration",
          "controls": [
            "ssh-1"
          ]
        },
        {
          "id": "controls/example.rb",
...

and also changes:

      "controls": {
        "ssh-1": {
          "title": "Allow only SSH Protocol 2",
...

into

      "controls": [
        {
          "id": "ssh-1",
          "title": "Allow only SSH Protocol 2",
...

Added bonuses

  • Easier interaction with multiple tools like jq and elasticsearch
  • Easier javascript interaction

Negative points

  • Controls and groups are not unique by design. (with arrays you may have multiple entries with same IDs)
  • Any processing that looks for control and group IDs will need more work to find the right item (it is simpler to access via map/dict). e.g. connecting inspec exec --format json-min with inspec json

Please join the discussion below

@arlimus arlimus changed the title static keys in all json DISCUSS static keys in all json Jul 4, 2016
@chris-rock
Copy link
Contributor

Thanks @arlimus for bringing up that question. I try to self-answer some questions and feedback is welcome to get the appropriate output.

What is the purpose of the json format?

It is a data-exchange format

Then the next questions comes into my mind: What are the properties of a data-exchange format?

  • easy to consume eg. easy to parse file
  • data-integrity

Form file parsing perspective, I see the point that it is easier to iterate over an array of elements, on the other hand the access of single elements is way more difficult. Therefore the pros and cons are on the same level here.

Especially data-integrity is difficult to achieve (see https://en.wikipedia.org/wiki/Data_integrity). By using arrays instead of unique keys, users could manipulate the content and add multiple entries with the same id (had a great discussion with @arlimus about that topic). Now the application needs to handle all edge cases, therefore extra logic is required on top of the json parser. Data-integrity is a core property that we added in the first place, to ensure the data is not corrupted. This allows us to reduce the code that ensures the data is correct and leads to simpler and more secure application software. By relying on this safety mechanism of a json parser, its easy to parse the format in all languages. Now only sha sums are required to ensure the data has not been tampered with. Using arrays, would lead to more insecure application code.

Therefore I vote against the change.

@alexpop
Copy link
Contributor

alexpop commented Jul 5, 2016

@arlimus, profiles would also need to be converted to an array if we are to make this change

@arlimus
Copy link
Contributor Author

arlimus commented Jul 5, 2016

Great catch Alex, thank you!!

@mhedgpeth
Copy link

I would vote for data integrity over reportability. I think you keep the core data model as correct as possible and then transform it into easier to read formats for other tools as needed. I agree with @chris-rock against the change.

@arlimus
Copy link
Contributor Author

arlimus commented Jul 7, 2016

Let me grab that data-integrity argument for a sec: I don't believe the Array vs Hash discussion helps us in a significant way to improve or deteriorate data integrity. It is indeed true that additional entries can be added to arrays, that essentially lead to duplicate IDs in controls; However this is only a tiny piece of the landscape of integrity. For example, taking the argument that attackers might manipulate via ID's in arrays vs hashes: the scenario would permit manipulation via IDs either way, which could lead to manipulated entries in the Hash, just as much as in the Array (with the only difference being that the array protects against duplication).

If we want to protect data integrity for profiles and results (and we do, this is planned in the future) we should tackle it with measures that address it (e.g. signatures with integrity information).

@mhedgpeth
Copy link

@arlimus I wasn't thinking in terms of attack protection but in terms of ensuring that you have uniqueness of keys, so data integrity of the keys and no duplication. I think you're right that the signing and protection-oriented features are outside of the scope of this discussion.

@chris-rock chris-rock added this to the 1.0.0 milestone Aug 8, 2016
@chris-rock chris-rock modified the milestones: 1.0.0, 0.31.0 Aug 8, 2016
arlimus added a commit that referenced this pull request Aug 8, 2016
(1) The field is not yet optimal, the calculations are great!
(2) Changing this field should go together with all other breaking json changes, especially if #811 results in a change.
@chris-rock
Copy link
Contributor

In order to find a solution for the question, we decided to treat our CLI JSON results similar to API endpoints. In order to increase the quality and stability, we played with the latest version of JSON Schema v4. It is very difficult to verify hash objects, but easy to verify array objects.

In addition to the schema verification, dealing with arrays is easier in Javascript and jq.

Therefore I change my opinion from Hash to Array.

@chris-rock chris-rock modified the milestones: 0.31.0, 0.32.0 Aug 18, 2016
@arlimus arlimus added the Type: RFC Community survey for a proposal label Aug 22, 2016
@arlimus arlimus modified the milestones: 1.0.0, 0.32.0 Aug 22, 2016
@arlimus
Copy link
Contributor Author

arlimus commented Sep 14, 2016

Decision

Decision for 1.0 JSON changes regarding this discussion:

{
  "version": "0.34.1",
  "profiles": {
    "profile": {
      "controls": {
        "ssh-1": {
~
      "groups": {
        "controls/meta.rb": {
~
        },
~

changes to:

{
  "version": "0.34.1",
  "profiles": [
    {
      "name": "profile",  // only name here
~
      "controls": [
        {
          "id": "control-xyz",
~
      "groups": [
        {
          "id": "controls/meta.rb",
~
        },
~

i.e. all hash-of-hashes change to array-of-hashes

@arlimus arlimus modified the milestones: 0.35.0, 1.0.0 Sep 14, 2016
@arlimus arlimus added the ready label Sep 15, 2016
@arlimus arlimus self-assigned this Sep 15, 2016
@arlimus arlimus force-pushed the dr/json-arrays branch 2 times, most recently from 7946a1e to 8e06e8d Compare September 16, 2016 23:20
@arlimus arlimus changed the title DISCUSS static keys in all json Static keys in all json Sep 17, 2016
@chris-rock chris-rock modified the milestones: 0.35.0, 0.36.0 Sep 19, 2016
@arlimus arlimus force-pushed the dr/json-arrays branch 3 times, most recently from c34d4ad to 83d4aac Compare September 19, 2016 11:16
@chris-rock
Copy link
Contributor

Thanks to all, this was a great discussion. We decided to switch to an array based-solution, because it makes handling in various languages a lot easier.

💯 @arlimus

@chris-rock chris-rock merged commit 8564e2d into master Sep 19, 2016
@chris-rock chris-rock deleted the dr/json-arrays branch September 19, 2016 14:57
@chris-rock chris-rock modified the milestones: 0.36.0, 1.0.0 Sep 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: RFC Community survey for a proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants