Skip to content

Conversation

@mfranceschit
Copy link
Contributor

  • Created a common interface for AWS schemas following the pattern at the GCP and Azure providers.
  • Required changes to enable new RulesEngine optimization.

arn: String! @id @search(by: [hash])
accountId: String! @search(by: [hash, regexp])
region: String @search(by: [hash, regexp])
cgId: String @search(by: [hash])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required? why do we need it in connections then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why. As I told you, it is an uncommon case, but it exists. It appears so infrequently that it is better to leave it that way instead of creating a new base interface that only we will use once or twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see! thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cg id for cloudtrail causes us a lot of issues. I wonder if it makes sense to reconsider our solution here. 🤔 Idk what we could do outside of limiting cloudtrails to only show up in the account they belong to. (I believe this is only an issue for cloudtrails in multi accounts)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aacotroneo We decided to use the id and remove the cgId to reduce edge cases like cloudtrail.

@hjaraujof
Copy link
Contributor

hjaraujof commented Apr 4, 2022

⚠️
Please keep in mind that with this directive(found on the base interfaces):

@generate(
    query: { get: false, query: false, aggregate: false }
    mutation: { add: false, delete: false }
    subscription: false
) 

you are purposely disabling all query, mutation, and subscription GraphQL APIs to be able to implement it on a service type(dodge a couple of DGraph errors). The directive is inherited as well.

You should re-enable all the query and the add mutation APIs for the main type for every one of the services that are implementing any of the base interfaces.

accountId: String! @search(by: [hash])
arn: String! @id @search(by: [hash, regexp])
region: String @search(by: [hash, regexp])
cgId: String @search(by: [hash, regexp])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to add this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're trying to standardize the common attributes that an AWS schema could have. Notice that is nullable because for this service the cgId doesn't perform a role. To keep the solution agnostic at the RulesEngine, we're using the extraFields field to create connections, take a look here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if its a standard field would it not be in the base schema and not here? I am just wondering why it is being added here but not to other schemas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would this change the 'data contract'? At EP, we assume that every resource has a unique id. is it not possible to keep that nice property? (we'll need to code special cases to deal with this too)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aacotroneo sadly the reason is due mainly to cloudtrail. because cloudtrail can live in multiple accounts (but still has the arn of the account where it was created), our multi account scanning had an issue. Either create a new id (which we did which is cgId) or only show the cloudtrail in the account it "belongs" to. (which I considered but was suggested from @ckoning that users will want to see it in both accounts as it would show in both in the management console). I am wondering if we want to reconsider this decision. Currently we use cgId because cloudtrail does have an id and we thought it may be confusing to users if the id field for cloudtrail in CG was NOT the actual cloudtrail id. Its a difficult problem with no easy solutions (outside of composite ids which dgraph doesnt have). @ckoning now that this problem has had some time to sit, what do you think would be best here?

Copy link
Contributor

@aacotroneo aacotroneo Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that's an unneeded restriction, and it's harder to reason about as a user of CG:

contract 1:

id - an opaque string that uniquely identifies a resource across accounts

contract 2:

id - a value that corresponds directly to an aws id. when that value is not unique across accounts refer to cgID
cgID: an opaque string that uniquely represents a resource across accounts, only meaningful when IDs clash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I am not a fan of the cgId paradigm. I would prefer to not have to use it. But idk if the trade off of losing the value of contract 2 (part 1) would be worth moving to contract 1. I preferred only having the cloudtrail show up in the account that owned it (although I understand the drawbacks of that as well). Moving the random string to the id also brings with it another issue. The cgId value is only initialized in the format function, but the connections functions deal with the raw data. So the connections files would have no access to the value needed to form connections to the cloudtrail service. I suppose we could initialize it in the data function... 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer contract 1. It's easier to implement and it follows a clear pattern, especially because of Dgraph limitations. That opaque string could be an ARN for most of the cases and a unique identifier for those services that don't have one. I'm leaning toward standardized how providers behave between them, which would allow keeping our paradigm agnostic, even move forward with the code generators projects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets chat on a huddle for this so we can discuss pros and cons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we are going to use a generated string as the id for cloudtrail instances after a convo with @mfranceschit and see how that goes.

ckoning added 2 commits April 5, 2022 08:56
Release Github Actions pipelines to Beta channel
Releases Github Actions pipelines to main branch
@mfranceschit mfranceschit changed the base branch from main to alpha April 6, 2022 14:31
@tyler-dunkel tyler-dunkel self-requested a review April 6, 2022 20:08
@tyler-dunkel tyler-dunkel merged commit 1ad58e3 into alpha Apr 6, 2022
@hjaraujof hjaraujof deleted the feature/CG-1026 branch April 6, 2022 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants