Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use airbyte service account when creating pods #11697

Conversation

mohamagdy
Copy link
Contributor

@mohamagdy mohamagdy commented Apr 4, 2022

What

When a source/destination pod is created, there is no service account assigned (default)

How

This PR changes that behaviour to use airbyte service account which help in avoid adding the AWS Access Key ID and Secret to manage AWS resources and use OIDBC roles instead (wired with service accounts)

Pre-merge Checklist

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • docs/SUMMARY.md
  • PR name follows PR naming conventions

@github-actions github-actions bot added area/platform issues related to the platform area/worker Related to worker labels Apr 4, 2022
Copy link
Contributor

@jrhizor jrhizor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The orchestrator piece is currently only used in our cloud platform. For injecting arbitrary service accounts, it seems like we should be reading an environment variable with the name of the service account in EnvConfigs and then using that as an input to the KubeProcessFactory.

@mohamagdy does that sound reasonable to you?

@mohamagdy mohamagdy force-pushed the kube-pods-with-airbyte-service-account branch from f7acad8 to 27eda41 Compare April 5, 2022 16:17
@mohamagdy
Copy link
Contributor Author

Hey @jrhizor thanks for taking a look. Yeah, that makes total sense. I did the changes and would love to test. Is there an easy way to test it? I was thinking of building the airbyte-worker Docker, pushing to a private DockerHub account and deploy a dummy Airbyte app on Kubernetes. What do you think?

@jrhizor
Copy link
Contributor

jrhizor commented Apr 5, 2022

@mohamagdy I left one comment. I think this is on the right track except for that one suggestion.

For testing Kubernetes-related changes, we generally run this locally using docker for desktop. https://docs.airbyte.com/contributing-to-airbyte/developing-on-kubernetes has some more info

@davinchia
Copy link
Contributor

@mohamagdy I'm trying to understand what you are trying to do. The job pods use the default Kubernetes service account for added security since the default Kube service account has permissions similar to an unauthenticated user. We don't want the job pods to assume anything about the runtime environment, since it should be runtime agnostic.

The orchestrator pod has higher permissions since it's explicitly creating other job pods. However this is a Cloud only feature and not exposed to OSS users (for now).

With that context, what issue here are you trying to solve for?

@mohamagdy
Copy link
Contributor Author

Hey @davinchia, what I wanna introduce here is using service accounts and OIDC roles https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html. A use case is Redshift destination that writes to S3. In the Redshift destination, there is an option to add the AWS_ACCESS_KEY_ID and AWS_SECRET_KEY. This change avoids this and introduces roles the connector pods can assume. Now, the connector pods (the Redshift destination pod) has serviceAccount: default. Does that make sense?

@davinchia
Copy link
Contributor

davinchia commented Apr 7, 2022

@mohamagdy thanks for explaining. I'm not sure this is the right approach. This will also need destination redshift connector changes to make the env var variables optional.

I'm guessing we want to do something like this. However, in that case, we want to be able to specify service accounts per connector and/or connection and not mount the same service account on all pods. This will require some thinking on our part as we have to make sure the approach supports all our various deployment options e.g. docker, GKE, EKS etc.

Is this blocking? I think the best way forward is to either comment on #5942 or create a separate issue focusing on the redshift destination. Either we discuss there and figure out a good solution before implementing or someone on our side can help implement this.

FYI @grishick any thoughts here?

@mohamagdy
Copy link
Contributor Author

@davinchia thank you too for reviewing. Regarding the Redshift destination configuration as far as I can see the AWS credentials (key and secret) are optional already. This will work on non K8s environments where the right AWS role was assumed, but on K8s as the comment you posted requires OIDC. Luckily, from what I can see this change only applies to K8s deployment only and didn't touch other deployments.

Is this blocking?

Well, we are trying to remove the AWS access key ID and secret from the Redshift connector (and everywhere it is used) and this is a one step to do this cleanup as using AWS access key ID and secret might not be the best approach security-wise.

@davinchia
Copy link
Contributor

@mohamagd you are right, these are already optional fields. Thanks for pointing that out.

The downside of this change is the service account is applied to all pods, which is probably less secure than specifying the aws key and secret in the config json. The right way to do this is to apply service account changes at the connector or connection stage, but that's trickier since it requires our connector configs to interact with our process layer abstractions. This is why I think we should rethink this.

@mohamagdy
Copy link
Contributor Author

@davinchia I see your point. You suggest having an option per connector to choose the service account to assume when the pod launches. I think this is a good idea but needs more work as you mentioned but the same time, I am not sure why using service accounts could be less secure than AWS credentials? The AWS credentials are both exposed + might be as powerful as the role that the service account will assume (that what I am thinking of)

Also creating a service account per connector requires some changes in the HelmCharts to create service accounts on demand (currently I see a single service account for the whole chart https://github.com/airbytehq/airbyte/blob/master/charts/airbyte/values.yaml#L25-L28.

@davinchia
Copy link
Contributor

davinchia commented Apr 7, 2022

@mohamagdy it's less secure because with our current set up, every pod will have the service account mounted, and thus have access to redshift/whatever roles the service account has. With the secret keys, only the pod running the job have access.

The creds are mounted per job pod and stored in the DB. What do you mean when you say the AWS credentials are exposed?

Also creating a service account per connector requires some changes in the HelmCharts to create service accounts on demand (currently I see a single service account for the whole chart

Yeap. I would actually leave this out of the chart and let OSS users manage this separately. Otherwise the chart will get complicated. We also have to figure out how to associate spec configuration with pod creation.

Because there are multiple pieces here, I suggest closing this PR and moving this to an issue.

@mohamagdy
Copy link
Contributor Author

mohamagdy commented Apr 7, 2022

I got your point. It will be the same "security" concern for the default airbyte-admin service account.

What I meant by the AWS credentials are exposed, if anyone has access to the connector pods, he can easily read the AWS credentials.

And I agree regarding the complexity of adding "dynamic" service accounts in the HelmChart. This PR in my point of view adds flexibility to choose between using AWS credentials or using the OIDC when using K8s by flexibly setting the CONNECTOR_KUBE_SERVICE_ACCOUNT_NAME environment variable and creating the right service account with the right roles attached.

@davinchia
Copy link
Contributor

I got your point. It will be the same "security" concern for the default airbyte-admin service account.

Yeap. We don't have this concern today because we don't assign any service account to the job pods today besides the default account which has no permissions. The orchestrator pod is special because it's in charge of creating other pods. There is less risk here with assigning a service account because it's code is tightly controlled. Connector code typically has less strict requirements compared to core code, since anyone can and is encouraged to contribute a connector, and the airbyte protocol also makes it easy to use custom connectors as long as they implement the protocol.

What I meant by the AWS credentials are exposed, if anyone has access to the connector pods, he can easily read the AWS credentials.

Agree. However, the changes presented here only help in the read-only Kube access case. If a user has write/edit access, these changes here make it easier for someone to abuse credentials, since the service account is generally available in Kube. e.g. create a pod vs finding a completed pod and reading it's creds. In addition to that, since we know this is not an approach we want to take, we are adding tech debt that will make it more confusing for users in the future when we roll out a better solution.

@davinchia
Copy link
Contributor

davinchia commented Apr 9, 2022

I am closing this and moving the discussion to #11866.

@davinchia davinchia closed this Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/worker Related to worker community kubernetes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants