Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding GPU E2E Test #171

Closed
wants to merge 8 commits into from
Closed

Adding GPU E2E Test #171

wants to merge 8 commits into from

Conversation

Paramadon
Copy link
Contributor

@Paramadon Paramadon commented May 9, 2024

Issue #, if available:
Currently there is an integration test for GPU, but we need an e2e test for this.
Description of changes:
The main difference between integration test and this is that we install the eks addon instead of mocking the dcgm exporter and daemonsets/services.

Steps happening in the e2e

  1. Creating beta cluster with node groups that has gpu instance and gpu ami
  2. installing eks addon
  3. Running gpu test

E2E Test

https://github.com/aws/amazon-cloudwatch-agent-operator/actions/runs/9033457562

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

musa-asad
musa-asad previously approved these changes May 9, 2024
go test ${var.test_dir} -eksClusterName ${aws_eks_cluster.this.name} -computeType=EKS -v -eksDeploymentStrategy=DAEMON -eksGpuType=nvidia

# Get all pods and describe them
kubectl get pods --all-namespaces -o wide > pods.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to write it to a file?

"github.com/aws/amazon-cloudwatch-agent-test/test/test_runner"
)

type GPUTestSuite struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, you don't need a test suite for this e2e test. The purpose of test suite is to group similar tests into a group to execute them all in the suite. There is no other test will be added to this suite. Other accelerated compute instance types (eg tranium) will be its own test with a different instance type in terraform file that will execute a different test.

@Paramadon Paramadon closed this Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants