-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Update Spark operator example to use new bootstrap flag for instance store volume RAID0 configuration #237
Conversation
…stance store volume RAID0 configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the blueprint @bryantbiggs . I left few comments.
kube-proxy = { | ||
coredns = {} | ||
kube-proxy = {} | ||
vpc-cni = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need the VPC CNI policies to be added to this add-on? Is it a default now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no - ref #244
aws_for_fluentbit_cw_log_group = { | ||
create = true | ||
use_name_prefix = false | ||
name = "/${local.name}/aws-fluentbit-logs" # Add-on creates this log group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name
is added without a prefix so that we can use the same name in cloudwatch_log_group
variable below to write the logs. Any better idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-added
enable_aws_load_balancer_controller = true | ||
aws_load_balancer_controller = { | ||
version = "1.4.7" | ||
timeout = "300" | ||
} | ||
|
||
enable_ingress_nginx = true | ||
ingress_nginx = { | ||
version = "4.5.2" | ||
timeout = "300" | ||
values = [templatefile("${path.module}/helm-values/nginx-values.yaml", {})] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two are being used for setting up Spark Live UI using path based routing for each Spark Job. Keep these two but add some comments that it is added for building Spark Live UI with Spark Operator config.
Can we get the output of the NLB DNS Name from this add-on and add this here
data-on-eks/analytics/terraform/spark-k8s-operator/helm-values/spark-operator-values.yaml
Line 42 in ad33f03
#ingressUrlFormat: '<ENTER_NLB_DNS_NAME/CUSTOM_DOMAIN_NAME>/{{$appName}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-added
data "aws_iam_policy_document" "fluent_bit" { | ||
statement { | ||
sid = "" | ||
effect = "Allow" | ||
resources = ["arn:${data.aws_partition.current.partition}:s3:::${module.s3_bucket.s3_bucket_id}/*"] | ||
|
||
actions = [ | ||
"s3:ListBucket", | ||
"s3:PutObject", | ||
"s3:PutObjectAcl", | ||
"s3:GetObject", | ||
"s3:GetObjectAcl", | ||
"s3:DeleteObject", | ||
"s3:DeleteObjectVersion" | ||
] | ||
} | ||
|
||
statement { | ||
sid = "" | ||
effect = "Allow" | ||
resources = ["arn:${data.aws_partition.current.partition}:logs:${data.aws_region.current.id}:${data.aws_caller_identity.current.account_id}:log-group:*"] | ||
|
||
actions = [ | ||
"logs:CreateLogGroup", | ||
"logs:CreateLogStream", | ||
"logs:DescribeLogGroups", | ||
"logs:DescribeLogStreams", | ||
"logs:PutLogEvents", | ||
"logs:PutRetentionPolicy", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these policies defaults now in our FluentBit add-on? If yes, we can remove that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we had the cloudwatch permissions and I just added the S3 permissions in aws-ia/terraform-aws-eks-blueprints-addons#203
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -81,7 +81,7 @@ spec: | |||
executor: | |||
volumeMounts: | |||
- name: spark-local-dir-1 | |||
mountPath: /data1 | |||
mountPath: /mnt/k8s-disks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test any of these examples after the new RAID0 config? Pod mountPath
(e.g., /data1) is new directory and its different form the mountPoint(/mnt/k8s-disks
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted the /data1
changes. I did test the pyspark-pi-job.yaml
:
k get pods -n spark-team-a -w
NAME READY STATUS RESTARTS AGE
pyspark-pi-karpenter-driver 0/1 Pending 0 0s
pyspark-pi-karpenter-driver 0/1 Pending 0 3s
pyspark-pi-karpenter-driver 0/1 Pending 0 47s
pyspark-pi-karpenter-driver 0/1 ContainerCreating 0 48s
pyspark-pi-karpenter-driver 1/1 Running 0 3m11s
pythonpi-8fbf89894a5269c5-exec-1 0/1 Pending 0 0s
pythonpi-8fbf89894a5269c5-exec-2 0/1 Pending 0 0s
pythonpi-8fbf89894a5269c5-exec-1 0/1 Pending 0 1s
pythonpi-8fbf89894a5269c5-exec-2 0/1 Pending 0 1s
pythonpi-8fbf89894a5269c5-exec-1 0/1 Pending 0 41s
pythonpi-8fbf89894a5269c5-exec-2 0/1 Pending 0 41s
pythonpi-8fbf89894a5269c5-exec-1 0/1 ContainerCreating 0 42s
pythonpi-8fbf89894a5269c5-exec-2 0/1 ContainerCreating 0 42s
pythonpi-8fbf89894a5269c5-exec-1 1/1 Running 0 112s
pythonpi-8fbf89894a5269c5-exec-2 1/1 Running 0 112s
pythonpi-8fbf89894a5269c5-exec-1 1/1 Terminating 0 118s
pythonpi-8fbf89894a5269c5-exec-2 1/1 Terminating 0 118s
pyspark-pi-karpenter-driver 0/1 Completed 0 5m20s
pyspark-pi-karpenter-driver 0/1 Completed 0 5m22s
pythonpi-8fbf89894a5269c5-exec-2 0/1 Terminating 0 2m5s
pythonpi-8fbf89894a5269c5-exec-2 0/1 Terminating 0 2m5s
pythonpi-8fbf89894a5269c5-exec-2 0/1 Terminating 0 2m5s
pythonpi-8fbf89894a5269c5-exec-1 0/1 Terminating 0 2m5s
pythonpi-8fbf89894a5269c5-exec-1 0/1 Terminating 0 2m5s
pythonpi-8fbf89894a5269c5-exec-1 0/1 Terminating 0 2m5s
pyspark-pi-karpenter-driver 0/1 Terminating 0 8m6s
pyspark-pi-karpenter-driver 0/1 Terminating 0 8m6s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the ./taxi-trip-execute.sh
example:
k get pods -n spark-team-a -w
NAME READY STATUS RESTARTS AGE
taxi-trip 0/1 Pending 0 7s
taxi-trip 0/1 Pending 0 44s
taxi-trip 0/1 Init:0/1 0 45s
taxi-trip 0/1 Init:0/1 0 55s
taxi-trip 0/1 PodInitializing 0 57s
taxi-trip 1/1 Running 0 84s
taxi-trip-exec-1 0/1 Pending 0 0s
taxi-trip-exec-2 0/1 Pending 0 0s
taxi-trip-exec-3 0/1 Pending 0 0s
taxi-trip-exec-1 0/1 Pending 0 0s
taxi-trip-exec-4 0/1 Pending 0 0s
taxi-trip-exec-2 0/1 Pending 0 1s
taxi-trip-exec-3 0/1 Pending 0 1s
taxi-trip-exec-4 0/1 Pending 0 1s
taxi-trip-exec-3 0/1 Pending 0 46s
taxi-trip-exec-2 0/1 Pending 0 46s
taxi-trip-exec-1 0/1 Pending 0 46s
taxi-trip-exec-3 0/1 Init:0/1 0 47s
taxi-trip-exec-1 0/1 Init:0/1 0 47s
taxi-trip-exec-2 0/1 Init:0/1 0 47s
taxi-trip-exec-4 0/1 Pending 0 52s
taxi-trip-exec-4 0/1 Init:0/1 0 52s
taxi-trip-exec-4 0/1 Init:0/1 0 53s
taxi-trip-exec-4 0/1 PodInitializing 0 54s
taxi-trip-exec-1 0/1 Init:0/1 0 56s
taxi-trip-exec-2 0/1 Init:0/1 0 56s
taxi-trip-exec-3 0/1 Init:0/1 0 57s
taxi-trip-exec-1 0/1 PodInitializing 0 57s
taxi-trip-exec-2 0/1 PodInitializing 0 57s
taxi-trip-exec-3 0/1 PodInitializing 0 58s
taxi-trip-exec-1 1/1 Running 0 77s
taxi-trip-exec-2 1/1 Running 0 77s
taxi-trip-exec-3 1/1 Running 0 77s
taxi-trip-exec-4 1/1 Running 0 81s
taxi-trip-exec-1 1/1 Terminating 0 11m
taxi-trip-exec-2 1/1 Terminating 0 11m
taxi-trip-exec-3 1/1 Terminating 0 11m
taxi-trip-exec-4 1/1 Terminating 0 11m
taxi-trip-exec-4 0/1 Terminating 0 11m
taxi-trip-exec-4 0/1 Terminating 0 11m
taxi-trip-exec-4 0/1 Terminating 0 11m
taxi-trip-exec-2 0/1 Terminating 0 11m
taxi-trip-exec-2 0/1 Terminating 0 11m
taxi-trip-exec-2 0/1 Terminating 0 11m
taxi-trip-exec-3 0/1 Terminating 0 11m
taxi-trip-exec-3 0/1 Terminating 0 11m
taxi-trip-exec-3 0/1 Terminating 0 11m
taxi-trip-exec-1 0/1 Terminating 0 11m
taxi-trip-exec-1 0/1 Terminating 0 11m
taxi-trip-exec-1 0/1 Terminating 0 11m
taxi-trip 0/1 Completed 0 13m
taxi-trip 0/1 Completed 0 13m
taxi-trip 0/1 Terminating 0 16m
taxi-trip 0/1 Terminating 0 16m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bryantbiggs Thanks for the update 👍🏼
We may have to look at the Website doc for this blueprint and replace /local1
with /mnt/k8s-disks
before merging this PR.
…es are updated and kept in sync
What does this PR do?
Motivation
More
website/docs
orwebsite/blog
section for this featurepre-commit run -a
with this PR. Link for installing pre-commit locallyFor Moderators
Additional Notes