This repository was archived by the owner on Mar 13, 2023. It is now read-only.
generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 27
Enable Slurm REST API #197
Merged
mtfranchetto
merged 3 commits into
aws-samples:enable-slurm-rest-api
from
rkilpadi:main
Aug 25, 2022
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| +++ | ||
| title = "g. Slurm REST API 🌀" | ||
| weight = 27 | ||
| +++ | ||
|
|
||
| Enable the Slurm REST API. Requires Slurm Accounting. | ||
|
|
||
| ## Step 1 - Setup Slurm Accounting | ||
|
|
||
| Slurm Accounting is required to enable the Slurm REST API. Follow the [instructions](https://pcluster.cloud/02-tutorials/02-slurm-accounting.html) to enable Slurm Accounting but **do not begin cluster creation** after completing Step 4. | ||
|
|
||
| ## Step 2 - Create a Security Group to allow inbound HTTPS traffic | ||
|
|
||
| By default, your cluster will not be able to accept incoming HTTPS requests to the REST API. You will need to [create a security group](https://console.aws.amazon.com/ec2/v2/home?#CreateSecurityGroup:) to change this. | ||
|
|
||
| 1. Under `Security group name`, enter "Slurm REST API" (or another name of your choosing) | ||
| 2. Ensure `VPC` matches the cluster's VPC | ||
| 3. Delete any outbound rules that may have been automatically generated | ||
| 4. Add an inbound rule and select `HTTPS` under `Type` and `My IP` under `Destination` | ||
| 5. Click `Create security group` | ||
|
|
||
|  | ||
|
|
||
| ## Step 3 - Configure your cluster | ||
|
|
||
| In your cluster configuration, return to the Head Node section and add your security group. | ||
|
|
||
|  | ||
|
|
||
| Under `Advanced options`, you should have already added a script for Slurm Accounting. | ||
| In the same multi-runner, click `Add Script` and select `Slurm REST API`. | ||
|
|
||
|  | ||
|
|
||
| Create your cluster. Make sure you followed the Slurm Accounting tutorial for the rest of the configuration. | ||
|
|
||
| ## Step 4 - Submit a job | ||
|
|
||
| Once the cluster has been successfully created, go to the `Job Scheduling` tab and select `Submit Job` | ||
|
|
||
| Choose a name for your job, a number of nodes to run under, and select `Run a script (manual entry)` and enter `srun sleep 30` on line 2 under `#!/bin/bash`. | ||
|
|
||
|  | ||
|
|
||
| Click `submit`. If the job was successful, it should be listed as `COMPLETED` in about 30 seconds. | ||
|
|
||
| ### Troubleshooting | ||
|
|
||
| If jobs aren't submitting, it's likely because of security groups. Try manually adding your IP to the new security group you created. | ||
| If you're still running issues, you can select `Any IPv4` as the destination (**WARNING:** this may have potential security risks). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| user nginx; | ||
| worker_processes auto; | ||
| error_log /var/log/nginx/error.log; | ||
| pid /run/nginx.pid; | ||
|
|
||
| # Load dynamic modules. See /usr/share/doc/nginx/README.dynamic. | ||
| include /usr/share/nginx/modules/*.conf; | ||
|
|
||
| events { | ||
| worker_connections 1024; | ||
| } | ||
|
|
||
| http { | ||
| log_format main '$remote_addr - $remote_user [$time_local] "$request" ' | ||
| '$status $body_bytes_sent "$http_referer" ' | ||
| '"$http_user_agent" "$http_x_forwarded_for"'; | ||
|
|
||
| access_log /var/log/nginx/access.log main; | ||
|
|
||
| sendfile on; | ||
| tcp_nopush on; | ||
| tcp_nodelay on; | ||
| keepalive_timeout 65; | ||
| types_hash_max_size 4096; | ||
|
|
||
| include /etc/nginx/mime.types; | ||
| default_type application/octet-stream; | ||
|
|
||
| # Load modular configuration files from the /etc/nginx/conf.d directory. | ||
| # See http://nginx.org/en/docs/ngx_core_module.html#include | ||
| # for more information. | ||
| include /etc/nginx/conf.d/*.conf; | ||
|
|
||
| # Settings for a TLS enabled server. | ||
|
|
||
| server { | ||
| listen 443 ssl http2; | ||
| listen [::]:443 ssl http2; | ||
| server_name _; | ||
| root /usr/share/nginx/html; | ||
|
|
||
| location / { | ||
| proxy_pass http://slurmrestd; | ||
|
|
||
| # auth_request /validate/; | ||
| # auth_request_set $user_name $upstream_http_x_slurm_user_name; | ||
| # auth_request_set $user_token $upstream_http_x_slurm_user_token; | ||
| # proxy_set_header X-SLURM-USER-NAME $user_name; | ||
| # proxy_set_header X-SLURM-USER-TOKEN $user_token; | ||
| proxy_set_header Host $host; | ||
| proxy_set_header X-Real-IP $remote_addr; | ||
| proxy_set_header X-Forwarded-Port $server_port; | ||
| proxy_pass_request_headers on; | ||
| } | ||
|
|
||
| ssl_certificate /etc/ssl/certs/nginx-selfsigned.crt; | ||
| ssl_certificate_key /etc/ssl/certs/nginx-selfsigned.key; | ||
| ssl_client_certificate /etc/ssl/certs/ca.crt; | ||
| ssl_session_cache shared:SSL:1m; | ||
| ssl_session_timeout 10m; | ||
| ssl_ciphers HIGH:!aNULL:!MD5; | ||
| ssl_prefer_server_ciphers on; | ||
|
|
||
| # Load configuration files for the default server block. | ||
| include /etc/nginx/default.d/*.conf; | ||
|
|
||
| error_page 404 /404.html; | ||
| location = /40x.html { | ||
| } | ||
|
|
||
| error_page 500 502 503 504 /50x.html; | ||
| location = /50x.html { | ||
| } | ||
| } | ||
|
|
||
| upstream slurmrestd { | ||
| server unix:/var/spool/socket/slurmrestd.sock; | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,166 @@ | ||
| require 'json' | ||
| return if node['cluster']['node_type'] != 'HeadNode' | ||
|
|
||
| slurm_etc = '/opt/slurm/etc' | ||
| socket_location = '/var/spool/socket' | ||
| state_save_location = '/var/spool/slurm.state' | ||
| key_location = state_save_location + '/jwt_hs256.key' | ||
| certs_location = '/etc/ssl/certs' | ||
| key_and_crt_name = 'nginx-selfsigned' | ||
| id = 2005 | ||
|
|
||
| # Configure Slurm for JWT authentication | ||
| ruby_block 'Create JWT key file' do | ||
| block do | ||
| shell_out!("dd if=/dev/random of=#{key_location} bs=32 count=1") | ||
| end | ||
| end | ||
|
|
||
| file key_location do | ||
| owner 'slurm' | ||
| group 'slurm' | ||
| mode '0600' | ||
| end | ||
|
|
||
| directory state_save_location do | ||
| owner 'slurm' | ||
| group 'slurm' | ||
| mode '0755' | ||
| end | ||
|
|
||
| ruby_block 'Add JWT configuration to slurm.conf' do | ||
| block do | ||
| file = Chef::Util::FileEdit.new("#{slurm_etc}/slurm.conf") | ||
| file.insert_line_after_match(/AuthType=*/, "AuthAltParameters=jwt_key=#{key_location}") | ||
| file.insert_line_after_match(/AuthType=*/, "AuthAltTypes=auth/jwt") | ||
| file.write_file | ||
| end | ||
| not_if "grep -q auth/jwt #{slurm_etc}/slurm.conf" | ||
| end | ||
|
|
||
| ruby_block 'Add JWT configuration to slurmdbd.conf' do | ||
| block do | ||
| file = Chef::Util::FileEdit.new("#{slurm_etc}/slurmdbd.conf") | ||
| file.insert_line_after_match(/AuthType=*/, "AuthAltParameters=jwt_key=#{key_location}") | ||
| file.insert_line_after_match(/AuthType=*/, "AuthAltTypes=auth/jwt") | ||
| file.write_file | ||
| end | ||
| not_if "grep -q auth/jwt #{slurm_etc}/slurmdbd.conf" | ||
| end | ||
|
|
||
| service 'slurmctld' do | ||
| action :restart | ||
| end | ||
|
|
||
| ruby_block 'Generate JWT token and create/update AWS secret' do | ||
| block do | ||
| token_name = "slurm_token_" + node['cluster']['stack_name'] | ||
| region = node['cluster']['region'] | ||
|
|
||
| jwt_token = shell_out!("/opt/slurm/bin/scontrol token lifespan=9999999999 \ | ||
| | grep -oP '^SLURM_JWT\\s*\\=\\s*\\K(.+)'").run_command.stdout | ||
|
|
||
| begin | ||
| shell_out!("aws secretsmanager create-secret \ | ||
| --name #{token_name} \ | ||
| --region #{region} \ | ||
| --secret-string #{jwt_token}" | ||
| ).run_command | ||
| rescue | ||
| shell_out!("aws secretsmanager update-secret \ | ||
| --secret-id #{token_name} \ | ||
| --region #{region} \ | ||
| --secret-string #{jwt_token}" | ||
| ).run_command | ||
| end | ||
| end | ||
| end | ||
|
|
||
| # NGINX installation and configuration | ||
| package 'nginx' do | ||
| action :install | ||
| end | ||
|
|
||
| ruby_block 'Generate self-signed key' do | ||
| block do | ||
| shell_out!("sudo openssl req -x509 -nodes -days 36500 -newkey rsa:2048 \ | ||
| -keyout /etc/ssl/certs/nginx-selfsigned.key \ | ||
| -out /etc/ssl/certs/nginx-selfsigned.crt \ | ||
| -subj ""/CN=#{node['cluster']['stack_name']}""" | ||
| ).run_command | ||
| end | ||
| end | ||
|
|
||
| group 'nginx' do | ||
| comment 'nginx group' | ||
| gid id + 1 | ||
| system true | ||
| end | ||
|
|
||
| user 'nginx' do | ||
| comment 'nginx user' | ||
| uid id + 1 | ||
| gid id + 1 | ||
| system true | ||
| end | ||
|
|
||
| file '/etc/nginx/nginx.conf' do | ||
| owner 'nginx' | ||
| group 'nginx' | ||
| mode '0644' | ||
| content ::File.open('/tmp/slurm_rest_api/nginx.conf').read | ||
| end | ||
|
|
||
| service 'nginx' do | ||
| action :start | ||
| end | ||
|
|
||
| # Enable slurmrestd | ||
| # TODO: Not idempotent if user is in process | ||
| group 'slurmrestd' do | ||
| comment 'slurmrestd group' | ||
| gid id | ||
| system true | ||
| end | ||
|
|
||
| user 'slurmrestd' do | ||
| comment 'slurmrestd user' | ||
| uid id | ||
| gid id | ||
| system true | ||
| end | ||
|
|
||
| directory socket_location do | ||
| owner 'nginx' | ||
| group 'nginx' | ||
| mode '0777' | ||
| end | ||
|
|
||
| file '/etc/systemd/system/slurmrestd.service' do | ||
| owner 'slurmrestd' | ||
| group 'slurmrestd' | ||
| mode '0644' | ||
| content ::File.open('/tmp/slurm_rest_api/slurmrestd.service').read | ||
| end | ||
|
|
||
| service 'slurmrestd' do | ||
| action :start | ||
| end | ||
|
|
||
| ruby_block 'Wait for slurmrestd' do | ||
| block do | ||
| iter=0 | ||
| until ::File.exists?("#{socket_location}/slurmrestd.sock") || iter > 20 do | ||
| sleep 1 | ||
| iter += 1 | ||
| end | ||
| raise "Timeout waiting for slurmrestd startup" unless iter < 20 | ||
| end | ||
| end | ||
|
|
||
| ruby_block 'Modify socket permissions' do | ||
| notifies :start, 'service[slurmrestd]', :before | ||
| block do | ||
| shell_out!("chmod 0666 #{socket_location}/slurmrestd.sock").run_command | ||
| end | ||
| end | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| [Unit] | ||
| Description=Slurm REST daemon | ||
| After=network.target munge.service slurmctld.service | ||
| ConditionPathExists=/opt/slurm/etc/slurm.conf | ||
| Documentation=man:slurmrestd(8) | ||
|
|
||
| [Service] | ||
| Type=simple | ||
| User=slurmrestd | ||
| Group=slurmrestd | ||
| Environment="SLURM_JWT=daemon" | ||
| ExecStart=/opt/slurm/sbin/slurmrestd unix:/var/spool/socket/slurmrestd.sock -a rest_auth/jwt | ||
| ExecReload=/bin/kill -HUP $MAINPID | ||
|
|
||
| [Install] | ||
| WantedBy=multi-user.target |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| #!/bin/bash | ||
|
|
||
| set -x | ||
| set -e | ||
|
|
||
| mkdir -p /tmp/slurm_rest_api | ||
| pushd /tmp/slurm_rest_api | ||
|
|
||
| # Copy Slurm configuration files | ||
| source_path=https://raw.githubusercontent.com/aws-samples/pcluster-manager/main/resources/files | ||
| files=(slurmrestd.service slurm_rest_api.rb nginx.conf) | ||
| for file in "${files[@]}" | ||
| do | ||
| wget -qO- ${source_path}/sacct/${file} > ${file} | ||
| done | ||
|
|
||
| sudo cinc-client \ | ||
| --local-mode \ | ||
| --config /etc/chef/client.rb \ | ||
| --log_level auto \ | ||
| --force-formatter \ | ||
| --no-color \ | ||
| --chef-zero-port 8889 \ | ||
| -j /etc/chef/dna.json \ | ||
| -z slurm_rest_api.rb | ||
|
|
||
| set +e |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.