Skip to content

Conversation

@tilne
Copy link
Contributor

@tilne tilne commented Dec 22, 2020

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

lukeseawalker and others added 30 commits November 19, 2020 13:03
The new parameter allows to specify a custom node for the createami test.
This new parameter permits to specify a custom node URL, that is needed when version bump is done and node package is not yet present in PyPi.

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
Signed-off-by: Enrico Usai <usai@amazon.com>
* Refactor src and tests structure according to https://docs.pytest.org/en/stable/goodpractices.html
* Differentiated between tests with coverage and tests without coverage. Run without coverage with
  all supported Python version and test against the installed version of the CLI (installing from
  the sdist package). Run with coverage only for Python 3.8 - when running with coverage tests
  are executed against the package installed in development mode.
* Added Python 3.9 to tests
* Grouped all travis tasks in a single stage so that the run is faster
* Updated setup.py file to reflect the new structure and to add missing project information

Signed-off-by: Francesco De Martino <fdm@amazon.com>
Common tests with develop are put in common.yaml, included through jinja

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
…ws#2234)

1. Test `additional_sg` in the config file is added to head and compute nodes
2. Test `ssh_from` in the config file applies to the pcluster security group of the head node
3. Test `vpc_security_group_id` in the config file overwrites security group of head and compute nodes, FSx, and EFS

Signed-off-by: Hanwen <hanwenli@amazon.com>
* This test is to verify FSx file system launched by pcluster has the correct deployment type as user set in pcluster config file.
* FSx file system has three deployment types in commercial regions SCRATCH_1(dafault), SCRATCH_2, PERSISTENT_1

Signed-off-by: Yulei Wang <yuleiwan@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
Previously we were only continuing to poll when the state was one of
CREATING or TRANSFERRING. According to the boto3 docs, we should also
handle the PENDING state as well.

Signed-off-by: Tim Lane <tilne@amazon.com>
Add 1 second sleep to give time to sqswatcher to reconfigure the master with np = max_nodes * node_slots
This operation is performed right after sqswatcher removes the compute nodes from the scheduler

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
This removes the need of calling CloudFormation API at every docker container launch.
In order to do so a dependency on the head node substack has been introduced for the
AWS Batch substack. This makes the cluster creation slower by around 40% when awsbatch
is selected as the scheduler.

Signed-off-by: Francesco De Martino <fdm@amazon.com>
if `custom_node` is not specified, the `env` variable was referenced before assignment.

Signed-off-by: Enrico Usai <usai@amazon.com>
This makes sure we always download the latest for Amazon Linux 2

Signed-off-by: Francesco De Martino <fdm@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
This test uses `troposphere` to create cloudformation stacks for `efs`, `mount target`, and a instance to write an empty file with random name into the efs. Then the test verifies when the existing `efs` is provided through `efs_fs_id` in `pcluster` config file, the cluster created can read the randomly named file and share files between head node and compute node.

Signed-off-by: Hanwen <hanwenli@amazon.com>
Add support for io2 volume type for EBS section and Raid section, add integration test to test different volume types

Signed-off-by: chenwany <chenwany@amazon.com>
Signed-off-by: chenwany <chenwany@amazon.com>
Signed-off-by: Enrico Usai <usai@amazon.com>
Signed-off-by: Enrico Usai <usai@amazon.com>
Signed-off-by: Enrico Usai <usai@amazon.com>
Signed-off-by: Enrico Usai <usai@amazon.com>
Signed-off-by: Enrico Usai <usai@amazon.com>
When running pcluster in a region with free tier, default instance type is set to the free tier instance type. When running pcluster in the China (BJS) region or AWS GovCloud (US) regions, default instance type is t3.micro.
Free tier is not available in the China (BJS) region and AWS GovCloud (US) regions.
For more information about free tier, please see https://aws.amazon.com/free/free-tier-faqs/

Signed-off-by: Hanwen <hanwenli@amazon.com>
Move half p4d tests on PDX (us-west-2)

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
)

Compute instance type parameter is not rendered if scheduler is Slurm. This caused the error `Parameters: [ComputeInstanceType] must have values` in CloudFormation because a value was still expected.
With this commit we set "NONE" as default to prevent this value being silently used as if set by the user.

Signed-off-by: ddeidda <ddeidda@amazon.com>
Reason for this change is that not all the regions support c4.xlarge.
C5 family support is broader

What does this change solve? It allows to run the test where C4 isn't present

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
What does the change solve? The change allows to run the iam policies test on the regions where AWS Batch is not present

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
This config will be used as test bed for new region.

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
The `network_interfaces_count` parameter depends on `compute_instance_type`, hence it could fail if this parameter is not specified in the config file.
Since the default instance type will always have 1 network interface we can safely return 1 when compute_instance_type is not specified.

Signed-off-by: ddeidda <ddeidda@amazon.com>
The test using p4d.24xlarge with slurm scheduler is already performed by the test_hit_efa test
Change test_sit_efa to use sge and move it to us-west-2
Remove warning when using p4d.24xlarge with scheduler != slurm

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
yuleiwan and others added 25 commits December 14, 2020 22:32
Signed-off-by: Yulei Wang <yuleiwan@amazon.com>
The new EFA installer provides the EFA kmod for all supported OSs except for Centos8. This commit adds a validator to prevent EFA from being enabled on ARM architectures with Centos8.

Signed-off-by: ddeidda <ddeidda@amazon.com>
* Remove the ban of using p4d as head node

Signed-off-by: Hanwen <hanwenli@amazon.com>

* Update CHANGELOG.md

Co-authored-by: Francesco De Martino <demartinof@icloud.com>
* Modify hit_scaling tests to test logic when clustermgtd is down
* Computemgtd should terminate any instance in DOWN or POWER_SAVE state, or if slurmctld is down
* ResumeProgram should not launch any instance if clustermgtd is down

Signed-off-by: Rex <shuningc@amazon.com>
Signed-off-by: Yulei Wang <yuleiwan@amazon.com>
aws#2304)

1. Add `iam_lambda_role` parameter to the config file. If specified, this role will be attached to all Lambda function resources created by CloudFormation Templates.
2. If both `ec2_iam_role` and `iam_lambda_role` are provided, and the scheduler is `sge`, `torque`, or `slurm`, there will be no created by `pcluster` commands. Note that if `awsbatch` is the scheduler, there will be role created during `pcluster create`.
3. Integration tests: Extract some functions (role creation, policy creation) from `storage.kms_key_factory` to `conftest`. The code in `kms_key_factory` is kept untouched to limit the scale of this commit.

Signed-off-by: Hanwen <hanwenli@amazon.com>
Signed-off-by: chenwany <chenwany@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
The final number returned from `lspci -n` can be different from 0.

Signed-off-by: ddeidda <ddeidda@amazon.com>
P4d is now supported also as head node.

Signed-off-by: ddeidda <ddeidda@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
GPUs from manufacturers different from NVIDIA (ex. AMD) are currently not supported in ParallelCluster.
With this patch we introduce a warning message that will be printed when GPUs from a manufacturer different from NVIDIA are detected, and we prevent them from being set in compute resurces.

Signed-off-by: ddeidda <ddeidda@amazon.com>
Signed-off-by: Rex <shuningc@amazon.com>
When P4d instances are used as head node, the parameter use_public_ips must be set to true in order for the public IP to be assigned to the instance.

Signed-off-by: ddeidda <ddeidda@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
Signed-off-by: Francesco De Martino <fdm@amazon.com>
Signed-off-by: Tim Lane <tilne@amazon.com>
Modify the iops and size range ro unblock user create io2 Block Express volume

Signed-off-by: chenwany <chenwany@amazon.com>
Changelog
```
  - EFA configuration: ``efa-config-1.7`` (from efa-config-1.5)
  - EFA profile: ``efa-profile-1.3`` (from efa-profile-1.1)
  - EFA kernel module: ``efa-1.10.2`` (no change)
  - RDMA core: ``rdma-core-31.2amzn`` (from rdma-core-31.amzn0)
  - Libfabric: ``libfabric-1.11.1amzn1.0`` (from libfabric-1.11.1amzn1.1)
  - Open MPI: ``openmpi40-aws-4.1.0`` (from openmpi40-aws-4.0.5)
```

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
Build Number 597
aws-parallelcluster-cookbook Git hash: d5378bb60f7810bb2f467e5ada9589cc8607ee2e
aws-parallelcluster-node Git hash: ae7c4b123d18399361b85e31473ad9ee53b21e45

Signed-off-by: ParallelCluster AMI bot <ec2-ds9-dev@amazon.com>
@tilne tilne added the skip-changelog-update Disables the check that enforces changelog updates in PRs label Dec 22, 2020
@codecov
Copy link

codecov bot commented Dec 22, 2020

Codecov Report

Merging #2333 (f38d0a8) into release-2.10 (b518228) will increase coverage by 0.01%.
The diff coverage is 78.76%.

Impacted file tree graph

@@               Coverage Diff                @@
##           release-2.10    #2333      +/-   ##
================================================
+ Coverage         61.81%   61.83%   +0.01%     
================================================
  Files                39       40       +1     
  Lines              6060     6186     +126     
================================================
+ Hits               3746     3825      +79     
- Misses             2314     2361      +47     
Impacted Files Coverage Δ
cli/src/awsbatch/awsbhosts.py 0.00% <ø> (ø)
cli/src/awsbatch/awsbkill.py 0.00% <ø> (ø)
cli/src/awsbatch/awsbout.py 0.00% <ø> (ø)
cli/src/awsbatch/awsbqueues.py 0.00% <ø> (ø)
cli/src/awsbatch/awsbstat.py 93.30% <ø> (ø)
cli/src/awsbatch/awsbsub.py 0.00% <ø> (ø)
cli/src/awsbatch/common.py 42.78% <0.00%> (ø)
cli/src/awsbatch/utils.py 67.34% <ø> (ø)
cli/src/pcluster/cli.py 0.00% <ø> (ø)
...uster/cli_commands/compute_fleet_status_manager.py 94.04% <ø> (ø)
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b518228...f38d0a8. Read the comment docs.

@tilne
Copy link
Contributor Author

tilne commented Dec 22, 2020

Since it appears to be stuck, I'm going to disable the Travis checks on this branch.

The CFN linter failure is expected.

Merging.

@tilne tilne merged commit 73a9ad5 into aws:release-2.10 Dec 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog-update Disables the check that enforces changelog updates in PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants