diff --git a/lang/en/docs/infrastructure/clusters/aws.md b/lang/en/docs/infrastructure/clusters/aws.md index 0ce6e8ee4..0d94f5bbd 100644 --- a/lang/en/docs/infrastructure/clusters/aws.md +++ b/lang/en/docs/infrastructure/clusters/aws.md @@ -4,37 +4,32 @@ This page contains information about clusters hosted on Amazon Web Services[^1] ## Clusters -The following table provides information about available clusters on Amazon Web Services (AWS) cloud computing platform. The latest cluster status can be found on Clusters page in web application. +The following table provides information about available clusters on Amazon Web Services (AWS) cloud computing platform. +The latest cluster status can be found on Clusters +page in web application. -| Name | Master Hostname | Location | -| :---: | :---: | :---: | -| cluster-001 | master-production-20160630-cluster-001.exabyte.io | West US | +| Name | Master Hostname | Location | +|:-------------:|:---------------------------------------------------:|:--------:| +| `cluster-002` | `master-production-20250821-cluster-001.mat3ra.com` | West US | ## Queues -The list of currently enabled queues is given below. Price per core hour is shown in relation to the [relative unit price](../../pricing/service-levels.md#comparison-table) and is subject to change at any time. Total number of nodes can be increased upon [request](../../ui/support.md). - -| Name | Category[^2] | Mode[^3] | Charge Policy[^4] | Price | Max Nodes per Job+ | Max Nodes Total | -| :---: | :---: | :---: | :---: | :---: | :---: | :---: | -| D | debug | debug | core-seconds | 2.251 | 1 | 10 | -| OR | ordinary | regular | core-seconds | 1.000 | 1 | 10 | -| OR4 | ordinary | regular | core-seconds | 1.126 | 1 | 20 | -| OR8 | ordinary | regular | core-seconds | 1.126 | 1 | 20 | -| OR16 | ordinary | regular | core-seconds | 1.126 | 1 | 20 | -| OF | ordinary | fast | core-hours | 1.000 | ≤5 | 100 | -| OFplus| ordinary | fast | core-hours | 0.962 | ≤5 | 10 | -| SR | saving | regular | core-seconds | 0.200 | 1 | 10 | -| SR4 | saving | regular | core-seconds | 0.225 | 1 | 20 | -| SR8 | saving | regular | core-seconds | 0.225 | 1 | 20 | -| SR16 | saving | regular | core-seconds | 0.225 | 1 | 20 | -| SF | saving | fast | core-hours | 0.200 | ≤5 | 100 | -| SFplus| saving | fast | core-hours | 0.379 | ≤5 | 10 | -| GOF | ordinary | fast | core-hours | 8.655 | ≤5 | 10 | -| G4OF | ordinary | fast | core-hours | 8.655 | ≤5 | 10 | -| G8OF | ordinary | fast | core-hours | 8.655 | ≤5 | 10 | -| GSF | saving | fast | core-hours | 3.370 | ≤5 | 10 | -| G4SF | saving | fast | core-hours | 4.158 | ≤5 | 10 | -| G8SF | saving | fast | core-hours | 4.335 | ≤5 | 10 | +The list of currently enabled queues is given below. Price per core hour is shown in relation to +the [relative unit price](../../pricing/service-levels.md#comparison-table) and is subject to change at any time. Total +number of nodes can be increased upon [request](../../ui/support.md). + +| Name | Category[^2] | Mode[^3] | Charge Policy[^4] | Price | Max Nodes per Job+ | Max Nodes Total | +|:------:|:------------:|:--------:|:-----------------:|:-----:|:-----------------------------:|:---------------:| +| D | debug | debug | core-seconds | 2.251 | 1 | 10 | +| OR | ordinary | regular | core-seconds | 1.000 | 1 | 10 | +| OF | ordinary | fast | core-hours | 1.000 | 10 | 100 | +| OFplus | ordinary | fast | core-hours | 0.962 | 5 | 10 | +| SR | saving | regular | core-seconds | 0.200 | 1 | 10 | +| SF | saving | fast | core-hours | 0.200 | 10 | 100 | +| SFplus | saving | fast | core-hours | 0.379 | 5 | 10 | +| GOF | ordinary | fast | core-hours | 8.655 | 5 | 10 | +| GSF | saving | fast | core-hours | 1.731 | 5 | 10 | +| G4OF | ordinary | fast | core-hours | 8.655 | 5 | 10 | + please contact support to inquire about attempting a larger node count per job @@ -42,31 +37,22 @@ The list of currently enabled queues is given below. Price per core hour is show The following table contains hardware specifications for the above queues. -| Name | CPU[^5] | Cores per Node | GPU[^6] | GPU per Node | Memory (GB) | Bandwidth (Gbps) | -| :---: | :---: | :---: | :---: | :---: | :---: | :---: | -| D | c-3 | 8 | - | - | 15 | ≤10 | -| OR | c-3 | 36 | - | - | 60 | ≤10 | -| OR4 | c-3 | 4 | - | - | 7.5 | ≤10 | -| OR8 | c-3 | 8 | - | - | 15 | ≤10 | -| OR16 | c-3 | 16 | - | - | 30 | ≤10 | -| OF | c-3 | 36 | - | - | 60 | 10 | -| OFplus| c-5 | 72 | - | - | 144 | 25 | -| SR | c-3 | 36 | - | - | 60 | 10 | -| SR4 | c-3 | 4 | - | - | 7.5 | ≤10 | -| SR8 | c-3 | 8 | - | - | 15 | ≤10 | -| SR16 | c-3 | 16 | - | - | 30 | ≤10 | -| SF | c-3 | 36 | - | - | 60 | 10 | -| SFplus| c-5 | 72 | - | - | 144 | 25 | -| GOF | c-4 | 8 | g-1 | 1 | 61 | 10 | -| G4OF | c-4 | 32 | g-1 | 4 | 244 | 10 | -| G8OF | c-4 | 64 | g-1 | 8 | 488 | 25 | -| GSF | c-4 | 8 | g-1 | 1 | 61 | 10 | -| G4SF | c-4 | 32 | g-1 | 4 | 244 | 10 | -| G8SF | c-4 | 64 | g-1 | 8 | 488 | 25 | - +| Name | CPU[^5] | Cores per Node | GPU[^6] | GPU per Node | Memory (GB) | Bandwidth (Gbps) | Instance Type | +|:------:|:-------:|:--------------:|:-------:|:------------:|:-----------:|:----------------:|:-------------------:| +| D | c-3 | 4 | - | - | 15 | ≤10 | c4.2xlarge | +| OR | c-3 | 36 | - | - | 60 | ≤10 | c4.8xlarge | +| OF | c-3 | 36 | - | - | 60 | 10 | c4.8xlarge | +| OFplus | c-5 | 72 | - | - | 192 | 100 | c5n.18xlarge | +| SR | c-3 | 36 | - | - | 60 | 10 | c4.8xlarge | +| SF | c-3 | 36 | - | - | 60 | 10 | c4.8xlarge | +| SFplus | c-5 | 72 | - | - | 192 | 100 | c5n.18xlarge | +| GOF | c-8 | 8 | g-3 | 8 | 1152 | 400 | p4d.24xlarge | +| GSF | c-8 | 8 | g-3 | 8 | 1152 | 400 | p4d.24xlarge | +| G4OF | c-4 | 32 | g-4 | 1 | 256 | 10 | p5.4xlarge | !!! note "Hyper-threading" - Hyper-threading[^7] is enabled on all AWS compute nodes by default. It is recommended to use half of available cores on each compute node (e.g 18 cores on OF queue) if the application does not benefit from the extra virtual cores. +Hyper-threading[^7] is enabled on all AWS compute nodes by default. It is recommended to use half of available cores on +each compute node (e.g. 18 cores on OF queue) if the application does not benefit from the extra virtual cores. ## Links diff --git a/lang/en/docs/infrastructure/clusters/azure.md b/lang/en/docs/infrastructure/clusters/azure.md index a6ef99162..fb148f516 100644 --- a/lang/en/docs/infrastructure/clusters/azure.md +++ b/lang/en/docs/infrastructure/clusters/azure.md @@ -4,70 +4,59 @@ This page contains information about clusters hosted on Microsoft Azure[^1] and ## Clusters -The following table provides information about available clusters on Microsoft Azure cloud computing platform. The latest cluster status can be found on Clusters page in web application. +The following table provides information about available clusters on Microsoft Azure cloud computing platform. The +latest cluster status can be found on Clusters page +in web application. -| Name | Hostname | Location | -| :---: | :---: | :---: | -| cluster-007 | master-production-20160630-cluster-007.exabyte.io | East US | +| Name | Hostname | Location | +|:-----------:|:-------------------------------------------------:|:--------:| +| cluster-003 | master-production-20250821-cluster-003.mat3ra.com | East US | ## Queues -The list of currently enabled queues is given below. Price per core hour is shown in relation to the [relative unit price](../../pricing/service-levels.md#comparison-table) and is subject to change at any time. Total number of nodes can be increased upon [request](../../ui/support.md). - -| Name | Category[^2] | Mode[^3] | Charge Policy[^4] | Price | Max Nodes per Job+ | Max Nodes Total | -| :---: | :---: | :---: | :---: | :---: | :---: | :---: | -| D | debug | debug | core-seconds | 4.002 | 1 | 10 | -| OR | ordinary | regular | core-seconds | 1.275 | 1 | 10 | -| OF | ordinary | fast | core-hours | 1.275 | ≤5 | 100 | -| OFplus| ordinary | fast | core-hours | 1.275 | 5 | 10 | -| SR | saving | regular | core-seconds | 0.379 | 1 | 10 | -| SF | saving | fast | core-hours | 0.379 | 1* | 100 | -| SFplus | saving | fast | core-hours | 0.379 | 5 | 10 | -| GPOF | ordinary | fast | core-hours | 6.110 | ≤5 | 10 | -| GP2OF | ordinary | fast | core-hours | 6.110 | ≤5 | 10 | -| GP4OF | ordinary | fast | core-hours | 6.110 | ≤5 | 10 | -| GPSF | saving | fast | core-hours | 1.222 | ≤5 | 10 | -| GP2SF | saving | fast | core-hours | 1.222 | ≤5 | 10 | -| GP4SF | saving | fast | core-hours | 1.222 | ≤5 | 10 | +The list of currently enabled queues is given below. Price per core hour is shown in relation to +the [relative unit price](../../pricing/service-levels.md#comparison-table) and is subject to change at any time. Total +number of nodes can be increased upon [request](../../ui/support.md). -+ please contact support to inquire about attempting a larger node count per job - -* presently the infrastructure limitations are not allowing for the multi-node communication in SF queue, so only single-node jobs should be attempted (as of Oct 2022) +| Name | Category[^2] | Mode[^3] | Charge Policy[^4] | Price | Max Nodes per Job+ | Max Nodes Total | +|:------:|:------------:|:--------:|:-----------------:|:-----:|:-----------------------------:|:---------------:| +| D | debug | ordinary | core-seconds | 4.002 | 1 | 10 | +| OR | regular | ordinary | core-seconds | 1.275 | 1 | 10 | +| SR | regular | saving | core-seconds | 0.379 | 1 | 10 | +| OF | fast | ordinary | core-hours | 1.275 | 5 | 100 | +| SF | fast | saving | core-hours | 0.379 | 5 | 100 | +| GPOF | fast | ordinary | core-hours | 6.110 | 5 | 10 | +| GPSF | fast | saving | core-hours | 1.222 | 5 | 10 | ++ please contact support to inquire about attempting a larger node count per job ## Hardware Specifications -The following table contains hardware specifications for the above queues. - -| Name | CPU[^5] | Cores per Node | GPU[^6] | GPU per Node | Memory (GB) | Bandwidth (Gb/sec) | -| :---: | :---: | :---: | :---: | :---: | :---: | :---: | -| D | c-7 | 16 | - | - | 32 | ≤10 | -| OR | c-6 | 44 | - | - | 352 | 100 | -| OF | c-6 | 44 | - | - | 352 | 100 | -| OFplus| c-6 | 44 | - | - | 352 | 100 | -| SR | c-6 | 44 | - | - | 352 | 100 | -| SF | c-6 | 44 | - | - | 352 | 100 | -| SFPlus| c-6 | 44 | - | - | 352 | 100 | -| GPOF | c-2 | 6 | g-2 | 1 | 112 | 10 | -| GP2OF | c-2 | 12 | g-2 | 2 | 224 | 10 | -| GP4OF | c-2 | 24 | g-2 | 4 | 448 | 10 | -| GPSF | c-2 | 6 | g-2 | 1 | 112 | 10 | -| GP2SF | c-2 | 12 | g-2 | 2 | 224 | 10 | -| GP4SF | c-2 | 24 | g-2 | 4 | 448 | 10 | +The following table contains hardware specifications for the above queues. + +| Name | Cores per Node | GPU per Node | Memory (GB) | Bandwidth (Gb/sec) | VM Size | +|:------:|:--------------:|:------------:|:-----------:|:------------------:|:------------------------:| +| D | 8 | - | 2 | ≤10 | Standard_F8s_v2 | +| OR | 44 | - | 352 | 100 | Standard_HC44rs | +| OF | 44 | - | 352 | 100 | Standard_HC44rs | +| SR | 44 | - | 352 | 100 | Standard_HC44rs | +| SF | 44 | - | 352 | 100 | Standard_HC44rs | +| GPOF | 40 | 1 | 320 | 40 | Standard_NC40ads_H100_v5 | +| GPSF | 40 | 1 | 320 | 40 | Standard_NC40ads_H100_v5 | ## Links [^1]: [Microsoft Azure, Website](https://azure.microsoft.com/en-us/) -[^2]: [Queue Cost Categories, Website](../resource/category.md#cost-categories) +[^2]: [Queue Cost Categories, this documentation](../resource/category.md#cost-categories) -[^3]: [Queue Provision Modes, Website](../resource/category.md#provision-modes) +[^3]: [Queue Provision Modes, this documentation](../resource/category.md#provision-modes) -[^4]: [Charge polices, Website](../resource/queues.md#charge-policies) +[^4]: [Charge polices, this documentation](../resource/queues.md#charge-policies) -[^5]: [CPU types, Website](hardware.md#cpu-types) +[^5]: [CPU types, this documentation](hardware.md#cpu-types) -[^6]: [GPU types, Website](hardware.md#gpu-types) +[^6]: [GPU types, this documentation](hardware.md#gpu-types) [^7]: [Azure high performance compute virtual machines, Website](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/sizes-hpc) diff --git a/lang/en/docs/infrastructure/clusters/cluster-101.md b/lang/en/docs/infrastructure/clusters/cluster-101.md new file mode 100644 index 000000000..cdc778ed2 --- /dev/null +++ b/lang/en/docs/infrastructure/clusters/cluster-101.md @@ -0,0 +1,40 @@ +# Cluster-101 + +## Overview + +This cluster is hosted on [Microsoft Azure][^1] infrastructure and is intended to provide free compute resources. + +| Name | Hostname | Location | +|:-----------:|:-------------------------------------------------:|:--------:| +| cluster-103 | master-production-20250821-cluster-103.mat3ra.com | East US | + +## Queues + +The list of currently enabled queues is given below. + +The price factor is shown in relation to the [relative unit price](../../pricing/service-levels.md#comparison-table). + +The price factor if 10E-4 means that the cost of using this queue is 0.0001 times the cost of using the base queue (OR). +This is intended to provide free compute resources and keep the accounting of resource usage. + +| Name | Category[^2] | Mode[^3] | Charge Policy[^4] | Price Factor | Max Nodes per Job+ | Max Nodes Total | +|:----:|:------------:|:--------:|:-----------------:|:------------:|:-----------------------------:|:---------------:| +| D | debug | ordinary | core-seconds | 10E-4 | 1 | 1 | +| OR | regular | saving | core-seconds | 10E-4 | 1 | 0 | +| SR | regular | saving | core-seconds | 10E-4 | 1 | 1 | +| OF | fast | saving | core-hours | 10E-4 | 5 | 0 | +| SF | fast | saving | core-hours | 10E-4 | 5 | 1 | +| GPOF | fast | saving | core-hours | 10E-4 | 5 | 0 | +| GPSF | fast | saving | core-hours | 10E-4 | 5 | 1 | + ++ please contact support to inquire about attempting a larger node count per job + +## Hardware Specifications + +See [Azure Queues](azure.md) for more information. + +# Links + +[^1]: [Microsoft Azure, Website](https://azure.microsoft.com/en-us/) + +///FOOTNOTES GO HERE/// diff --git a/lang/en/docs/infrastructure/clusters/google.md b/lang/en/docs/infrastructure/clusters/google.md new file mode 100644 index 000000000..486ee4a52 --- /dev/null +++ b/lang/en/docs/infrastructure/clusters/google.md @@ -0,0 +1,57 @@ +# Google Cloud Platform + +This page contains information about clusters hosted on Google Cloud Platform[^1] and their hardware specifications. + +## Clusters + +The following table provides information about available clusters on Google Cloud Platform cloud computing platform. The +latest cluster status can be found on Clusters page +in web application. + +| Name | Hostname | Location | +|:-----------:|:-------------------------------------------------:|:----------:| +| cluster-001 | master-production-20250821-cluster-001.mat3ra.com | US-Central | + +## Queues + +The list of currently enabled queues is given below. Price per core hour is shown in relation to +the [relative unit price](../../pricing/service-levels.md#comparison-table) and is subject to change at any time. Total +number of nodes can be increased upon [request](../../ui/support.md). + +| Name | Category[^2] | Mode[^3] | Charge Policy[^4] | Price | Max Nodes per Job+ | Max Nodes Total | +|:----:|:------------:|:--------:|:-----------------:|:-----:|:-----------------------------:|:---------------:| +| D | debug | ordinary | core-seconds | 4.002 | 1 | 1 | +| OR | regular | ordinary | core-seconds | 1.275 | 1 | 1 | +| OF | fast | ordinary | core-hours | 1.275 | 5 | 2 | +| GOF | fast | ordinary | core-hours | 6.110 | 5 | 1 | + ++ please contact support to inquire about attempting a larger node count per job + +## Hardware Specifications + +The following table contains hardware specifications for the above queues. + +| Name | Cores per Node | GPU per Node | Memory (GB) | Bandwidth (Gb/sec) | VM Size | +|:----:|:--------------:|:------------:|:-----------:|:------------------:|:--------------:| +| D | 2 | - | 15 | ≤10 | n1-standard-4 | +| OR | 4 | - | 15 | ≤10 | h3-standard-88 | +| OF | 4 | - | 15 | 10 | h3-standard-88 | +| GOF | 12 | 1 | 85 | 100 | a2-highgpu-1g | + +## Links + +[^1]: [Google Cloud Platform, Website](https://cloud.google.com/) + +[^2]: [Queue Cost Categories, this documentation](../resource/category.md#cost-categories) + +[^3]: [Queue Provision Modes, this documentation](../resource/category.md#provision-modes) + +[^4]: [Charge polices, this documentation](../resource/queues.md#charge-policies) + +[^5]: [CPU types, this documentation](hardware.md#cpu-types) + +[^6]: [GPU types, this documentation](hardware.md#gpu-types) + +[^7]: [Google Cloud VM instances, Website](https://cloud.google.com/compute/docs/machine-types) + +///FOOTNOTES GO HERE/// diff --git a/lang/en/docs/infrastructure/clusters/hardware.md b/lang/en/docs/infrastructure/clusters/hardware.md index e39154b20..ad48897e6 100644 --- a/lang/en/docs/infrastructure/clusters/hardware.md +++ b/lang/en/docs/infrastructure/clusters/hardware.md @@ -1,6 +1,6 @@ # Hardware Specifications -Our computing resources are hosted by trusted vendors: [Amazon Web Services](aws.md) and [Microsoft Azure](azure.md). +Our computing resources are hosted by trusted vendors: [Amazon Web Services](aws.md) and [Microsoft Azure](azure.md). We support IBM SoftLayer[^1], Rackspace[^2] and Google Cloud[^3] can deploy capacity there on a short notice. The following shows the CPU and GPU hardware specification on aforementioned vendors. @@ -9,34 +9,37 @@ The following shows the CPU and GPU hardware specification on aforementioned ven The following table shows different types of CPUs available in our platform. -| Name | Type | Processor Base Frequency (GHz) | -| :---: | :---: | :---: | -| c-1 | Intel Xeon E5-2667-v3[^4] | 3.20 | -| c-2 | Intel Xeon E5-2690-v4[^5] | 2.60 | -| c-3 | Intel Xeon E5-2666-v3[^6] | 2.90 | -| c-4 | Intel Xeon E5-2686-v4[^7] | 2.30 | -| c-5 | Intel Xeon Platinum[^6a] | 3.00 | -| c-6 | Intel Xeon Platinum 8168[^8] | 2.70 | -| c-7 | Intel Xeon E5-2673-v3[^11] | 2.40 | +| Name | Type | Processor Base Frequency (GHz) | +|:----:|:------------------------------:|:------------------------------:| +| c-1 | Intel Xeon E5-2667-v3[^4] | 3.20 | +| c-2 | Intel Xeon E5-2690-v4[^5] | 2.60 | +| c-3 | Intel Xeon E5-2666-v3[^6] | 2.90 | +| c-4 | Intel Xeon E5-2686-v4[^7] | 2.30 | +| c-5 | Intel Xeon Platinum[^6a] | 3.00 | +| c-6 | Intel Xeon Platinum 8168[^8] | 2.70 | +| c-7 | Intel Xeon E5-2673-v3[^11] | 2.40 | +| c-8 | Intel Xeon Platinum 8275L[^11] | 3.00 | +| c-9 | AMD EPYC 7R13[^12] | 3.60 | ## GPU Types The following table shows different types of GPUs the GPU-enabled compute nodes are provisioned with. -| Name | Type | -| :---: | :---: | -| g-1 | NVIDIA V100[^9] | -| g-2 | NVIDIA P100[^10] | - +| Name | Type | +|:----:|:----------------:| +| g-3 | NVIDIA A100[^9] | +| g-4 | NVIDIA H100[^10] | ## Available Resources -As of Apr, 2018 our major compute and storage systems (per cluster) are as explained below. The total number of cores is administratively limited by our agreements with the cloud vendors, and cen be extended further upon request. Elastically grown file system lets us reach to 8 exabytes (EB) of disk space per single cluster. +As of 2025 our major compute and storage systems (per cluster) are as explained below. +The total number of cores is administratively limited by our agreements with the cloud vendors, and +can be extended further upon request. -| Provider | Total cores | Total Memory (GB) | Total Disk (EB) | -| :--------- | :--------: | :---------------: | :-------------: | -| AWS | 36,000 | 60,000 | 8 | -| Azure | 10,000 | 20,000 | 8 | +| Provider | Total cores | Total Memory (GB) | Total Disk (EB) | +|:---------|:-----------:|:-----------------:|:---------------:| +| AWS | 36,000 | 60,000 | 8 | +| Azure | 10,000 | 20,000 | 8 | ## Links @@ -58,10 +61,14 @@ As of Apr, 2018 our major compute and storage systems (per cluster) are as expla [^8]: [HC-Series, Microsoft Azure documentation](https://docs.microsoft.com/en-us/azure/virtual-machines/hc-series) -[^9]: [NVIDIA Tesla V100, online product documentation](https://www.nvidia.com/en-us/data-center/tesla-v100/) +[^9]: [NVIDIA A100, online product documentation](https://www.nvidia.com/en-us/data-center/a100/) -[^10]: [NVIDIA Tesla P100, online product documentation](https://www.nvidia.com/en-us/data-center/tesla-p100/) +[^10]: [NVIDIA H100, online product documentation](https://www.nvidia.com/en-us/data-center/h100/) [^11]: [F-Series VM Sizes, Azure](https://azure.microsoft.com/en-us/blog/f-series-vm-size/) +[^11]: [p4d.24xlarge VM Sizes, AWS](https://aws.amazon.com/ec2/instance-types/p4/) + +[^12]: [p5.4xlarge VM Sizes, AWS](https://aws.amazon.com/ec2/instance-types/p5/) + ///FOOTNOTES GO HERE/// diff --git a/lang/en/docs/jobs-cli/batch-scripts/apptainer.md b/lang/en/docs/jobs-cli/batch-scripts/apptainer.md new file mode 100644 index 000000000..9cdaafb41 --- /dev/null +++ b/lang/en/docs/jobs-cli/batch-scripts/apptainer.md @@ -0,0 +1,57 @@ +# Apptainer and Environment Modules + +On the new platform, environment modules integrate with Apptainer to provide consistent, containerized runtimes for HPC applications. When you `module load` an application, the module system: + +- Resolves and loads required dependencies (e.g., `gcc`, `mpi`) +- Sets per-application environment variables (e.g., `$EXEC_CMD_VASP`, `$EXEC_CMD_QE`) +- Updates a convenience variable `$EXEC_CMD` to the most recently loaded application's command +- Maintains `$EXEC_CMDS` as a colon-separated list of loaded application exec variables (e.g., `EXEC_CMD_VASP:EXEC_CMD_QE`) + +## Example session + +```bash +>>>> module load espresso/6.3-gcc-openmpi-openblas +The module gcc/11.2.0 is loaded +The module mpi/ompi-4.1.1 is loaded +The module espresso/6.3-gcc-openmpi-openblas is loaded + +Loading espresso/6.3-gcc-openmpi-openblas + Loading requirement: gcc/11.2.0 mpi/ompi-4.1.1 + +>>>> echo $EXEC_CMD +apptainer exec --bind /export,/scratch,/dropbox,/cluster-001-share \ + /export/compute/software/applications/espresso/6.3-gcc-openmpi-openblas/image.sif + +>>>> echo $EXEC_CMDS +EXEC_CMD_QE + +>>>> echo $EXEC_CMD_QE +apptainer exec --bind /export,/scratch,/dropbox,/cluster-001-share \ + /export/compute/software/applications/espresso/6.3-gcc-openmpi-openblas/image.sif + +>>>> module load vasp/5.4.4-gcc-openmpi-openblas-fftw-scalapack +The module vasp/5.4.4-gcc-openmpi-openblas-fftw-scalapack is loaded + +>>>> echo $EXEC_CMDS +EXEC_CMD_VASP:EXEC_CMD_QE + +>>>> echo $EXEC_CMD +apptainer exec --bind /export,/scratch,/dropbox,/cluster-001-share \ + /export/compute/software/applications/vasp/5.4.4-gcc-openmpi-openblas-fftw-scalapack/image.sif + +>>>> echo $EXEC_CMD_VASP +apptainer exec --bind /export,/scratch,/dropbox,/cluster-001-share \ + /export/compute/software/applications/vasp/5.4.4-gcc-openmpi-openblas-fftw-scalapack/image.sif +``` + +Notes: +- The Apptainer command binds common platform directories into the container (e.g., `/export`, `/scratch`, `/dropbox`, and the cluster share such as `/cluster-001-share`). +- `$EXEC_CMD` always points to the last loaded application's container exec command. +- Use per-app variables (e.g., `$EXEC_CMD_VASP`, `$EXEC_CMD_QE`) when you need to be explicit in job scripts. + +## Using in job scripts + +See: +- [Jobs via Command Line](../overview.md) +- [Batch Scripts > General Structure](general-structure.md) +- [Batch Scripts > Sample Scripts](sample-scripts.md) diff --git a/lang/en/docs/jobs-cli/batch-scripts/sample-scripts.md b/lang/en/docs/jobs-cli/batch-scripts/sample-scripts.md index 5db49325c..934990dd7 100644 --- a/lang/en/docs/jobs-cli/batch-scripts/sample-scripts.md +++ b/lang/en/docs/jobs-cli/batch-scripts/sample-scripts.md @@ -23,7 +23,8 @@ This example requests 1 node with 2 processors (cores) for 10 minutes, in the De cd $PBS_O_WORKDIR module load espresso -mpirun -np $PBS_NP pw.x -in pw.input +# $EXEC_CMD is set by the environment module +mpirun -np $PBS_NP $EXEC_CMD pw.x -in pw.input ``` ## On-demand regular (OR) @@ -44,5 +45,6 @@ This example requests 1 node and 16 cores for 10 minutes, on the OR [queue](../. cd $PBS_O_WORKDIR module load vasp -mpirun -np $PBS_NP vasp +# $EXEC_CMD is set by the environment module +mpirun -np $PBS_NP $EXEC_CMD vasp ``` diff --git a/lang/en/docs/jobs-cli/overview.md b/lang/en/docs/jobs-cli/overview.md index 00cd9580f..939817811 100644 --- a/lang/en/docs/jobs-cli/overview.md +++ b/lang/en/docs/jobs-cli/overview.md @@ -14,6 +14,10 @@ We describe the accounting aspects of Job submission via CLI, such as specifying The actions pertaining to Jobs submission and execution under the CLI are reviewed [in this section](actions/overview.md) of the documentation. Other general actions concerning the CLI, such as the loading of modules, the compilation of new applications or the creation of new python environments, are described [separately](../cli/actions/overview.md). +## Apptainer and Environment Modules + +For the new platform, CLI workflows use Apptainer-backed modules that set `$EXEC_CMD` variables for containerized execution. See: [Apptainer and Environment Modules](batch-scripts/apptainer.md) + ## [Tutorials](../tutorials/jobs-cli/overview.md) We provide tutorials guiding the user through the complete procedure for submitting jobs via CLI, and subsequently retrieving the corresponding results under the [Web Interface](../ui/overview.md) of our platform. These tutorials are introduced [here](../tutorials/jobs-cli/overview.md). diff --git a/lang/en/docs/migrating-to-new-platform.md b/lang/en/docs/migrating-to-new-platform.md new file mode 100644 index 000000000..35055aa9e --- /dev/null +++ b/lang/en/docs/migrating-to-new-platform.md @@ -0,0 +1,92 @@ +# Migrating to the New Mat3ra Platform (Q4 2025) + +> Last updated Oct 7, 2025. + +We are excited to announce the immediate availability of the new iteration of the Mat3ra platform. This is a significant upgrade over previous monthly releases and introduces new infrastructure, compute options, and improvements to the overall user experience. + +!!! note "Important": For a smooth transition, both the old and the new platform versions will remain operational through the end of 2025. See the timeline below. + +--- + +## What’s New + +- **Free compute tier (cluster-101)** + - A new always-available free tier via `cluster-101`. + - Read more: [Free Tier and Community Access](other/community-programs.md) and [Cluster-101](infrastructure/clusters/cluster-101.md). + +- **New cloud infrastructure setup** + - Google Cloud Platform via `cluster-001` — see [GCP clusters](infrastructure/clusters/google.md) + - Amazon Web Services via `cluster-002` — see [AWS clusters](infrastructure/clusters/aws.md) + - Microsoft Azure via `cluster-003` — see [Azure clusters](infrastructure/clusters/azure.md) + - Updated set of instance types across providers, including larger CPU and GPU options + +- **Operating system upgrade** + - Base OS updated to RHEL 9 across the new platform infrastructure + +- **Apptainer-based application deployment** + - Modernized packaging and runtime isolation for scientific applications + - See: [Jobs via Command Line](jobs-cli/overview.md) for updated CLI usage and examples + - We welcome contributions for additional applications your workflows require + +--- + +## Where to Access + +- New platform (now available): `https://platform-new.mat3ra.com` +- Current/old platform: `https://platform.mat3ra.com` +- New documentation: `https://docs-new.mat3ra.com` +- New login node: `https://login-new.mat3ra.com` + +You can sign in to the new platform with the same credentials you use for the current platform. + +--- + +## Migration Timeline + +- **Now → Oct 31, 2025** + - Both old and new platforms are live. You can explore and begin migrating at your convenience. + +- **On/after Nov 1, 2025** + - `https://platform.mat3ra.com` will route to the new platform + - The old platform will remain accessible at `https://platform-old.mat3ra.com` + - Equivalent routing changes will apply to related URLs (docs, login) + +- **On/after Jan 1, 2026** + - Only the new platform will remain available + +--- + +## What Gets Migrated Automatically + +- Application data stored in the platform database (entities, metadata, workflows, settings) is migrated automatically. + +## What Does NOT Migrate Automatically + +- Runtime files and bulk data stored on disk do not migrate automatically. Due to updated infrastructure libraries and layout in the new environment, data migration from legacy cluster homes and shares is handled on a case-by-case basis. + +> Tip: Review data locations under [Data on Disk > Directory Structure](data-on-disk/directories.md) and [Infrastructure > Login Node Directories](infrastructure/login/directories.md) to plan your migration. + +--- + +## Recommended Migration Steps + +1. Sign in to the new platform at `https://platform-new.mat3ra.com` and verify access +2. Review your workflows for compatibility; see updated examples under [Jobs via Command Line](jobs-cli/overview.md) +3. Identify runtime data to transfer (e.g., project files, job outputs) from legacy cluster homes +4. Contact us for assistance with bulk data migration and best practices +5. Validate your workflows on the new clusters (e.g., `cluster-001`, `cluster-002`, `cluster-003`, `cluster-101`) + +## CLI and Environment Modules with Apptainer + +See examples of module loading, `$EXEC_CMD`, and containerized execution in: [Apptainer and Environment Modules](jobs-cli/batch-scripts/apptainer.md) + +### Using in job scripts + +As below: + +``` + +## Contact and Support + +- For migration help (data movement, cluster selection, workflow updates), contact your Mat3ra representative or reach us via Support Widget in the platform header or `support@mat3ra.com`. +- If you require a specific application or environment, please let us know — we welcome contributions and requests to expand supported software. diff --git a/lang/en/docs/other/community-programs.md b/lang/en/docs/other/community-programs.md index db59b89cd..19217fb72 100644 --- a/lang/en/docs/other/community-programs.md +++ b/lang/en/docs/other/community-programs.md @@ -1,15 +1,43 @@ # Community Programs -> Last updated: Aug 22, 2025 +> Last updated: Oct 7, 2025 -## Limited Free Compute for the Platform Users. +## 1. Free Compute Tier for the Platform Users. -!!!note "The Limited Free Compute Program starting in 2025." -The Limited Free Compute Program is starting in 2025. Users are still welcome to submit their information. +In collaboration with our cloud provider partners, we are offering a Limited Free Compute program for our platform +users. -The program is targeted for users with current academic affiliations, however, we will consider private parties as well. -Anyone can participate. For the selected users, we will provide computational resources free of charge on a case-by-case -basis. +The primary goal of this program is to support academic research and education in the field of materials science and +enable the community to explore the capabilities of our platform. + +The program is targeted for users with current academic affiliations, however, we welcome private parties as well. + +The free tier access is available through [cluster-101](../infrastructure/clusters/cluster-101.md) only. + +### 1.1. Free Compute Eligibility. + +To be eligible for the Limited Free Compute program, users must meet the following criteria: + +- Be a registered user of the platform. +- Have a valid academic affiliation (eg. student, faculty, researcher) or be a private party interested in exploring the + platform capabilities. +- If affiliated with an academic institution, use an email address associated with the institution (ie. hosted on a " + .edu" domain) during the registration. +- (Optional) Provide information about the nature of the anticipated work (eg. research topic, educational purpose, + etc.) +- Agree to the conditions outlined below. + +### 1.2. Applying for Free Compute. + +As of now, no seprate application is needed. If you meet the eligibility criteria above, you can start using the +platform and its resources right away. + +We will review the user base periodically and reach out to the users who meet the eligibility criteria to inform them +about the program and its benefits. + +And consequently, we will screen out the users who do not meet the criteria. + + + +### 1.3. Free Compute Conditions. -Free access is limited to certain compute resources only and is subject to other limitations as below. We will consider -adjusting the limitations according to the user feedback received. Contact "support@mat3ra.com" for this. +Free access is limited to the certain compute resources and is subject to other limitations as below. -#### Limitations +#### 1.3.1. Limitations -To be specified later. Similar to the below: +As below: | Feature | Explanation | |:-----------------------:|:------------------:| | Max nodes per job | 1 | -| Max cores per job | 4 | -| Max job walltime | 24 hours | -| Max job queued per user | 4 | +| Max cores per job | Per Queue Policy | +| Max job walltime | Per Queue Policy | +| Max job queued per user | 10 | | Available Queues | "D" only | | Available Resources | "cluster-101" only | | Included Disk Quota | 10 Gb | -#### Acknowledgements +For more information, please see [cluster-101](../infrastructure/clusters/cluster-101.md). + +## 2. Acknowledgements -Any/all published work derived from the Limited Free Access program must include the following Acknowledgement text and -citation below. +Any/all published work derived from any of the Community Programs listed here must include the following acknowledgement +and citation. **Acknowledgement text** ```text -The authors performed this work partially or in full using the Exabyte.io -platform, a web-based computational ecosystem for the development of new -materials and chemicals [REFERENCE TO THE BELOW CITATION]. +The authors performed this work partially or in full using the Mat3ra.com +platform [REFERENCE TO THE BELOW CITATION]. ``` **Citation** ```text -Timur Bazhirov, "Data-centric online ecosystem for digital materials science", -arxiv.org preprint, 2019, https://arxiv.org/abs/1902.10838 +Timur Bazhirov, "Data-centric online ecosystem for digital materials science.", +arxiv.org preprint, 2019, https://arxiv.org/abs/1902.10838 ``` In Bibtex format: ```bibtex -@article{Exabyte.io-Platform-Reference, +@article{Mat3ra-Platform-Reference, title={Data-centric online ecosystem for digital materials science}, author={Bazhirov, Timur}, journal={arxiv.org/abs/1902.10838}, @@ -79,8 +109,18 @@ In Bibtex format: } ``` -#### Publicity +## 3. Publicity. + +We plan to select some of the work performed under the Free Compute Tier to be highlighted in the online +publication sources, similar to the below: + +- [Enabling new Science through Accessible Cloud HPC](https://www.mat3ra.com/news-and-blog-posts/enabling-new-science-through-accessible-modeling-and-simulations) +- [Scientific Computing on Cloud Infrastructure](https://blogs.oracle.com/cloud-infrastructure/post/exabyteio-for-scientific-computing-on-oracle-cloud-infrastructure-hpc) + +We will contact select users in advance to request permission for this. If you are interested in having your work +highlighted, please let us know by sending an email to `support@mat3ra.com`. + +## 4. Feedback. -We plan to select some of the work performed under the Limited Free Access program to be highlighted in the online -publication sources together with the cloud provider(s) enabling the computational infrastructure. -We will contact the users in advance to request permission for this. +We welcome feedback from the users of the Community Programs listed here. Please send your feedback, suggestions, and +any issues you encounter to `support@mat3ra.com`. diff --git a/lang/en/docs/tutorials/jobs-cli/job-cli-example.md b/lang/en/docs/tutorials/jobs-cli/job-cli-example.md index dcd658d76..35513d9cc 100644 --- a/lang/en/docs/tutorials/jobs-cli/job-cli-example.md +++ b/lang/en/docs/tutorials/jobs-cli/job-cli-example.md @@ -112,7 +112,8 @@ Secondly, we prepare the [Batch Script](../../jobs-cli/batch-scripts/overview.md module add espresso cd $PBS_O_WORKDIR -mpirun -np $PBS_NP pw.x -in pw.in > pw.out +# $EXEC_CMD is set by the environment module +mpirun -np $PBS_NP $EXEC_CMD pw.x -in pw.in > pw.out ``` Just like before, we are using template variables again instead of the [project](../../jobs/projects.md) name and email. Variables starting with `$PBS` are automatically set by the [resource manager](../../infrastructure/resource/overview.md), and are known as the ["PBS Directives"](../../jobs-cli/batch-scripts/directives.md). @@ -258,7 +259,7 @@ EOF module add espresso cd \$PBS_O_WORKDIR -mpirun -np \$PBS_NP pw.x -in srzro3_${celldm1}.in | tee srzro3_${celldm1}.out +mpirun -np \$PBS_NP $EXEC_CMD pw.x -in srzro3_${celldm1}.in | tee srzro3_${celldm1}.out EOF qsub run_QE_${celldm1}.pbs done diff --git a/lang/en/docs/tutorials/jobs-cli/qe-gpu.md b/lang/en/docs/tutorials/jobs-cli/qe-gpu.md index edede9f5a..cc2705319 100644 --- a/lang/en/docs/tutorials/jobs-cli/qe-gpu.md +++ b/lang/en/docs/tutorials/jobs-cli/qe-gpu.md @@ -41,7 +41,7 @@ with 8 OpenMP threads. ```bash module load espresso/7.4-cuda-12.4-cc-70 export OMP_NUM_THREADS=8 -mpirun -np 1 pw.x -npool 1 -ndiag 1 -in pw.cuo.scf.in > pw.cuo.gpu.scf.out +mpirun -np 1 $EXEC_CMD pw.x -npool 1 -ndiag 1 -in pw.cuo.scf.in > pw.cuo.gpu.scf.out ``` 6. Finally, we can submit our job using: diff --git a/mkdocs.yml b/mkdocs.yml index 8403c7701..79bdc1c12 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -30,7 +30,7 @@ extra_javascript: copyright: Exabyte Inc. All rights reserved. | Back to platform extra: - version: "2025.8.21" + version: "2025.9.25" preload_javascript: - /extra/js/preload_hotjar.js - /extra/js/preload.js @@ -121,6 +121,7 @@ plugins: nav: - Home: index.md + - Migrating to New Platform, Q4.2025: migrating-to-new-platform.md # INTRODUCTION - Getting Started: @@ -627,6 +628,7 @@ nav: - General Structure: jobs-cli/batch-scripts/general-structure.md - Directives: jobs-cli/batch-scripts/directives.md - Working Directory: jobs-cli/batch-scripts/directories.md + - Apptainer & Environment Modules: jobs-cli/batch-scripts/apptainer.md - Sample Scripts: jobs-cli/batch-scripts/sample-scripts.md - Actions: - Overview: jobs-cli/actions/overview.md