Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure server #689

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
dfa290d
ARC Env: RapidFuzz
calvinp0 Jul 30, 2023
b65c34d
[WIP] Basis Set Dataframe
calvinp0 Jul 30, 2023
83cd331
Adapter Update: SSH fix, Submit Memory correction, SLURM submit memor…
calvinp0 Jul 30, 2023
6437460
QChem IRC Software Support Recognition
calvinp0 Jul 30, 2023
64893cd
SSH Improvement
calvinp0 Jul 30, 2023
7948473
XYZ to Smiles: Warning Update to Possible Valence
calvinp0 Jul 30, 2023
6b3b610
Vectors: Reading coords that are in string format using regex
calvinp0 Jul 30, 2023
808ba16
Species: Get number of heavy atoms for TSGuess
calvinp0 Jul 30, 2023
18f2b4a
[WIP] Getting the Internal Coordinates for QChem - Still not operational
calvinp0 Jul 30, 2023
2bf315c
submit.py & settings.py: Updated for SLURM and AZURE
calvinp0 Jul 30, 2023
00e4b74
Scheduler: Import JobError, JobError Exception, Rerunning Job, Removi…
calvinp0 Jul 30, 2023
fe06f9e
Parser: QUESTION - RAISE_ERROR, parse_normal_mode_displacement QChem,…
calvinp0 Jul 30, 2023
197e248
QChem Adapter
calvinp0 Jul 30, 2023
0baf191
trsh QCHEM & MOLPRO
calvinp0 Jul 30, 2023
b00bca6
Molpro Adapter
calvinp0 Jul 30, 2023
48a612f
[WIP] QChem Test
calvinp0 Jul 30, 2023
5ac4317
[WIP] Molpro Test
calvinp0 Jul 30, 2023
e346f8d
main_test
calvinp0 Jul 30, 2023
65186c8
level_test
calvinp0 Jul 30, 2023
76de095
[Temp] Change CACHE_NUMBER
calvinp0 Jul 30, 2023
c69d700
Adapter: total submit memory fix
calvinp0 Jul 30, 2023
84ec8cb
QChem Adapter: Updated Tests
calvinp0 Jul 30, 2023
46541b1
TrshMolpro: Parse new array length memory
calvinp0 Jul 30, 2023
75942f5
TrshTEST
calvinp0 Jul 30, 2023
a569939
mainTEST
calvinp0 Jul 30, 2023
45681e1
parserTEST: Updated normal mode displacement test
calvinp0 Jul 30, 2023
cb5ff4c
submitTEST
calvinp0 Jul 30, 2023
90a5c89
parser: Fixed parse_trajectory for QCHEM
calvinp0 Jul 30, 2023
dfdcf13
Adapter: Submit Memory Fix
calvinp0 Jul 30, 2023
e9954a5
TrshQCHEM: Added Minimization Error
calvinp0 Jul 31, 2023
123279e
parser: Fixed elif statement in parse_normal_mode_displacement - chan…
calvinp0 Jul 31, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/cont_int.yml
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ jobs:
key: conda-${{ runner.os }}--${{ runner.arch }}-arcenv-${{ env.CACHE_NUMBER}}
env:
# Increase this value to reset cache if etc/example-environment.yml has not changed
CACHE_NUMBER: 0
CACHE_NUMBER: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this commit be removed once it did its trick once?

id: cache-arc-env
- name: Update environment
run: mamba env update -n arc_env -f environment.yml
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ scratch/

# csv database
*.csv
!basis_sets.csv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain !?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that whenever someone gits add . it will make sure to add the basis_sets.csv. Basically it is excluded from the rule *.csv


# iPython files
*.ipynb_*
Expand All @@ -52,6 +53,10 @@ timer.dat
# .vscode
.vscode

# files created via testing
nul
run.out

# .trunk folder
.trunk

Expand Down
61 changes: 45 additions & 16 deletions arc/job/adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -772,10 +772,13 @@ def set_cpu_and_mem(self):
f'exceeds {100 * job_max_server_node_memory_allocation}% of the the maximum node memory on '
f'{self.server}. Setting it to {job_max_server_node_memory_allocation * max_mem:.2f} GB.')
self.job_memory_gb = job_max_server_node_memory_allocation * max_mem
total_submit_script_memory = self.job_memory_gb * 1024 * 1.05 # MB
total_submit_script_memory = self.job_memory_gb * 1024 * 1.05 if (self.job_memory_gb * 1024 * 1.05) <= (max_mem * 1024) else max_mem * 1024 # MB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you squash? I commented about this on the original commit

self.job_status[1]['keywords'].append('max_total_job_memory') # Useful info when troubleshooting.
else:
total_submit_script_memory = self.job_memory_gb * 1024 * 1.1 # MB
if max_mem is None:
total_submit_script_memory = self.job_memory_gb * 1024 * 1.1
else:
total_submit_script_memory = self.job_memory_gb * 1024 * 1.1 if (self.job_memory_gb * 1024 * 1.1) <= (max_mem * 1024) else max_mem * 1024 # MB
# Determine amount of memory in submit script based on cluster job scheduling system.
cluster_software = servers[self.server].get('cluster_soft').lower() if self.server is not None else None
if cluster_software in ['oge', 'sge', 'htcondor']:
Expand All @@ -785,8 +788,8 @@ def set_cpu_and_mem(self):
# In PBS, "#PBS -l select=1:ncpus=8:mem=12000000" specifies the memory for all cores to be 12 MB.
self.submit_script_memory = math.ceil(total_submit_script_memory) * 1E6 # in Bytes
elif cluster_software in ['slurm']:
# In Slurm, "#SBATCH --mem-per-cpu=2000" specifies the memory **per cpu/thread** to be 2000 MB.
self.submit_script_memory = math.ceil(total_submit_script_memory / self.cpu_cores) # in MB
# In Slurm, "#SBATCH --mem=2000" specifies the memory to be 2000 MB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the comment more explicit by specifying something like the memory for all cores, like for PBS above

If you change this behavior, do we need to make any changes to our submit scripts?

self.submit_script_memory = math.ceil(total_submit_script_memory) # in MB
self.set_input_file_memory()

def as_dict(self) -> dict:
Expand Down Expand Up @@ -942,18 +945,25 @@ def _get_additional_job_info(self):
if cluster_soft in ['oge', 'sge', 'slurm', 'pbs', 'htcondor']:
local_file_path_1 = os.path.join(self.local_path, 'out.txt')
local_file_path_2 = os.path.join(self.local_path, 'err.txt')
local_file_path_3 = os.path.join(self.local_path, 'job.log')
if self.server != 'local' and self.remote_path is not None and not self.testing:
local_file_path_3 = None
for files in self.files_to_upload:
if 'job.sh' in files.values():
local_file_path_3 = os.path.join(self.local_path, 'job.log')
if self.server != 'local' and self.remote_path is not None:
remote_file_path_1 = os.path.join(self.remote_path, 'out.txt')
remote_file_path_2 = os.path.join(self.remote_path, 'err.txt')
remote_file_path_3 = os.path.join(self.remote_path, 'job.log')
remote_file_path_3 = None
for files in self.files_to_upload:
if 'job.sh' in files.values():
remote_file_path_3 = os.path.join(self.remote_path, 'job.log')
with SSHClient(self.server) as ssh:
for local_file_path, remote_file_path in zip([local_file_path_1,
local_file_path_2,
local_file_path_3],
[remote_file_path_1,
remote_file_path_2,
remote_file_path_3]):

local_files_to_zip = [local_file_path_1, local_file_path_2]
remote_files_to_zip = [remote_file_path_1, remote_file_path_2]
if local_file_path_3 and remote_file_path_3:
local_files_to_zip.append(local_file_path_3)
remote_files_to_zip.append(remote_file_path_3)
for local_file_path, remote_file_path in zip(local_files_to_zip, remote_files_to_zip):
try:
ssh.download_file(remote_file_path=remote_file_path,
local_file_path=local_file_path)
Expand All @@ -963,10 +973,21 @@ def _get_additional_job_info(self):
f'flags with stdout and stderr of out.txt and err.txt, respectively '
f'(e.g., "#SBATCH -o out.txt"). Error message:')
logger.warning(e)
for local_file_path in [local_file_path_1, local_file_path_2, local_file_path_3]:
for local_file_path in [path for path in [local_file_path_1, local_file_path_2, local_file_path_3] if path]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be simpler to keep the above version and add to the condition below if path

if os.path.isfile(local_file_path):
with open(local_file_path, 'r') as f:
lines = f.readlines()
with open(local_file_path, 'rb') as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind outsourcing this r vs rb block into a small function of its own, e.g., in common.py? Would be great to also add a unit test to the new function

# Read the file
first_bytes = f.read()
# Check if the bytes contain a null byte
has_null_byte = b'\x00' in first_bytes
# Use the appropriate mode based on whether the file is binary or not
mode = 'rb' if has_null_byte else 'r'
# Read the file contents using the determined mode
lines = first_bytes.decode('utf-8')
if mode == 'r':
with open(local_file_path, 'r') as f:
lines = f.readlines()

content += ''.join([line for line in lines])
content += '\n'
else:
Expand Down Expand Up @@ -1346,6 +1367,14 @@ def troubleshoot_server(self):
if run_job:
# resubmit job
self.execute()

def remove_remote_files(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this being called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here

https://github.com/ReactionMechanismGenerator/ARC/blob/b7cdbb5fe6ebe1c7404afdd2b9d744a54277cce0/arc/scheduler.py#L977

"""
Remove the remote files.
"""
if (self.server != 'local' and self.server is not None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the parenthesis

with SSHClient(self.server) as ssh:
ssh.remove_dir(self.remote_path)

def troubleshoot_queue(self) -> bool:
"""Troubleshoot queue errors.
Expand Down
2 changes: 1 addition & 1 deletion arc/job/adapter_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ def test_set_cpu_and_mem(self):
self.job_4.cpu_cores = None
self.job_4.set_cpu_and_mem()
self.assertEqual(self.job_4.cpu_cores, 8)
expected_memory = math.ceil((14 * 1024 * 1.1) / self.job_4.cpu_cores)
expected_memory = math.ceil((14 * 1024 * 1.1))
self.assertEqual(self.job_4.submit_script_memory, expected_memory)
self.job_4.server = 'local'

Expand Down
35 changes: 33 additions & 2 deletions arc/job/adapters/molpro.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import os
from typing import TYPE_CHECKING, List, Optional, Tuple, Union
import socket
import re

Check notice

Code scanning / CodeQL

Unused import Note

Import of 're' is not used.

from mako.template import Template

Expand Down Expand Up @@ -156,6 +157,7 @@
self.execution_type = execution_type or 'queue'
self.command = 'molpro'
self.url = 'https://www.molpro.net/'
self.core_change = None

if species is None:
raise ValueError('Cannot execute Molpro without an ARCSpecies object.')
Expand Down Expand Up @@ -326,7 +328,7 @@
Set the input_file_memory attribute.
"""
# Molpro's memory is per cpu core and in MW (mega word; 1000 MW = 7.45 GB on a 64-bit machine)
# The conversion from mW to GB was done using this (https://deviceanalytics.com/words-to-bytes-converter/)
# The conversion from mW to GB was done using this (c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

# specifying a 64-bit architecture.
#
# See also:
Expand All @@ -335,8 +337,37 @@
# 800,000,000 bytes (800 mb).
# Formula - (100,000,000 [Words]/( 800,000,000 [Bytes] / (job mem in gb * 1000,000,000 [Bytes])))/ 1000,000 [Words -> MegaWords]
# The division by 1E6 is for converting into MWords
# Due to Zeus's configuration, there is only 1 nproc so the memory should not be divided by cpu_cores.
# Due to Zeus's configuration, there is only 1 nproc so the memory should not be divided by cpu_cores.
self.input_file_memory = math.ceil(self.job_memory_gb / (7.45e-3 * self.cpu_cores)) if 'zeus' not in socket.gethostname() else math.ceil(self.job_memory_gb / (7.45e-3))
# We need to check if ess_trsh_methods=['cpu'] and ess_trsh_methods=['molpro_memory:] exists
# If it does, we need to reduce the cpu_cores
if self.ess_trsh_methods is not None:
if 'cpu' in self.ess_trsh_methods and any('molpro_memory:' in method for method in self.ess_trsh_methods):
current_cpu_cores = self.cpu_cores
max_memory = self.job_memory_gb
memory_values = []
for item in self.ess_trsh_methods:
if 'molpro_memory:' in item:
memory_value = item.split('molpro_memory:')[1]
memory_values.append(float(memory_value))

if memory_values:
min_memory_value = min(memory_values)
required_cores = math.floor(max_memory / (min_memory_value * 7.45e-3))
if self.core_change is None:
self.core_change = required_cores
elif self.core_change == required_cores:
# We have already done this
# Reduce the cores by 1
required_cores -= 1
if required_cores < current_cpu_cores:
self.cpu_cores = required_cores
logger.info(f'Changing the number of cpu_cores from {current_cpu_cores} to {self.cpu_cores}')
self.input_file_memory = math.ceil(self.job_memory_gb / (7.45e-3 * self.cpu_cores)) if 'zeus' not in socket.gethostname() else math.ceil(self.job_memory_gb / (7.45e-3))


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep just one line break between methods




def execute_incore(self):
"""
Expand Down
2 changes: 1 addition & 1 deletion arc/job/adapters/molpro_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def test_set_files(self):
'source': 'path',
'make_x': False},
]
job_1_files_to_download = [{'file_name': 'input.out',
job_1_files_to_download = [{'file_name':'output.out',
'local': os.path.join(self.job_1.local_path, output_filenames[self.job_1.job_adapter]),
'remote': os.path.join(self.job_1.remote_path, output_filenames[self.job_1.job_adapter]),
'source': 'path',
Expand Down
Loading
Loading