Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure server #689

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

Azure server #689

wants to merge 31 commits into from

Commits on Mar 18, 2024

  1. ARC Env: RapidFuzz

    Added the the python package rapidfuzz to the environment for QChem Adapter to allow for basis set matching
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    dfa290d View commit details
    Browse the repository at this point in the history
  2. [WIP] Basis Set Dataframe

    Built a dataframe of possible combinations of basis sets that QChem requires and the correct format for the input file. Additionally, ensured that it is uploaded in the git push. This is a WIP as there may be more basis sets to add or even fix!
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    b65c34d View commit details
    Browse the repository at this point in the history
  3. Adapter Update: SSH fix, Submit Memory correction, SLURM submit memor…

    …y change, SSH File Download Correction, Null Byte Read, Remote Remove Files
    
    1.  Adapter: Submit Script Format updated with using user provided username {un}, also convert memory to an 'int', and also provide the option of {server_nodes} if required
    2.  Adapter: Total Submit Memory adjusted to now ensure that when troubleshooting a job, it never attempts to go OVER the maximum memory of the allowed submission memory of the node/server
    3.  Adapter: SLURM Submit Memory - Using `#SBATCH --mem` as the parameter now as it defines the TOTAL memory of the submission
    4.  Adapter: SSH File Download - We do not expect to always download or upload certain files depending on the scheduler via SSH. This change allows for recognising if certain files will be uploaded or downloaded depending on the user's scheduler choice
    5.  Adapter: Null Bytes can appear in files, or rather more specifically, if QChem has an error, it can produce Null Bytes in the out.txt/err.txt files and thus requires a different type of reading of the file contents. This is not 100% full proof though and may need extra work
    6.  Adapters: In #390 branch, SSH had improvements but were not merged. I have brought forth an improvement from this branch were Remote Files are removed once they are download to the local computer
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    83cd331 View commit details
    Browse the repository at this point in the history
  4. QChem IRC Software Support Recognition

    ARC can now recognise that IRC is also supported by QChem
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    6437460 View commit details
    Browse the repository at this point in the history
  5. SSH Improvement

    Inspired by branch #390
    
    1. SSH: Updated decorator to use the correct connect function
    2. SSH: If the user provides a server that is not in servers.keys() and server is also not None, then an error is raised to informat the user that they need to fix up the server keys
    3. SSH: An error that can occur is when a submission to a scheduler includes an incorrect memory specification, then there is warning to the user that the requested memory is not supported and needs to be checked. May need to make this a ValueError instead of a logger warning
    4. SSH: Slight adjustment to checking if there is an stdout after submission attempt
    5. SSH: Some servers require private keys. Originally the code was incorrectly adding the private key to the SSH paramiko function. It has now been changed so that system keys are loaded and then if the user provides a private key, it is included in the connect function
    6. SSH: Updated default arguments to `get_last_modified_time`
    7. SSH: Changed the lowering of the cluster soft
    8. SSH: Added a function to remove the directory on the remote server
    9. SSH: Azure SLURM has an extra status called 'CF' which means configuring (for the node). This can take 10-15 mins or so before the node is online. We now ensure to caputre this. HOWEVER, a node can get stuck in 'CF' status. Now we check this via checking the current time the node has been active, splitting the time up correctly (different formats of time are possible), and then if it is above 15 minutes, we run the command `scontrol show node {node_id}`. If the stdout includes the phrase 'NOT_RESPONDING' then we return 'errored'
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    64893cd View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    7948473 View commit details
    Browse the repository at this point in the history
  7. Vectors: Reading coords that are in string format using regex

    ARC may sometimes pass coords in a string format. To deal with this, a regex function is used to properly format it into a tuple. Will return an error if it cannot achieve a formatted tuple
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    6b3b610 View commit details
    Browse the repository at this point in the history
  8. Species: Get number of heavy atoms for TSGuess

    The original code did not functional correctly - nor was never used hence why it was passed into production. It has now been changed to properly return the actual number of heavy atoms
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    808ba16 View commit details
    Browse the repository at this point in the history
  9. [WIP] Getting the Internal Coordinates for QChem - Still not operational

    Needs further development and understanding
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    18f2b4a View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    2bf315c View commit details
    Browse the repository at this point in the history
  11. Scheduler: Import JobError, JobError Exception, Rerunning Job, Removi…

    …ng Remote Files, Checking Opt Jobs, Troubleshooting Conformers, Question Regarding not okay freq, Rerunning fine opt jobs
    
    1. Scheduler: Now imports JobError
    2. Scheduler: Fixed adding trsh to the args
    3. Scheduler: Added JobError exception for determining job status
    4. Scheduler: Now removing remote jobs at the end of the scheduler - !!!MAY NEED TO SEE IF THIS HAS AN EFFECT ON NON SSH JOBS!!!
    5. Scheduler: Getting the recent opt job name via properly checking if the opt job was considered done (This was not done before)
    6. Scheduler: TODO - We attempt to trouble shoot a frequency we deem not okay. Yet, there is no specific troubleshoot method, so why do we do this?
    7. Scheduler: Properly troubleshoot an job
    8. Scheduler: Fix conformer troubleshoot if it was a TS conformer
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    00e4b74 View commit details
    Browse the repository at this point in the history
  12. Parser: QUESTION - RAISE_ERROR, parse_normal_mode_displacement QChem,…

    … parse_1d_scan_coords QChem, parse_trajectory QChem, parse_args QChem
    
    1. parser: TODO - Why do we set raise error as true for normal mode displacement parsing? It has an effect on the function of raising a not implemented error even though it is implememnt
    2. parser: Now can parse the normal mode displacement of QCHEM
    3. parser: Now can parse the 1d scan coords of QCHEM
    4. parser: Can now parse trajectory of QCHEM
    5. parser: Can now parse arguments in the scan input file QCHEM
    6. parser: NEED to fix parse_ic_info for QCHEM
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    fe06f9e View commit details
    Browse the repository at this point in the history
  13. QChem Adapter

    0. QChem Adapter: Import - Pandas, ARC_PATH, rapidfuzz
    1. QChem Adapter: Input Template now supports IRC and {trsh} args and ensures IQMOL_FCHK is false (This can be set to true BUT be aware this .fchk file can be rather large)
    2. QChem Adapter: write_input_file - basis set is now matched via the software_input_matching function
    3. QChem Adapter: write_input_file - QChem now supports D3 method. We should look at working with other DFT_D methods in the future. More specifically there are other types of D3 methods
    4. QChem Adapter: write_input_file - Correctly pass in troubleshooting arguments into the input file
    5. QChem Adapter: write_input_file - Capitalised TRUE/FALSE in UNRESTRICTED parameter
    6. QChem Adapter: write_input_file - Removed the scan job type and moved it to another section of the input file writing
    7. QChem Adapter: write_input_file - If scan is set, the job_type is PES_SCAN. We also set the fine to be XC_GRID 3. However, we may need to look into changing the tolerances later
    8. QChem Adapter: write_input_file - We now write correctly the torsional scans for the input file for a scan
    9. QChem Adapter: write_input_file - IRC is now supported, however this input file means we run two jobs from the one input file - A  FREQ job and then IRC. This currently works but will need improvements when used more by users
    10. QChem Adapter: write_input_file - Ensuring that the SCF CONVERGENCE is 10^-8 for certain job types
    11. QChem Adapter: [NEWFUNCTION] generate_scan_angles - to support PES SCAN jobs, we have a function that will look at what the required angle we want to scan, and the step size, and then return a start and end angle between -180, 180 that will ensure we scan the require angle during the stepping
    12. QChem Adapter: [NEWFUCNTION] software_input_matching - Since QCHEM has different formatting for basis sets, this function will try take the users format of the basis set and match it against a dataframe (which should always be updated if its missing a format). This uses the new package in the ARC environment called rapidfuzz
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    197e248 View commit details
    Browse the repository at this point in the history
  14. trsh QCHEM & MOLPRO

    1. TrshQChem: Fixed error checking in QChem output files. It would originally mistakenly think SCF failed was the error due to what errors it would look for in the lines
    2. TrshQChem: FlexNet Licensing Error - If the license server is not working this will cause ARC to stop
    3. TrshQChem: Max Optimisation Cycles is probably checked for in the output file
    4. TrshQChem: Max Iteration Cycles is identified now if there is a failure during SCF convergence
    5. TrshMolpro: Molpro reports memory errors that need to be properly troubleshooted differently than what we did originally. Now, we will look for how much memory needs to be increased in order for molpro to run successfully. This is done through regex pattern matching. We also check for triples memory increase if required
    6. Trsh: determine_job_log_memory_issue - Sometimes the job log can have null bytes in them, usually a QCHEM issue, and so this means we need to open the file to read differently
    7. TrshQChem: trsh_ess_job - QCHEMs trsh has been reworked so now that it will combine troubleshoot attempts if they were attempted previously. For example, if we troubleshooted the max opt cycle but now need to turn on SYM IGNORE, it will include both of these statements in the troubleshooting input file
    8. TrshQMolpro: trsh_ess_job - Molpro required a chnage in how we troubleshoot the memory. If we get an error for the memory it is because either the MWords per process is not enough, even though we have provided an adequate about of memory to the submit script OR the MWords per process is enough but the TOTAL MEMORY (MWords * CPUS) > Max Node Memory, therefore CORES has to be reduced.
    9. TrshSSH:trsh_job_on_server - Fixed it as a 'with' statement so the client is closed when exiting the 'with' statement
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    0baf191 View commit details
    Browse the repository at this point in the history
  15. Molpro Adapter

    Molpro Adapter: Molpro needs a different touch to troubleshooting its memory. Here in setting the input file memory we determine if the MWords was enough per process but the total memory was too high. If that's the case, we reduce the processes req. while maintaining the memory per process
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    b00bca6 View commit details
    Browse the repository at this point in the history
  16. [WIP] QChem Test

    QChem Test Module - Needs tests and fix ups
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    48a612f View commit details
    Browse the repository at this point in the history
  17. [WIP] Molpro Test

    Adjusted the file name to download from input.out to output.out
    Need to create a test for the mem per process change
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    5ac4317 View commit details
    Browse the repository at this point in the history
  18. main_test

    Fixed up main_test.py due to the addition of azure
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    e346f8d View commit details
    Browse the repository at this point in the history
  19. level_test

    Will require some additional tests
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    65186c8 View commit details
    Browse the repository at this point in the history
  20. [Temp] Change CACHE_NUMBER

    Change the cache number so that rapidfuzz is installed in the environment and the cache of ARC production is not used
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    76de095 View commit details
    Browse the repository at this point in the history
  21. Adapter: total submit memory fix

    total submit memory calculation fixed if max_mem is None
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    c69d700 View commit details
    Browse the repository at this point in the history
  22. QChem Adapter: Updated Tests

    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    84ec8cb View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    46541b1 View commit details
    Browse the repository at this point in the history
  24. TrshTEST

    Updated the tests for Trsh
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    75942f5 View commit details
    Browse the repository at this point in the history
  25. mainTEST

    Updated the tests for main.py
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    a569939 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    45681e1 View commit details
    Browse the repository at this point in the history
  27. submitTEST

    Updated the server assertion
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    cb5ff4c View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    90a5c89 View commit details
    Browse the repository at this point in the history
  29. Adapter: Submit Memory Fix

    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    dfdcf13 View commit details
    Browse the repository at this point in the history
  30. TrshQCHEM: Added Minimization Error

    'Error within run_minimization with minimization method' - Not certain what this error requires, and also if we should troubleshoot it if the job type is a 'conformer'. For now, we will re-run the job under the same conditions and if it fails again, we will declare it not possible to troubleshoot
    remove 'break'
    calvinp0 committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    e9954a5 View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    123279e View commit details
    Browse the repository at this point in the history