Assets 5

PBS Pro v18.1.3 Release

Bugs fixed in this release are as follows:
3ffce9b default_qsub_arguments -V not working (18.1)
72e45a2 Scheduler may oversubscribe node while confirming reservation

@subhasisb subhasisb released this Jun 2, 2018 · 3 commits to release_18_1_branch since this release

Assets 5

PBS Pro v18.1.2 Release Notes

Bugs fixed in this patch are as follows:

86bf85e PP-756: Upgrading from an older version to the latest mainline version fails to start the PBS daemons
51c1875 remove extra destroying of a lock in tpp

@subhasisb subhasisb released this May 11, 2018 · 6 commits to release_18_1_branch since this release

Assets 2

PBS Pro v18.1.1 Release Notes

18.1 delivers faster scheduling, better job performance, improved utilization, new policy controls and improvements focused on making administration easier. For details on the new features, please see the design documents in the PBS Pro contributor’s portal Project Documentation space at https://pbspro.atlassian.net/wiki/spaces/PD/pages

New features include:

  • Support Intel Xeon Phi KNL Processor on Cray Platform

  • Job equivalence class optimization improves scheduling speed by an order of magnitude for workloads with many thousands of small jobs

  • MultiSched: multiple schedulers each handle separate partitions, with per-partition scheduling policies, speeding scheduling while retaining a single global view of the whole cluster

  • Cgroups integration provides kernel-level resource usage limits, reporting, and isolation, ensuring that jobs sharing nodes run as fast as possible without interfering with each other

  • Green Provisioning(tm) enables power management (profiles, on/off, capping, ramp rate limiting) and energy accounting as supported on systems from Cray and HPE

  • Soft walltime resource can be set to improve backfill calculations and increase utilization

  • Jobs can shrink, releasing nodes that are no longer being used, e.g., during post-processing or stage-out, effectively increasing utilization

  • Job suspend/resume allows specifying which resources are released/retained, allowing better bin packing to ensure new jobs actually fit on semi-vacated nodes

  • Fairshare values can be used in the job_sort_formula for defining scheduling priority

  • Advance and standing reservations can have their start and end times adjusted (via the new pbs_ralter) to accommodate changes in operational schedules (such as late data arrival or longer calculations due to numerical instability)

  • Write stdout/err files directly to their final destinations or automatically delete them, eliminating the overhead (and possible faults) associated with maintaining local copies

  • Enhanced output of qstat (JSON, CSV, and one-unbroken-line formats) make it more admin and script friendly

  • New admin suspend allows node maintenance for nodes associated with a job or set of jobs, e.g., to repair a filesystem issue without killing jobs

  • New periodic server hook event expands plugin framework

  • Improvements in handling secondary groups, authorizing MoMs, placement set creation, job array resilience, per-chunk provisioning, and execjob_prologue execution

  • Improved diagnostics: Debuginfo RPM package, hostname and interface logging each time a log file is opened, new pbs_snapshot tool captures full system state

  • New platform support: ARM64, Cray XC Series, Ubuntu, and Windows

Numerous bug fixes and test improvements. The below is the list of bugs fixed on top of 18.1.beta release.

9a9b82b Update PBS Pro version to 18.1.1
461b4d8 hook fail_action 'offline_vnodes' is not functional
61397fd Job IDs are not unique after a server restart
f976708 PP-1252: daemonized qsub causes foreground qsub failures when it decides to quit.
f1a131e PP-1222: Add skip test for the case when server and mom are on the same host
f04ecec Additional PTL test cases for PP-337 Multiple schedulers
13568cb orphaned subjobs at job_history_duration expiry and a race scenario
22cdb01 Take away xeon_phi_provision hook from non cray platform
e3e0153 In SGI power, refactor log messages and log levels
57fa19e PP-1262: Fix compiler warnings generated with gcc(version 5.4.0) flags = '-g -O2 -Wall -Werror -fPIC -D_LARGEFILE_SOURCE -Wno-unused-result'
ccdb684 Addressed server and mom leaks involving qselect, resourcedef, dependencies, hooks.
db30f2d Make alps reservation query requests tunable
af385fa PP-1239: win_postinstall failed to register and start the services in windows 10
c2d3d46 After Powering on of the node, job goes into Held state
acd556b Updated cgroups tests in pbs_cgroups_hook.py to be compatible on CRAY
dd10d2a Add tests for hyperthreading on a Cray X* series system.
8a88adf PP-1275: Remove max_attamepts specified in TestHookSwig test suite
c318d59 Add deepcopy support to pbs.size type
b00c8db PP-1273: PTL fails to start remote mom
342b938 Fix server memory leaks and remove wrong supressions
bc14a92 PAP-7760: Automate t7-t12 of mom dynamic resource test
aa158da Subjob in X state should not be allowed to qdel
81ecd25 Fixed race condition in log match
f626203 Removing python-magic dependency from PBS Snapshot
fd9a5cd Retain previous scheduling cycle's configuration of Multisched as much as necessary.
82bfedc Add additional tests for server dynamic resources
6a4de9c TestNodePartition is failing due to unknown node error
3c44349 Added a section for log output in the template
3264fa1 Node attribute declarations for ptl testlib
9557790 Valgrind suppress potential unfreed memory in mom and server that are tracked globally.
39ebdfd Fix test_preemption race condition
39278f2 Increase sleep time in test_multiple_job_preemption_order
21db98e PP-1088: Fix TestQstat_json test 'TestQstat_json.test_qstat_bf_json_valid'
3ab72eb PP-1267: Fix script error in TestReservations.test_sched_cycle_starts_on_resv_end test
02c7142 PP-1259 : Deleted subjobs get requeued after server restart or failover
837dc1a Automate t1-t6 of mom dynamic resource test

Pre-release

@subhasisb subhasisb released this Apr 17, 2018 · 202 commits to master since this release

Assets 2

PBS Pro v18.1 Beta Release Notes
Updated 2018-04-19

18.1 delivers faster scheduling, better job performance, improved utilization, new policy controls and improvements focused on making administration easier. For details on the new features, please see the design documents in the PBS Pro contributor’s portal Project Documentation space at https://pbspro.atlassian.net/wiki/spaces/PD/pages

New features include:

  • Job equivalence class optimization improves scheduling speed by an order of magnitude for workloads with many thousands of small jobs

  • MultiSched: multiple schedulers each handle separate partitions, with per-partition scheduling policies, speeding scheduling while retaining a single global view of the whole cluster

  • Cgroups integration provides kernel-level resource usage limits, reporting, and isolation, ensuring that jobs sharing nodes run as fast as possible without interfering with each other

  • Green Provisioning(tm) enables power management (profiles, on/off, capping, ramp rate limiting) and energy accounting as supported on systems from Cray and HPE

  • Soft walltime resource can be set to improve backfill calculations and increase utilization

  • Jobs can shrink, releasing nodes that are no longer being used, e.g., during post-processing or stage-out, effectively increasing utilization

  • Job suspend/resume allows specifying which resources are released/retained, allowing better bin packing to ensure new jobs actually fit on semi-vacated nodes

  • Fairshare values can be used in the job_sort_formula for defining scheduling priority

  • Advance and standing reservations can have their start and end times adjusted (via the new pbs_ralter) to accommodate changes in operational schedules (such as late data arrival or longer calculations due to numerical instability)

  • Write stdout/err files directly to their final destinations or automatically delete them, eliminating the overhead (and possible faults) associated with maintaining local copies

  • Enhanced output of qstat (JSON, CSV, and one-unbroken-line formats) make it more admin and script friendly

  • New admin suspend allows node maintenance for nodes associated with a job or set of jobs, e.g., to repair a filesystem issue without killing jobs

  • New periodic server hook event expands plugin framework

  • Improvements in handling secondary groups, authorizing MoMs, placement set creation, job array resilience, per-chunk provisioning, and execjob_prologue execution

  • Improved diagnostics: Debuginfo RPM package, hostname and interface logging each time a log file is opened, new pbs_snapshot tool captures full system state

  • New platform support: ARM64, Cray XC Series, Ubuntu, and Windows

  • Numerous bug fixes and test improvements

Detailed release notes (via “git log --online v14.1.2..v18.1.beta”):

e3d870f PP-1216: TestReservations.test_sched_cycle_starts_on_resv_end test failed with race condition
827f059 PP-1067: pbs_topologyinfo enhancements
ebcd34f fix node oversubscription during preemption
7345f22 PP-824: Cray - on/off functionality
e63c974 PP-706: Automatically create KNL specific information
b32aa31 memory leak fixes in server
ee3072d fixed timeout for pbs_hook_alarm_large_multinode_job.py and moved to resilience folder updated pbs_soft_walltime race condition in test_soft_held
9195cbf Scheduler reports preempted job as Can Never Run for one cycle
ca3f48f mail fork error is not logged
65d60a1 Fix resourcedef file not getting sent to moms due to fix in commit 360de29.
cff78a9 Add multi-sched support to pbs_snapshot
d7862b1 Fix _assign_resources fail to assign mem only vnode
ab2e810 PP-1009: Enhancing sudo usage in PTL
0f11da0 Add test for walltime set to zero after pbs_mom_restart
807be8d PP-1250: Fix rpp pileup due to processing only 3 messages per server iteration
c25a772 Free the attribute so it doesn't leak memory
bcea516 PP-1239: win_postinstall failed to register and start the services in windows 10
6a9e75f PP-1050: pbs_mom not coming up after powering on the compute node on mom installation on ICE
4d15de0 PP-479: Running subjobs to be able to survive a pbs_server restart
ec94015 Adding code to update ncpus and verifying subjob state in TestRunJobHook
c4c3c33 fix to parse new value aarch64 for basil processor architecture
1412c05 Fix several server memory leaks and add to valgrind suppressions file for known non-leak allocations
587b809 Fixed timing issue in mom_hook_sync test
16deb6d PP-1130: Add Cray specific PTL smoke tests
c36837d Do not enable the PBS_translate_mpp hook when reverting to default
4682075 array subjob once rejected by runjob hook will never run
360de29 Fixed problem where hook fails to run after event change
8223334 Added additional test cases for server dynamic resources(t5-t8)
3a0ad71 Re-enable the PBS_alps_inventory_check hook on Cray X* systems
2706eb9 Fix race condition between cgroup periodic and begin handler Use mom_priv/hooks/hook_data/cgroup_files as temp jobid list
6f5663c sometime test_soft_then_hard failing due to race condition
2c15fd7 Additional tests for prime and dedicated queues and qmgr sched attrs
52f723e Automated tests for verifying server dynamic resources(t1-t4)
61b204a don't enable PBS_translate_mpp hook
59cc49b Removing checksign from Travis
fcfe6e8 pbsnodes -r does not clear the state from mom structure
f7dbb2d PP-1078: On Cray PTL job submission fails while using create_script function inside test
8af069f PP-1242: Windows pbs_mom.exe crash in check_pwd()
13a3681 PP-735: Additional test cases for Power Awareness on SGI
2de68be PP-1223: test_root_owned_script.Test_RootOwnedScript failed with error Permission denied: '/root'
f54f8ef PP-1238: All schedulers are querying for few of default schedulers attributes instead of its own
2e8b6aa PP:1177 remove sudo from run_cmd for TestQstat_oneline_json_dsv
cab95ed PP-1241: pbs_habitat fails when psql -V command has multi-line output
1a72ead PP-1240: Invalid JSON output
e12ee7e PP-1067: Utilize new topology information to accurately count devices (sockets, MICs, GPUs) on nodes
8318de8 PP-1070: Revert ifl call pbs_statsched signature back to pre-mutlisched version.
3df8099 pbs_fairshare log_match failures
418497a PP-905: Ctrl+C During Interactive Job Startup Race Condition on Cray
c5e180b Increased timeout of test.
c6117a9 PP-1152: wait longer for cycle to end
bb53b58 Add automated test for cgroup race condition crash
98235b6 PP-747: Equivalence class tests with scheduler partitions
d6eb67f Adding additional tests for pbsfs on multi scheds
3798b6b Delete cgroup again in execjob_end_handler in case execjob_epilogue failed
6d0856a Fixed .gitignore to ignore make dist files
80dbbc6 PP-1030: Pbs_account fails to validate domained USER2 from logon domained USER1
9e1d511 Remove cached ip addresses when node is deleted
9ad0640 Fairshare for multisched
a6eede2 Fix for equiv classes not working with limited resources not in resources line
de468ce updated sched_config value in test_fairshare_formula5 for pbs_fairshare.py
49f9ae3 PP-1024: Secondary server stuck in Starting state even after Primary server is killed, with an error in tpp_em_wait()
9623775 Fixing invalid writes reported by valgrind for PP-746
57ae1a7 Fix for invalid read of mem from req struct after its free'd
713b5d9 scheduler errors out if server_dyn_res script returns unexpected value
c8ce3a4 Removes casting of NULLs and zeros
20c3ac7 PP-773: Array jobs do not send emails when specified for -m abe
8cff32d Remove capturing rscs_all file from pbs_snapshot
681b32a updated tests to be multi node and added more regression tests
a82225e fixing sched_config errors for pbs_conf_resv_stale_vnode test
617c5e5 PP-1108: Update PTL fw to keep data consistent between PTL and PBS
595956b PP-855: PTL uses wrong default path for PBS_HOME
1e4f95d Add sanitize build support in Travis
718b719 PP-1210: Expect function should validate server_state when scheduling is turned off
4b9ed9f Fix for wrong usage of memset and few unbalanced python refcount
1589327 PBS Comm fails to open log file under valgrind
31f4bc0 Fix cgroup hook race condition crash Moved file locking inside main() to avoid gaps Changed parse_cgroup_config() to a static method so it can be called before a CgroupUtil object is instantiated Determine assigned resources after cleaning up stale set of cgroups in execjob_begin_handler
4043c52 PP-1068: race condition in test_alter_standing_resv_start_time_before_run
d46352b Make PBS init script valgrind compatible and fix for temp file removal
2b69b46 PP-1184: putenv calls are flawed
f5d983d Skip the test for non-Cray platforms.Corrected log_match() message to have Mom's hostname
3296689 Fix debug build and update travis matrix to add debug build
4207d16 PP-1023: FIFO not working
7b1e199 Remove version field from PR template
d53b298 PP-1219: Build fails with -fsanitize=address
e69763b PP-1221: PBS Scheduler can crash if qurey_queue_info returns NULL
8a3eeb5 PP-1021: PBSPro Windows debug executables frequently crash
0ae5917 PP-747: Modify the sched object to have additional options to configure pbs_sched
757a7d4 Update OSS version from 17.1.0 to 18.1.0
0f7d011 PP-1006: AOE provisioning enhancement.
2208e9d PP-1164 On windows walltime is not see on qstat output or accounting log
e328499 PP-1209: Incorrect termination of buffer
5303f79 Make the ALPS reservation_mode default to none
cded060 PP-1094 Fixed file permission errors when accesing hooks/tmp
b378695 PP-1193: TestClientNagles is failing due to incorrect command being executed
f2a8b92 PP-1149: Adjust the sleep time for TestSoftWalltime.test_soft_held
46b2988 PP-1211: Using eligible_time, parent array job misses the stime, comment, and 'S' accounting record.
b67d9a8 Fix for AppVeyor build failure
3e6cb09 PP-1098: PTL should cleanup job folders
1b2e942 Fix bug introduced by TPP changes to register all ip address.
15d854d PP-1201: deleted job that was not cleared from list, causes server crash in post_discard_job()
9f5472f PP-1016, PP-1043: choose components of pbs to launch in docker container using env variables
2316429 PP-1117: The used walltime is set to zero after pbs_mom restart.
fc8a118 PP-1124: Running a multinode job, sister mom crashes
0b4314f PP-1176 Add checks for successful mom hook copy
7a9d0a1 PP-1005: Address compiler warnings with gcc -Wall (part 2)
ba0165e PP-783: Fix for application provisioning
2093dc3 Add Code of Conduct and Contributing guidelines for Github Community
0067b43 Update PULL_REQUEST_TEMPLATE.md
bc90660 PP-1189: Preemption does not happen if resource in contention is a server/queue level resource
ca536b1 PP-1092: Adding a call to TestFunctional.setUp as the tests were failing on testbed systems
d42ddec PP-56: svr_clean_job_history can cause server quasihangs and spurious failovers
e6c9439 Added a line in win_postinstall script that notifies script will take a few minutes to run.
f9bb781 PP-1191: Compilation errors while building PBSPro OSS for Ubuntu 16.04 LTS
bbadc8f PP-1153 race condition in log match for TestPbsHookAlarmLargeMultinodeJob.test_epi_hook
494b4d6 PP-1166: mom hook updates forgotten if acks not received before timeout
3fa460d PP-1188: Add easy valgrind support for PBS
9580f1c PP-1092: TestJobRouting has typo for setUp function
27681ac PP-1187: Miscelleneous fixes on TPP usage in net_server.c
03acb3b PP-1040: Moms cannot communicate with one another in a cloud configuration when cloud nodes resolve each other's hostnames to IP addresses not known to the PBS server/comm
af8eaee PP-1131: Missing vnode configuration in “TestReservations.test_sched_cycle_starts_on_resv_end” test causes a reservation failure
88bed72 PP-1137: Stack corruption in pbs_account.exe
4c2ee06 PP-1190 pbs_server crash inside python garbage collection routines
ec5837d PP-1060: Job with future start time immediately start accruing eligible_time
682bf7d PP-1174: Fix copyright headers in several files
d6bd1d6 PP-1132, PP-1133, PP-1134, PP-1135, PP-1136, PP-1126: Windows build scripts enhancement
648c9f6 PP-1111 Calling pbs_benchpress with a bad argument runs tests
d83c8d9 PP-1119 Creating Reservations Uses Memory after Free
c285f2f PP-1059: added one more regression case
467ed3f PP-1005: Address compiler warnings with gcc -Wall
29467dd PP-1109: After a server restart, all running jobs disappear from pbsnodes
e8f7882 PP-1160: Reduce server quasihang in vnode_available when checking more than 1000 instances of standing reservations.
b586ae3 PP-1178: Fix test TestResvStaleVnode to include priority level in set_sched_config
fd13a2a PP-1105: qsub: standard input notice
d5296b7 PP-826 vnode_pool is displayed in pbsnodes -av output on Windows
5509dea PP-1121 pbs_probe buffer overflow
ef69b0e PP-1162: update pbs_snapshot path in TestPBSSnapshot
1cca382 PP-843: Ignoring validation of cray login node license attribute.
ad002d5 PP-1032-1127: preempted jobs aren't put in their own equivalence classes.
7764e2d PP-1079 Submitting reservations are invalid if only specifying endtime and duration
42a3258 PP-1071, PP-1072: Text gets jumbled and spills into borders in Windows installer graphic; win_postinstall message updates
df7c86b PP-1158:TestQmgr fails saying "No such file or directory"
a48c19a PP-1156 Job Obit use memory after free
b549884 PP-772: Added new tests
2b1c680 PP-1144: PTL-Interactive job timeout if jobid has FQDN hostname
7bf925d PP-1093: pbs_server crashes when recovering database after default_qsub_arguments is set
37ac640 Merge branch 'master' of git://github.com/ptosco/pbspro into ptosco-master
cd86e03 Merge branch 'PP-1101' of git://github.com/brewlius-cesar/pbspro into brewlius-cesar-PP-1101
0a795f5 PP-1096 daemon protect (Linux OOM protection function) is wrong
a3c55eb PP-1099: Add error checking in Scheduler for resources
d3ad767 Address TravisCI build failure for OpenSUSE
c66b0a1 Restore ability to build with -DDEBUG flag
c8dcd17 PP-1059: Jobs in queues with resources_available limits (e.g., resvs) may not run due to equiv classes
13ec422 PP-891: Added tests for power provisioning on cray
3569105 PP-404: Address remaining comments for cgroups hook after checkin (part 2)
2dfc2ba PP-772: fix compiler warning
998d3ca PP-1100: Build wineditline with MSVC rather than GCC to avoid linking errors
3516e5f PP-966: Scheduler confirms resv on stale/down nodes and resv stays confirmed
0df1c0d PP-1102: Windows Debug configuration has unresolved external symbols
4494144 PP-1095: starting "pbs_mom -p" with running job results in SIGSEGV
1724245 PP-1089-1090: Skipping tests instead of failing them when no.of MoMs hosts provided is not as per the expectations of the test.
3a1a04e PP-985: Long sched cycles/ALPS broken on Cray CLE6 because PAGG ids set to non-unique session IDs
de39feb PP-844: Fixed test_normal_user_unable_to_see_res_released ptl test
a6d027b PP-772: improve accuracy of walltime
e44cd6b PP-1081: Arrayjob 'E' accounting record has the 'start' value set to 0
95db2d0 PP-481: Added a new test and fixed another test
0e2e634 PP-954: Skip test if expected number of MoMs is not provided
b456904 PP-938: Action functions do not handle interdependent value validity set at the same time
fafc5f9 PP-482: additional QA test for soft walltime
d2bbd06 PP-844: Make resource_release_list and resources_released attribute visible to operator/manager only
b86a939 PP-516: Additional tests for direct write and remove files
7b8ee87 PP-770: state_count queued jobs count goes negative after server restart
f9583b5 PP-843: pbsnode reports mem in b(bytes) instead of kb for cray login node.
19766c4 PP-983: mom crash due to memory leak on HUP
e2d0cae PP-1066: PTL's is_cray() call fails to detect cray build
896537e PP-1038: pbsnodes produces nonconforming json
2114586 PP-1069: Remove the example inventory hook from unsupported directory
fe9c814 PP-964: Wrong use of BASIL string size used in snprintf
22f59bd PP-1055: Fairshare problems between cycles
e4adb21 PP-896: pbs_snapshot Option -d <pbs_diag> is not working
15445d2 PP-1061: Preemption does not happen when a job is qrun
df510e7 PP-935: Move pbs_snapshot from unsupported/ to sbin/
69de242 PP-662: rebased and undid the merge commit
2905a00 PP-560: scheduler leaks error structure during preemption
63f7e5c PP-1057: Address new compiler warnings
1b116db PP-1017: do not set job substate prerun in while holding the job
8bbaf86 PP-946: On Cray a PTL script setting select specification on a job using set_attributes method makes the job fail to run
ca53054 PP-911: Support for sub-job index in runjob event hook for Array jobs
1694485 PP-482: Soft Walltime
1a02257 PP-875, PP-959, PP-965: PTL log_match current defaults to 1 for max_attempts and interval and also relevant tests
c98f5e9 PP-923: qrun failing if job's formula value is below threshold
160a472 Additional PTL tests for PP-864.
dc70261 PP-941: print queue name in queuejob hook debug input file
6c636d3 PP-1034: Scheduler uses uninitialized memory. PP-1035: scheduler memory leak in query_resv(). PP-1037: Server crashed after requesting multi-chunk reservation with scatter.
ddac541 PP-481: make prologue hooks execute on all sister moms all the time
46ed217 PP-1027: Add Man page functionality test in PTL smoke test
3247d85 PP-1029: Travis needs to be updated to install man command
fccf2ce PP-927: Don't call pbs_postinstall on CLE6.0
1a46097 PP-979: Upgrade from PBS Schema version 1.2 to 1.3 is failing
a3ad1a0 PP-994: fix link to docker multi-stage build documentation
0e1a24b PP-1022: double free with unsupported batch request on mom side
ca67226 PP-35: Failover file locks are broken if file timestamps don't move for a long enough period (STONITH)
5477806 PP-1014: move git clone and exec build.sh from base to build dockerfile
a4fad72 PP-668: execjob_launch fails to setup the environment properly if one of the environment variables has a \n or a comma or quotes + Variable_List corruption when job is read from .JB file from disk
4339641 PP-930: PP-931 Duplicate files && usage message correction
343e19f PP-729: Failover after power loss on primary makes secondary try to connect to data service on primary -- never gives up
aa1e7c2 PP-852: It is possible to indefinitely delay the server ping_nodes using a client that registers/deregisters with pbs_comm
eee66d3 PP-971: MIN_STACK_LIMIT for stacksize not working
eb9e918 PP-1012: cgroup test references wrong variable
cc657ca PP-864: Ability to suspend jobs on Cray
f007774 PP-998: fix incorrect comment of job array on subjob rejection add a new ptl test
5a32120 PP-1004: remove sudo for ulimit from pbs_hook_debug_nocrash.py
bc48f95 PP-339 - additional QA test for node ramp down
c63bebe PP-995: update job object and other race conditions in pbs_accumulate_resc_used.py
a0f53f9 PP-698: Update copyright year in the source headers
4f1a84f PP-984: cput and mem limits kill job even if the value is equal to the limit
9685360 PP-970: mom file descriptor leak on resourcedef update when mom hooks present
36bdf3e PP-977: pbs_admin_suspend.py test_offline needs to be updated to not reuse job object
0ef58d2 PP-800: qsub performance improvement opportunity: env_array_to_varlist() called unnecessarily if -v or -V are not specified, costly if environment is largPP-800: qsub performance improvement opportunity: env_array_to_varlist() called unnecessarily if -v or -V are not specified, costly if environment is largee
5946ba1 PP-794: PBS does not update job comment while resuming a suspended job
3124c75 PP-992: add a readme for dockerfiles
2c2e09e PP-990: modify ONBUILD line somehow dockerhub vm complained about it
217b786 PP-991: Travis build: Add --no-gpg-checks to zyper install for pbspro rpm
a252e35 PP-989: reduce docker image size and build time use multistage build and reduce number of layers create a seperate base image for building
8374f6b PP-404: Address remaining comments for cgroups hook after checkin
11b4806 PP-942: Compilation failure when -DDEBUG is defined
032b529 PP-516: Direct write of the job's stdout/err files
20f8df3 PP-891: PBSPro Power Awareness for Cray
e75209d PP-921: qsub can crash based on -v value
1a3a310 PP-662: Allow an admin to modify the start/end times of a reservation. PP-663: A scheduling cycle should also be triggered by the end of an advance reservation. PP-906: Consider other nodes when extending reservation. PP-703: Enhance accounting logs with complete information about reservations. PP-701: Capture timezone for standing reservations in accounting logs.
0cdf323 PP-162: scheduler keeps waiting even after session id has arrived from job launch
91702ce PP-960: Add Dockerfile for centos7
12b4525 PP-950: PP-929: PBS Pro (windows) services are not able to start due to missing libical dll's
2a564c3 PP-610: On a Cray X-series, periodically synchronize PBS with ALPS inventory
397a56f PP-750: Add a new node attribute named "partition" to the node object
ae13f6a PP-863: Added additional PTL tests
55015db PP-809 Basil Inventory PTL test case.
e6eca7d PP-882: Windows build fails
e31c40a PP-783: race condition in mom hook transport initiated by provisioning
d7cf35a PP-880: EVP_CIPHER_CTX_free get called twice on same variable
751bcd6 PP-909: Update make-ug of PTL to allow test execution as a non root user
fa06c03 PP-765: Possibility to allow all moms in acl (Modified PTL test case)
1ce14fe PP-878: pbs_swigify is broken
9c54ada PP-915: pbs_snapshot doesn't anonymize Authorized_Users & Authorized_Groups
322883f PP-339: Node Ramp Down Feature
4e1e92c PP-60: Using kill -HUP pgrep pbs_mom could result in signal being sent to children of main MOM
97326aa PP-718: Added more regressions for fairshare formula
0a38365 PP-902: PTL fails to parse mom config if more than one spaces are present on single line
6f91c9f PP-890, PP-893: get_log_files() & Scheduler.cycles() are broken
fbc1b97 PP-870: memory leak fix in scheduler
a6fc64d PP-863: scheduler won't run jobs in the same equiv class if there is a suspended job in the list, but isn't the first job
bc42307 PP-806: Add a new queue attribute named "partition"
779becd PP-887: PTL: Support for interactive job is broken
e0050da PP-868: Add a note to sched_config : help_starving_jobs being deprecated
5465759 PP-885: small memory leak in scheduler - equiv_class_resdef is not freed
bb29561 PP-735: PBSPro Power Awareness for SGI
ac48b75 PP-617:PP-788: Test for acl_groups and acl_resv_groups considers only primary group
395301c PP-809 Support Basil 1.3 and 1.4
c8416ea PP-854: PBSPro for windows needs a installer
5f1bee4 PP-867: Add a note to sched_config : mom_resources being deprecated
8cac15e PP-879: Add backward compatibility to older version of openssl
3251e48 PP-851: Extend PTL infrastructure to support parallel scheduler
a653129 PP-856: Implement AppVeyor CI for Windows
c77f65e PP-865: PTL doesn't unassociate nodes from queues in revert_to_defaults()
a3ca87f PP-866: Windows build is not compiled due to error C2079: 'scheduler' uses undefined struct 'sched'
b789e98 PP-858: Revert "PP-803: Run tests from newly added or modified test file along with smoke tests in Travis"
28a590e PP-746: Extend PBS to support a list of scheduler objects
8fe986d PP-850: fix for scheduler crash while resuming suspended job
ab8616c PP-785: PTL should use same version as PBS PP-778: Bump up PBS version to 17
9e23646 PP-862:pbs_snapshot errors out if there are custom resources
d2fcca9 PP-360: make libpbs depend on libcrypto and libpthread
b961560 PP-839: PTL framework should not execute dynamically generated script as SUDO
b8058f3 PP-849: fix for putting node in maintenance upon admin-suspend
7573d4c PP-741: Need a default job specific to Cray in PTL
214fee3 PP-828: preempted jobs can cause other jobs in their same equiv class to not run
a80187d PP-842: Scheduler doesn't detect old user/group limits when creating equiv classes
299b6d3 PP-719: Enhance setUp in PTL specifically for Cray platforms
658c228 PP-803: Run tests from newly added or modified test file along with smoke tests in Travis
92bc657 PP-758: Add pbs_snapshot tool ..
8585c74 PP-306: Refactor references to postgres commands in pbs_habitat
7dda921 PP-718: Add fairshare support to the formula
770ef27 PP-834: fixed releasing queue/server level resource on suspension
ae04a77 PP-835: Travis failing due to new updated Trusty images
d7c5f3d PP-480: Added more regressions for job equivalence classes
8e607ba PP-780: mom execjob_begin hook alarms when running a large multi-node job
fa1e435 PP-790: qmgr output has inconsistent whitespaces
ee7c331 PP-788: If a user's primary group is not the same as their user name then they are not able to submit reservations when acl_resv_group_enable is set to true
929af52 PP-777: Tracejob output truncated on jobs using many vnodes
e22764c PP-786: PTL sends signal to wrong pid in signal()
57b5059 PP-734: ability to release limited resources for suspended jobs
e277be7 PP-776: mom's chk_file_sec messages missing when syslog is being used
23612f5 PP-765: possibility to allow all moms in acl
9830a7b PP-781: Pass appropriate output directory to pbs_diag while saving post analysis data in PTL
b69e65d PP-789: address a compiler warning in node_manager.c
c031a13 PP-306: Refactor references to postgres commands in pbs_habitat
cacfde4 PP-736: update log_match in PTL to report failure
08a5327 PP-699: Calls to memmove() should cast third argument need as size_t, not int
15b3646 PP-480: Job Equivalence Class Optimization
deb3da6 PP-698:Updated copyright year in source file headers Fixed inconsistency in copyright headers
934abfc PP-721: Error observed when post analysis data is saved in PTL test execution that has failed tests
8093817 PP-587: One MoM reports the compute node info to server
085bc2c PP-740: Add Cray platform detection in PTL
12d5a5f PP-745: Fix for single node job, string type resource encoding.
19417ab PP-738: Wrong test count displayed when PTL fails in setUpClass
350e4dd PP-722: PTL post data analysis option max-postdata-threshold is not working as expected
cf9ec98 PP-737: expect failure causes a test error rather than failure
afdc229 PP-728: Fixed upper case in some dependency library names
5deaac8 PP-489: Updating PTL test case for interface information in daemon logs
7380adb PP-764: sched attribute opt_backfill_fuzzy can't be set
06b7c85 PP-484:PP-485:Tests for qstat dsv and json output
59ab832 PP-755: modulefile and pbs.service not in .gitignore
2b5c4e4 PP-489: Daemon logs should have interface and host name information
ddac9d0 PP-702, PP-754: Installations and upgrades on Cray XC CLE 5.2
929f4c4 PP-732: scheduler's attribute allows to control unset resources in placement sets
52fa00b PP-730: log_err, log_joberr messages do not appear in syslog if PBS_LOCALLOG is not set and PBS_SYSLOG is set to 1 in pbs.conf
b1b084b PP-739: Add 'Delegate=yes' to pbs.service file
f602d47 PP-731: Server crashes in fault tolerant mode
2de2049 PP-726: Systemd PBS service startup times out if there are tens of thousands of jobs present
dbf81f1 PP-211: Scheduler dumps core for suspended jobs..
b0a8fd4 PP-682 : pbs_comm drops new connection request for registered node
849a145 PP-239,PP-659,PP-708,PP-709,PP-710,PP-711,PP-712: Cpuset support to PTL
f6b15fa PP-742: Scheduler crashes during re-confirmation..
b5c1593 PP-512: Memory leak in server when writing 'E' accounting record
90f8dea PP-84: Scheduler crashing, race condition
204d686 PP-683: Sending large job scripts over TPP causes issues with TPP. Should use direct TCP connection in those cases.
ded681d PP-483, PP-484, PP-485: qstat output enhancement
63732c9 PP-489: As an admin, I would like to have the daemon log its hostname each time the log file gets opened, so that I can know easily and for sure which host produced a log file I am looking at
88b09a4 PP-693, PP-704: qstat -Bf is incorrectly called for every vnode during a PTL test run; Few test cases in PTL test suite "SmokeTest" refer to server instead of mom object
5f0f9d5 pp-669: update pbs_accumulate_resc_used.py
41bdd80 PP-657: scheduler infinite loop after subjob fails to start on node
adfeded PP-590: Correcting PTL to fail when MoM other than MS sends resources_used info to the server.
862cff7 PP-654: jobs not getting backfilled when server backfill depth is zero but queue is non-zero
c5c9207 PP-696: pbs.PERIODIC is documented but not defined in python module
1186de4 PP-653: scheduler hangs while resuming a job during qdel -Wforce
a651a59 PP-493: Error numbers returned are wrong when file open fails
44ee9cc PP-671: logging via syslog does not work
5d64f19 PP-665: On Cray X-series, accelerator_memory is not set when vnode_per_numa is false (or unset)
08d3ed0 PP-679: Travis for OpenSuse is broken
3e323c3 PP-302: Implement save of post-run analysis data
cae0605 PP-649: qdel not working with seq_num.server
0e88df1 PP-602: pidcache to cull list of processes to poll in MOM polling is broken by PID recycling; pidcache_truncate doesn't allow for new kthreads
6f8fe1d PP-568: Fix for correct evaluation of Array Job state.
d7822cf PP-586: Added to PTL tests.
b167e0c PP-588: Python hooks resc_used accumulation: could not load json or invalid resource name upon HUP
8f5ecee PP-589: server crashing if there is a hook updating resources unknown to server
3b1af3d PP-656: only last hook of same kind can modify Resource_List
d46648d PP-664: Travis builds failing on centos
c724dc2 PP-586: On a Cray X-series, create a vnode per compute node.
df74f20 PP-655: can't compile under Windows due to checkin of PP-409
ecf1b1f PP-648: sgigenvnodelist.awk fails on SLES 12 and RHEL 7
c19021d PP-638: Online manual page updates
3a951b0 PP-374: PTL for Job with a root-owned script runs even if root_reject_scripts is set to true
2997944 PP-594: Job submitted as 'pbsadmin' throws error 'bad user' while copying the files in mom-logs
07fc110 PP-652: only first queuejob hook is processed
2d2da72 PP-409: allow mom hooks to accumulate resources_used values beside cput, mem, cpupercent.
aaf1e80 PP-426: Server periodic hook infrastructure
0fd1037 PP-591: pbs_mom not coming up after switching on compute node on cpuset machines
2c81973 PP-617 acl_groups on queue considers only primary group
6ee82ea PP-615 comment can't be set on queue
1219c83 PP-590: Jobs may have resources_used.xxx = 0 when job has restarted due to node_fail_requeue if there are execjob_begin and execjob_end hooks in use
cb7be7a PP-502: Fix for emails/acct log getting truncated.
6928dfa PP-240: Add proper document in PTL framework - phase 1
84fee1d PP-578: server memory leak for hooks load test
20a2936 PP-580: qorder between jobs submitted to different queue in the same second does not work
108b728 PP-592: Restarting pbs_server causes duplicate accounting records for reservations
e17c0f0 PP-593: Remove spurious whitespace causing autotools warning message
e6fcf54 PP-499: Remove irrelevant code from pbs_postinstall script
45af8df PP-373: Fix for long -lselect resource request
83df5e0 PP-575: pbs_test_qorder.py script is not present in the correct directory
3a26b85 PP-584: PBS_SCP should only get set in pbs.conf on fresh installs, not upgrades
5614258 PP-491: Very short "qsub --" style jobs sometimes do not show stdout/err, race condition
5d3ad4f PP-582: PBS commands throwing error on a Client type of installation node
5dd48e3 PP-583: Windows build is broken
36eb78d PP-520: When error occurs while adjusting entity usage count for one limit, other limits should be left unaffected for enque job
6c5063a PP-496: Support PBS config environment variables like PBS_PRIMARY , PBS_SECONDARY, PBS_LEAF_ROUTERS
9f3fce2 PP-501: Refactoring scheduler code & ...
99d6bef PP-577: travis_retry may fail everytime on a retry
520c9c4 PP-524: add travis CI support for debian
9090587 PP-82: After qordering the jobs, jobs are not considered in the correct order
a9bf7dd PP-227: Increased floating licenses and sockets to 10,000,000
1c526e3 PP-526: Configure fails, while checking libical library headers
8867a41 PP-509: Compile with_libical for archlinux
7cfcfa1 PP-513: When checking job_state=R it should check substate=42 as well by default
d00bb79 PP-521: Changes to PTL to increase job_requeue_timeout to make the test pass on slow machines.
5583be8 Fixed PP-511: pbs_ical correct icalerror_errors_are_fatal
b76befe PP-510: Libpbs,server makefile repair python call from environment variable PYTHON
a17da80 PP-514 pbsdsh fails when execjob_launch hook exists
4a7c6b3 PP-503: Update PTL test skip wrapper function for cray platform
854d2c7 PP-465: Changing the error message for qrerun timeout
0d4ea02 PP-495: pbsnodes -l output trims the hostname to 20 characters
648ad3d PP-505: Fix pbs_client_nagle_performance.py
e456727 PP-459: Code refactoring for pbs ifl interfaces
f387df8 PP-368: Registering data service to systemd in /etc/init.d/pbs is broken on failover configs
2fe4505 PP-374: Job with a root-owned script runs even if root_reject_scripts is set to true
a3406a7 PP-500: Fix abysmal qrerun performance with large spool file staging
6733daa PP-472: PP-492: Admin must be able to set the PBS_HOME, PBS_SERVER and PBS_DATA_SERVICE_USER value at install time & pbs_postinstall fails due to hard coded path.
ada357c Merge branch 'pp-477' of git://github.com/vinodchitrali/pbspro into PP-477
9595431 Merge branch 'PP-371' of git://github.com/gajendrasharmagit/pbspro into gajendrasharmagit-PP-371
e232c3c PP-371: Remove AIX specific netwins resource and PBS_ibwins hook
9047d26 PP-478: PTL test directory structure - rename "failover" test directory as "resilience"
ece22a9 PP-477: Server crash due to invalid memory access
f126bda PP-400: prevent invisible resources from being printed in email
898b486 PP-384: Changes needed for compiling PBS on Ubuntu OS.
47d5eb5 PP-349: changes to the reading of /proc//stat fields * there is speculation that unnecessary reading of some /proc//stat fields might have caused previous mysterious crashes; we now process only those fields we require (but note that we continue to use stdio to read the files)
b4be57a PP-418: pbs_password fails to create or update user password
ce4ce5c added more regressions to existing pbs_admin_suspend.py for pp-389
2e93243 PP-406: Error reporting code path fails to report all of the errors when setting entity count
221bfc2 PP-461: Create PTL test directory structure as per guidelines
95d3035 PP-446: Overlay Upgrade is failing
77b6bca PP-351: Correcting PTL test case to check for substate = 42 instead of state = 'R'
6e54810 PP-441: Issue with starting PBS when home directory is changed/not present after install
c78f085 PP-442: Fix for multiple execjob_launch hook
7620b5c PP-454: Fix commands printing additional error about nodelay when connect fails
4495b7d PP-452: Remove PTL code related to interface that doesnt exist in PBS anymore
d4ac6d3 PP-453: add header to PTL test file added through PP-367.
81f6169 PP-351: R records should have information on resource usage if the job started running and was rerun/requeued.
9785c1a PP-451: Fix commands/server communication delays caused by Nagle's
2068e5f PP-389: Allow the admin to suspend jobs for node maintenance
de5815e Merge branch 'kim' of git://github.com/lcnja/pbspro into lcnja-kim
c0ecf10 PP-440: Server crash on Windows when job blocking ends
abd7222 PP-423: Travis should check for PEP8 compliancy of Python code changes
3d0beec PP-367: Fix for sending only small job files (size <= 2MB) over TPP on job rerun. Issue: send_job_exec now in main PBS server thread can be blocked for very long leading to server hangs
a9b3df7 PP-402: remove netlibtoc.lib liblmx-altair.lib dependency from the windows project files
502100c PP-401: preemption not working for host level resources
7bf5252 Merging code change branch for PP-421 with master
17cc500 PP-408: Appending path to link and load dynamic library in pbs_server
14829c9 Fixed incorrect use of memcpy
4f790ba Merge in pp370 (#130) with latest master
b657643 PP-381: Pbs_spawn fails when user installed python path is used
a5dbad0 PP-397: jsdl-hpcpa:Executable tag is missing in qstat -f output
d798b18 PP-369: Fix for invalid state count value
819121e PP-383: Add smoke tags to all smoke test cases in pbs_smoketest.py
f296a8a PP-407: Replace bash with sh in .travis.yml file
f103d64 PP-410: Improvement to server logging ...
2da82d2 PP-326,PP-327,PP-328,PP-331,PP-332,PP-333,PP-335: Initial checkin of hook to support cgroup functionality
4429fd5 PP-407: Replace bash with sh in .travis.yml file
a1dd21d PP-399: BS Pro spec file should allow sendmail or postfix to satisfy requirements
9a4b05d PP-379: Systemd pbs unit file should refer path from pbs.conf file
b48e2ab PBS-15641: Windows MoM could call pbs_demux.exe and mom_open_demux.exe w/ unsuitable shell if job sets Shell_Path_List
e29d908 PP-380: Fix for gtags file in .gitignore
4e81710 PP-345: cpuset_destroy_delay ignores child processes
7583084 PP-352: Error when unsetting ncpus, that is ['ncpus'] = None, in modifyjob hook
0217561 PP-382: Support for Postgres-9.3.12 on Windows platform
4b6a921 PP-350: Fix for multiple execjob_launch hooks Addressed comments from Arun.
a14a08b PP-377: Support Python2.7 on Windows
67d5bd5 PP-376: Update pull request template to include link to jira ticket instead of Jira id
e8f67df PP-375:Update Swig on Windows
a596df7 PP-370: Update pull request template to explicitly clarify about PTL/manual tests
bfea4d3 PP-357 Make configure.ac backward compatible with older versions of autotools
4e224b9 PP-311: Move to AES broke upgrade of DB that uses DES
c1c36cb PP-324: Add fairshare enhancement test case to PTL smoke test suite
eea3fa0 PP-270: Node/queue association is ignored when node_group_key is only set at the server level
9eed59c PP-344: qstat build broken on Windows
c1e031a PP-361: Fix functions isAdminPrivilege(), setuser_with_password()....
c84194f PP-280: Replace init script with unit file for systems that support systemd process management
9db6742 PP-123: PBS schedules jobs on nodes without accounting for the reservation on the node
04c2569 PP-358: Update installation instructions
a691252 PP-250: Hook debug causes file descriptor leak that crashes PBS server
a7cf99b PP-301: Scheduler does not import python math module...
b8fdf3e PP-148: fix for pbs_tclsh dumps core
cf10c91 PP-343: Fix PBS Build error on Windows
8e594d9 PP-356: Integrate comments from merge review
0292e61 PP-279 Setting 'o.Execution_Time = None ' fails with error "Error evaluating Python script, exec_time could not be parsed"
61e71ac PP-314: Wrong variable used in fix for PP-277
77ede4a PP-297: Update server dynamic resource smoke test case in PTL
3e68ed1 Merge branch 'pp-322' of git://github.com/bhroam/pbspro into bhroam-pp-322
09a9400 Merge branch 'pp-276' of git://github.com/bayucan/pbspro into bayucan-pp-276
2698694 PP-309: fix for strict_ordering not working
6827305 PP-310: suspended job not resuming due to ghost jobs
ed9ac8d PP-322: Merge NASA Mods since 13.0 into mainline
168740a PP-276: install the swig-generated pbs_ifl.py file
9dd1614 PP-254: merging doxygen changes for pending files
8617c41 PP-249: merging pbs.conf and pbs.conf.version
7f493d0 PP-231: qstat performance improvement
81ff30a PP-307: spec file must specify libexecdir on SUSE systems
0286bbb Removed automatically generated files and update .gitignore
b6947ba PP-149: Fix scheduler performance while preempting jobs
8357696 PP-294: Update pull request template to show checklist
dccd7ab Fix for PP-265: Extend qterm support for mom and sched
c5292f7 Merge branch 'debian_initialization_patches' of https://github.com/CESNET/pbspro into CESNET-debian_initialization_patches
cfaacf2 PP-293: Add check for signed commit in TravisCI
71104c8 PP-291 Fix for postgresql database paths
1340e65 compiling on Debian
f176417 Update README.md for first OSS release
6995008 Merge branch 'pp260' of git://github.com/mike0042/pbspro into pull_request_merge
2456862 PP-260: Fix a memory leak in svr_func.c
de57d8e PP-277: Multinode jobs may fail to start
8c6c580 Fix for PP-262: Add pull request template
7872675 pp-263: Hook throwing error 'SwigPyObject' object has no attribute 'name'
767ccc3 PP-274: Build PBS Pro under non-OHPC OBS instance
2b626f8 To fix issue while setting PBS_HOME directory in configure script
124289a Fix for PP-248 Multi node job fails to start ...
c2f733b Minor updates to INSTALL instructions
61bb037 Creating initial README.md
da42f97 PP-256: Fix build on SLES10 and RHEL5 (#22)

@subhasisb subhasisb released this Dec 8, 2017

Assets 3

pbspro release 14.1.2

pbspro-14.1.2.tar.gz - ready for admin to compile and install (./configure ; make ...)
Source code (zip and tar.gz) - archive of the source code

@subhasisb subhasisb released this Jun 17, 2016 · 5 commits to release_14_1_branch since this release

Assets 2

pbspro release 14.1.0

May 16, 2016
OpenHPC OBS snapshot of PBS Pro 13.1.800 on May 15, 2016