Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the common storage/file utilities to better handle object-store “directory-like” keys (multiple object results) and adds initial configuration entries for a new TACC storage/endpoint, along with a package version bump.
Changes:
- Extend object-store file checking to handle cases where listing a key returns multiple objects (treat as non-file and aggregate size).
- Add TACC-related defaults (
TACCNAME,TACCEP) and begin wiring TACC intopg_fileendpoint configuration. - Bump package version to
2.0.17.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/rda_python_common/pg_log.py |
Adds TACC-related default names/endpoints to the shared PGLOG configuration. |
src/rda_python_common/pg_file.py |
Adjusts object-store stat behavior (directory-like handling), adds endpoint entries, and uses PFSIZE when validating copies involving directories. |
pyproject.toml |
Version bump to reflect the new behavior/configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'B': self.BHOST, | ||
| 'D': self.DHOST | ||
| 'D': self.DHOST, | ||
|
|
There was a problem hiding this comment.
DHOSTS adds D but also introduces THOST above; however DHOSTS doesn’t include an entry for the new TACC host (and contains a stray blank line). If TACC is intended to be a supported down-storage flag, add the corresponding key (e.g., T) mapped to self.THOST (and remove the blank line). Otherwise, drop THOST/related constants to avoid partially-wired configuration.
| 'T': self.THOST, |
| 'D': 'gdex-quasar-drdata', | ||
| 'T': 'tacc' |
There was a problem hiding this comment.
QPOINTS introduces the new 'T': 'tacc' entry, but the rest of the Globus endpoint plumbing in this file isn’t updated (e.g., QHOSTS has no 'tacc' key, and copy_gdex_file has no branch for self.THOST). This is likely to fail with a KeyError (or make TACC unreachable) once any code path starts using the new quick-point. Either complete the wiring (add QHOSTS mapping + copy/check routing) or remove the T entry until support is fully implemented.
| 'D': 'gdex-quasar-drdata', | |
| 'T': 'tacc' | |
| 'D': 'gdex-quasar-drdata' |
| 'gdex-glade': "NCAR GDEX GLADE", | ||
| 'gdex-lustre': "NCAR GDEX Lustre", | ||
| 'gdex-quasar': "NCAR GDEX Quasar", | ||
| 'gdex-quasar-drdata': "NCAR GDEX Quasar DRDATA" | ||
| 'gdex-quasar-drdata': "NCAR GDEX Quasar DRDATA", | ||
| 'tacc' : "Texas Advanced Computer Center" |
There was a problem hiding this comment.
ENDPOINTS adds gdex-lustre and tacc, but QHOSTS (used elsewhere for host-down checks like self.QHOSTS[endpoint]) is not updated to include these keys. If either endpoint is ever passed into those paths, it will raise KeyError. Add corresponding QHOSTS entries (or ensure these endpoints are never used with QHOSTS).
| hash = ary[0] | ||
| uhash = None | ||
| if ucmd: | ||
| ubuf = self.pgsystem(ucmd, self.LOGWRN, self.CMDRET) | ||
| if ubuf and re.match(r'^\{', ubuf): uhash = json.loads(ubuf) | ||
| ret = self.object_file_stat(hash, uhash, opt) |
There was a problem hiding this comment.
uhash can be referenced before assignment. When ucmd is false (or when ucmd is true but ubuf is empty/non-JSON), uhash is never initialized before being passed to object_file_stat, which will raise UnboundLocalError. Initialize uhash = None before the if ucmd: block (and keep it scoped per-loop).
| if cnt > 1 or hash['Key'] != file: | ||
| ret['count'] = cnt | ||
| ret['fname'] = file | ||
| ret['isfile'] = 0 | ||
| size = 0 |
There was a problem hiding this comment.
When treating an object key as a “directory” (cnt > 1 or hash['Key'] != file), the returned ret dict is still initially populated from the first object via object_file_stat (e.g., date_modified, time_modified, checksum, and metadata). Those values will be incorrect/misleading for a directory-like result. Consider returning a separate structure for directory results (or clearing file-specific fields) so callers don’t accidentally consume stale metadata from the first entry.
| # object store function to get file stat | ||
| def object_file_stat(self, hash, uhash, opt): | ||
| info = {'isfile': 1, 'data_size': int(hash['Size']), 'fname': op.basename(hash['Key'])} | ||
| info = {'isfile': 1, 'data_size': int(hash['Size']), 'fname': hash['Key']} |
There was a problem hiding this comment.
object_file_stat now sets fname to the full object key (hash['Key']), but nearby docstrings and other *_file_stat helpers in this module treat fname as the basename. If callers rely on fname being a basename, this is a breaking behavioral change. Either keep fname as op.basename(hash['Key']) or update the contract/docs and any dependent code to consistently use full keys.
| info = {'isfile': 1, 'data_size': int(hash['Size']), 'fname': hash['Key']} | |
| info = {'isfile': 1, 'data_size': int(hash['Size']), 'fname': op.basename(hash['Key'])} |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.OHOST = self.PGLOG['OBJCTSTR'] | ||
| self.BHOST = self.PGLOG['BACKUPNM'] | ||
| self.DHOST = self.PGLOG['DRDATANM'] | ||
| self.THOST = self.PGLOG['TACCNAME'] |
There was a problem hiding this comment.
self.THOST is initialized from self.PGLOG['TACCNAME'] but is never referenced anywhere else in this module. If TACC support isn’t implemented yet, consider removing this attribute for now (or wire it into DHOSTS/transfer routing) to avoid dead code and confusion.
| self.THOST = self.PGLOG['TACCNAME'] |
| self.ENDPOINTS = { | ||
| 'gdex-glade': "NCAR GDEX GLADE", | ||
| 'gdex-lustre': "NCAR GDEX Lustre", | ||
| 'gdex-quasar': "NCAR GDEX Quasar", |
There was a problem hiding this comment.
gdex-lustre was added to ENDPOINTS, but it’s missing from QHOSTS. Several error paths do self.QHOSTS[endpoint], so using gdex-lustre as BACKUPEP (or passing it as an endpoint) will raise a KeyError. Add a QHOSTS['gdex-lustre'] entry (or change lookups to use .get() with a fallback).
| if ubuf and re.match(r'^\{', ubuf): uhash = json.loads(ubuf) | ||
| ret = self.object_file_stat(hash, uhash, opt) | ||
| if ret: | ||
| cnt = len(ary) | ||
| if cnt > 1 or hash['Key'] != file: | ||
| ret['count'] = cnt |
There was a problem hiding this comment.
When check_object_file() receives multiple keys (directory/prefix behavior), ret is first populated via object_file_stat() using only ary[0]. That can leave fields like date_modified, checksum, and meta reflecting the first object even though you later force isfile = 0 and aggregate sizes. Consider clearing those file-specific fields for directory results, or computing directory-appropriate values (e.g., latest mtime across keys) so callers don’t get misleading metadata.
| ret['count'] = cnt | ||
| ret['fname'] = op.basename(file) | ||
| ret['isfile'] = 0 |
There was a problem hiding this comment.
For directory/prefix results you set ret['fname'] = op.basename(file), but op.basename('some/dir/') is an empty string. If callers may pass trailing slashes, consider normalizing file first (e.g., stripping trailing /) before computing fname/comparisons so the returned info isn’t missing a name.
No description provided.