Skip to content

check object for directory#85

Merged
zaihuaji merged 3 commits intomainfrom
hua-work-common
Mar 3, 2026
Merged

check object for directory#85
zaihuaji merged 3 commits intomainfrom
hua-work-common

Conversation

@zaihuaji
Copy link
Copy Markdown
Collaborator

@zaihuaji zaihuaji commented Mar 2, 2026

No description provided.

Copilot AI review requested due to automatic review settings March 2, 2026 23:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the common storage/file utilities to better handle object-store “directory-like” keys (multiple object results) and adds initial configuration entries for a new TACC storage/endpoint, along with a package version bump.

Changes:

  • Extend object-store file checking to handle cases where listing a key returns multiple objects (treat as non-file and aggregate size).
  • Add TACC-related defaults (TACCNAME, TACCEP) and begin wiring TACC into pg_file endpoint configuration.
  • Bump package version to 2.0.17.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
src/rda_python_common/pg_log.py Adds TACC-related default names/endpoints to the shared PGLOG configuration.
src/rda_python_common/pg_file.py Adjusts object-store stat behavior (directory-like handling), adds endpoint entries, and uses PFSIZE when validating copies involving directories.
pyproject.toml Version bump to reflect the new behavior/configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

'B': self.BHOST,
'D': self.DHOST
'D': self.DHOST,

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DHOSTS adds D but also introduces THOST above; however DHOSTS doesn’t include an entry for the new TACC host (and contains a stray blank line). If TACC is intended to be a supported down-storage flag, add the corresponding key (e.g., T) mapped to self.THOST (and remove the blank line). Otherwise, drop THOST/related constants to avoid partially-wired configuration.

Suggested change
'T': self.THOST,

Copilot uses AI. Check for mistakes.
Comment thread src/rda_python_common/pg_file.py Outdated
Comment on lines +95 to +96
'D': 'gdex-quasar-drdata',
'T': 'tacc'
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QPOINTS introduces the new 'T': 'tacc' entry, but the rest of the Globus endpoint plumbing in this file isn’t updated (e.g., QHOSTS has no 'tacc' key, and copy_gdex_file has no branch for self.THOST). This is likely to fail with a KeyError (or make TACC unreachable) once any code path starts using the new quick-point. Either complete the wiring (add QHOSTS mapping + copy/check routing) or remove the T entry until support is fully implemented.

Suggested change
'D': 'gdex-quasar-drdata',
'T': 'tacc'
'D': 'gdex-quasar-drdata'

Copilot uses AI. Check for mistakes.
Comment thread src/rda_python_common/pg_file.py Outdated
Comment on lines +104 to +108
'gdex-glade': "NCAR GDEX GLADE",
'gdex-lustre': "NCAR GDEX Lustre",
'gdex-quasar': "NCAR GDEX Quasar",
'gdex-quasar-drdata': "NCAR GDEX Quasar DRDATA"
'gdex-quasar-drdata': "NCAR GDEX Quasar DRDATA",
'tacc' : "Texas Advanced Computer Center"
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENDPOINTS adds gdex-lustre and tacc, but QHOSTS (used elsewhere for host-down checks like self.QHOSTS[endpoint]) is not updated to include these keys. If either endpoint is ever passed into those paths, it will raise KeyError. Add corresponding QHOSTS entries (or ensure these endpoints are never used with QHOSTS).

Copilot uses AI. Check for mistakes.
Comment on lines 1425 to 1429
hash = ary[0]
uhash = None
if ucmd:
ubuf = self.pgsystem(ucmd, self.LOGWRN, self.CMDRET)
if ubuf and re.match(r'^\{', ubuf): uhash = json.loads(ubuf)
ret = self.object_file_stat(hash, uhash, opt)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhash can be referenced before assignment. When ucmd is false (or when ucmd is true but ubuf is empty/non-JSON), uhash is never initialized before being passed to object_file_stat, which will raise UnboundLocalError. Initialize uhash = None before the if ucmd: block (and keep it scoped per-loop).

Copilot uses AI. Check for mistakes.
Comment on lines +1432 to +1436
if cnt > 1 or hash['Key'] != file:
ret['count'] = cnt
ret['fname'] = file
ret['isfile'] = 0
size = 0
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When treating an object key as a “directory” (cnt > 1 or hash['Key'] != file), the returned ret dict is still initially populated from the first object via object_file_stat (e.g., date_modified, time_modified, checksum, and metadata). Those values will be incorrect/misleading for a directory-like result. Consider returning a separate structure for directory results (or clearing file-specific fields) so callers don’t accidentally consume stale metadata from the first entry.

Copilot uses AI. Check for mistakes.
Comment thread src/rda_python_common/pg_file.py Outdated
# object store function to get file stat
def object_file_stat(self, hash, uhash, opt):
info = {'isfile': 1, 'data_size': int(hash['Size']), 'fname': op.basename(hash['Key'])}
info = {'isfile': 1, 'data_size': int(hash['Size']), 'fname': hash['Key']}
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object_file_stat now sets fname to the full object key (hash['Key']), but nearby docstrings and other *_file_stat helpers in this module treat fname as the basename. If callers rely on fname being a basename, this is a breaking behavioral change. Either keep fname as op.basename(hash['Key']) or update the contract/docs and any dependent code to consistently use full keys.

Suggested change
info = {'isfile': 1, 'data_size': int(hash['Size']), 'fname': hash['Key']}
info = {'isfile': 1, 'data_size': int(hash['Size']), 'fname': op.basename(hash['Key'])}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/rda_python_common/pg_file.py Outdated
self.OHOST = self.PGLOG['OBJCTSTR']
self.BHOST = self.PGLOG['BACKUPNM']
self.DHOST = self.PGLOG['DRDATANM']
self.THOST = self.PGLOG['TACCNAME']
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.THOST is initialized from self.PGLOG['TACCNAME'] but is never referenced anywhere else in this module. If TACC support isn’t implemented yet, consider removing this attribute for now (or wire it into DHOSTS/transfer routing) to avoid dead code and confusion.

Suggested change
self.THOST = self.PGLOG['TACCNAME']

Copilot uses AI. Check for mistakes.
Comment on lines 102 to 105
self.ENDPOINTS = {
'gdex-glade': "NCAR GDEX GLADE",
'gdex-lustre': "NCAR GDEX Lustre",
'gdex-quasar': "NCAR GDEX Quasar",
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gdex-lustre was added to ENDPOINTS, but it’s missing from QHOSTS. Several error paths do self.QHOSTS[endpoint], so using gdex-lustre as BACKUPEP (or passing it as an endpoint) will raise a KeyError. Add a QHOSTS['gdex-lustre'] entry (or change lookups to use .get() with a fallback).

Copilot uses AI. Check for mistakes.
Comment on lines 1438 to +1443
if ubuf and re.match(r'^\{', ubuf): uhash = json.loads(ubuf)
ret = self.object_file_stat(hash, uhash, opt)
if ret:
cnt = len(ary)
if cnt > 1 or hash['Key'] != file:
ret['count'] = cnt
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When check_object_file() receives multiple keys (directory/prefix behavior), ret is first populated via object_file_stat() using only ary[0]. That can leave fields like date_modified, checksum, and meta reflecting the first object even though you later force isfile = 0 and aggregate sizes. Consider clearing those file-specific fields for directory results, or computing directory-appropriate values (e.g., latest mtime across keys) so callers don’t get misleading metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +1443 to +1445
ret['count'] = cnt
ret['fname'] = op.basename(file)
ret['isfile'] = 0
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For directory/prefix results you set ret['fname'] = op.basename(file), but op.basename('some/dir/') is an empty string. If callers may pass trailing slashes, consider normalizing file first (e.g., stripping trailing /) before computing fname/comparisons so the returned info isn’t missing a name.

Copilot uses AI. Check for mistakes.
@zaihuaji zaihuaji merged commit b4b238a into main Mar 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants