Skip to content

features: inspect_h5 keys, faster imports, archive UX#8

Merged
youngdashu merged 5 commits intomainfrom
feat/add-h5-parsing-content-with-names
Mar 23, 2026
Merged

features: inspect_h5 keys, faster imports, archive UX#8
youngdashu merged 5 commits intomainfrom
feat/add-h5-parsing-content-with-names

Conversation

@youngdashu
Copy link
Copy Markdown
Collaborator

feat: inspect_h5 keys, faster imports, archive UX

What changed

  • inspect_h5: In content mode, optional --name (comma-separated PDB keys) limits what is loaded and shown; order matches the list. Missing keys get a warning.
  • CLI / imports: Lazy imports in command_parser, lazy embedder classes in EmbedderType, and deferred Embedding import in StructuresDataset so startup is lighter.
  • Dataset path: If you pass a file as the dataset path, it must be .json.
  • create_archive: Progress bar, results collected as workers finish, final zip name includes the dataset dir; no extra per-PDB files on disk beside the existing zip/tar flow.

Quick check

  • inspect_h5 --mode content --name A,B and that --name without content errors.

@youngdashu youngdashu merged commit f6152f6 into main Mar 23, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant