Skip to content

Bringing in latest master branch changes#1

Merged
rmumme2 merged 14 commits intodeadlywrong:derm_setfrom
sunlabuiuc:master
Apr 8, 2026
Merged

Bringing in latest master branch changes#1
rmumme2 merged 14 commits intodeadlywrong:derm_setfrom
sunlabuiuc:master

Conversation

@rmumme2
Copy link
Copy Markdown
Collaborator

@rmumme2 rmumme2 commented Apr 8, 2026

No description provided.

jhnwu3 and others added 14 commits March 13, 2026 15:50
* add new docs

* index

* overview page added

* clean up and fix old details
* Fixed repo to be able to run TUEV/TUAB + updated example scripts

* Args need to be passed correctly

* Minor fixes and precomputed STFT logic

* Fix the test files to reflect codebase changes

* Args update
* Fixed repo to be able to run TUEV/TUAB + updated example scripts

* Args need to be passed correctly

* Minor fixes and precomputed STFT logic

* Fix the test files to reflect codebase changes

* Args update

* test script fixes
…x device/contiguity issues (#901)

1. Fix bare .squeeze() calls that silently remove the batch dimension
   when batch_size=1, causing wrong results during single-sample inference:
   - concare.py: .squeeze() → .squeeze(dim=-1) and .squeeze(dim=1)
   - agent.py: .squeeze() → .squeeze(dim=-1) or removed (already 1-D after .sum/.mean)

2. Add weights_only=True to all torch.load() calls for PyTorch 2.6+
   compatibility and security (prevents arbitrary code execution via
   pickle deserialization):
   - trainer.py, biot.py, tfm_tokenizer.py (2 calls), kg_base.py

3. Add .contiguous() before pack_padded_sequence in RNNLayer to prevent
   cuDNN errors with non-contiguous input tensors (fixes #800)

4. Fix StageNet device mismatch — tensors were created on CPU instead of
   the input tensor's device, causing crashes during GPU training:
   - torch.zeros/ones(...) → torch.zeros/ones(..., device=device)
   - time == None → time is None (PEP8)
… reproducible splits (#902)

Three fixes that directly affect the trustworthiness of research results:

1. regression.py: kl_divergence computation mutated the input arrays
   (x, x_rec) in-place via clamping and normalization. When multiple
   metrics were requested (e.g., ["kl_divergence", "mse", "mae"]),
   mse/mae were computed on the modified arrays, producing incorrect
   values. Fixed by operating on copies.

2. trainer.py: model.eval() was called inside the per-batch loop in
   inference(), redundantly setting eval mode on every batch. Moved
   to before the loop — called once as intended.

3. splitter.py: all split functions used np.random.seed() which mutates
   the global numpy random state. This causes cross-contamination when
   multiple splits are called sequentially, making experiments
   non-reproducible. Replaced all 7 occurrences with
   np.random.default_rng(seed) which creates an isolated RNG instance.
   The existing sample_balanced() already used default_rng correctly.
The GRASP model was completely non-functional in PyHealth 2.0 because it
still used the legacy 1.x BaseModel constructor and removed helper
methods (get_label_tokenizer, add_feature_transform_layer,
prepare_labels, padding2d/3d).

Changes:
- Rewrite GRASP.__init__ to use the 2.0 pattern (matching ConCare):
  - super().__init__(dataset=dataset) instead of passing feature_keys/label_key/mode
  - EmbeddingModel(dataset, embedding_dim) replaces manual type dispatch
  - self.get_output_size() without arguments
  - Auto-derive feature_keys, label_key, mode from dataset schemas
- Rewrite GRASP.forward to use EmbeddingModel:
  - embedded, masks = self.embedding_model(kwargs, output_mask=True)
  - Labels from kwargs[self.label_key].to(self.device)
  - Eliminates ~60 lines of manual tokenization/padding/embedding
- Remove eliminated parameters: feature_keys, label_key, mode, use_embedding
- Update imports: SampleEHRDataset → SampleDataset, add EmbeddingModel
- Update docstring examples to 2.0 API
- Update __main__ block to use create_sample_dataset
- Add tests/core/test_grasp.py with 8 test cases covering:
  initialization, forward/backward, embed extraction, GRU/LSTM backbones

GRASPLayer (the algorithm core) is unchanged.
… page so users who join can hopefully find a more easy to navigate page that isn't so documentation heavy to find what they're looking for (#910)
* Fixed repo to be able to run TUEV/TUAB + updated example scripts

* Args need to be passed correctly

* Minor fixes and precomputed STFT logic

* Fix the test files to reflect codebase changes

* Args update

* test script fixes

* dataset path update

* fix contrawr - small change

* divide by 0 error

* Incorporate tfm logic

* Fix label stuff

* tuab fixes

* fix metrics

* aggregate alphas

* Fix splitting and add tfm weights

* fix tfm+tuab

* updates scripts and haoyu splitter

* fix conflict

* Remove weightfiles from tracking and add to .gitignore

Weight files are large binaries distributed separately; untrack all
existing .pth files under weightfiles/ and add weightfiles/ to
.gitignore so they are excluded from future commits and the PR.

Made-with: Cursor
* feat: add optional dependency groups for graph and NLP extras (#890)

Add [project.optional-dependencies] to pyproject.toml so users can
install domain-specific dependencies via pip extras:

  pip install pyhealth[graph]   # torch-geometric for GraphCare, KG
  pip install pyhealth[nlp]     # editdistance, rouge_score, nltk

The codebase already uses try/except ImportError with HAS_PYG flags
for torch-geometric, and the NLP metrics define their required
versions in each scorer class. This change exposes those dependencies
through standard Python packaging so pip can resolve them.

Version pins match the requirements declared in the code:
- editdistance~=0.8.1 (pyhealth/nlp/metrics.py:356)
- rouge_score~=0.1.2 (pyhealth/nlp/metrics.py:415)
- nltk~=3.9.1 (pyhealth/nlp/metrics.py:397)
- torch-geometric>=2.6.0 (compatible with PyTorch 2.7)

Closes #890

* fix: move optional-dependencies after scalar fields to fix TOML structure

Move [project.optional-dependencies] from between dependencies and
license (line 49) to after keywords (line 62), before [project.urls].

In TOML, a sub-table header like [project.optional-dependencies]
closes the parent [project] table, so placing it before license and
keywords caused those fields to be excluded from [project]. This
broke CI validation.

Verified with tomllib that all project fields (name, license,
keywords, optional-dependencies, urls) parse correctly under
[project].
* init commit

* RNN memory fix

* add example scripts here

* more bug fixes?

* commit to see new changes

* add test cases

* fix basemodel leakage of args

* fixes to tests and examples

* more examples

* reduce unnecessary checks, enable crashing on when a cache is invalid

* fix nested sequence rnn problems

* fixes for the concare and transformer model exploding in memory

* fix concare merge conflict again

* fix for 3D channel for CNN

* update and delete defunct docs

* better loc comparisons and also a bunch of model fixes hopefully

* test case updates to match our bug fixes

* fix instability in calibration tests for CP


tldr; Fixes a variety of dataset loading, run bugs, splits for TUEV/TUAB, adds a good number of performance fixes for Transformer and Concare. We can always iterate on our fixes later.
Bypassing a PR review, because of speed/reviewer bottleneck reasons.
…#935)

The v2.0 MIMIC3Dataset/MIMIC4Dataset (based on BaseDataset) no longer
accepts code_mapping, dev, or refresh_cache parameters. These were
part of the legacy BaseEHRDataset API.

Update README.rst, example scripts, and leaderboard utilities to use
the current v2.0 API.

Note: task file docstrings and pyhealth/datasets/mimicextract.py
still reference code_mapping but are left for separate PRs since
mimicextract.py has not yet been migrated to v2.0.

Fixes #535
@rmumme2 rmumme2 merged commit 503edb3 into deadlywrong:derm_set Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants