Bringing in latest master branch changes by rmumme2 · Pull Request #1 · deadlywrong/PyHealth

rmumme2 · 2026-04-08T20:03:46Z

No description provided.

* add new docs * index * overview page added * clean up and fix old details

* Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update

* Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update * test script fixes

…x device/contiguity issues (#901) 1. Fix bare .squeeze() calls that silently remove the batch dimension when batch_size=1, causing wrong results during single-sample inference: - concare.py: .squeeze() → .squeeze(dim=-1) and .squeeze(dim=1) - agent.py: .squeeze() → .squeeze(dim=-1) or removed (already 1-D after .sum/.mean) 2. Add weights_only=True to all torch.load() calls for PyTorch 2.6+ compatibility and security (prevents arbitrary code execution via pickle deserialization): - trainer.py, biot.py, tfm_tokenizer.py (2 calls), kg_base.py 3. Add .contiguous() before pack_padded_sequence in RNNLayer to prevent cuDNN errors with non-contiguous input tensors (fixes #800) 4. Fix StageNet device mismatch — tensors were created on CPU instead of the input tensor's device, causing crashes during GPU training: - torch.zeros/ones(...) → torch.zeros/ones(..., device=device) - time == None → time is None (PEP8)

… reproducible splits (#902) Three fixes that directly affect the trustworthiness of research results: 1. regression.py: kl_divergence computation mutated the input arrays (x, x_rec) in-place via clamping and normalization. When multiple metrics were requested (e.g., ["kl_divergence", "mse", "mae"]), mse/mae were computed on the modified arrays, producing incorrect values. Fixed by operating on copies. 2. trainer.py: model.eval() was called inside the per-batch loop in inference(), redundantly setting eval mode on every batch. Moved to before the loop — called once as intended. 3. splitter.py: all split functions used np.random.seed() which mutates the global numpy random state. This causes cross-contamination when multiple splits are called sequentially, making experiments non-reproducible. Replaced all 7 occurrences with np.random.default_rng(seed) which creates an isolated RNG instance. The existing sample_balanced() already used default_rng correctly.

The GRASP model was completely non-functional in PyHealth 2.0 because it still used the legacy 1.x BaseModel constructor and removed helper methods (get_label_tokenizer, add_feature_transform_layer, prepare_labels, padding2d/3d). Changes: - Rewrite GRASP.__init__ to use the 2.0 pattern (matching ConCare): - super().__init__(dataset=dataset) instead of passing feature_keys/label_key/mode - EmbeddingModel(dataset, embedding_dim) replaces manual type dispatch - self.get_output_size() without arguments - Auto-derive feature_keys, label_key, mode from dataset schemas - Rewrite GRASP.forward to use EmbeddingModel: - embedded, masks = self.embedding_model(kwargs, output_mask=True) - Labels from kwargs[self.label_key].to(self.device) - Eliminates ~60 lines of manual tokenization/padding/embedding - Remove eliminated parameters: feature_keys, label_key, mode, use_embedding - Update imports: SampleEHRDataset → SampleDataset, add EmbeddingModel - Update docstring examples to 2.0 API - Update __main__ block to use create_sample_dataset - Add tests/core/test_grasp.py with 8 test cases covering: initialization, forward/backward, embed extraction, GRU/LSTM backbones GRASPLayer (the algorithm core) is unchanged.

…ense (#907) just doc things

… page so users who join can hopefully find a more easy to navigate page that isn't so documentation heavy to find what they're looking for (#910)

* Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update * test script fixes * dataset path update * fix contrawr - small change * divide by 0 error * Incorporate tfm logic * Fix label stuff * tuab fixes * fix metrics * aggregate alphas * Fix splitting and add tfm weights * fix tfm+tuab * updates scripts and haoyu splitter * fix conflict * Remove weightfiles from tracking and add to .gitignore Weight files are large binaries distributed separately; untrack all existing .pth files under weightfiles/ and add weightfiles/ to .gitignore so they are excluded from future commits and the PR. Made-with: Cursor

* feat: add optional dependency groups for graph and NLP extras (#890) Add [project.optional-dependencies] to pyproject.toml so users can install domain-specific dependencies via pip extras: pip install pyhealth[graph] # torch-geometric for GraphCare, KG pip install pyhealth[nlp] # editdistance, rouge_score, nltk The codebase already uses try/except ImportError with HAS_PYG flags for torch-geometric, and the NLP metrics define their required versions in each scorer class. This change exposes those dependencies through standard Python packaging so pip can resolve them. Version pins match the requirements declared in the code: - editdistance~=0.8.1 (pyhealth/nlp/metrics.py:356) - rouge_score~=0.1.2 (pyhealth/nlp/metrics.py:415) - nltk~=3.9.1 (pyhealth/nlp/metrics.py:397) - torch-geometric>=2.6.0 (compatible with PyTorch 2.7) Closes #890 * fix: move optional-dependencies after scalar fields to fix TOML structure Move [project.optional-dependencies] from between dependencies and license (line 49) to after keywords (line 62), before [project.urls]. In TOML, a sub-table header like [project.optional-dependencies] closes the parent [project] table, so placing it before license and keywords caused those fields to be excluded from [project]. This broke CI validation. Verified with tomllib that all project fields (name, license, keywords, optional-dependencies, urls) parse correctly under [project].

* init commit * RNN memory fix * add example scripts here * more bug fixes? * commit to see new changes * add test cases * fix basemodel leakage of args * fixes to tests and examples * more examples * reduce unnecessary checks, enable crashing on when a cache is invalid * fix nested sequence rnn problems * fixes for the concare and transformer model exploding in memory * fix concare merge conflict again * fix for 3D channel for CNN * update and delete defunct docs * better loc comparisons and also a bunch of model fixes hopefully * test case updates to match our bug fixes * fix instability in calibration tests for CP tldr; Fixes a variety of dataset loading, run bugs, splits for TUEV/TUAB, adds a good number of performance fixes for Transformer and Concare. We can always iterate on our fixes later.

Bypassing a PR review, because of speed/reviewer bottleneck reasons.

…#935) The v2.0 MIMIC3Dataset/MIMIC4Dataset (based on BaseDataset) no longer accepts code_mapping, dev, or refresh_cache parameters. These were part of the legacy BaseEHRDataset API. Update README.rst, example scripts, and leaderboard utilities to use the current v2.0 API. Note: task file docstrings and pyhealth/datasets/mimicextract.py still reference code_mapping but are left for separate PRs since mimicextract.py has not yet been migrated to v2.0. Fixes #535

jhnwu3 and others added 14 commits March 13, 2026 15:50

Update/core docs (#889)

b29ad0d

* add new docs * index * overview page added * clean up and fix old details

[Conformal EEG] TUEV/TUAB Compatibility (#894)

2d12b1f

* Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update

Updated Conformal Test Scripts (#895)

c746b24

* Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update * test script fixes

making the PyHealth Research Initiative page way less confusing and d…

4cc526f

…ense (#907) just doc things

add new reference to the top of the pyhealth page for our new project…

f4b65d7

… page so users who join can hopefully find a more easy to navigate page that isn't so documentation heavy to find what they're looking for (#910)

concare fix (#920)

ed56212

Bypassing a PR review, because of speed/reviewer bottleneck reasons.

fix pixi warning and version format for backend (#917)

e857c91

rmumme2 merged commit 503edb3 into deadlywrong:derm_set Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bringing in latest master branch changes#1

Bringing in latest master branch changes#1
rmumme2 merged 14 commits intodeadlywrong:derm_setfrom
sunlabuiuc:master

rmumme2 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

rmumme2 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants