Bringing in latest master branch changes#1
Merged
rmumme2 merged 14 commits intodeadlywrong:derm_setfrom Apr 8, 2026
Merged
Conversation
* add new docs * index * overview page added * clean up and fix old details
* Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update
* Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update * test script fixes
…x device/contiguity issues (#901) 1. Fix bare .squeeze() calls that silently remove the batch dimension when batch_size=1, causing wrong results during single-sample inference: - concare.py: .squeeze() → .squeeze(dim=-1) and .squeeze(dim=1) - agent.py: .squeeze() → .squeeze(dim=-1) or removed (already 1-D after .sum/.mean) 2. Add weights_only=True to all torch.load() calls for PyTorch 2.6+ compatibility and security (prevents arbitrary code execution via pickle deserialization): - trainer.py, biot.py, tfm_tokenizer.py (2 calls), kg_base.py 3. Add .contiguous() before pack_padded_sequence in RNNLayer to prevent cuDNN errors with non-contiguous input tensors (fixes #800) 4. Fix StageNet device mismatch — tensors were created on CPU instead of the input tensor's device, causing crashes during GPU training: - torch.zeros/ones(...) → torch.zeros/ones(..., device=device) - time == None → time is None (PEP8)
… reproducible splits (#902) Three fixes that directly affect the trustworthiness of research results: 1. regression.py: kl_divergence computation mutated the input arrays (x, x_rec) in-place via clamping and normalization. When multiple metrics were requested (e.g., ["kl_divergence", "mse", "mae"]), mse/mae were computed on the modified arrays, producing incorrect values. Fixed by operating on copies. 2. trainer.py: model.eval() was called inside the per-batch loop in inference(), redundantly setting eval mode on every batch. Moved to before the loop — called once as intended. 3. splitter.py: all split functions used np.random.seed() which mutates the global numpy random state. This causes cross-contamination when multiple splits are called sequentially, making experiments non-reproducible. Replaced all 7 occurrences with np.random.default_rng(seed) which creates an isolated RNG instance. The existing sample_balanced() already used default_rng correctly.
The GRASP model was completely non-functional in PyHealth 2.0 because it still used the legacy 1.x BaseModel constructor and removed helper methods (get_label_tokenizer, add_feature_transform_layer, prepare_labels, padding2d/3d). Changes: - Rewrite GRASP.__init__ to use the 2.0 pattern (matching ConCare): - super().__init__(dataset=dataset) instead of passing feature_keys/label_key/mode - EmbeddingModel(dataset, embedding_dim) replaces manual type dispatch - self.get_output_size() without arguments - Auto-derive feature_keys, label_key, mode from dataset schemas - Rewrite GRASP.forward to use EmbeddingModel: - embedded, masks = self.embedding_model(kwargs, output_mask=True) - Labels from kwargs[self.label_key].to(self.device) - Eliminates ~60 lines of manual tokenization/padding/embedding - Remove eliminated parameters: feature_keys, label_key, mode, use_embedding - Update imports: SampleEHRDataset → SampleDataset, add EmbeddingModel - Update docstring examples to 2.0 API - Update __main__ block to use create_sample_dataset - Add tests/core/test_grasp.py with 8 test cases covering: initialization, forward/backward, embed extraction, GRU/LSTM backbones GRASPLayer (the algorithm core) is unchanged.
…ense (#907) just doc things
… page so users who join can hopefully find a more easy to navigate page that isn't so documentation heavy to find what they're looking for (#910)
* Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update * test script fixes * dataset path update * fix contrawr - small change * divide by 0 error * Incorporate tfm logic * Fix label stuff * tuab fixes * fix metrics * aggregate alphas * Fix splitting and add tfm weights * fix tfm+tuab * updates scripts and haoyu splitter * fix conflict * Remove weightfiles from tracking and add to .gitignore Weight files are large binaries distributed separately; untrack all existing .pth files under weightfiles/ and add weightfiles/ to .gitignore so they are excluded from future commits and the PR. Made-with: Cursor
* feat: add optional dependency groups for graph and NLP extras (#890) Add [project.optional-dependencies] to pyproject.toml so users can install domain-specific dependencies via pip extras: pip install pyhealth[graph] # torch-geometric for GraphCare, KG pip install pyhealth[nlp] # editdistance, rouge_score, nltk The codebase already uses try/except ImportError with HAS_PYG flags for torch-geometric, and the NLP metrics define their required versions in each scorer class. This change exposes those dependencies through standard Python packaging so pip can resolve them. Version pins match the requirements declared in the code: - editdistance~=0.8.1 (pyhealth/nlp/metrics.py:356) - rouge_score~=0.1.2 (pyhealth/nlp/metrics.py:415) - nltk~=3.9.1 (pyhealth/nlp/metrics.py:397) - torch-geometric>=2.6.0 (compatible with PyTorch 2.7) Closes #890 * fix: move optional-dependencies after scalar fields to fix TOML structure Move [project.optional-dependencies] from between dependencies and license (line 49) to after keywords (line 62), before [project.urls]. In TOML, a sub-table header like [project.optional-dependencies] closes the parent [project] table, so placing it before license and keywords caused those fields to be excluded from [project]. This broke CI validation. Verified with tomllib that all project fields (name, license, keywords, optional-dependencies, urls) parse correctly under [project].
* init commit * RNN memory fix * add example scripts here * more bug fixes? * commit to see new changes * add test cases * fix basemodel leakage of args * fixes to tests and examples * more examples * reduce unnecessary checks, enable crashing on when a cache is invalid * fix nested sequence rnn problems * fixes for the concare and transformer model exploding in memory * fix concare merge conflict again * fix for 3D channel for CNN * update and delete defunct docs * better loc comparisons and also a bunch of model fixes hopefully * test case updates to match our bug fixes * fix instability in calibration tests for CP tldr; Fixes a variety of dataset loading, run bugs, splits for TUEV/TUAB, adds a good number of performance fixes for Transformer and Concare. We can always iterate on our fixes later.
Bypassing a PR review, because of speed/reviewer bottleneck reasons.
…#935) The v2.0 MIMIC3Dataset/MIMIC4Dataset (based on BaseDataset) no longer accepts code_mapping, dev, or refresh_cache parameters. These were part of the legacy BaseEHRDataset API. Update README.rst, example scripts, and leaderboard utilities to use the current v2.0 API. Note: task file docstrings and pyhealth/datasets/mimicextract.py still reference code_mapping but are left for separate PRs since mimicextract.py has not yet been migrated to v2.0. Fixes #535
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.