Skip to content

v1.4.3: implemented sample packing for cehrbert (#104)

Choose a tag to compare

@github-actions github-actions released this 03 May 15:18
· 18 commits to main since this release
f2df8ce
* implemented sample packing for cehrbert

* clamped concept_values between -10 and 10; fixed a bug when generating time_embeddings

* added compute_cehrbert_features.py for extract features

* removed the sort patient sequence mapping

* updated sample_packing_sampler.py

* updated the index_date and age_at_index data types in hf_dataset_collator.py

* Synchronized bert flash attention layer with the eager implementation

* removed torch_dtype when loading the model

* put back upad_input in src/cehrbert/models/hf_models/hf_cehrbert.py

* removed squeeze in compute_cehrbert_features

* switched to CehrBertForPreTraining for computing features

* added an option to get the features by averaging over the entire sequence for each sample

* added train_with_cehrbert_features

* added device to start_indices and end_indices

* swtiched to a vectorized implementation

* corrected the logic for extract CLS embeddings in sample packing

* implemented sample packing for cehrbert sample packing

* fixed the integration test

* adjusted finetuning model for sample packing

* do not use sample packing for running predicitons

* updated transformers version

* updated the integration tests

* fixed a data loading bug in streaming