New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial refactoring of tensor_from_file #224
Conversation
@mklarqvist I like this format! Seems very clean. How will the |
Would we structure partners tmaps as follows?
|
Looks good. I don't think we need to formally impose any structure. As long as there is some structure there. |
I'm in favor of a dirty solution for now: invoking the string literal using This seems to work fine: def test_tm_resolve(test: str):
if isinstance(test,str) == False:
raise TypeError(f"Input name must be a string. Given: {type(test)}")
if len(test) == 0:
raise ValueError(f"Input name cannot be empty.")
if '.'.join(test.split('.')[0:2]) != 'ml4cvd.tensormap':
raise ValueError(f"TensorMaps must reside in the path 'ml4cvd.tensormap.*'. Given: {test}")
import importlib
try:
i = importlib.import_module('.'.join(test.split('.')[:-1]))
except ModuleNotFoundError:
raise ModuleNotFoundError(f"Could not resolve library {'.'.join(test.split('.')[:-1])} for target tensormap {test}")
try:
tm = getattr(i,test.split('.')[-1])
except AttributeError:
raise AttributeError(f"Module {'.'.join(test.split('.')[:-1])} has no TensorMap called {test.split('.')[-1]}")
return tm Example 1
Example 2
Example 3
What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this big refactor. I think we can flatten the hierarchy a bit especially if the whole path is going to need to be specified on the command line. Before merging the python introspection to import from strings should be integrated into arguments.py _get_tmap
function.
let's just prepend 'ml4cvd.tensormap' rather than requiring it in: |
Also should add a type check ensuring the import is a |
This makes sense to me. I agree with Sam that we should add a type check ensuring the import is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting rid of the messy TMAPS
dictionary is a great idea, but a big change! Before merging, make sure that all the tests pass and add new ones for the new command line TensorMap
interface. Here is some very brief pytest
unit testing info https://github.com/broadinstitute/ml/blob/d41308c01649a964627b7fd2037e6981b43f2786/ml4cvd/DATA_MODELING_TESTS.md#unit-tests
I prefer this way
because then we don't force |
Everyone happy with this? def tensormap_lookup(module_string: str, prefix: str = None):
if isinstance(module_string, str) == False:
raise TypeError(f"Input name must be a string. Given: {type(module_string)}")
if len(module_string) == 0:
raise ValueError(f"Input name cannot be empty.")
path_string = module_string
if prefix:
if isinstance(prefix, str) == False:
raise TypeError(f"Prefix must be a string. Given: {type(prefix)}")
if len(prefix) == 0:
raise ValueError(f"Prefix cannot be set to an emtpy string.")
path_string = '.'.join([prefix,module_string])
else:
if '.'.join(path_string.split('.')[0:2]) != 'ml4cvd.tensormap':
raise ValueError(f"TensorMaps must reside in the path 'ml4cvd.tensormap.*'. Given: {module_string}")
import importlib
try:
i = importlib.import_module('.'.join(path_string.split('.')[:-1]))
except ModuleNotFoundError:
raise ModuleNotFoundError(f"Could not resolve library {'.'.join(path_string.split('.')[:-1])} for target tensormap {module_string}")
try:
tm = getattr(i,path_string.split('.')[-1])
except AttributeError:
raise AttributeError(f"Module {'.'.join(path_string.split('.')[:-1])} has no TensorMap called {path_string.split('.')[-1]}")
from ml4cvd.TensorMap import TensorMap
if isinstance(tm, TensorMap) == False:
raise TypeError(f"Target value is not a TensorMap object. Returned: {type(tm)}")
return tm |
I like this except I don't see the reason to put the import of |
* revise tmaps to get all partners ecgs * remove duplicate import * do not propagate error to all ecgs in hd5 * update cardiac surgery tmaps * _md tmaps * tweaks to partners tensorize and ingest scripts (#210) * dynamic tmap shape in tmap constructor * modify label tmaps * explore with tmap shape None * not empty string validator * defer str dtype * better tensor dtype str * slightly optimize by assuming dtype str * parallel explore with multiprocess * fix counter using unsightly global variable * fix datetime, fix prt axes * move explore to explorations * dont fail missing on more tmaps * fix voltage reshample shape * move date constants and clean up imports * dont normalize every loop * add 'newest' tmaps for single ecgs * tmap shape when time_series_limit is set * infer tmap shape dynamic if time_series_limit not None * comments explaining dynamic shaping
* quick fix for _newest * clearer ternary * fix typo
…ect TMaps (#223) * calls build_cardiac_surgery_tensor_maps - closes #222 * adds path to CARDIAC_SURGERY_OUTCOMES_CSV * address Paolos input on PR #223 * backup file in case I mess up the merge conflicts * syncs defines with master * syncs recipes with master -- moves explore to explorations * syncs with master * syncs with master * syncs with master * syncs with master * resolves merge conflicts, imports decompress * restores explorations from master prior to black formatting * orders imports by length in arguments.py
* tensor maps for partners with dynamic ECGs
Implements a new command line argument --tsv_style genetics (not active by default) that makes the TSVs output byinfer, infer_hidden and explore compatible with genetic association software (e.g., Plink and BOLT).
ecg by time
* clinical and full view ECG plotting * add msttcorefonts to docker image * tmaps * import order, top panel func name, remove constrained layout warning * Pre-requirements not to have errors * Better comment for pre_requirements * period Co-authored-by: Paolo Di Achille <paolo.diachille@gmail.com>
* stats q
* xref base * rename clean cols, escape col names * source -> tensors, add help to args * plot changes, require time range with time * add fpath to xref output, if found * cleanup logic, remove pandasql * advanced time windowing * cleanup
Lots of changes to model factory! List here https://docs.google.com/document/d/15sZsTnDVOiBPlW16FX36IzEx-FS7p0YxQ7alY_-npjE/edit?usp=sharing
* rewrite get_test_train_valid_paths * new train valid test paths with balance csv * bug squashing and tests * overlapping csv tests * rename tot->ratio_sum/use_ratios * cleanup comments * fix test case where train/valid/test < sample, fix ratio rescaling * more explicit param docs * todo for balance csv tests * goodbye test_modulo, require at least 2 args * space between functions * field for partners ecg tensorizer * refactor arg handling * simplify to override ratios and document * and -> or
* sts tmaps exact length * fix time selection and add new sts tmaps * use normalizer
* categorical voltage tmaps * time_series_set * revert time_series_set * None check * _tensors_to_df overhaul * comment * revisions and readability
1. The function _cardiac_surgery_str2date no longer hard-codes the date formatting. Rather, it is set in and imported from defines.py, and used as a default. It can be easily overriden when calling this function. 2. all str2date functions are now grouped together to keep the script organized.
CARCIAC --> CARDIAC
* wip rehaul tensorizer * wip tensorizer rehaul * rehaul tensor writer, get metadata in tensorize * bug fixes * bug fixes and correctness regarding missingness * new voltage qc tmaps * default sampling frequency tmaps calculate based on length * multiprocess -> multiprocessing
Also * Remove obsolete instructions now that ml4cvd has the needed packages and permissions. * Use newer hd5 for ECGs. * Switch to local paths when on a ML4CVD VM.
rebased |
File changes look a bit nasty because of rebasing |
Can you try |
This is an incomplete pull request to start a conversation regarding the refactoring of the
TMAP
dictionary into separate submodules (issue #143 ). For example,TMAPS['ecg_rest_raw']
is now defined asml4cvd.tensormap.ukbb.ecg.ecg.ecg_rest_raw
. This introduces more meaningful semantics and imports libraries only required for the target application.In this initial commit, only
TMAP
definitions and their associated functions intensor_from_file
are covered.The current proposed structure looks like this: