Skip to content

Conversation

@Nic-Ma
Copy link
Contributor

@Nic-Ma Nic-Ma commented Jun 9, 2021

Fixes #2310 .

Description

This PR added the load_csv_datalist API to load extra information from CSV files.
Users can easily combine this datalist with the image and label, etc. and put in Dataset.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 9, 2021

Thanks for Seyed's example code and use cases, I totally changed my previous local code and switched to pandas.
Will complete this PR as soon as possible.

Thanks.

Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma Nic-Ma force-pushed the 2310-csv-datalist branch 2 times, most recently from b861d79 to 61208cf Compare June 11, 2021 12:24
Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma Nic-Ma force-pushed the 2310-csv-datalist branch from 61208cf to 58a0fa7 Compare June 11, 2021 12:26
Nic-Ma added 3 commits June 11, 2021 20:27
Signed-off-by: Nic Ma <nma@nvidia.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 11, 2021

/black

monai-bot and others added 3 commits June 11, 2021 14:26
Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 11, 2021

/black

@Nic-Ma Nic-Ma marked this pull request as ready for review June 11, 2021 15:50
@Nic-Ma Nic-Ma changed the title [WIP] 2310 Add CSV datalist 2310 Add load_csv_datalist utility API Jun 11, 2021
@Nic-Ma Nic-Ma requested review from ericspod, rijobro and wyli June 11, 2021 15:51
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 11, 2021

Hi @wyli @ericspod ,

This PR is ready for review, I will also develop a tutorial to show how to use it with load_decathlon_datalist() when this PR merged.

Thanks.

@wyli
Copy link
Contributor

wyli commented Jun 14, 2021

I think the CSV reading should be implemented with the MONAI dataset API, with an option of partially loading large csv files e.g. https://discuss.pytorch.org/t/how-to-use-dataset-larger-than-memory/37785

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 14, 2021

Hi @wyli ,

Thanks for your suggestion.
I will investigate how to build a Dataset above this CSV utility and support partially loading.

Thanks.

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 16, 2021

Hi @wyli ,

I want to double confirm your suggestion: if we partially load a large CSV in dataset, do you mean to only load chunks of the CSV for a training, or still use the whole dataset for training but don't load data before training, everytime we only open the CSV file to read 1 row based on the shuffled index of dataset?

Thanks.

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 16, 2021

pandas.read_csv() provides 2 methods to read a CSV file:

  1. default to read the whole content.
  2. set chunksize and return an iterator.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

@ericspod @rijobro @wyli What's the typical use case for a very big CSV file during training?

Thanks in advance.

Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 21, 2021

/black

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 21, 2021

I didn't figure out why the min_tests of python 3.6 failed, @ericspod @wyli @rijobro do you python experts know something?

Thanks in advance.

@ericspod
Copy link
Member

I'd suggest the assert should be changed to have an error message that states the contents of err_mod so we can see what's going on, rather than printing the contents of err_mod every time. It looks like MONAI isn't being loaded at all, it would help to detect if this fails and try to import with import monai again and use the exception from that to see what's going on.

Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma Nic-Ma force-pushed the 2310-csv-datalist branch from 5dd3e44 to 48d4ef7 Compare June 21, 2021 13:26
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 21, 2021

/black

monai-bot and others added 2 commits June 21, 2021 13:30
Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 21, 2021

/black

Signed-off-by: Nic Ma <nma@nvidia.com>
@wyli wyli self-assigned this Jun 21, 2021
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 21, 2021

Hi @ericspod ,

Thanks for your suggestions, I tried to print out error message locally and solved the issue.

@wyli The GPU tests failed due to below error:


Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/__w/MONAI/MONAI/tests/test_gmm.py", line 323, in test_cuda
    gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count)
  File "/__w/MONAI/MONAI/monai/networks/layers/gmm.py", line 42, in __init__
    "gmm", {"CHANNEL_COUNT": channel_count, "MIXTURE_COUNT": mixture_count, "MIXTURE_SIZE": mixture_size}
  File "/__w/MONAI/MONAI/monai/_extensions/loader.py", line 91, in load_module
    verbose=verbose_build,
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/__w/MONAI/MONAI/monai/_extensions/loader.py", line 37, in timeout
    raise TimeoutError(message)
TimeoutError: Build appears to be blocked. Is there a stopped process building the same extension?

Should I wait a while and try again?

Thanks.

Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 21, 2021

/black

Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
@wyli
Copy link
Contributor

wyli commented Jun 21, 2021

Hi @ericspod ,

Thanks for your suggestions, I tried to print out error message locally and solved the issue.

@wyli The GPU tests failed due to below error:


Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/__w/MONAI/MONAI/tests/test_gmm.py", line 323, in test_cuda
    gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count)
  File "/__w/MONAI/MONAI/monai/networks/layers/gmm.py", line 42, in __init__
    "gmm", {"CHANNEL_COUNT": channel_count, "MIXTURE_COUNT": mixture_count, "MIXTURE_SIZE": mixture_size}
  File "/__w/MONAI/MONAI/monai/_extensions/loader.py", line 91, in load_module
    verbose=verbose_build,
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/__w/MONAI/MONAI/monai/_extensions/loader.py", line 37, in timeout
    raise TimeoutError(message)
TimeoutError: Build appears to be blocked. Is there a stopped process building the same extension?

Should I wait a while and try again?

Thanks.

looks like multiple ninja builds sharing the same build cache is still creating some issues. any idea @charliebudd ?

@wyli wyli mentioned this pull request Jun 21, 2021
5 tasks
@wyli wyli reopened this Jun 21, 2021
Copy link
Contributor

@wyli wyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, please add some basic support for missing values, we can have another iteration to update the modules

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Jun 22, 2021

/black

@Nic-Ma Nic-Ma enabled auto-merge (squash) June 22, 2021 07:11
Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
@Nic-Ma Nic-Ma merged commit 075bccd into Project-MONAI:dev Jun 22, 2021
@Nic-Ma Nic-Ma deleted the 2310-csv-datalist branch July 2, 2021 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide a basic function to load datalist from CSV file (25/June)

4 participants