Dataset #7

sungggat · 2023-04-03T08:44:32Z

How to get CINC2021 dataset? How to download dataset from url you provided in benchmarks. I could not find prepare_dataset.py but I found it from original repo.

wenh06 · 2023-04-04T01:15:42Z

Just call the download method. And of course, you may also download the zip files from google cloud using some other tools and uncompress them manually. The prepare_dataset function in the original repo was created since I had to keep the files in specific subfolders to maintain the paths. The _ls_rec method was updated and the paths are maintained in a pandas DataFrame now, so moving files in the prepare_dataset function is unnecessary and thus removed.

sungggat · 2023-04-11T04:12:06Z

I downloaded dataset Cinc 2021 from https://physionet.org/content/challenge-2021/#files . I want to run trainer.py from benchmarks/cinc2021. I also added ds_train and ds val.
`

TrainCfg.db_dir  = 'data/CINC2021/physionet.org/files/challenge-2021/1.0.3/training/'

ds_train = CINC2021(TrainCfg, training=True, lazy=True)
ds_val = CINC2021(TrainCfg, training=False, lazy=True)

`

I am getting below error:

File "trainer.py", line 423, in
ds_train = CINC2021(TrainCfg, training=True, lazy=True)
File "/workspace/torch_ecg/benchmarks/train_crnn_cinc2021/dataset.py", line 101, in init
self.config.train_ratio, force_recompute=False
File "/workspace/torch_ecg/benchmarks/train_crnn_cinc2021/dataset.py", line 306, in _train_test_split
self.reader.all_records[t], dynamic_ncols=True, mininterval=1.0
TypeError: len() takes no keyword arguments

wenh06 · 2023-04-11T14:40:01Z

It's a typo in this file, which happened perhaps when doing copy-paste (from torch_ecg/databases/datasets/cinc2021/cinc2021_dataset.py). The right bracket of this len function was missing, and was added at a wrong place (perhaps by Copilot?). It is now corrected in 20203ca.

AK-mehr · 2024-03-25T15:11:36Z

Hi, I'm trying to run trainer.py for train_hybrid_cpsc2020. I have downloaded the CPSC 2020 dataset and specified the data path inside cfg.py like this:
BaseCfg.db_dir = 'D:/AUT/Data_Lab/Implementation/TinyML/data/TrainingSet/'
TrainingSet contains two subfolders, namely data and ref in which exist 10 .mat files. but come across this error whenever I run trainer.py

File "C:\Users\AK\miniconda3\envs\cpsc\Lib\site-packages\torch\utils\data\dataloader.py", line 350, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AK\miniconda3\envs\cpsc\Lib\site-packages\torch\utils\data\sampler.py", line 143, in init
raise ValueError(f"num_samples should be a positive integer value, but got num_samples={self.num_samples}")
ValueError: num_samples should be a positive integer value, but got num_samples=0
Any advice on how can I fix this?

wenh06 · 2024-03-26T02:38:40Z

It seems that the data reader did not find the recording files. The CPSC2020 data reader searches for the recordings and annotation files using the following method:

    def _ls_rec(self) -> None:
        """Find all records in the database directory
        and store them (path, metadata, etc.) in some private attributes.
        """
        self._df_records = pd.DataFrame()
        n_records = 10
        all_records = [f"A{i:02d}" for i in range(1, 1 + n_records)]
        self._df_records["path"] = [path for path in self.db_dir.rglob(f"*.{self.rec_ext}") if path.stem in all_records]
        self._df_records["record"] = self._df_records["path"].apply(lambda x: x.stem)
        self._df_records.set_index("record", inplace=True)

        all_annotations = [f"R{i:02d}" for i in range(1, 1 + n_records)]
        df_ann = pd.DataFrame()
        df_ann["ann_path"] = [path for path in self.db_dir.rglob(f"*.{self.ann_ext}") if path.stem in all_annotations]
        df_ann["record"] = df_ann["ann_path"].apply(lambda x: x.stem.replace("R", "A"))
        df_ann.set_index("record", inplace=True)
        # take the intersection by the index of `df_ann` and `self._df_records`
        self._df_records = self._df_records.join(df_ann, how="inner")

        if len(self._df_records) > 0:
            if self._subsample is not None:
                size = min(
                    len(self._df_records),
                    max(1, int(round(self._subsample * len(self._df_records)))),
                )
                self._df_records = self._df_records.sample(n=size, random_state=DEFAULTS.SEED, replace=False)

        self._all_records = self._df_records.index.tolist()
        self._all_annotations = self._df_records["ann_path"].apply(lambda x: x.stem).tolist()

Theoretically, you can pass any of its parents because the pathlib.Path.rglob is used.

wenh06 · 2024-03-26T02:47:03Z

I think I know the reason now. The CPSC2020 dataset uses sliced recordings since the original recordings are fairly long. So, you should call the persistence method first, which takes quite a long time to slice the recordings.

AK-mehr · 2024-03-26T10:19:42Z

Thank you for your guidance, it seems like training requires a CNN.h5 and a CRNN.h5 file located in signal_processing/ecg_rpeaks_dl_models directory but I only have the corresponding json files. It's worth noting that I've only run trainer.py. Should I do anything before running trainer.py? could you please help me on this one as well?

wenh06 · 2024-03-26T17:30:36Z

I added automatic downloading of these models, which you can find in https://opensz.oss-cn-beijing.aliyuncs.com/ICBEB2020/file/CPSC2019-opensource.zip. However, these models were trained with a very older version of Keras. One might have trouble loading these models. I also removed the auto-load of deep learning models in the signal_processing module.

The changes were made in the dev branch currently and will be merged into the master branch soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset #7

Dataset #7

sungggat commented Apr 3, 2023

wenh06 commented Apr 4, 2023

sungggat commented Apr 11, 2023 •

edited

Loading

wenh06 commented Apr 11, 2023

AK-mehr commented Mar 25, 2024

wenh06 commented Mar 26, 2024

wenh06 commented Mar 26, 2024 •

edited

Loading

AK-mehr commented Mar 26, 2024

wenh06 commented Mar 26, 2024

Dataset #7

Dataset #7

Comments

sungggat commented Apr 3, 2023

wenh06 commented Apr 4, 2023

sungggat commented Apr 11, 2023 • edited Loading

wenh06 commented Apr 11, 2023

AK-mehr commented Mar 25, 2024

wenh06 commented Mar 26, 2024

wenh06 commented Mar 26, 2024 • edited Loading

AK-mehr commented Mar 26, 2024

wenh06 commented Mar 26, 2024

sungggat commented Apr 11, 2023 •

edited

Loading

wenh06 commented Mar 26, 2024 •

edited

Loading