Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About splitting the data #1

Open
WayneWu01 opened this issue Mar 22, 2022 · 12 comments
Open

About splitting the data #1

WayneWu01 opened this issue Mar 22, 2022 · 12 comments

Comments

@WayneWu01
Copy link

How to split data according to your files? The code you gave, he uses all the patients for random splitting. There are several images that are corresponding to 1 patient, should I put all of them in the same folder?

@Calvin-Pang
Copy link
Owner

If you mean the first stage patch-level training, the answer is YES. In the patch-level training, we put all MSS patches no matter which patient they belong to in one folder. So as MSI. So that in patch-level we only have six folders: train/MSI, train/MSS, validation/MSI, validation/MSS, test/MSI, test/MSI.

@WayneWu01
Copy link
Author

So for TCGA-AZ-4615 as an example in MSI test for CRC_DX, there are 61 pngs in the original dataset and I should put all of them in one folder for MSI test, right?

@Calvin-Pang
Copy link
Owner

Yes, just use your trained patch-level models to get the predicted scores of these 61 pngs. And you can use any aggregation method (MAg, counting, averaging and so on) to get the patient's predicted result (MSI or MSS).

@PLMMZY67
Copy link

Sorry to bother you.why i find that there are 84 pngs of patient 'TCGA-AZ-4615' in the CRC_DX MSIMUT_test ?not 61 pngs.

@PLMMZY67
Copy link

PLMMZY67 commented Mar 26, 2022 via email

@WayneWu01
Copy link
Author

WayneWu01 commented Mar 26, 2022 via email

@Calvin-Pang
Copy link
Owner

Could you share the data with me? The already split one. Calvin Pang @.>于2022年3月21日 周一下午10:00写道:

If you mean the first stage patch-level training, the answer is YES. In the patch-level training, we put all MSS patches no matter which patient they belong to in one folder. So as MSI. So that in patch-level we only have six folders: train/MSI, train/MSS, validation/MSI, validation/MSS, test/MSI, test/MSI. — Reply to this email directly, view it on GitHub https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCalvin-Pang%2FMAg%2Fissues%2F1%23issuecomment-1074674153&data=04%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C8a8f4127cbd14ca8715108da0bb011e6%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637835148079539179%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=CmOmG%2FRdjAUpSSGzN4pDUzRJEpPQJerHjAYl09ByVE4%3D&reserved=0, or unsubscribe https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ7Z3LVDPFU2SIH26TLVBEZTLANCNFSM5RJPSG5A&data=04%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C8a8f4127cbd14ca8715108da0bb011e6%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637835148079539179%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=brMOTNLpiHvipOtURqoPVJy8qnj1cirJqhZ998HOquk%3D&reserved=0 . You are receiving this because you authored the thread.Message ID: @.
>

Hello, I checked my data again and I think in my experiment there is 84 pngs for TCGA-AZ-4615 in MSIMUT test. I think maybe you type the wrong id? Because I found there do have a patient with 61 pngs (TCGA-AZ-4315 in MSS test).
And I am sorry that I cannot share any image data with you beacuse in my experiments I put all images in one folder and use file names in json or xlsx to call them and train or test them. And my split patient xlsx files is in https://github.com/Calvin-Pang/MAg/tree/main/name_patient

@WayneWu01
Copy link
Author

WayneWu01 commented Mar 26, 2022 via email

@Calvin-Pang
Copy link
Owner

I am sorry I don't know what's wrong with your split. I guess you have data loss when downloading and unzipping the dataset.
I just found my split dataset in my Google Drive, you may download and I hope it can help you.
CRC_DX: https://drive.google.com/drive/folders/1sQR_4_ZjOW8IWk8cMdsF2MTMOmfS9Y06?usp=sharing
STAD: https://drive.google.com/drive/folders/1ntb9MLvBx7ptyEA3dGhpQiM1nKg-jVCH?usp=sharing

@WayneWu01
Copy link
Author

WayneWu01 commented Mar 30, 2022 via email

@Calvin-Pang
Copy link
Owner

Do you have the train_mobilenet.py file? I could'nt find it.

On Sat, Mar 26, 2022 at 4:19 AM Calvin Pang @.> wrote: I am sorry I don't know what's wrong with your split. I guess you have data loss when downloading and unzipping the dataset. I just found my split dataset in my Google Drive, you may download and I hope it can help you. CRC_DX: https://drive.google.com/drive/folders/1sQR_4_ZjOW8IWk8cMdsF2MTMOmfS9Y06?usp=sharing https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Fdrive%2Ffolders%2F1sQR_4_ZjOW8IWk8cMdsF2MTMOmfS9Y06%3Fusp%3Dsharing&data=04%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C7b41345f87614e5503f908da0f09c990%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637838831954767756%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2BOyCvdfcmRHh2UCIaiZyGM8qPP57tEMUfhh3ODg5OT0%3D&reserved=0 STAD: https://drive.google.com/drive/folders/1ntb9MLvBx7ptyEA3dGhpQiM1nKg-jVCH?usp=sharing https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Fdrive%2Ffolders%2F1ntb9MLvBx7ptyEA3dGhpQiM1nKg-jVCH%3Fusp%3Dsharing&data=04%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C7b41345f87614e5503f908da0f09c990%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637838831954767756%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=hj%2FqA0LFT0kPfH57BKkTpjl%2F0UrPhNdEGC9YTcafCrg%3D&reserved=0 — Reply to this email directly, view it on GitHub https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCalvin-Pang%2FMAg%2Fissues%2F1%23issuecomment-1079647323&data=04%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C7b41345f87614e5503f908da0f09c990%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637838831954767756%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=D1jsCKPAf42HuUV3EQiiUjzXWenjRWDOk66NDzb0e8A%3D&reserved=0, or unsubscribe https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJY734QJXWAFTAW3MT3VB3JDTANCNFSM5RJPSG5A&data=04%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C7b41345f87614e5503f908da0f09c990%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637838831954767756%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Znp8sK3dwQkT9YvRp8q4TUeVHwuRp8CadAS4CVJ3Lts%3D&reserved=0 . You are receiving this because you authored the thread.Message ID: @.>

Hello, I checked my files and found that the file train_mobilenet.py is just a modified train.py which was a experiment for parameter-frozen processing. And the result showed that the frozen processing is meaningless. So please just use train.py for all model(resnet, mobilenet,......)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants