Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hannover Medical School COVID-19 image repository #4

Closed
armiro opened this issue May 21, 2020 · 4 comments
Closed

Hannover Medical School COVID-19 image repository #4

armiro opened this issue May 21, 2020 · 4 comments
Labels
dataset expansion Some new data source are to be added to dataset

Comments

@armiro
Copy link
Owner

armiro commented May 21, 2020

They have a dataset of 172 PNG images. Their original images are DCM and of high-quality, which are then converted to Nifti images and downscaled to be hosted in a GitHub repo:
https://github.com/ml-workgroup/covid-19-image-repository

More images will be added to this repo shortly. They will also add CT scans as well.

@armiro armiro added the dataset expansion Some new data source are to be added to dataset label May 21, 2020
@armiro
Copy link
Owner Author

armiro commented May 27, 2020

Images were added but there are some issues & tips here:

  • All images are not COVID-19 positive! as they don't have "is_covid" label or something like that, positive images are distinguished by a rule of thumb explained by repo owner here.
  • Some images have patient IDs that are misinterpreted by Excel (~5 images because of the exponential sign e).
  • Dataset has 172 images with PNG format as of 12/May/2020. Regarding the guideline mentioned in the issue above, 164 images are COVID-19 positive and 8 images are normal CXRs.
  • Images are of high quality but darker than the images in our dataset. Hence, brightness adjustment is important as an augmentation function to be applied here.
  • According to the issues in the repo, there are some bugs in calculating admission offsets. There may be some changes to the number of current positive CXRs. I've had a look at the positive images and most of them obviously have patchy opacities and consolidation. Normal images are normal.
  • Dr. Cohen's dataset has removed some images from Hannover's imported dataset, and I can't find any specific reason for that. This dataset includes all the images in Hannover's so far
  • Many images had black regions out of CXR. The majority of the original images were therefore cropped and some of them were straightened.

@armiro armiro closed this as completed May 27, 2020
@armiro armiro reopened this May 27, 2020
@armiro
Copy link
Owner Author

armiro commented May 28, 2020

The issue is reopened for adding images of the next releases. The next set of images including 21 CXRs are added.

Lack of proper disease labelling is worrying me! For instance, CXR with patient_id=d3fb252e and image_id=88859dc1 has an admission offset of 0 with none of the other metrics reported; while it is assumed COVID-19 positive, CXR does not show specific findings related to pneumonia.

Images must come with radiological description to be viable in terms of labelling. This specific image was added with - add the end of the image name. I recheck all images one by one before adding to our dataset to make sure it has signs of pneumonia but I am not an experienced radiologist though!

I have asked the repo maintainer about this problem here

@armiro armiro closed this as completed May 20, 2021
@danshirron
Copy link

Is running Hannover_data_loader.py still needed or its images are integrated in current dataset?
If its needed then there are some overwrites of images on existing dataset when running the script. Is that the expected behaviour?

@armiro
Copy link
Owner Author

armiro commented May 27, 2021

Hey @danshirron

Nope, not needed. The chest_xray_images/covid19 has all the images included. It is also possible to download all the images at once on the FigShare page.

Regarding the Hannover Dataset, I updated the code based on their second version of the dataset. They might have changed some names, but this dataset has all the needed data from their set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset expansion Some new data source are to be added to dataset
Projects
None yet
Development

No branches or pull requests

2 participants