Skip to content

codeamt/mle-capstone-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Generating COVIDx Dataset

Data preprocessing submodule for Udacity's Machine Learning Engineer Nanodegree program.

Generates the latest COVIDx Dataset for modeling; from benchmark research model first presented in [1].

Repo Contents

1 directory, 6 files

Generating Covidx Training Set

There are 2 ways to generate the COVIDx Dataset:

The Data Pre-Processing Notebook:

The data preprocessing notebook covidnet_data_processing.ipynb in this repo includes additional steps for generating .csv labeling files for modeling.

Setting up and Running data-cli-tool:

What you'll need:

  • Linux-based system with Python 3.7+ installed
  • And/or virtualenv intalled
  • A Kaggle Authentication Key (kaggle.json file)

Running Locally (Linux):

In a terminal, get the repo via git if you don't have it on your system already, then change into the repo, create a virtual environment and activate, and run the python script:

pip3 install virtualenv
git clone https://github.com/codeamt/mle-capstone-data.git
cd mle-capstone-data-master && virtualenv .
source bin/activate
python3 get_covidx.py --kaggle_file "/path/to/your/kaggle.json"

Be sure to upload and extract the output zip file of this pipeline phase to the environment/notebook you use for the modeling phase.

About the Data

This set aggregates and deduplicates examples to construct COVIDxv3 from the following sources:

For more notes on previous versions of the dataset, please refer to the original COVID-Net repo for more detailed documentation.

Chest Radiography Images Distribution

[1] L. Wang and A. Wong, “COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID19 Cases from Chest Radiography Images,” ArXiv200309871 Cs Eess, Mar. 2020 [Online]. Available: http://arxiv.org/abs/2003.09871.

About

data preprocessing submodule for Udacity's mle nanodegree program.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published