Skip to content

WadhwaniAI/cough-against-covid-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CoughAgainstCovid Official Dataset Repository

This is the official repository for accessing the data used by the project CoughAgainstCovid.

Data Description

Due to privacy constraints, we are not allowed to release the original raw audiowaveforms. Instead we release spectrograms, which are 2D time-frequency representations of the audio. To create the spectrograms from the raw audio waveform, we used the following transforms,

  1. ToTensor
  2. Resample (44.1khz to 16khz)
  3. Background Noise (From ESC-50 Dataset)
  4. Spectrogram (n_fft=512, win_length=512, hop_length=160)
  5. MelScale (n_mels=64, f_min=125, f_max=7500)
  6. AmplitudeToDB
  7. ToNumpy

We share the 2D numpy arrays (npy files) for all the audio sounds collected.

Accessing/Downloading the Data

To download/access the spectrograms,

  1. Fill the form and attach the signed doc file. You will receive a text file with the links in 10-15 mins.
  2. Download the text file, rename it (to say links.txt) and save it at a location where you can access it.
  3. Run prepare.py to download and unzip the data. (This script should take 1-2hrs depending upon the download speed)
# To run prepare.py and download, unzip the data at ~/data, (wget would be used to download)
python prepare.py -lp path_to_links_file -od ~/data

Args:
    links_path (lp): Path to the text file with the links to the zip files.
    output_dir (od): Path to the output directory. If it does not exist, it will be created

Running this script will download the data and unzip it to the output directory. The spectrograms should be present at output_dir/spectrograms/

Metadata Details

We provide a metadata file (attributes.csv) that contains supplementary information about the patients. The table contains the supplementary information present in the csv file.

Attribute Column Name in CSV Description
Patient Id patient_id Unique Identifier
Patient Age enroll_patient_age Continuous
Health Worker enroll_health_worker Discrete
Temperature enroll_patient_temperature Continuous
Travel History enroll_travel_history Discrete
Presence of Cough enroll_cough Discrete
Presence of Shortness of Breath enroll_shortness_of_breath Discrete
Presence of Fever enroll_fever Discrete
Days with Cough enroll_days_with_cough Continuous
Days with Shortness of Breath (SOB) enroll_days_with_shortness_of_breath Continuous
Days with Fever enroll_days_with_fever Continuous
Contact with Covid Confirmed Case enroll_contact_with_confirmed_covid_case Discrete
Comorbidities enroll_comorbidities Discrete
Patient Respiratory Rate enroll_patient_respiratory_rate Continuous
Smoking Habits enroll_habits Discrete
Cough Relief Measures enroll_cough_relief_measures Discrete
State testresult_state Discrete
Test Facility testresult_facility Discrete
Test Time testresult_end_time DateTime
Covid Result testresult_covid_test_result Discrete
Covid Test Type testresult_diagnostics_test_type Discrete
Audio Recording (aaaaaa sound) aaaaa_recording File Name
Audio Recording (oooooo sound) ooooo_recording File Name
Audio Recording (eeeeee sound) eeeee_recording File Name
Audio Recording (a sound) a_sound File Name
Audio Recording (e sound) e_sound File Name
Audio Recording (o sound) o_sound File Name
Audio Recording (Cough Sound 1) cough_1 File Name
Audio Recording (Cough Sound 2) cough_2 File Name
Audio Recording (Cough Sound 3) cough_3 File Name
Audio Recording (Breathing) breathing File Name
Audio Recording (1 to 10 Counting) audio_1_to_10 File Name
Audio Recording (Room) room_sound File Name
Audio Recording (Room Recording) room_recording File Name

While we collect cough sounds for all the 7169 patients, we collect some outher sounds as well. The audio recording for them would exist only if their filename exists in this metadata file.

Dataset Paper will be Released Soon.

About

Official Open Source Dataset Repository

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages