Skip to content

NeuroLexDiagnostics/train-diseases

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

train-diseases

This repository is for training models related to disease prediction. This summarizes our lab's work in this area and makes it accessible to the public. If you'd like to contribute to this repo, please let us know!

Take a second to watch this video below on why we started our company and why we think this research is impactful.

NeuroLex intro video

how to collaborate

the TRIBE Model

We created the TRIBE model to work with outstanding individuals to help accomplish our mission to build a universal voice test to refer patients to specialists faster. Fellows come from many different backgrounds - undergraduates, graduate students, faculty members, physicians, engineers, computer scientists, and other professionals.

Fellows contribute to this repo by pursuing a data science project to model existing datasets or a research-related project to collect more data around an existing or new use case. Research demos are important for us since many of our datasets have a very small number of samples, we're very focused on curating a larger dataset and have it open-sourced to advance this work into the world.

If you're interested, definitely apply to the next TRIBE here. You can read this FAQ and watch a previous demo day below to get a better feel for the program. If you have any additional questions, please reach out to Jim Schwoebel @ js@neurolex.co.

TRIBE 2 Demo Day

how to engineer new datasets

why youtube?

We have found that Youtube is a reliable place to get labeled speech data if you know what to search for. For example, if we were using Parkinson’s disease as an example to find data, you could search something like “Parkinson’s disease: my story”. You’ll quickly find many people who have shared their stories of living with Parkinson’s disease. You can then manually annotate these videos to cut them around voice segments of patients and use this data to train machine learning models without any formal IRB.

getting started

This repository makes it seamless to build custom voice-based disease datasets using Youtube.

To get started, clone this repository:

git clone git@github.com:NeuroLexDiagnostics/train-diseases.git
cd train-diseases 
open template.xlsx

Now fill out the spreadsheet (template.xlsx) in the current directory. This template (template.xlsx in this directory) allows you to quickly label 20 second segments with labels of voice data along with age (e.g. twenties), gender (e.g. male), accent (e.g. British), audio quality (e.g. good/bad), and location (indoor vs. outdoor). Note that you can make a new spreadsheet or expand upon an existing spreadsheet in this repository (in the spreadsheets directory):

  • addiction.xlsx
  • adhd.xlsx
  • als.xlsx
  • anxiety.xlsx
  • autism.xlsx
  • cold.xlsx
  • controls.xlsx
  • depression.xlsx
  • depression_labels.xlsx
  • dyslexia.xlsx
  • glioblastoma.xlsx
  • gravesdisease.xlsx
  • multiple_sclerosis.xlsx
  • parkinsons.xlsx
  • postpartum_depression.xlsx
  • schizophrenia.xlsx
  • sleep_apnea.xlsx
  • stressed.xlsx

Once you fill out this spreadsheet and save it to the cloned directory's spreadsheets folder, type this into the terminal:

python3 setup.py
python3 yscrape.py

After this, the video should start downloading and they will be converted to audio files in a folder named after the excel sheet you type in (e.g. glioblastoma.xlsx will be put into the folder glioblastoma).

If you get stuck, you can watch a quick tutorial on how this process works in the video below.

IRB-related studies

We also have options to collect to datasets through drafting an IRB-approved study. If you are interested in doing this, please contact Reza Hosseini Ghomi (MD/MSE), our Chief Medical Officer, @ reza@neurolex.co to see if your project idea is feasible.

references

TRIBE program

Ongoing studies

Research papers (ongoing)

About

this repository is for training models related to disease prediction.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages