HIE ML analysis

Source code

git clone git@github.com:MRCIEU/hie-ml.git
cd hie-ml

Build Docker image

docker build -t hie-ml .

Prepare data

See prepare_data.stata

Extract data

The first(A) is one with all variables with >5% missing values removed, the second(B) is imputed form the most recent complete data-point prior to that birth and the third(C) is imputed using mode values

Derived variables are:

_cohort – Either 1 (born in the first deriving cohort) or 0 (in the second, testing cohort) _hie – 1 for HIE, 0 for not _id _lapgar – 1 for a low Apgar score, 0 for not _ne – Another measure of brain injury (not used at present) _neonataldeath – Not used at present _perinataldeath – 1 for perinatal death; 0 for not _resus – 1 for resus at birth, and 0 for not _stillborn – Not used at present _yearofbirth - Year of birth

First letter is either a (antenatal), g (growth) or I (intrapartum) variable Second letter is type of entry; c (categorical), o(ordinal) or l(linear) Then _NAME (most have one given) Then _#### - number of were extraction was performed on the Variable File

docker run -it -v `pwd`:/app hie-ml python extract_features.py

Features

Select features

for data in "antenatal" "antenatal_growth" "antenatal_intrapartum"; do
    for model in "RFE" "ElasticNet" "Lasso" "SVC" "Tree"; do
        docker run -it -d -v `pwd`:/app hie-ml \
        python feature_selection.py \
        --data "$data" \
        --outcome "_hie" \
        --model "$model"
    done
done

Plot method correlation

Rscript feature_selection_plot.R

Models

# pool jobs
for data in "antenatal" "antenatal_growth" "antenatal_intrapartum"; do
    for model in "LR" "RF" "SVC" "NB" "NN"; do
        for fmodel in "ElasticNet"; do
            for nfeatures in 20 40 60; do
                f=data/"$data"_hie_"$fmodel"_n"$nfeatures"_"$model"_prob.csv
                if [ ! -f "$f" ]; then
                    docker run -it --cpus 1 -d -v `pwd`:/app hie-ml \
                    python models.py \
                    --data "$data" \
                    --outcome "_hie" \
                    --model "$model" \
                    --fmodel "$fmodel" \
                    --nfeatures "$nfeatures"
                fi
            done
        done
    done
done > todo.sh

# run n jobs concurrently
head -n 20 todo.sh | bash

ROC

Rscript roc-forest.R

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LR_all_feature_selection.pdf		LR_all_feature_selection.pdf
ML_elastic_net.pdf		ML_elastic_net.pdf
README.md		README.md
antenatal_feature_selection.pdf		antenatal_feature_selection.pdf
antenatal_growth_feature_selection.pdf		antenatal_growth_feature_selection.pdf
antenatal_intrapartum_feature_selection.pdf		antenatal_intrapartum_feature_selection.pdf
extract_features.py		extract_features.py
feature_selection.py		feature_selection.py
feature_selection_plot.R		feature_selection_plot.R
forest.R		forest.R
missingness.R		missingness.R
missingness.pdf		missingness.pdf
models.py		models.py
participants.drawio		participants.drawio
prepare_data.stata		prepare_data.stata
requirements.txt		requirements.txt
roc-forest.R		roc-forest.R
roc_main.R		roc_main.R
roc_plot.R		roc_plot.R
sim.R		sim.R
workflow.drawio		workflow.drawio

MRCIEU/hie-ml

Folders and files

Latest commit

History

Repository files navigation

HIE ML analysis

Source code

Build Docker image

Prepare data

Extract data

Features

Models

ROC

About

Resources

Stars

Watchers

Forks

Languages