Skip to content

Tools in Python to quickly start using the UK-BioBank dataset before UKB RAP.

License

Notifications You must be signed in to change notification settings

TemryL/UKB-Tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UKB-Tools

Introduction

This repository provides tools in Python to quickly start using the UK-BioBank dataset before UKB RAP. The folder has the following structure:

├── commands/
    ├── create_data.py
    ├── create_eu_set.py
    ├── get_newest_baskets.py
├── ukb_tools/
    ├── preprocess
        ├── filtering.py
        ├── labeling.py
        ├── utils.py
    ├── __init__.py
    ├── data.py
    ├── logger.py
    ├── tools.py

Installation

Clone the repository:

git clone https://github.com/TemryL/UKB-Tools.git

Move to the directory:

cd UKB-Tools

Create a virtual environment with Python 3.11 installed. Then install the dependencies:

pip install -r requirements.txt

Usage

UK-BioBank is organized by projects and baskets. Each project ID can have several basket IDs associated. When somenone requests new fields or a data update under the same project ID, a new basket will be created. Data across projects cannot be merged (because of eids randomization). However, data across baskets of the same project can be merged and it is preferable to get data for a given UKB field from the most recent basket.

Let's say we want to create a dataset with UKB fields 31, 131369, 3066. Then one can store the fields in a text file as follow:

ukb_fields.txt:

31
131369
3066

Run the following command to retrieve, for a given project ID, the most recent basket that contains the given UKB fields:

python commands/get_newest_baskets.py ${/dir/to/ukb_folder} ${project_id} ${data/ukb_fields.txt} ${data/field_to_basket.json}

The results will be stored in a JSON file as follow:

field_to_basket.json:

{
    "31": "project_52887_41230",
    "131369": "project_52887_676883",
    "3066": "project_52887_669338",
}

Finally, to merge the data in a single CSV file, run the following command:

python commands/create_data.py ${/dir/to/ukb_folder} ${data/field_to_basket.json} ${data.csv}

Contribute

Feel free to contribute to this repo by fixing issues, improving performances or adding new features!

About

Tools in Python to quickly start using the UK-BioBank dataset before UKB RAP.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages