Skip to content

bioc/BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BERT

BERT (Batch-Effect Reduction Trees) offers flexible and efficient batch-effect correction of omics data, while providing maximum tolerance to missing values. As such, BERT is a valuable preprocessing tool for data analysis workflows. By providing BERT via Bioconductor, we make this tool available to a wider research community. An accompanying research paper is currently under preparation and will be made public soon.

BERT addresses the same fundamental data integration challenges as HarmonizR package, which has been released on Bioconductor in November 2023. However, various algorithmic modications and optimizations of BERT provide better execution time, better data coverage and enhanced flexibility compared to HarmonizR. Moreover, BERT offers a more user-friendly design and a less error-prone input format.

Please note that our package BERT is neither affiliated with nor related to Bidirectional Encoder Representations from Transformers as published by Google.

This GitHub README provides only a brief introduction to BERT and we refer the reader to the Bioconductor vignette for more details and more thorough explanations.

System Requirements

BERT supports all major operating systems, i.e. Linux (e.g., Ubuntu 22.04 LTS), Microsoft Windows (e.g., Windows 10 and Windows 11) and macOS (e.g., Monterey and Ventura). Further, it has been tested to work on all major CPU architectures (x86_64, x64, arm64). The Bioconductor version requires R version 4.4. All other relevant software dependencies are specified in the source DESCRIPTION file along with their respective version numbers and will be installed automatically.

Installation Guide

To install BERT, start R (version "4.4") and enter

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("BERT")

The execution time for the installation may vary greatly depending on your bandwidth and latency of your internet connection, as well as pre-installed R packages. At maximum, we expect an installation time of 20 minutes.

Example Usage

BERT provides functionality to generate simulated data with missing values and batch-effects. This data is correctly formatted for direct batch-effect correction using BERT.

library(BERT)
dataset_raw <- generate_dataset(features=60, batches=10, samplesperbatch=10, mvstmt=0.1, classes=2)
dataset_corrected <- BERT(dataset_raw)

For this example, the average silhouette width (ASW) with respect to batch should decrease and vice versa for the ASW with respect to class label. At maximum, we expect a runtime of 20 seconds for the above example.

Usage

For details on how to use BERT, please refer to the vignette.

Issues

Please report any issues in the GitHub forum, the Bioconductor forum, or contact the authors directly.

License

This code is published under the GPLv3.0 License.

Reference

Please cite our manuscript, if you use BERT for your research:

High Performance Data Integration for Large-Scale Analyses of Incomplete Omic Profiles Using Batch-Effect Reduction Trees (BERT)

About

This is a read-only mirror of the git repos at https://bioconductor.org

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages