ORscraper: An R Package for for extracting data from Oncomine Reporter’s clinical reports .

Overview

ORscraper is an R package designed to extract relevant medical information from clinical reports generated by the Oncomine Reporter software. This package is intended for healthcare professionals and researchers working with genetic data who need to automate the extraction and processing of information from report files. ORscraper provides tools to identify biopsies, extract genetic variants and pathogenicity classifications, filter relevant data, and query databases such as NCBI ClinVar.

Installation

Install the released version of remotes from CRAN:

install.packages("ORscraper")

You can install ORscraper from GitHub using the following R code:

# Install devtools if not already installed
if (!requireNamespace("devtools", quietly = TRUE)) {
    install.packages("devtools")
}

# Install ORscraper from GitHub
devtools::install_github("SamuelGonzalez0204/ORscraper")

Basic Usage

Below is a basic example of how to use ORscraper to extract information from PDF files:

library(ORscraper)

# Read content from a PDF file
example_pdf <- system.file("extdata", "100.1-example.pdf", package = "ORscraper")
lines <- read_pdf_content(example_pdf)

# Read content from mutation tables
genesFile <- system.file("extdata", "Genes.xlsx", package = "ORscraper")
genes <- read_excel(genesFile)
mutations <- unique(genes$GEN)

# Extract mutations values from the extracted text
genes_mut <- c()
pathogenicities <- c()
tableValues <- extract_values_from_tables(lines, mutations)
genes_mut <- c(genes_mut, tableValues[1])
pathogenicities <- c(pathogenicities, tableValues[2])

# Filter only pathogenic mutations
pathogenic_mutations <- filter_pathogenic_only(pathogenicities, genes_mut)

print(pathogenic_mutations)

Main Functions

The ORscraper package includes several key functions:

classify_biopsy(): Analyzes biopsy identifiers and categorizes them based on predefined rules.
extract_chip_id(): Extracts chip values from filenames matching specific patterns.
extract_fusions(): Identifies and extracts fusion variants from text lines.
extract_intermediate_values(): Searches for a specific text pattern and extracts consecutive values.
extract_values_from_tables(): Extracts information such as mutations, pathogenicity, and frequencies from tables in reports.
extract_values_start_end(): Extracts values based on start and end markers.
filter_pathogenic_only(): Filters mutations, retaining only those marked as “Pathogenic.”
read_pdf_content(): Extracts the content of a PDF and splits it into individual lines.
read_pdf_files(): Scans a directory and retrieves all PDF files.
search_ncbi_clinvar(): Queries the NCBI ClinVar database for germline classifications.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
R		R
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
ORscraper.Rproj		ORscraper.Rproj
README.Rmd		README.Rmd
README.md		README.md
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ORscraper: An R Package for for extracting data from Oncomine Reporter’s clinical reports .

Overview

Installation

Basic Usage

Main Functions

About

Uh oh!

Releases

Packages

Languages

License

SamuelGonzalez0204/ORscraper

Folders and files

Latest commit

History

Repository files navigation

ORscraper: An R Package for for extracting data from Oncomine Reporter’s clinical reports .

Overview

Installation

Basic Usage

Main Functions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages