Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wb_to_df gets startCol #330

Merged
merged 3 commits into from
Sep 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

## New features

* New argument `startCol` in read to data frame functions `wb_to_df()`, `wb_read()` and `read_xlsx()`. [330](https://github.com/JanMarvin/openxlsx2/issues/330)

* New function `wb_colour()` to ease working with colour vectors used in `openxlsx2` styles. [292](https://github.com/JanMarvin/openxlsx2/issues/292)

* Deprecated `get_cell_style()` and `set_cell_style()` in favor of newly introduced wrapper functions `wb_get_cell_style()` and `wb_set_cell_style()`. [306](https://github.com/JanMarvin/openxlsx2/issues/306)
Expand Down Expand Up @@ -40,6 +42,14 @@

* Various (mostly internal) changes to `conditional_formatting`. Created `style_mgr` integration for `dxf` (cf-styles) and cleaned up internal code. The syntax has changed slightly, see [conditional formatting vignette](https://janmarvin.github.io/openxlsx2/articles/conditional-formatting.html) for reference. Add `whitespace` argument to `read_xml()`. [268](https://github.com/JanMarvin/openxlsx2/issues/268)

## Breaking changes

* Order of arguments in reading functions `wb_to_df()`, `wb_read()` and `read_xls()` has changed.


***************************************************************************


# openxlsx2 0.2.1

## New features
Expand Down Expand Up @@ -142,6 +152,9 @@
* `$append_sheet_rels()` for `self$worksheet_rels[[sheet]]`
* `$get_worksheet()` to replace `$ws()`


***************************************************************************

# openxlsx2 0.2.0

* Added a `NEWS.md` file to track changes to the package.
Expand Down
24 changes: 14 additions & 10 deletions R/readWorkbook.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
#' @description Read data from an Excel file or Workbook object into a data.frame
#' @param xlsxFile An xlsx file, Workbook object or URL to xlsx file.
#' @param sheet The name or index of the sheet to read data from.
#' @param startRow first row to begin looking for data. Empty rows at the top of a file are always skipped,
#' regardless of the value of startRow.
#' @param startRow first row to begin looking for data.
#' @param startCol first column to begin looking for data.
#' @param colNames If `TRUE`, the first row of data will be used as column names.
#' @param skipEmptyRows If `TRUE`, empty rows are skipped else empty rows after the first row containing data
#' will return a row of NAs.
Expand Down Expand Up @@ -69,13 +69,14 @@ read_xlsx <- function(
xlsxFile,
sheet,
startRow = 1,
colNames = TRUE,
startCol = NULL,
rowNames = FALSE,
detectDates = TRUE,
colNames = TRUE,
skipEmptyRows = FALSE,
skipEmptyCols = FALSE,
rows = NULL,
cols = NULL,
detectDates = TRUE,
namedRegion,
na.strings = "#N/A",
na.numbers = NA,
Expand All @@ -93,13 +94,14 @@ read_xlsx <- function(
xlsxFile,
sheet = sheet,
startRow = startRow,
colNames = colNames,
startCol = startCol,
rowNames = rowNames,
detectDates = detectDates,
colNames = colNames,
skipEmptyRows = skipEmptyRows,
skipEmptyCols = skipEmptyCols,
rows = rows,
cols = cols,
detectDates = detectDates,
named_region = namedRegion,
na.strings = na.strings,
na.numbers = na.numbers,
Expand Down Expand Up @@ -127,13 +129,14 @@ wb_read <- function(
xlsxFile,
sheet = 1,
startRow = 1,
colNames = TRUE,
startCol = NULL,
rowNames = FALSE,
detectDates = TRUE,
colNames = TRUE,
skipEmptyRows = FALSE,
skipEmptyCols = FALSE,
rows = NULL,
cols = NULL,
detectDates = TRUE,
namedRegion,
na.strings = "NA",
na.numbers = NA
Expand All @@ -148,13 +151,14 @@ wb_read <- function(
xlsxFile = xlsxFile,
sheet = sheet,
startRow = startRow,
colNames = colNames,
startCol = startCol,
rowNames = rowNames,
detectDates = detectDates,
colNames = colNames,
skipEmptyRows = skipEmptyRows,
skipEmptyCols = skipEmptyCols,
rows = rows,
cols = cols,
detectDates = detectDates,
named_region = namedRegion,
na.strings = na.strings,
na.numbers = na.numbers
Expand Down
28 changes: 25 additions & 3 deletions R/wb_functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ style_is_posix <- function(cellXfs, numfmt_date) {
#' @param skipEmptyCols If TRUE, empty columns are skipped.
#' @param skipEmptyRows If TRUE, empty rows are skipped.
#' @param startRow first row to begin looking for data.
#' @param startCol first column to begin looking for data.
#' @param rows A numeric vector specifying which rows in the Excel file to read. If NULL, all rows are read.
#' @param cols A numeric vector specifying which columns in the Excel file to read. If NULL, all columns are read.
#' @param definedName (deprecated) Character string with a definedName. If no sheet is selected, the first appearance will be selected.
Expand Down Expand Up @@ -257,13 +258,14 @@ wb_to_df <- function(
xlsxFile,
sheet,
startRow = 1,
colNames = TRUE,
startCol = NULL,
rowNames = FALSE,
detectDates = TRUE,
skipEmptyCols = FALSE,
colNames = TRUE,
skipEmptyRows = FALSE,
skipEmptyCols = FALSE,
rows = NULL,
cols = NULL,
detectDates = TRUE,
na.strings = "#N/A",
na.numbers = NA,
fillMergedCells = FALSE,
Expand Down Expand Up @@ -376,6 +378,7 @@ wb_to_df <- function(
keep_rows <- rownames(z)

maxRow <- max(as.numeric(keep_rows))
maxCol <- max(col2int(keep_cols))

if (startRow > 1) {
keep_rows <- as.character(seq(startRow, maxRow))
Expand Down Expand Up @@ -407,6 +410,25 @@ wb_to_df <- function(
}
}

if (!is.null(startCol)) {
keep_cols <- int2col(seq(col2int(startCol), maxCol))

if (!all(keep_cols %in% colnames(z))) {
keep_col <- keep_cols[!keep_cols %in% colnames(z)]

z[keep_col] <- NA_character_
tt[keep_col] <- NA_character_

# return expected order of columns
z <- z[keep_cols]
tt <- tt[keep_cols]
}


z <- z[, colnames(z) %in% keep_cols, drop = FALSE]
tt <- tt[, colnames(tt) %in% keep_cols, drop = FALSE]
}

if (!is.null(cols)) {
keep_cols <- int2col(cols)

Expand Down
14 changes: 8 additions & 6 deletions man/read_xlsx.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 8 additions & 6 deletions man/wb_read.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 10 additions & 7 deletions man/wb_to_df.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions tests/testthat/test-read_from_created_wb.R
Original file line number Diff line number Diff line change
Expand Up @@ -111,3 +111,15 @@ test_that("dims != rows & cols", {
expect_equal("4", rownames(got6))

})

test_that("read startCol", {

wb <- wb_workbook()$add_worksheet()$add_data(x = cars, startCol = "E")

got <- wb_to_df(wb, startCol = 1, colNames = FALSE)
expect_equal(LETTERS[1:6], names(got))

got <- wb_to_df(wb, startCol = "F", colNames = FALSE)
expect_equal(LETTERS[6], names(got))

})