readcsvs

A single function R package that reads and combines multiple CSV files into a single data frame using DuckDB's efficient parallel processing.

Overview

readcsvs::read_csvs() uses DuckDB to quickly read and union multiple CSV files matching a pattern, automatically adding a filename column to track the source of each row. It normalizes column names and handles various CSV formats.

Installation

# Install devtools if needed
install.packages("devtools")

# Install package from GitHub
devtools::install_github("RayTellis/readcsvs")

Usage

read_csvs(
  pattern = "*",
  quote = "\"",
  delim = ",",
  skip = 0,
  header = TRUE
)

Parameters

Parameter	Type	Default	Description
`pattern`	character	`"*"`	File pattern to match (without `.csv` extension). Supports wildcards.
`quote`	character	`"\""`	Character used to quote fields in the CSV files.
`delim`	character	`","`	Field delimiter character.
`skip`	integer	`0`	Number of lines to skip at the beginning of each file.
`header`	logical	`TRUE`	Whether the first row contains column names.

Features

Efficient parallel processing: Uses DuckDB's optimized CSV reader
Automatic file combining: Unions multiple CSVs with union_by_name = true
Filename tracking: Adds a filename column showing the source file (without path or extension)
Column normalization: Automatically normalizes column names across files
Flexible matching: Supports wildcard patterns to select specific files
Clean resource management: Automatically disconnects from DuckDB on exit

Examples

Read all CSV files in the current directory

data <- read_csvs()

Read files matching a specific pattern

# Read all files starting with "sales_"
sales_data <- read_csvs(pattern = "sales_*")

# Read a single file
customer_data <- read_csvs(pattern = "customers_2024")

Handle different CSV formats

# Tab-delimited files
tsv_data <- read_csvs(pattern = "data_*", delim = "\t")

# Files with single quotes
quoted_data <- read_csvs(pattern = "export_*", quote = "'")

# Skip header rows
no_header_data <- read_csvs(pattern = "raw_*", skip = 2, header = FALSE)

Output

Returns a data frame with:

A filename column containing the source file name (without path or .csv extension)
All columns from the matched CSV files
Rows from all files combined

Notes

The .csv extension is automatically added to the pattern, so you don't need to include it
Column names are normalized across all files for consistent merging
Files are matched relative to the current working directory

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
R		R
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
readcsvs.Rproj		readcsvs.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

readcsvs

Overview

Installation

Usage

Parameters

Features

Examples

Read all CSV files in the current directory

Read files matching a specific pattern

Handle different CSV formats

Output

Notes

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

readcsvs

Overview

Installation

Usage

Parameters

Features

Examples

Read all CSV files in the current directory

Read files matching a specific pattern

Handle different CSV formats

Output

Notes

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages