db2pq is an R package for moving data from PostgreSQL into Apache Parquet files.
It is designed for both general PostgreSQL sources and the WRDS PostgreSQL service.
- Export a single PostgreSQL table to Parquet.
- Export all tables in a PostgreSQL schema to Parquet.
- Update WRDS Parquet files only when the source table is newer.
- Read and manage
last_modifiedmetadata embedded in Parquet files. - Archive and restore historical versions of Parquet files.
# install.packages("pak")
pak::pak("iangow/db2pqr")You can pass a WRDS username directly for a first call:
library(db2pq)
wrds_update_pq("dsi", "crsp", wrds_id = "your_wrds_id")For repeated use, configure the WRDS username and PostgreSQL password outside
the call. The authentication
article
documents the WRDS_ID, .pgpass, and wrds::wrds_set_credentials() paths.
The remaining WRDS examples assume that setup is in place.
wrds_update_pq("dsi", "crsp", force = TRUE)wrds_update_pq("dsi", "crsp", use_sas = TRUE)SSH setup is only needed for this SAS metadata path. See the WRDS SSH setup article for the key-based setup used by that option.
wrds_schema_to_pq("crsp")wrds_sql_to_pq(
"SELECT permno, date, ret FROM crsp.dsf WHERE date >= '2024-01-01'",
table_name = "dsf_recent",
schema = "crsp"
)db_to_pq(
table_name = "company",
schema = "comp",
keep = c("gvkey", "conm"),
rename = c(conm = "company_name")
)pq_last_modified(schema = "crsp")The stable default transfer path uses DBI/RPostgres. The optional ADBC path can
be selected with transfer_method = "adbc" when adbi and a PostgreSQL ADBC
driver are installed:
adbc_diagnostics()
wrds_update_pq("dsi", "crsp", transfer_method = "adbc")If ADBC reports an SSL/libpq error, use transfer_method = "dbi" or install a
current SSL-capable adbcpostgresql build.
Files are organized as:
<DATA_DIR>/<schema>/<table>.parquet
For example:
~/pq_data/crsp/dsi.parquet
The DATA_DIR environment variable sets the root directory. It can also be
passed directly as data_dir to any function.
When archive = TRUE, replaced files are moved to:
<DATA_DIR>/<schema>/archive/<table>_<timestamp>.parquet
MIT License. See LICENSE.md.