Data Package Manager for R
dpmr has three core functions:
datapackage_init: initialises a new data package from an R data frame and (optionally) a meta data list.
datapackage_install: installs a data package either stored locally or remotely, e.g. on GitHub.
datapackage_info:reads a data package's metadata (stored in its datapackage.json file) into the R Console and (optionally) as a list.
Create Data Packages
To initiate a barebones data package in the current working directory called
# Create fake data A <- B <- C <- sample(1:20, size = 20, replace = TRUE) ID <- sort(rep('a', 20)) Data <- data.frame(ID, A, B, C) datapackage_init(df = Data, package_name = 'My_Data_Package')
This will create a data package with barebones metadata in a datapackage.json file. You can then edit this by hand.
Alternatively, you can also create a list with the metadata in R and have this included with the data package:
meta_list <- list(name = 'My_Data_Package', title = 'A fake data package', last_updated = Sys.Date(), version = '0.1', license = data.frame(type = 'PDDL-1.0', url = 'http://opendatacommons.org/licenses/pddl/'), sources = data.frame(name = 'Fake', web = 'No URL, its fake.')) datapackage_init(df = Data, meta = meta_list)
Note if you don't include the
resources fields in your metadata list, then
they will automatically be added. These fields identify the data files' paths
Installing Data Packages
To load a data package called gdp stored in the current working directory use:
gdp_data <- datapackage_install(path = 'gdp/')
From the web
You can install a package stored remotely using its URL. In this example we directly download the gdp data package from GitHub using the URL for its zip file:
URL <- 'https://github.com/datasets/gdp/archive/master.zip' gdp_data <- datapackage_install(path = URL)
Get Data Package Metadata
datapackage_info to read a data package's metadata into R:
# Print information when working directory is a data package datapackage_info()
To-do for v0.2
datapackage_updatefor updating a data package's data and metadata.
Specify data variable descriptions in meta list.
Load inline data from the datapackage.json file.
Load data from a GitHub repo using GitHub usernames and repos.
Integrate Octopub API.
Licensed under GPL-3