R Package - Download and Prepare C14 Dates from Different Source Databases
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
data-raw
data
inst/image
man
playground
tests
.Rbuildignore
.gitignore
.travis.yml
CONDUCT.md
CONTRIBUTING.md
DESCRIPTION
LICENSE
NAMESPACE
README.md
c14bazAAR.Rproj
cran-comments.md

README.md

Travis-CI Build Status Coverage Status CRAN_Status_Badge license

c14bazAAR

c14bazAAR is a R package to query different openly accessible radiocarbon date databases. It allows basic data cleaning, calibration and merging. It serves as back end of our neolithicRC WebApp. If you're not familiar with R the WebApp or other tools (such as GoGet) to search for radiocarbon dates might be better suited for your needs.

If you want to use data downloaded with c14bazAAR or neolithicRC for your research, you have to quote the source databases. Most databases have a preferred way of citation that also may change over time with new versions and publications. Please check the respective homepages to find out more. The output of c14bazAAR does not contain the full citations of the individual dates, but only a short reference tag. For further information you have to consult the source databases.

Installation

c14bazAAR is not on CRAN yet, but you can install it from github. To do so, run the following lines in your R console:

if(!require('devtools')) install.packages('devtools')
devtools::install_github("ISAAKiel/c14bazAAR")

The package needs a lot of other packages -- many of them only necessary for specific tasks. Functions that require certain packages you don't have installed yet will stop and ask you to install them. Please do so with install.packages() to download and install the respective packages from CRAN.

How to use

The package contains a set of getter functions (see below) to query the databases. Thereby not every available variable from every archive is downloaded. Instead c14bazAAR focuses on a selection of the most important and most common variables to achieve a certain degree of standardization. The downloaded dates are stored in the custom S3 class c14_date_list which acts as a wrapper around the tibble class and provides specific class methods.

One (almost) complete workflow to download and prepare all dates can be triggered like this:

library(c14bazAAR)
library(magrittr)

get_all_dates() %>%
  calibrate() %>%
  mark_duplicates() %>%
  classify_material() %>%
  finalize_country_name() %>%
  coordinate_precision()

It takes quite some time to run all of this and it's probably not necessary for your use case. Here's a list of the main tasks c14bazAAR can handle. That allows you to pick what you need:

Download

c14bazAAR contains a growing selection of getter functions to download radiocarbon date databases. Here's a list of all available getters. You can download all dates at once with get_all_dates(). The getters download the data, adjust the variable selection according to a defined variable key and transform the resulting list into a c14_date_list.

See ?get_dates for more information.

x <- get_all_dates()

Calibration

The calibrate() function calibrates all valid dates in a c14_date_list individually with Bchron::BchronCalibrate(). It provides two different types of output: calprobdistr and calrange.

See ?calibrate for more information.

x %>% calibrate()

Material classification

Most 14C databases provide some information about the material sampled for the individual date. Unfortunately this information is often very specific and makes filtering operations difficult for large datasets. The function classify_material() relies on a custom made classification to simplify this data.

See ?classify_material for more information and look here for a change log of the thesaurus.

x %>% classify_material()

Country attribution

Filtering 14C dates by country is useful for a first spatial limitation and especially important, if no coordinates are documented. Most databases provide the variable country, but they don't rely on a unified naming convention and therefore use various terms to represent the same entity. The function standardize_country_name() tries to unify the semantically equal terms by string comparison with the curated country name list countrycode::codelist and a custom made thesaurus. Beyond that it turned out to be much more reliable to look at the coordinates to determine the country.

That's what the function determine_country_by_coordinate() does. It joins the position with country polygons from rworldxtra::countriesHigh to get reliable country attribution.

The function finalize_country_name() finally combines the initial country information in the database and the results of the two previous functions to forge a single column country_final. If the necessary columns are missing, it calls the previous functions automatically.

See ?country_attribution for more information.

x %>%
  standardize_country_name() %>%
  determine_country_by_coordinate() %>%
  finalize_country_name()

Duplicates

Some of the source databases already contain duplicated dates and for sure you'll have some if you combine different databases. As a result of the long history of these archives, which includes even mutual absorption, duplicates make up a significant proportion of combined datasets. The function mark_duplicates() adds a column duplicate group to the c14_date_list, that assigns duplicates found by lab code comparison a common group number. This should allow you to make an educated decision, which dates to discard.

For an automatic removal there's the function remove_duplicates(). It boils down all dates in a duplicate_group to one entry. Unequal values become NA. All variants for all columns are documented within a string in the column duplicate_remove_log.

See ?duplicates for more information.

x %>%
  mark_duplicates() %>%
  remove_duplicates()

Coordinate precision

The function coordinate_precision() allows to calculate the precision of the coordinate information. It relies on the number of digits in the columns lat and lon. The mean of the inaccuracy on the x and y axis in meters is stored in the additional column coord_precision.

See ?coordinate_precision for more information.

x %>% coordinate_precision()

Conversion

A c14_date_list can be directly converted to other R data structures. So far only as.sf() is implemented. The sf package provides great tools to manipulate and plot spatial vector data. This simplifies certain spatial operations with the date point cloud.

See ?as.sf for more information.

x %>% as.sf()

Technical functions

c14_date_lists are constructed with as.c14_date_list. This function takes data.frames or tibbles and adds the c14_date_list class tag. It also calls order_variables() to establish a certain variable order and enforce_types() which converts all variables to the correct data type. There are custom print() and format() methods for c14_date_lists.

The fuse() function allows to rowbind multiple c14_date_lists.

See ?as.c14_date_list and ?fuse.

x1 <- data.frame(
  c14age = 2000,
  c14std = 30
) %>% as.c14_date_list()

x2 <- fuse(x1, x1)

Databases

To suggest other archives to be queried you can join the discussion here.

Contributing

If you would like to contribute to this project, please start by reading our Guide to Contributing. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

License

For the code in this project apply the terms and conditions of GNU GENERAL PUBLIC LICENSE Version 2. The source databases are published under different licences.