(Click on the logo above to play the Video)
Paper: The potential of intra-annual density information for crossdating of short tree-ring series
Thesis: Cross-Dating of Intra-Annual Wood Density Series
Further Information: Appendix
The correct dates of wood pieces can be determined by annual rings. Therefore, one can just measure the widths or maximum densities of individual rings and then shift them in a chronology with known dates to determine the exact calendar year. But such established approaches do not work well with short pieces, i.e. with wood pieces containing only a few annual rings. Our new approach, however, is different. It has been shown that it can accurately determine the date for even shorter wood pieces correctly. Therefore, it uses series of densities within a ring. By this it becomes one of the most accurate, existing approaches in dendrochronology, today!
It has been used different libraries for visualization, simplification, testing and fast code execution.
Essential libraries for fast and nice code:
Library | Version | Description |
---|---|---|
fitdistrplus | 1.09 | fitting distributions to data |
plyr | 1.8.4 | data reconversion with C++ calls |
rlist | 0.4.6.1 | to build complex data-structures and avoid unnecessary code |
stringr | 1.2.0 | for consistent usage of string-operations |
These libraries can be installed with the command
install.packages(c("fitdistrplus", "plyr", "rlist", "stringr"))
.
Perhaps you have to install the right version for correct execution of the code!
In addition, rJava
and the Java runtime
have to be installed to execute MICA.
A manual can be found here.
Java 8 Update 152 together with R 3.4.3
were successfully tested within this project.
It should be enough to execute the command install.packages("rJava")
.
In the libraries
-folder extracted is MICA 2.02.
But the script mica-functions.R
was extended. So it is not possible to replace it without breaking the code.
For visualizations and unit-testing, the following libraries are necessary:
Library | Version | Description |
---|---|---|
ggplot2 | 2.2.1 | grammar of graphics based plotting system |
gridExtra | 2.3 | arranging ggplots in grids |
reshape2 | 1.4.3 | fast reshaping with C++ calls |
scales | 0.4.1 | custom axes for ggplot |
testthat | 2.0.0 | unit-testing maths, loading and visualization |
VennDiagram | 1.6.2 | to create Venn- and Euler-diagrams |
These libraries can be installed with the command
install.packages(c("ggplot2", "gridExtra", "reshape2", "scales", "testthat", "VennDiagram"))
,
but also each package in the table is linked to its repository, paper or homepage.
Perhaps you have to install the right version for correct execution of the code!
Hint: During the installation, the dependencies of ggplot2
may not be installed correctly. In this case you have to run install.packages("ggplot2")
again.
The different approaches can be easily tested
after you have installed the necessary libraries.
Set up the working directory to the folder containing
the file called Main.R
.
The function Main.interface()
which is executed
by the Main.main
-function contains preset examples to test the approaches.
Uncomment with Ctrl+Shift+C
there the selected lines of an approach you want execute.
Then select all code with Ctrl+A
in the Main.R
and press Ctrl+Enter
.
The execution takes a few minutes.
Meanwhile, the finished number of samples and
different information is written into the console to give the user a visual feedback.
To execute the unit-tests, just go into the folder tests
and execute the file run_tests.R
. Do not set the folder as the working directory
and do not forget installing testthat
and the other for visualizations necessary libraries!
The interface allows you executing each base approach presented in the theoretical part. Hereby you can date samples of inter-annual wood density-series. By this you get as an output a matrix with most probable dates together with corresponding reached scores and p-values.
The inputs are *.csv
-files with following structure:
year | density | characteristic |
---|---|---|
1992 | 2.016 | 166 |
1992 | 2.433 | 166 |
1992 | 2.881 | 166 |
1993 | 2.043 | 128 |
1993 | 2.383 | 128 |
⋮ | ⋮ | ⋮ |
The rows with the same year correspond to a density profile.
Optionally available, depending on the approach is the characteristic
-column.
It is implemented an approach with two-steps which can use ring-widths or maximum densities
to speed up the dating procedure.
And there this characteristic
-column is used to store ring-widths or maximum densities.
Samples you want to date, must have a similar format.
Concretely the column year
is replaced by a column called part
with numbers 1,2,3,... identifying the different profiles.
For testing purposes, a chronology and samples in the presented formats
were prepared under \input\interface\pass_1
.
The output you get by interface functions is a matrix
with the following structure:
sample | pValue | rank1 | score1 | rank2 | … |
---|---|---|---|---|---|
1041_MICA-cons | 0.06209708 | 1960 | 41.6288 | 1947 | … |
1051_MICA-cons | 0.01127934 | 1987 | 14.9864 | 1940 | … |
1201_MICA-cons_1 | 0.06737666 | 1957 | 24.6861 | 1955 | … |
1201_MICA-cons_2 | 0.066093 | 1941 | 24.6395 | 1962 | … |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | … |
whereas the pValue
-column as well as the score
-columns are
only optionally available in the approaches Two-Step and Bucket.
In the first column you can see the name of the dated sample, beneath
you see the p-value for the rank 1 rated samples.
Then a column with the rank 1 prediction (= most probable start-year) and a column with the corresponding
score. The user can specify the number of predictions, so there can also be
rank 3 or rank 4 predictions.
This matrix can automatically be stored as *.csv
-file,
so there is an option in the Interface
-class to store computed data in the output
-folder.
The project provides a file called Main.R
. It contains a function
Main.interface
and this function contains examples for each of
the four approaches.
Interface.computeDatesConsensusApproach(..)
: executes the Consensus Approach per sampleInterface.computeDatesBucketApproach(..)
: executes the Bucket Approach per sampleInterface.computeDatesPerTreeApproach(..)
: executes the Per-Tree Approach per sampleInterface.computeDatesVotingApproach(..)
: executes the Voting Approach per sampleInterface.computeDatesTwoStepApproach(..)
: executes first a fast Points-Based Approach (correlation coefficient / t-value based approach) and afterwards the Bucket Approach on the number of potentially correct years found by the Points-Based Approach
Hint: Computed chronologies and samples must not contain gaps. Check that for each year at least one consensus-profile or bucket is available!
Interface.computeDatesConsensusApproach(consensusPath, consensusName, samplesPath, scoreType, bestYearsMax, save, fileName)
Interface.computeDatesConsensusApproach(..)
executes the Consensus Approach per sample.
Input parameters:
consensusPath {string}
: the path to the consensusconsensusName {string}
: the name of the consensus filesamplesPath {string}
: the path to the samples which should be datedscoreType {string}
: the score-type which should be used for computation ("a"
= y-based,"b"
= slope-based,"c"
= z-scores y-based,"d"
= z-scores slope-based)bestYearsMax {numeric}
: tells how many best ranked years should be storedsave {logical}
: tells if the list of possible dates should be stored in the projectoutput
-folderfileName {string}
: the filename without extension for the stored per sample dates-file
Output:
{matrix}
the matrix of possible dates (see Output)
Interface.computeDatesPerTreeApproach(consensusPath, consensusName, samplesPath, scoreType, bestYearsMax, save, fileName)
Interface.computeDatesPerTreeApproach(..)
executes the Per-Tree Approach per sample.
Input parameters:
consensusPath {string}
: the path to the consensusconsensusName {string}
: the name of the consensus filesamplesPath {string}
: the path to the samples which should be datedscoreType {string}
: the score-type which should be used for computation ("a"
= y-based,"b"
= slope-based,"c"
= z-scores y-based,"d"
= z-scores slope-based)bestYearsMax {numeric}
: tells how many best ranked years should be storedsave {logical}
: tells if the list of possible dates should be stored in the projectoutput
-folderfileName {string}
: the filename without extension for the stored per sample dates-file
Output:
{matrix}
the matrix of possible dates (see Output)
Interface.computeDatesBucketApproach(bucketsPath, samplesPath, scoreType, innerFunc, outerFunc, bestYearsMax, qualityMeasures = "", save, fileName))
Interface.computeDatesBucketApproach(..)
executes the Bucket Approach per sample.
Hint: The chronology must have at least length 30 to get proper p-values.
Input parameters:
bucketsPath {string}
: the path to the per tree consensi which should be usedsamplesPath {string}
: the path to the samples which should be datedscoreType {string}
: the score-type which should be used for computation ("a"
= y-based,"b"
= slope-based,"c"
= z-scores y-based,"d"
= z-scores slope-based)innerFunc {function}
: the function which should be applied on the per bucket computed scores to get a score for the given position (e.g. the minimum functionmin
)outerFunc {function}
: the function which should be applied on the per sample computed scores to get a final score for the position (e.g. the summation functionsum
)bestYearsMax {numeric}
: tells how many best ranked years should be storedqualityMeasures {string}
: tells which quality measures should be active (combine multiple options:""
= none,"p"
= p-values,"s"
= scores,"ps"
or"sp"
= scores and p-values)save {logical}
: tells if the list of possible dates should be stored in the projectoutput
-folderfileName {string}
: the filename without extension for the stored per sample dates-file
Output:
{matrix}
the matrix of possible dates
Interface.computeDatesVotingApproach(bucketsPath, samplesPath, scoreType, topYearsCount, approach = "", minimumLength, bestYearsMax, save, fileName)
Interface.computeDatesVotingApproach(..)
executes the Voting Approach per sample.
Hint 1: It is possible that NA
's are returned for rank predictions,
since for example all votes could have gone to a single year.
Hint 2: If minimumLength
is not set correctly the approach will need exponential time.
Input parameters:
bucketsPath {string}
: the path to the per tree consensi which should be usedsamplesPath {string}
: the path to the samples which should be datedscoreType {string}
: the score-type which should be used for computation ("a"
= y-based,"b"
= slope-based,"c"
= z-scores y-based,"d"
= z-scores slope-based)topYearsCount {numeric}
: the number of years selected per columnapproach {string}
: the string telling you which approaches should be active (combine multiple options:""
= none,"p"
= powerset approach)minimumLength {numeric}
: tells which minimum sample lengths should be considered in the powerset table (-1
= no limit)bestYearsMax {numeric}
: tells how many best ranked years should be storedsave {logical}
: tells if the list of possible dates should be stored in the projectoutput
-folderfileName {string}
: the filename without extension for the stored per sample dates-file
Output:
{matrix}
the matrix of possible dates
Interface.computeDatesTwoStepApproach(bucketsPath, samplesPath, scoreTypeRingWidths, scoreTypeBuckets, topYearsCount, bestYearsMax, qualityMeasures, save, fileName)
Interface.computeDatesTwoStepApproach(..)
executes first a fast Points-Based Approach (correlation coefficient / t-value based approach)
and afterwards the Bucket Approach on the amount of potentially correct years found by the Points-Based Approach.
Hint: topYearsCount
has to be set at least to 30 to get a proper distribution for p-values!
Input parameters:
bucketsPath {string}
: the path to the per tree consensi which should be usedsamplesPath {string}
: the path to the samples which should be datedscoreTypeCharacteristic {string}
: the score-type which should be used for computation during the Points-Based Approach ("p"
= Pearson's Rho,"t"
= Kendall's Tau,"r"
= Spearman's Rho,"v"
= t-value)scoreTypeBucket {string}
: the score-type which should be used for computation ("a"
= y-based,"b"
= slope-based,"c"
= z-scores y-based,"d"
= z-scores slope-based)topYearsCount {numeric}
: the number of top years which are stored by the characteristic approach (at least 2)bestYearsMax {numeric}
: tells how many best ranked years should be storedqualityMeasures {string}
: tells which quality measures should be active (combine multiple options:""
= none,"p"
= p-values,"s"
= scores,"ps"
or"sp"
= scores and p-values)save {logical}
: tells if the list of possible dates should be stored in the projectoutput
-folderfileName {string}
: the filename without extension for the stored per sample dates-file
Output:
{matrix}
the matrix of possible dates
R allows the =
-operator as the assignment-operator, but we use <-
for assignments instead.
The class-name (here Plotter
) is also the name of the class-file, and it is written behind the property or function e.g.
Plotter.__extendedMode <- TRUE;
Plotter.getLogNormalDistributionPlot <- function(scores, fit, color, print) {
...
}
like in C++.
Almost all constant values
like paths, strings, symbols, filenames are stored in the Defaults.R
.
So everything except functional strings
like dotted
or histogram
is stored there.
To avoid sourcing the same class two times and by this deactivating breakpoints, there is in every class an import-boolean written in big letters and named like the class:
EXERCISE_1_IMPORTED <- TRUE; # to avoid a reimport by the "Main.R"-class after sourcing this file
This import-boolean is set,
when the class is sourced for example
after setting a breakpoint.
If you now execute the Main
-class source code,
it is automatically checked if the boolean already exists
and a set breakpoint in class Exercise1
won't be deactivated.
Exercise1
is not resourced
because of an existence check-up in Main
:
if(!exists("EXERCISE_1_IMPORTED")) source("Exercise1.R");
By not using this technique you have otherwise to uncomment
the sourcing source("Exercise1.R")
of the class Exercise1
in class Main
. And that every time you
want to debug the class Exercise1
.
Hint: It is not allowed to have two classes with the same name!
Visibility follows the Python programming style
by just marking functions as protected or private.
Functions with two underscores e.g.
Exercise1.__getSubpatterns <- function(patternY, subpatternsIntervals) {
...
}
are private functions.
And functions with one underscore e.g.
Alignment._createAlignment <- function(path, sequenceA, sequenceB) {
...
}
are protected functions.