Skip to content
Browse files
initial release
  • Loading branch information
njahn82 committed Jun 17, 2014
0 parents commit 52eac8bed8c7cb859fee1f217106220c341b607f
Show file tree
Hide file tree
Showing 6 changed files with 10,348 additions and 0 deletions.
BIN +6 KB .DS_Store
Binary file not shown.
@@ -0,0 +1,166 @@
## About

The aim of this repository is:

- to share a copy of DOAJ journal master list (downloaded January 2014) -> [data/doajJournalList.csv](data/doajJournalList.csv)
- to release a Bielefeld University Paid APCs 2012/13 Dataset under an Open Database License -> [data/unibi12-13.csv](data/unibi12-13.csv)
- to demonstrate how reporting on fee-based Open Access publishing can be made more transparent and reproducible

This documentation is written in Markdown and embeds source code written in [R](

You can run ```knit()``` to reproduce this file



The current [DOAJ journal master list]( that can be downloaded as `csv` lacks information on the applicability of author processing charges (APC). Answering Ulrich Herbs concerns in his blog post [The prevalence of Open Access publication fees](, we decided to share our copy that we used for reporting Open Access Gold publications as part of the [DFG Open-Access Publishing Programme]( in January 2014.

### Load DOAJ master list

doaj <- read.csv("data/doajJournalList.csv", header = TRUE, sep =",")
This list contains

registered Open Access journals.

The following columns are available:


### A closer look on APCs

This table summarises whether a journal requires APCs or not:

```{r, results='asis', tab.cap ='DOAJ APCs'}
apc.table <- data.frame(table(unlist(doaj$Publication.fee)))
apc.table$Perc <- apc.table$Freq / sum(apc.table$Freq) * 100
colnames(apc.table) <-c("Fee", "Journals", "Share")
knitr::kable(apc.table[order(apc.table$Journals, decreasing = T),], row.names = FALSE, digits = 2)

You may also wish to explore the growth of DOAJ by APC applicability.

```{r simpleplot, echo=TRUE, fig.height=8/2.54, fig.width=18/2.54, warning = FALSE, message = FALSE}
#exclude journals with blank APC info
doaj <- doaj[!doaj$Publication.fee == "",]
doaj <- droplevels(doaj)
# transform to date object
doaj$ <- as.Date(doaj$, format="%Y-%m-%d")
# relevel
doaj$Publication.fee <- factor (doaj$Publication.fee,
levels = c(rownames(data.frame(rev(sort(table(doaj$Publication.fee)))))))
#prepare df for ggplotting
tt <- data.frame(table(doaj$Publication.fee, doaj$
ggplot2::ggplot(tt, aes(as.Date(Var2),Cumsum, group = Var1)) +
geom_line(aes(colour = Var1, show_guide=FALSE)) +
xlab("Date added to DOAJ") + ylab("Cumulative Sum") +
scale_colour_brewer("Publication Fee",palette=2, type="qual") +
theme_bw(base_size= 16) +

As this little exploration demonstrates, the DOAJ journal title master list is an excellent source (not only) to study the longitudinal growth of Open Access journal publishing. As we have shown, by reusing such a file, you can also make your study reproducible.

We hope that DOAJ will continue its bulk download service in order to provide an convenient way to monitor Open Access journal publishing. At Bielefeld University Library, for instance, we use the `csv` download to answer our DFG reporting requirements.

## Bielefeld University Paid APCs 2012/13 Dataset

With this dataset, we are providing information on paid Article Processing Charges (APC) by the [Open Access Publication Fund of Bielefeld University]( The fund is supported by the German Research Foundation (DFG) under its [Open-Access Publishing Programme](

### Load Data

To load the data in R:
my.df <- read.csv("data/unibi12-13.csv", header = TRUE, sep =",")

The dataset covers the accounting periods 2012 and 2013 (`Period`). For every single article, the `doi` is provided. Journal titles (`Journal`), publisher names (`Publisher`) and the International Standard Serial Numbers (`issn.1` and `issn.2`) are normalized according to CrossRef metadata store. In addition, `PUBID` references the copy stored in the institutional repository [PUB - Publications at Bielefeld University](

With this step, we follow Wellcome Trust example to share data on paid APCs. See:

Neylon, Cameron (2014): Wellcome Trust Article Processing Charges by Article 2012/13.

### Paid APCs

Column `EURO` contains the author processing charge for each paper including VAT.

In total, we paid



DFG funding covered 75%.

sum(my.df$EURO) * 0.75

#### APC per accounting period

Column `EURO` contains the author processing charge paid for each paper in Euro including VAT.

To obtain the yearly costs for 2012 and 2013:

tapply(my.df$EURO, my.df$Period, sum)

#### APC per Publisher

To calculate APC paid per publisher and accounting period and print it as pretty markdown table:

```{r, results='asis', tab.cap ='Bielefeld University publication fund: APCs paid in EURO (incl VAT)'}
# relevel publishers according to sorting
my.df$Publisher <- factor (my.df$Publisher,
levels = c(rownames(data.frame(rev(sort(table(my.df$Publisher)))))))
# calculate and print
knitr::kable(tapply(my.df$EURO, list(my.df$Publisher, my.df$Period), sum))

Mean APCs paid:

```{r, results='asis', tab.cap ='Bielefeld University publication fund: Mean APCs paid in EURO (incl VAT)'}
knitr::kable(tapply(my.df$EURO, list(my.df$Publisher, my.df$Period), mean))

### Licence

The Bielefeld University Paid APCs 2012/13 Dataset is made available under the Open Database License: Any rights in individual contents of the database are licensed under the Database Contents License:

### Contact

Najko Jahn < najko.jahn [ at ] >

Dirk Pieper < dirk.pieper [ at ] >

0 comments on commit 52eac8b

Please sign in to comment.