-
Notifications
You must be signed in to change notification settings - Fork 118
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 52eac8b
Showing
6 changed files
with
10,348 additions
and
0 deletions.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
## About | ||
|
||
The aim of this repository is: | ||
|
||
- to share a copy of DOAJ journal master list (downloaded January 2014) -> [data/doajJournalList.csv](data/doajJournalList.csv) | ||
- to release a Bielefeld University Paid APCs 2012/13 Dataset under an Open Database License -> [data/unibi12-13.csv](data/unibi12-13.csv) | ||
- to demonstrate how reporting on fee-based Open Access publishing can be made more transparent and reproducible | ||
|
||
This documentation is written in Markdown and embeds source code written in [R](http://www.r-project.org/). | ||
|
||
You can run ```knit()``` to reproduce this file | ||
|
||
``` | ||
require(knitr) | ||
knit(readme.Rmd) | ||
``` | ||
|
||
## DOAJ | ||
|
||
The current [DOAJ journal master list](http://doaj.org/faq#metadata) that can be downloaded as `csv` lacks information on the applicability of author processing charges (APC). Answering Ulrich Herbs concerns in his blog post [The prevalence of Open Access publication fees](http://www.scinoptica.com/pages/topics/the-prevalence-of-open-access-publication-fees.php), we decided to share our copy that we used for reporting Open Access Gold publications as part of the [DFG Open-Access Publishing Programme](http://www.dfg.de/en/research_funding/programmes/infratructure/lis/funding_opportunities/open_access_publishing/index.html) in January 2014. | ||
|
||
### Load DOAJ master list | ||
|
||
```{r} | ||
doaj <- read.csv("data/doajJournalList.csv", header = TRUE, sep =",") | ||
``` | ||
This list contains | ||
|
||
```{r} | ||
length(unique(doaj$ISSN)) | ||
``` | ||
registered Open Access journals. | ||
|
||
The following columns are available: | ||
|
||
```{r} | ||
colnames(doaj) | ||
``` | ||
|
||
### A closer look on APCs | ||
|
||
This table summarises whether a journal requires APCs or not: | ||
|
||
```{r, results='asis', tab.cap ='DOAJ APCs'} | ||
apc.table <- data.frame(table(unlist(doaj$Publication.fee))) | ||
apc.table$Perc <- apc.table$Freq / sum(apc.table$Freq) * 100 | ||
colnames(apc.table) <-c("Fee", "Journals", "Share") | ||
knitr::kable(apc.table[order(apc.table$Journals, decreasing = T),], row.names = FALSE, digits = 2) | ||
``` | ||
|
||
You may also wish to explore the growth of DOAJ by APC applicability. | ||
|
||
```{r simpleplot, echo=TRUE, fig.height=8/2.54, fig.width=18/2.54, warning = FALSE, message = FALSE} | ||
#exclude journals with blank APC info | ||
doaj <- doaj[!doaj$Publication.fee == "",] | ||
doaj <- droplevels(doaj) | ||
# transform to date object | ||
doaj$Added.on.date <- as.Date(doaj$Added.on.date, format="%Y-%m-%d") | ||
# relevel | ||
doaj$Publication.fee <- factor (doaj$Publication.fee, | ||
levels = c(rownames(data.frame(rev(sort(table(doaj$Publication.fee))))))) | ||
#prepare df for ggplotting | ||
tt <- data.frame(table(doaj$Publication.fee, doaj$Added.on.date)) | ||
tt$Cumsum<-ave(tt$Freq,tt$Var1,FUN=cumsum) | ||
#plot | ||
ggplot2::ggplot(tt, aes(as.Date(Var2),Cumsum, group = Var1)) + | ||
geom_line(aes(colour = Var1, show_guide=FALSE)) + | ||
xlab("Date added to DOAJ") + ylab("Cumulative Sum") + | ||
scale_colour_brewer("Publication Fee",palette=2, type="qual") + | ||
theme_bw(base_size= 16) + | ||
opts(legend.key=theme_rect(fill="white",colour="white")) | ||
``` | ||
|
||
As this little exploration demonstrates, the DOAJ journal title master list is an excellent source (not only) to study the longitudinal growth of Open Access journal publishing. As we have shown, by reusing such a file, you can also make your study reproducible. | ||
|
||
We hope that DOAJ will continue its bulk download service in order to provide an convenient way to monitor Open Access journal publishing. At Bielefeld University Library, for instance, we use the `csv` download to answer our DFG reporting requirements. | ||
|
||
## Bielefeld University Paid APCs 2012/13 Dataset | ||
|
||
With this dataset, we are providing information on paid Article Processing Charges (APC) by the [Open Access Publication Fund of Bielefeld University](http://oa.uni-bielefeld.de/publikationsfonds.html). The fund is supported by the German Research Foundation (DFG) under its [Open-Access Publishing Programme](http://www.dfg.de/en/research_funding/programmes/infrastructure/lis/funding_opportunities/open_access_publishing/index.html). | ||
|
||
|
||
### Load Data | ||
|
||
To load the data in R: | ||
```{r} | ||
my.df <- read.csv("data/unibi12-13.csv", header = TRUE, sep =",") | ||
head(my.df) | ||
``` | ||
|
||
The dataset covers the accounting periods 2012 and 2013 (`Period`). For every single article, the `doi` is provided. Journal titles (`Journal`), publisher names (`Publisher`) and the International Standard Serial Numbers (`issn.1` and `issn.2`) are normalized according to CrossRef metadata store. In addition, `PUBID` references the copy stored in the institutional repository [PUB - Publications at Bielefeld University](http://pub.uni-bielefeld.de/en). | ||
|
||
With this step, we follow Wellcome Trust example to share data on paid APCs. See: | ||
|
||
Neylon, Cameron (2014): Wellcome Trust Article Processing Charges by Article 2012/13. | ||
[https://github.com/cameronneylon/apcs](https://github.com/cameronneylon/apcs) | ||
|
||
|
||
### Paid APCs | ||
|
||
Column `EURO` contains the author processing charge for each paper including VAT. | ||
|
||
In total, we paid | ||
|
||
```{r} | ||
sum(my.df$EURO) | ||
``` | ||
|
||
for | ||
```{r} | ||
nrow(my.df) | ||
``` | ||
articles. | ||
|
||
|
||
DFG funding covered 75%. | ||
|
||
```{r} | ||
sum(my.df$EURO) * 0.75 | ||
``` | ||
|
||
#### APC per accounting period | ||
|
||
Column `EURO` contains the author processing charge paid for each paper in Euro including VAT. | ||
|
||
To obtain the yearly costs for 2012 and 2013: | ||
|
||
```{r} | ||
tapply(my.df$EURO, my.df$Period, sum) | ||
``` | ||
|
||
|
||
#### APC per Publisher | ||
|
||
To calculate APC paid per publisher and accounting period and print it as pretty markdown table: | ||
|
||
```{r, results='asis', tab.cap ='Bielefeld University publication fund: APCs paid in EURO (incl VAT)'} | ||
# relevel publishers according to sorting | ||
my.df$Publisher <- factor (my.df$Publisher, | ||
levels = c(rownames(data.frame(rev(sort(table(my.df$Publisher))))))) | ||
# calculate and print | ||
knitr::kable(tapply(my.df$EURO, list(my.df$Publisher, my.df$Period), sum)) | ||
``` | ||
|
||
Mean APCs paid: | ||
|
||
```{r, results='asis', tab.cap ='Bielefeld University publication fund: Mean APCs paid in EURO (incl VAT)'} | ||
knitr::kable(tapply(my.df$EURO, list(my.df$Publisher, my.df$Period), mean)) | ||
``` | ||
|
||
### Licence | ||
|
||
The Bielefeld University Paid APCs 2012/13 Dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/ | ||
|
||
### Contact | ||
|
||
Najko Jahn < najko.jahn [ at ] uni-bielefeld.de > | ||
|
||
Dirk Pieper < dirk.pieper [ at ] uni-bielefeld.de > |
Oops, something went wrong.