Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
1374 lines (1074 sloc) 60 KB
---
title: "Authorship Poll Analysis"
author: "Meghan Duffy"
date: "June 15, 2016"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
#What are the current views on last and corresponding authorship in ecology?
## Introduction
Who is the last author on a paper? Is it the person who did the least work? Or is it the PI of the lab where the work was done? When I started grad school (in 2000), the norm in ecology was still that the last author on a paper is the person who did the least work. But, more recently, it has seemed to me that the norm is that the last author on a paper is the "senior" author (usually the PI). However, if you talk with other ecologists about the topic, it's clear that there's variation in views, and that not everyone is on the same page.
This project started out as a poll on the Dynamic Ecology blog, but has led to a bigger project. The code here has been updated to reflect what is in the manuscript on this (in revision for Ecology & Evolution), rather than the original blog posts.
## Literature review
I used a combination of Web of Science data and manually searching through journals to determine the number of authors and corresponding authorship for papers in Ecology from 1956-2016 (every 10 years from 1956-1996, every five years from 2001-2016) and in American Naturalist, Evolution, and Oikos in 2001, 2006, 2011, and 2016.
## Making figures and doing analyses for revised version of manuscript (incorporating suggestions from the reviewers)
```{r, plotting data on number authors over time}
library(stringr)
library(ggplot2)
library(dplyr)
library(cowplot)
#WoSdata <- read.csv("WoSAuthorshipData.csv")
#WoSdata$numberauthors <- (str_count(WoSdata$AU, ";") +1)
#WoSdata$numberemails <- (str_count(WoSdata$EM, ";") +1)
#write.csv(WoSdata, file = "WoSdata.csv")
## moved out of R to do some manipulations, now moving back in
WoSdata <- read.csv("WoSdata2.csv")
WoSdata <- subset(WoSdata, DT == "Article"| DT == "Article; Data Paper" | DT == "Article; Proceedings Paper" | DT == "Review")
newtimeplot <- WoSdata %>%
filter(Journal == c("Ecology")) %>%
ggplot(aes(x=Year,y=numberauthors,fill=Journal,group=interaction(Journal,Year))) +
scale_fill_manual(values=c("white","#999999")) +
geom_boxplot(width=4) +
theme_bw() +
ggtitle("Number of authors of articles in Ecology, 1956-2016") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_x_continuous(breaks=c(1956,1966,1976,1986,1996,2006,2016)) +
ylab("Number of authors")
#newtimeplot
save_plot("newtimeplot.jpg", newtimeplot, base_width = 6, base_height = 4)
newtimeplotalljournals <- WoSdata %>%
filter(Year>2000) %>%
ggplot(aes(x=Year,y=numberauthors,fill=Journal,group=interaction(Journal,Year))) +
scale_fill_manual(values=c("white","#999999","#F0E442","#56B4E9")) +
geom_boxplot(width=4) +
theme_bw() +
ggtitle("Number of authors per article, 2001-2016") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_x_continuous(breaks=c(2001,2006,2011,2016)) +
ylab("Number of authors")
#newtimeplotalljournals
save_plot("newtimeplotalljournals.jpg", newtimeplotalljournals, base_width = 6, base_height = 4)
#combine plot with just ecology for all years with plot for all journals for 2001-2016
newtimeplotgrid2panel <- plot_grid(newtimeplot, newtimeplotalljournals,
labels = c("A", "B"), ncol = 1, nrow = 2,
rel_heights = c(0.5, 0.5))
newtimeplotgrid2panel
save_plot("newtimeplotgrid2panel.jpg", newtimeplotgrid2panel, base_width = 8, base_height = 8)
```
```{r, checking colorblind friendliness of that figure}
#library(devtools)
#devtools::install_github("wilkelab/cowplot")
#install.packages("colorspace", repos = "http://R-Forge.R-project.org")
#devtools::install_github("clauswilke/colorblindr")
library(colorblindr)
cvd_grid(newtimeplotalljournals)
```
```{r, stats for data on number of authors over time}
ecologyWoSdata <- subset(WoSdata, Journal == "Ecology")
ecologyfit <- glm(numberauthors ~ Year, family = poisson(), data = ecologyWoSdata)
summary(ecologyfit)
recentWoSdata <- subset(WoSdata, Year > 2000)
recentfit <- glm(numberauthors ~ Year + Journal + Year*Journal, family = poisson(), data = recentWoSdata)
summary(recentfit)
recentfitnointeraction <- glm(numberauthors ~ Year + Journal, family = poisson(), data = recentWoSdata)
# test for significant interaction
anova(recentfit,recentfitnointeraction, test="Chi")
# test for effect of journal
recentfitnojournal <- glm(numberauthors ~ Year, family = poisson(), data = recentWoSdata)
anova(recentfitnojournal,recentfitnointeraction, test="Chi")
# test for effect of year
recentfitnoyear <- glm(numberauthors ~ Journal, family = poisson(), data = recentWoSdata)
anova(recentfitnoyear,recentfitnointeraction, test="Chi")
## summary statistics
WoSdata %>%
filter(Journal == c("Ecology")) %>%
group_by(Year) %>%
summarise(median(numberauthors),mean(numberauthors))
```
```{r, figure showing corresponding authorship over time}
###Important note: original WoS data was inaccurate for 2001 for all journals (lots of papers without email addresses when there was one on the actual article) and for 2006 for AmNat (lots of papers with "all" email addresses, when most of those papers had one designated for correspondence). So, I manually went through the first 900 pages of each journal in 2001 (and in 2006 for AmNat) to record authorship. I did not record the number of authors while doing this, so this dataset below (with "Corresponding" at the end) cannot be used to make a figure related to number of authors.
WoSdataCor <- read.csv("WoSdataCorresponding.csv")
WoSdataCor <- subset(WoSdataCor, DT == "article" | DT == "Article"| DT == "Article; Data Paper" | DT == "Article; Proceedings Paper" | DT == "brief communication" | DT == "briefcomm" | DT == "comment" | DT == "e-article" | DT == "forum" | DT == "letter" | DT == "nathistmisc" | DT == "note" | DT == "Note" | DT == "opinion" | DT == "perspective" | DT == "report" | DT == "review" | DT == "Review" | DT == "synthesis")
recentWoSdataCor <- subset(WoSdataCor, Year > 2000)
newtimesummary <- recentWoSdataCor %>%
filter(Correspondence != "unknown") %>%
group_by(Journal, Year,Correspondence) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
# summary statistics for 2016 (averaged across all papers; see newtimesummary for stats broken down by journal)
recentWoSdataCor %>%
filter(Correspondence != "unknown") %>%
filter(Year == 2016) %>%
group_by(Correspondence) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
Journal = c("AmNat","AmNat","AmNat","AmNat","AmNat","AmNat","Ecology","Ecology","Ecology","Ecology","Ecology","Ecology","Ecology","Ecology","Ecology","Evolution","Evolution","Oikos","Oikos","Oikos","Oikos","Oikos","Oikos","Oikos","Oikos","Oikos","Oikos")
Year = c(2001,2001,2011,2011,2016,2016,2001,2001,2006,2006,2011,2011,2016,2016,2016,2006,2016,2001,2001,2006,2006,2011,2011,2011,2016,2016,2016)
Correspondence = c("middle","ND","all","ND","all","ND","all","other","all","other","all","other","all","ND","other","ND","ND","all","other","all","ND","all","other","ND","all","other","ND")
n = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
rel.freq = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
missingdfnew <- data.frame(Journal, Year, Correspondence, n, rel.freq)
newtimesummary <- bind_rows(newtimesummary,missingdfnew)
newtimesummaryamnat <- newtimesummary %>%
filter(Journal == c("AmNat"))
newtimecorrlineplotamnat <- newtimesummaryamnat %>%
filter(Year>2000) %>%
mutate(Correspondence = as.character(Correspondence)) %>%
mutate(Correspondence = factor(Correspondence, levels=c("first","middle","last","ND","all","other"))) %>%
ggplot(aes(x=Year,y=rel.freq,color= Correspondence)) +
geom_line(stat='identity',aes(linetype=Correspondence),size=1) +
geom_point(aes(shape=Correspondence), size=4) +
ggtitle("American Naturalist") +
scale_x_continuous(breaks=c(2001,2006,2011,2016)) +
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73", "#000000","#0072B2")) +
ylab("Percent") +
ylim(0,100)
newtimesummaryecol <- newtimesummary %>%
filter(Journal == c("Ecology"))
newtimecorrlineplotecol <- newtimesummaryecol %>%
filter(Year>2000) %>%
mutate(Correspondence = as.character(Correspondence)) %>%
mutate(Correspondence = factor(Correspondence, levels=c("first","middle","last","ND","all","other"))) %>%
ggplot(aes(x=Year,y=rel.freq,color= Correspondence)) +
geom_line(stat='identity',aes(linetype=Correspondence),size=1) +
geom_point(aes(shape=Correspondence), size=4) +
ggtitle("Ecology") +
scale_x_continuous(breaks=c(2001,2006,2011,2016)) +
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73", "#000000","#0072B2")) +
ylab("Percent") +
ylim(0,100)
newtimesummaryevol <- newtimesummary %>%
filter(Journal == c("Evolution"))
newtimecorrlineplotevol <- newtimesummaryevol %>%
filter(Year>2000) %>%
mutate(Correspondence = as.character(Correspondence)) %>%
mutate(Correspondence = factor(Correspondence, levels=c("first","middle","last","ND","all","other"))) %>%
ggplot(aes(x=Year,y=rel.freq,color= Correspondence)) +
geom_line(stat='identity',aes(linetype=Correspondence),size=1) +
geom_point(aes(shape=Correspondence), size=4) +
ggtitle("Evolution") +
scale_x_continuous(breaks=c(2001,2006,2011,2016)) +
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73", "#000000","#0072B2")) +
ylab("Percent") +
ylim(0,100)
newtimesummaryoikos <- newtimesummary %>%
filter(Journal == c("Oikos"))
newtimecorrlineplotoikos <- newtimesummaryoikos %>%
filter(Year>2000) %>%
mutate(Correspondence = as.character(Correspondence)) %>%
mutate(Correspondence = factor(Correspondence, levels=c("first","middle","last","ND","all","other"))) %>%
ggplot(aes(x=Year,y=rel.freq,color= Correspondence)) +
geom_line(stat='identity',aes(linetype=Correspondence),size=1) +
geom_point(aes(shape=Correspondence), size=4) +
ggtitle("Oikos") +
scale_x_continuous(breaks=c(2001,2006,2011,2016)) +
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73", "#000000","#0072B2")) +
ylab("Percent") +
ylim(0,100)
newfourpanellineplot <- ggdraw() +
draw_plot(newtimecorrlineplotamnat,0,0.75,1.0,0.25) +
draw_plot(newtimecorrlineplotecol,0,0.50,1.0,0.25) +
draw_plot(newtimecorrlineplotevol,0,0.25,1.0,0.25) +
draw_plot(newtimecorrlineplotoikos,0,0,1.0,0.25)
newfourpanellineplot
save_plot("newfourpanellineplot.jpg", newfourpanellineplot, base_width = 8, base_height = 13)
```
```{r, stats for first as corresponding author position over time}
#step 1: make corresponding first author into a binary variable, analyze that to look at differences in that between journals and across time.
recentWoSdataCor$Cor01 <- ifelse(recentWoSdataCor$Correspondence=="first",1,0)
corrfit <- glm(Cor01 ~ Year + Journal + Year*Journal, family = binomial(), data = recentWoSdataCor)
summary(corrfit)
# test for significant interaction
corrfitnointeraction <- glm(Cor01 ~ Year + Journal, family = binomial(), data = recentWoSdataCor)
anova(corrfit,corrfitnointeraction, test="Chi")
# test for effect of journal
corrfitnojournal <- glm(Cor01 ~ Year, family = binomial(), data = recentWoSdataCor)
anova(corrfitnojournal,corrfitnointeraction, test="Chi")
# test for effect of year
corrfitnoyear <- glm(Cor01 ~ Journal, family = binomial(), data = recentWoSdataCor)
anova(corrfitnoyear,corrfitnointeraction, test="Chi")
```
```{r, stats for last as corresponding author position over time}
#step 1: make corresponding last author into a binary variable, analyze that to look at differences in that between journals and across time.
recentWoSdataCor$LastCor01 <- ifelse(recentWoSdataCor$Correspondence=="last",1,0)
lastcorrfit <- glm(LastCor01 ~ Year + Journal + Year*Journal, family = binomial(), data = recentWoSdataCor)
summary(lastcorrfit)
# test for significant interaction
lastcorrfitnointeraction <- glm(LastCor01 ~ Year + Journal, family = binomial(), data = recentWoSdataCor)
anova(lastcorrfit,lastcorrfitnointeraction, test="Chi")
# test for effect of journal
lastcorrfitnojournal <- glm(LastCor01 ~ Year, family = binomial(), data = recentWoSdataCor)
anova(lastcorrfitnojournal,lastcorrfitnointeraction, test="Chi")
# test for effect of year
lastcorrfitnoyear <- glm(LastCor01 ~ Journal, family = binomial(), data = recentWoSdataCor)
anova(lastcorrfitnoyear,lastcorrfitnointeraction, test="Chi")
```
```{r, looking at what covaries with corresponding authorship in the literature}
# import new data set that combines information on corresponding author position with information on where the reprint author lives (lined these up in excel; original files downloaded from WoS didn't have reprint author addresses in all cases)
# did this just using 2016 data, since interested in current practices for this
#WoSdata2016 <- read.csv("WoSdata2016.csv")
#WoSdata2016 <- subset(WoSdata2016, DT == "Article" | DT == "Article; Data Paper" | DT == "Review")
#WoSdata2016$Cor01 <- ifelse(WoSdata2016$Correspondence=="first",1,0)
# I don't think this required the stringi package and have deleted the line of code loading that. If this is broken in the future, that would be a place to start trouble-shooting
#WoSdata2016$Country <- word(WoSdata2016$RP, -1)
#WoSdata2016$Country <- as.factor(WoSdata2016$Country)
## moving it out to excel to check things and put into regions
#write.csv(WoSdata2016, file = "WoSdata2016Country.csv")
WoSdata2016Regions <- read.csv("WoSdata2016CountryRegion.csv")
WoSdata2016Regions$LastCor01 <- ifelse(WoSdata2016Regions$Correspondence=="last",1,0)
WoSdata2016Regions$Cor01 <- ifelse(WoSdata2016Regions$Correspondence=="first",1,0)
WoSdata2016EurNA <- subset(WoSdata2016Regions, Region == "Europe" | Region == "North America")
WoSdata2016RegionsEnough <- subset(WoSdata2016Regions, Region == "Europe" | Region == "North America" | Region == "Asia" | Region == "Oceania")
####
# Note: there are several different columns related to "region", since it's not clear whether it makes more sense to go based on continents or more based on cultural differences (e.g., should there be a Latin America region, or should those countries be divided between North America and South America, based on continental plates?) One option was to go with World Bank regions, but that combined Oceania & Asia, which didn't seem to make as much sense from a scientific culture perspective. In the end, I decided to stick with regions that mostly correspond to continents (with the exception being that New Zealand is included with Australia in "Oceania".)
####
regionsummarylast <- WoSdata2016Regions %>%
group_by(Region,LastCor01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
regionsummarylast
regionsummaryfirst <- WoSdata2016Regions %>%
group_by(Region,Cor01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
regionsummaryfirst
# created a table manually in excel
authorshiptableregion <- read.csv("authorshiptableregion.csv")
regionplot <- authorshiptableregion %>%
ggplot(aes(x=Region.short,y=rel.freq,color=Correspondence,fill=Correspondence)) +
geom_bar(stat="identity",position=position_dodge()) +
ylab("Percentage of corresponding authors \nwho are first or last author") +
xlab("Region") +
ggtitle("Analysis of corresponding authorship of papers published in 2016 \nin American Naturalist, Ecology, Evolution, and Oikos") +
annotate("text", x = 1, y = 105, label = "n = 10", size=3.5) +
annotate("text", x = 2, y = 105, label = "n = 52", size=3.5) +
annotate("text", x = 3, y = 105, label = "n = 296", size=3.5) +
annotate("text", x = 4, y = 105, label = "n = 438", size=3.5) +
annotate("text", x = 5, y = 105, label = "n = 70", size=3.5) +
annotate("text", x = 6, y = 105, label = "n = 25", size=3.5) +
scale_color_manual(values=c("#999999", "#56B4E9")) +
scale_fill_manual(values=c("#999999", "#56B4E9"))
regionplot
regionfitlastEurNA <- glm(LastCor01 ~ Region, family = binomial(link=logit), data = WoSdata2016EurNA)
summary(regionfitlastEurNA)
regionfitfirstEurNA <- glm(Cor01 ~ Region, family = binomial(link=logit), data = WoSdata2016EurNA)
summary(regionfitfirstEurNA)
regionfitlastenough <- glm(LastCor01 ~ Region, family = binomial(link=logit), data = WoSdata2016RegionsEnough)
summary(regionfitlastenough)
regionfitfirstenough <- glm(Cor01 ~ Region, family = binomial(link=logit), data = WoSdata2016RegionsEnough)
summary(regionfitfirstenough)
```
```{r, now looking at how number of authors relates to first and last corresponding authorship}
# First, getting rid of papers with only 1 author because there's no potential for last authorship there (based on my definition of last authorship)
WoSdata2016RegionsNot1 <- subset(WoSdata2016Regions, numberauthors>1)
## Looking at things related to numbers of authors:
numberauthorfitfirstordered <- glm(Cor01 ~ ordered(binnedauthors), family = binomial(link="logit"), data = WoSdata2016RegionsNot1)
summary(numberauthorfitfirstordered)
numberauthorfitlastordered <- glm(LastCor01 ~ ordered(binnedauthors), family = binomial(link="logit"), data = WoSdata2016RegionsNot1)
summary(numberauthorfitlastordered)
#What if bin from 7 authors & up (vs. 10 & up, which previous analysis did)
numberauthorfitfirstordered7 <- glm(Cor01 ~ ordered(binnedauthors7), family = binomial(link="logit"), data = WoSdata2016RegionsNot1)
summary(numberauthorfitfirstordered7)
numberauthorfitlastordered7 <- glm(LastCor01 ~ ordered(binnedauthors7), family = binomial(link="logit"), data = WoSdata2016RegionsNot1)
summary(numberauthorfitlastordered7)
numberauthorcorrplot <- WoSdata2016Regions %>%
ggplot(aes(x=numberauthors,y=Cor01)) +
geom_point(alpha=0.1) +
ggtitle("Number of authors vs. last author as corresponding author") +
ylab("Is the last author the corresponding author?")
numberauthorcorrplot
numberauthorsummary <- WoSdata2016Regions %>%
group_by(binnedauthors) %>%
summarise(n=n())
numberauthorsummary
WoSdata2016Regions %>%
group_by(binnedauthors) %>%
summarise(n=n())
numberauthorsummarylast <- WoSdata2016Regions %>%
group_by(binnedauthors,LastCor01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
numberauthorsummarylast
numberauthorsummaryfirst <- WoSdata2016Regions %>%
group_by(binnedauthors,Cor01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
numberauthorsummaryfirst
# Now with 7+ instead of 10+ as the break
numberauthorsummarylast7 <- WoSdata2016Regions %>%
group_by(binnedauthors7,LastCor01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
numberauthorsummarylast7
numberauthorsummaryfirst7 <- WoSdata2016Regions %>%
group_by(binnedauthors7,Cor01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
numberauthorsummaryfirst7
# created a table manually in excel
authorshiptable <- read.csv("authorshiptable.csv")
authorshiptable$Numberauthors <- factor(authorshiptable$Numberauthors,
c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10+"))
authorshiptable$Numberauthors7 <- factor(authorshiptable$Numberauthors7,
c("0", "1", "2", "3", "4", "5", "6", "7+"))
numberauthorscorrplotv2 <- authorshiptable %>%
ggplot(aes(x=Numberauthors,y=rel.freq,color=Correspondence,fill=Correspondence)) +
geom_bar(stat="identity",position=position_dodge()) +
ylab("Percentage of corresponding authors \nwho are first or last author") +
xlab("Number of authors") +
annotate("text", x = 1, y = 105, label = "48", size=3.5) +
annotate("text", x = 2, y = 105, label = "196", size=3.5) +
annotate("text", x = 3, y = 105, label = "185", size=3.5) +
annotate("text", x = 4, y = 105, label = "150", size=3.5) +
annotate("text", x = 5, y = 105, label = "114", size=3.5) +
annotate("text", x = 6, y = 105, label = "74", size=3.5) +
annotate("text", x = 7, y = 105, label = "45", size=3.5) +
annotate("text", x = 8, y = 105, label = "23", size=3.5) +
annotate("text", x = 9, y = 105, label = "11", size=3.5) +
annotate("text", x = 10, y = 105, label = "45", size=3.5) +
scale_color_manual(values=c("#999999", "#56B4E9")) +
scale_fill_manual(values=c("#999999", "#56B4E9"))
numberauthorscorrplotv2
numberauthorscorrplotv7 <- authorshiptable %>%
ggplot(aes(x=Numberauthors,y=rel.freq,color=Correspondence,fill=Correspondence)) +
geom_bar(stat="identity",position=position_dodge()) +
ylab("Percentage of corresponding authors \nwho are first or last author") +
xlab("Number of authors") +
annotate("text", x = 1, y = 105, label = "48", size=3.5) +
annotate("text", x = 2, y = 105, label = "196", size=3.5) +
annotate("text", x = 3, y = 105, label = "185", size=3.5) +
annotate("text", x = 4, y = 105, label = "150", size=3.5) +
annotate("text", x = 5, y = 105, label = "114", size=3.5) +
annotate("text", x = 6, y = 105, label = "74", size=3.5) +
annotate("text", x = 7, y = 105, label = "124", size=3.5) +
scale_color_manual(values=c("#999999", "#56B4E9")) +
scale_fill_manual(values=c("#999999", "#56B4E9"))
numberauthorscorrplotv7
```
```{r, combined plot using WoS data to look at regional diffs and effects of number of authors}
WoSCorrespondencePlot <- plot_grid(regionplot, numberauthorscorrplotv2, labels = c("A", "B"), nrow = 2, align = "v")
WoSCorrespondencePlot
save_plot("WoSCorrespondencePlot.jpg", WoSCorrespondencePlot, base_width = 8, base_height = 12)
```
## The poll
The poll had four main questions:
1. For ecology papers, do you consider the last author to be the senior author?
2. Which of the following statements most closely matches the current norms in ecology in terms of who is corresponding author?
3. Which of the following statements would be best practice in terms of who is corresponding author?
4. If someone includes a statement on his/her CV indicating they have used a first/last author emphasis, do you pay attention to that?
It also asked about the respondent's primary research area, whether their research is primarily basic or applied, how frequently they conduct interdisciplinary research, how many years post-PhD they are, where they live, and what their current department is.
The poll first appeared on 6 April 2016 and ran for two weeks.
### Data manipulation related to the poll
Four blank entries were deleted. I am a bad person and used excel to add in numeric codes for the different answers. The key is:
For the question about whether last author is the senior author:
```
1 = No
2 = It depends, but probably no
3 = Not sure, but probably no
4 = Not sure, but probably yes
5 = It depends, but probably yes
6 = Yes
```
For the question about current corresponding author practices:
```
1 = The corresponding author is the person that has taken responsibility for fielding questions about the paper post-publication
2 = The corresponding author is the person with the most stable contact info and/or internet access
3 = The corresponding author is usually the person who uploaded the files (usually the first author)
4 = The corresponding author is usually the senior author
5 = The corresponding author uploaded the files, managed the revisions and wrote the response to reviewers, and took responsibility for the paper after publication
```
For the question about best corresponding author practices:
```
1 = The corresponding author should be the person that has taken responsibility for fielding questions about the paper post-publication
2 = The corresponding author should be the person with the most stable contact info and/or internet access
3 = The corresponding author should be whichever person uploaded the files (usually the first author)
4 = The corresponding author should be the senior author
5 = The corresponding author should be the person who uploaded the files, managed the revisions and wrote the response to reviewers, and took responsibility for the paper after publication
```
For the question about the CV statement:
```
1 = No
2 = I have never seen this, but would probably not pay attention to it
3 = I have never seen this, but would probably pay attention to it
4 = Yes
```
For the question about research area:
```
1 = Ecology (primarily field-based)
2 = Ecology (primarily wet-lab based, including molecular ecology)
3 = Ecology (primarily computational-based)
4 = Evolutionary biology (primarily molecular)
5 = Evolutionary biology (primarily organismal)
6 = Biology other than EEB
7 = Outside biology
```
For the basic vs. applied question:
```
1 = basic
2 = applied
```
For the interdisciplinarity question:
```
1 = Never
2 = Rarely
3 = Sometimes
4 = Often
5 = Always
```
Years since PhD
```
1 = 0 (current students and people without a PhD should choose this)
2 = 1-5
3 = 6-10
4 = 11-15
5 = 16-20
6 = >20
7 = I do not have a PhD and am not a current student
```
Where live?
```
1 = Africa
2 = Asia
3 = Australia
4 = Europe
5 = North America
6 = South America
```
Department:
```
1 = An EEB department (or similar)
2 = A biology department
3 = A natural resources department (or similar)
4 = other
```
### Basic overview of responses
After removing the four blank responses, there were 1122 responses to the poll. What did the respondents look like?
```{r, import, message=FALSE, warning=FALSE}
# Import data
polldata <- read.csv("AuthorshipPollResults.csv", na.strings=".")
# load libraries needed to run code
library(knitr)
require(likert)
library(reshape2)
library(magrittr)
library(forcats)
library(lubridate)
library(gridExtra)
```
```{r}
PrimaryResearchArea <-
polldata %>%
filter(!is.na(PrimaryResearch)) %>%
group_by(PrimaryResearch) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
kable(PrimaryResearchArea, caption = "Primary Research Area of Respondents")
BasicAppliedSplit <-
polldata %>%
filter(!is.na(BasicApplied)) %>%
group_by(BasicApplied) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
kable(BasicAppliedSplit)
Interdisciplinarity <-
polldata %>%
filter(!is.na(Interdisciplinary)) %>%
group_by(Interdisciplinary) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
Reorder <- function(x, ordering=c(1,3,5,4,2))
factor(x, levels(x)[ordering])
Interdisciplinarity$Interdisciplinary <- Reorder(Interdisciplinarity$Interdisciplinary)
kable(Interdisciplinarity[order(Interdisciplinarity$Interdisciplinary), ])
YearsPostPhD <-
polldata %>%
filter(!is.na(YearssincePhD)) %>%
group_by(YearssincePhD) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
Reorder <- function(x, ordering=c(2,6,3,4,5,1,7))
factor(x, levels(x)[ordering])
YearsPostPhD$YearssincePhD <- Reorder(YearsPostPhD$YearssincePhD)
kable(YearsPostPhD[order(YearsPostPhD$YearssincePhD), ])
#This one has problem of rendering as dates instead of year ranges. Need to fix that.
Continent <-
polldata %>%
filter(!is.na(WhereLive)) %>%
group_by(WhereLive) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
kable(Continent)
Department <-
polldata %>%
filter(!is.na(Dept01)) %>%
group_by(Dept01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
kable(Department)
```
### Results for the four main questions
### Q1: "For ecology papers, do you consider the last author to be the senior author?"
```{r, q1-fig}
LastSeniorSum <-
polldata %>%
filter(!is.na(LastSenior01)) %>%
group_by(LastSenior01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
LastSeniorSum
#Note: most frequent response (option 6) is 43%
#Sum of all the "yes" options = 86%
as.factor(LastSeniorSum$LastSenior01)
require(cowplot)
LastSeniorSum$LastSenior01c <- as.factor(LastSeniorSum$LastSenior01)
lastseniorplot <- ggplot(LastSeniorSum,aes(x=LastSenior01c,y=rel.freq,fill=LastSenior01c)) +
geom_bar(stat="identity") +
scale_fill_manual(values=c("#D8B365","#DFC27D","#F6E8C3","#C7EAE5","#80CDC1","#5AB4AC")) +
ylab("Percent of responses") +
theme(legend.position="none") +
coord_flip() +
theme(axis.text.y = element_blank()) +
annotate("text", x = 1, y = 9, label = "No", hjust=0, size=3.5) +
annotate("text", x = 2, y = 9, label = "It depends, but probably no", hjust=0, size=3.5) +
annotate("text", x = 3, y = 9, label = "Not sure, but probably no", hjust=0, size=3.5) +
annotate("text", x = 4, y = 9, label = "Not sure, but probably yes", hjust=0, size=3.5) +
annotate("text", x = 5, y = 01, label = "It depends, but probably yes", hjust=0, size=3.5) +
annotate("text", x = 6, y = 01, label = "Yes", hjust=0, size=3.5) +
ggtitle("For ecology papers, \ndo you consider the last author \nto be the senior author?") +
theme(plot.title=element_text(size=12)) +
theme(axis.title.y=element_blank()) +
theme(axis.title.x=element_text(size=12)) +
theme(axis.ticks.y = element_blank())
lastseniorplot
save_plot("lastseniorplot.jpg", lastseniorplot)
```
### Q2: "Which of the following statements most closely matches the current norms in ecology in terms of who is corresponding author?"
```{r, q2}
CurrentCorrespondingSum <-
polldata %>%
filter(!is.na(CorrespondingCurrent01)) %>%
group_by(CorrespondingCurrent01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
CurrentCorrespondingSum
#most frequent answer (option 5) = 54%
#option 3 = 19%, option 1 = 16%
```
### Q3: "Which of the following statements would be best practice in terms of who is corresponding author?"
```{r, q3}
BestCorrespondingSum <-
polldata %>%
filter(!is.na(CorrespondingBest01)) %>%
group_by(CorrespondingBest01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
BestCorrespondingSum
#most frequent answer (option 5) = 61%
#option 1 = 24%
```
```{r, plot comparing current and best practices}
BestCorrespondingSum$Corresponding01 <- BestCorrespondingSum$CorrespondingBest01
BestCorrespondingSum$bestcurrent <- c("best","best","best","best","best")
CurrentCorrespondingSum$Corresponding01 <- CurrentCorrespondingSum$CorrespondingCurrent01
CurrentCorrespondingSum$bestcurrent <- c("current","current","current","current","current")
CombinedCorrespondingSum <- bind_rows(BestCorrespondingSum,CurrentCorrespondingSum)
combinedcorrespondingplot <- ggplot(CombinedCorrespondingSum,aes(x=Corresponding01,y=rel.freq,fill= bestcurrent)) +
geom_bar(stat="identity",position=position_dodge()) +
ylab("Percent of responses") +
coord_flip() +
theme(axis.text.y = element_blank()) +
annotate("text", x = 1, y = 24.5, label = "The corresponding author is/should be the \nperson that has taken responsibility for fielding \nquestions about the paper post-publication",hjust=0,size=4) +
annotate("text", x = 2, y = 5, label = "The corresponding author is/should be the person with the most stable \ncontact info and/or internet access",hjust=0,size=4) +
annotate("text", x = 3, y = 20, label = "The corresponding author is/should be whichever \nperson uploaded the files (usually the first author)",hjust=0,size=4) +
annotate("text", x = 4, y = 8, label = "The corresponding author is/should be the senior author",hjust=0,size=4) +
annotate("text", x = 5, y = 1, color="black", label = "The corresponding author is/should be the person who uploaded \nthe files, managed the revisions and wrote the response \nto reviewers, and took responsibility for the paper after publication",hjust=0,size=4) +
theme(plot.title=element_text(size=12)) +
theme(axis.title.y=element_blank()) +
theme(axis.title.x=element_text(size=12)) +
theme(axis.ticks.y = element_blank()) +
scale_fill_manual(values=c("#999999", "#56B4E9")) +
guides(fill = guide_legend(reverse=TRUE))
combinedcorrespondingplot
save_plot("combinedcorrespondingplot.jpg", combinedcorrespondingplot, base_width = 8, base_height = 6)
## NOTE: The above figure is Figure 6 for the manuscript.
```
### Q4: "If someone includes a statement on his/her CV indicating they have used a first/last author emphasis, do you pay attention to that?"
```{r, q4-fig}
CVStatementSum <-
polldata %>%
filter(!is.na(CVStatement01)) %>%
group_by(CVStatement01) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
CVStatementSum
#most people haven't seen such a statement (options 2+3 = 70%)
#most people either pay attention to such a statement or think they would: (options 3+4 = 71%)
#but the proportion of people who would ignore such a statement isn't exactly trivial!
CVStatementSum$CVStatement01c <- as.factor(CVStatementSum$CVStatement01)
CVstatementplot <- ggplot(CVStatementSum,aes(x=CVStatement01c,y=rel.freq,fill=CVStatement01c)) +
geom_bar(stat="identity") +
scale_fill_manual(values=c("#D8B365","#F6E8C3","#C7EAE5","#5AB4AC")) +
ylab("Percent of responses") +
theme(legend.position="none") +
coord_flip() +
theme(axis.text.y = element_blank()) +
annotate("text", x = 1, y = 1, color="black", label = "No",hjust=0,size=3.5) +
annotate("text", x = 2, y = 1, color="black", label = "I have never seen this, \nbut would probably not \npay attention to it",hjust=0,size=3.5) +
annotate("text", x = 3, y = 1, color="black", label = "I have never seen this, but would probably \npay attention to it",hjust=0,size=3.5) +
annotate("text", x = 4, y = 1, color="black", label = "Yes",hjust=0,size=3.5) +
ggtitle("If someone includes a statement on his/her \nCV indicating they have used a first/last \nauthor emphasis, do you pay attention to that?") +
theme(plot.title=element_text(size=12)) +
theme(axis.title.y=element_blank()) +
theme(axis.title.x=element_text(size=12)) +
theme(axis.ticks.y = element_blank())
save_plot("CVstatementplot.jpg", CVstatementplot)
CVstatementplot
```
```{r, combined Figure 4 for paper}
figure4 <- plot_grid(lastseniorplot, CVstatementplot, labels = c("A", "B"), nrow = 2, align = "v")
figure4
save_plot("figure4.jpg", figure4, base_width = 4, base_height = 8)
```
## Looking at cross-tabs
### Does whether people view the last author as the senior author vary based on age, country, research area, and/or department?
```{r, cross tabs related to last authorship}
## first, making a "molecular" variable
polldata$molecular <- ifelse(polldata$PrimaryResearch01 == 2 | polldata$PrimaryResearch == 4, "molecular",
ifelse(polldata$PrimaryResearch01 == 1 | polldata$PrimaryResearch == 5,"organismal","other"
))
## make an ecology vs. evolution variable:
polldata$ecoevo <- ifelse(polldata$PrimaryResearch01 < 4, "ecology",
ifelse(polldata$PrimaryResearch01 == 4 | polldata$PrimaryResearch == 5,"evolution","other"
))
# next, make a variable for department type
polldata$depttype <- ifelse(polldata$Dept01 == 1, "eeb",
ifelse(polldata$Dept01 == 2, "bio",
ifelse(polldata$Dept01 == 3, "natres","other"
)))
## subsetting only the last author data, but first filter by the grouping variables of interest
## then renaming factors, renaming question
lastdata <-
polldata %>%
filter(CorrespondingBest01 != "NA", LastSenior != "NA",
WhereLive != "NA", BasicApplied != "NA",
depttype != "NA", PrimaryResearch != "NA", Inter01 != "NA") %>%
select(LastSenior)
# order the data
lastdata$LastSenior <- factor(lastdata$LastSenior,
c("No",
"Not sure, but probably no",
"It depends, but probably no",
"It depends, but probably yes",
"Not sure, but probably yes",
"Yes"))
# changing the column name to the question
colnames(lastdata)[1] <- "Is the last author the senior author?"
## subsetting the likert data AND the grouping variables
lastdata_grouping <-
polldata %>%
filter(CorrespondingBest01 != "NA", LastSenior != "NA",
WhereLive != "NA", BasicApplied != "NA",
depttype != "NA", PrimaryResearch != "NA", Inter01 != "NA") %>%
select(molecular,
WhereLive, BasicApplied,
depttype, PrimaryResearch,
ecoevo, Interdisciplinary, YearssincePhD)
## ordering the grouping factors
## then renaming the primary research responses
lastdata_grouping$molecular <- factor(lastdata_grouping$molecular,
c("molecular",
"organismal",
"other"))
lastdata_grouping$Interdisciplinary <- factor(lastdata_grouping$Interdisciplinary,
c("Always",
"Often",
"Sometimes",
"Rarely",
"Never"))
levels(lastdata_grouping$PrimaryResearch) <- c("Biology Other", "Comp Ecol", "Field Ecol", "Lab Mol Ecol", "Mol Evol", "Organismal Evol", "Other")
lastdata_grouping$YearssincePhD <- lastdata_grouping %>%
use_series(YearssincePhD) %>%
plyr::mapvalues(., c(">20","0 (current students should choose this)","10-Jun","15-Nov", "16-20", "5-Jan", "I do not have a PhD and am not a current student"), c(">20", "0","6-10","11-15", "16-20", "1-5", "no PhD, not student"))
#lastdata_grouping$PrimaryResearch <- lastdata_grouping %>%
# use_series(PrimaryResearch) %>%
# plyr::mapvalues(., c("Biology other than EEB","Ecology (primarily computational-based)",
# "Ecology (primarily field-based)","Ecology (primarily wet-lab based, including molecular ecology)",
# "Evolutionary biology (primarily molecular)", "Evolutionary biology (primarily organismal)", "Outside biology"),
# c("Biology Other", "Comp Ecol","Field Ecol","Lab Mol Ecol", "Mol Evol", "Organismal Evol", "Other"))
lastdata_grouping$YearssincePhD <- factor(lastdata_grouping$YearssincePhD,
c("0",
"1-5",
"6-10",
"11-15",
"16-20",
">20",
"no PhD, not student"))
```
```{r, q1-4panelcrosstabfig, warning=FALSE}
counts = lastdata_grouping %>%
count(depttype) %>%
mutate(variable=NA)
colnames(lastdata)[1] <- "Department type"
likert_32 <- likert(lastdata, grouping = lastdata_grouping$depttype)
depttypelikertplot = plot(likert_32) +
geom_text(data=counts,
aes(label=format(n,big.mark=","), x=depttype, y=145),
size=3, colour="black", hjust=1) +
scale_y_continuous(limits=c(-100,150)) +
coord_flip(ylim=c(-110,110)) +
theme(plot.margin=unit(c(0.2,2,0.2,0.2),"cm"))
depttypelikertplot <- ggplot_gtable(ggplot_build(depttypelikertplot))
depttypelikertplot$layout$clip <- "off"
depttypelikertplot.mpg <- plot(depttypelikertplot)
plot(depttypelikertplot)
counts = lastdata_grouping %>%
count(PrimaryResearch) %>%
mutate(variable=NA)
colnames(lastdata)[1] <- "Primary Research Area"
likert_52 <- likert(lastdata, grouping = lastdata_grouping$PrimaryResearch)
primaryresearchlikertplot = plot(likert_52) +
geom_text(data=counts,
aes(label=format(n,big.mark=","), x=PrimaryResearch, y=145),
size=3, colour="black", hjust=1) +
scale_y_continuous(limits=c(-100,150)) +
coord_flip(ylim=c(-110,110)) +
theme(plot.margin=unit(c(0.2,2,0.2,0.2),"cm")) + theme(legend.position = "none")
primaryresearchlikertplot <- ggplot_gtable(ggplot_build(primaryresearchlikertplot))
primaryresearchlikertplot$layout$clip <- "off"
counts = lastdata_grouping %>%
count(YearssincePhD) %>%
mutate(variable=NA)
colnames(lastdata)[1] <- "Years since PhD"
likert_92 <- likert(lastdata, grouping = lastdata_grouping$YearssincePhD)
yearslikertplot = plot(likert_92) +
geom_text(data=counts,
aes(label=format(n,big.mark=","), x=YearssincePhD, y=145),
size=3, colour="black", hjust=1) +
scale_y_continuous(limits=c(-100,150)) +
coord_flip(ylim=c(-110,110)) +
theme(plot.margin=unit(c(0.2,2,0.2,0.2),"cm")) + theme(legend.position = "none")
yearslikertplot <- ggplot_gtable(ggplot_build(yearslikertplot))
yearslikertplot$layout$clip <- "off"
counts = lastdata_grouping %>%
count(WhereLive) %>%
mutate(variable=NA)
colnames(lastdata)[1] <- "Geographic location"
likert_22 <- likert(lastdata, grouping = lastdata_grouping$WhereLive)
wherelivelikertplot = plot(likert_22) +
geom_text(data=counts,
aes(label=format(n,big.mark=","), x=WhereLive, y=145),
size=3, colour="black", hjust=1) +
scale_y_continuous(limits=c(-100,150)) +
coord_flip(ylim=c(-110,110)) +
theme(plot.margin=unit(c(0.2,2,0.2,0.2),"cm")) + theme(legend.position = "none")
wherelivelikertplot <- ggplot_gtable(ggplot_build(wherelivelikertplot))
wherelivelikertplot$layout$clip <- "off"
lastplotgrid4panel <- plot_grid(yearslikertplot, wherelivelikertplot, primaryresearchlikertplot, depttypelikertplot,
labels = c("A", "B", "C", "D"), ncol = 1, nrow = 4,
rel_heights = c(0.25, 0.25, 0.25, 0.25))
save_plot("lastplotgrid4panel.jpg", lastplotgrid4panel, base_width = 9, base_height = 9)
lastplotgrid4panel
```
```{r, stats related to views on last authorship}
## CAREEER STAGE
polldata$LastSeniorYes <- ifelse(polldata$LastSenior01>4,1,0)
polldatacareerstageanalysis <- subset(polldata, PhD01 < 7)
polldatacareerstageanalysisfit <- glm(LastSeniorYes ~ ordered(PhD01), family = binomial(link="logit"), data = polldatacareerstageanalysis)
summary(polldatacareerstageanalysisfit)
# comparing to analysis w/year of PhD as a factor to compare
polldatacareerstageanalysisfitfactor <- glm(LastSeniorYes ~ factor(PhD01), family = binomial(link="logit"), data = polldatacareerstageanalysis)
summary(polldatacareerstageanalysisfitfactor)
# and now comparing to it with PhD as numeric, again just to compare
polldatacareerstageanalysisfitnumeric <- glm(LastSeniorYes ~ PhD01, family = binomial(link="logit"), data = polldatacareerstageanalysis)
summary(polldatacareerstageanalysisfitnumeric)
## WHERE LIVE
polldatawhereliveanalysis <- subset(polldata, WhereLive01 == 4 | WhereLive01 == 5)
polldatawherelivefit <- glm(LastSeniorYes ~ WhereLive, family = binomial(link="logit"), data = polldatawhereliveanalysis)
summary(polldatawherelivefit)
## ECOLOGY VS. EVOLUTION
polldataecoevoanalysis <- subset(polldata, ecoevo != "other")
polldataecoevofit <- glm(LastSeniorYes ~ ecoevo, family = binomial(link="logit"), data = polldataecoevoanalysis)
summary(polldataecoevofit)
## DEPT TYPE
polldatadeptfit <- glm(LastSeniorYes ~ factor(Dept01), family = binomial(link="logit"), data = polldata)
summary(polldatadeptfit)
## BASIC VS. APPLIED
polldatabasicappliedfit <- glm(LastSeniorYes ~ BasicApplied, family = binomial(link="logit"), data = polldata)
summary(polldatabasicappliedfit)
## INTERDISCIPLINARITY
polldatainterfit <- glm(LastSeniorYes ~ ordered(Inter01), family = binomial(link="logit"), data = polldata)
summary(polldatainterfit)
## COMPARING TO OUTPUT OF ONE BIG MODEL (note: This model isn't quite the right form, because I'm making the ordinal variables into factors, but want to see if there's general consistency in the approaches)
fullmodelfit <- glm(LastSeniorYes ~ factor(PhD01) + WhereLive + ecoevo + factor(Dept01) + BasicApplied + factor(Inter01), family = binomial(link="logit"), data = polldata)
summary(fullmodelfit)
```
### Does whether people pay attention to a CV statement vary based on age, country, research area, department and/or their views on last authorship?
```{r, CV statement crosstabs}
## subsetting only the statement data, but first filter by the grouping variables of interest
## then renaming factors, renaming question
statementdata <-
polldata %>%
filter(CVStatement01 != "NA", LastSenior != "NA",
WhereLive != "NA", BasicApplied != "NA",
depttype != "NA", PrimaryResearch != "NA") %>%
select(CVStatement)
# order the data
statementdata$CVStatement <- factor(statementdata$CVStatement,
c("No",
"I have never seen this, but would probably not pay attention to it",
"I have never seen this, but would probably pay attention to it",
"Yes"))
statementdata$CVStatement <- statementdata %>%
use_series(CVStatement) %>%
plyr::mapvalues(., c("No","I have never seen this, but would probably not pay attention to it","I have never seen this, but would probably pay attention to it","Yes"), c("No","Not seen, no","Not seen, yes","Yes"))
# changing the column name to the question
colnames(statementdata)[1] <- "Do you pay attention to a CV statement?"
## subsetting the likert data AND the grouping variables
polldata$youngold <- ifelse(polldata$PhD01 < 3, "young",
ifelse(polldata$PhD01 == 3, "middle",
ifelse(polldata$PhD01>3,"old","neither"
)))
statementdata_grouping <-
polldata %>%
filter(CVStatement01 != "NA", LastSenior != "NA",
WhereLive != "NA", BasicApplied != "NA",
depttype != "NA", PrimaryResearch != "NA") %>%
select(molecular, youngold,
WhereLive, BasicApplied,
depttype, PrimaryResearch, LastSenior, LastSenior01)
## ordering the grouping factors
## then renaming the primary research responses
statementdata_grouping$molecular <- factor(statementdata_grouping$molecular,
c("molecular",
"organismal",
"other"))
statementdata_grouping$youngold <- factor(statementdata_grouping$youngold,
c("young",
"middle",
"old"))
levels(statementdata_grouping$PrimaryResearch)
levels(statementdata_grouping$PrimaryResearch) <- c("Biology Other", "Comp Ecol", "Field Ecol", "Lab Mol Ecol", "Mol Evol", "Organismal Evol", "Other")
statementdata_grouping$LastSenior <- factor(statementdata_grouping$LastSenior,
c("No",
"Not sure, but probably no",
"It depends, but probably no",
"It depends, but probably yes",
"Not sure, but probably yes",
"Yes"))
statementdata_grouping$LastSeniorYesNo <- ifelse(statementdata_grouping$LastSenior01 < 4, "No", "Yes")
```
```{r, q4-crosstabfig, warning=FALSE}
## Plots
## plots##
colnames(statementdata)[1] <- "Career stage"
likert_32 <- likert(lastdata, grouping = lastdata_grouping$depttype)
plot(likert_32)
likert_13 <- likert(statementdata, grouping = statementdata_grouping$youngold)
plot(likert_13)
likert_23 <- likert(statementdata, grouping = statementdata_grouping$WhereLive)
plot(likert_23)
likert_33 <- likert(statementdata, grouping = statementdata_grouping$depttype)
plot(likert_33)
likert_43 <- likert(statementdata, grouping = statementdata_grouping$BasicApplied)
plot(likert_43)
likert_53 <- likert(statementdata, grouping = statementdata_grouping$PrimaryResearch)
plot(likert_53)
likert_63 <- likert(statementdata, grouping = statementdata_grouping$molecular)
plot(likert_63)
likert_73 <- likert(statementdata, grouping = statementdata_grouping$LastSenior)
plot(likert_73)
likert_83 <- likert(statementdata, grouping = statementdata_grouping$LastSeniorYesNo)
plot(likert_83)
likert_13.mpg <- plot(likert_13) + theme(legend.position = "none")
likert_23.mpg <- plot(likert_23) + theme(legend.position = "none")
likert_33.mpg <- plot(likert_33) + theme(legend.position = "none")
likert_43.mpg <- plot(likert_43) + theme(legend.position = "none")
likert_53.mpg <- plot(likert_53) + theme(legend.position = "none")
likert_63.mpg <- plot(likert_63) + theme(legend.position = "none")
likert_73.mpg <- plot(likert_73)
likert_83.mpg <- plot(likert_83)
statementplotgrid <- plot_grid(likert_13.mpg, likert_23.mpg, likert_33.mpg, likert_43.mpg, likert_53.mpg, likert_63.mpg, likert_73.mpg, likert_83.mpg,
labels = c("A", "B", "C", "D", "E", "F", "G", "H"), ncol = 2, nrow = 4,
rel_widths = c(0.5, 0.5),
rel_heights = c(0.3, 0.3, 0.35))
save_plot("statementplotgrid.jpg", statementplotgrid, base_width = 17, base_height = 15)
#ideally, would fix the above plot so that the title for each panel was the factor used to break down responses. But since there doesn't seem to be much going on with this, I'm just going to move on.
```
### Do views on corresponding authorship vary based on age, country, research area, and/or department?
```{r, cross tabs related to current corresponding authorship practices}
## subsetting only the corresponding author data, but first filter by the grouping variables of interest
## then renaming factors, renaming question
corrcurrdata <-
polldata %>%
filter(CorrespondingBest01 != "NA", LastSenior != "NA",
WhereLive != "NA", BasicApplied != "NA",
depttype != "NA", PrimaryResearch != "NA", Inter01 != "NA") %>%
select(CorrespondingCurrent,CorrespondingCurrent01,YearssincePhD,PhD01)
corrcurrdata$PhDShort <- corrcurrdata$PhD01
corrcurrdata$PhDShort <- plyr::mapvalues(corrcurrdata$PhDShort, c(1, 2, 3, 4, 5, 6, 7), c("student","1 to 5", "6 to 10", "11 to 15", "16 to 20", "over 20", "no PhD"))
corrcurrdata$CorrespondingCurrentShort <- corrcurrdata$CorrespondingCurrent01
corrcurrdata$CorrespondingCurrentShort <- plyr::mapvalues(corrcurrdata$CorrespondingCurrentShort, c(1, 2, 3, 4, 5), c("questions","stable contact info", "uploaded files", "senior author", "full responsibility"))
currentsummary <- corrcurrdata %>%
filter(PhD01 < 7) %>%
group_by(PhDShort,CorrespondingCurrentShort) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
currentsummary$CorrespondingCurrentShort <- as.factor(currentsummary$CorrespondingCurrentShort)
currentsummary$PhDShort <- factor(currentsummary$PhDShort,
c("student","1 to 5", "6 to 10", "11 to 15", "16 to 20", "over 20"))
currentlineplot <- currentsummary %>%
filter(CorrespondingCurrentShort != "NA") %>%
ggplot(aes(x=PhDShort,y=rel.freq,color= CorrespondingCurrentShort,group=CorrespondingCurrentShort)) +
geom_line(stat='identity',aes(linetype=CorrespondingCurrentShort),size=1) +
geom_point(aes(shape=CorrespondingCurrentShort), size=4) +
ggtitle("Views of people at different career stages on \ncurrent corresponding authorship practices") +
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73","#0072B2")) +
ylab("Percent") +
xlab("Years since PhD") +
guides(colour = guide_legend("Corresponding authorship"), linetype = guide_legend("Corresponding authorship"),
shape = guide_legend("Corresponding authorship"))
currentlineplot
save_plot("currentlineplot.jpg", currentlineplot, base_width = 8, base_height = 4)
#1 = The corresponding author is the person that has taken responsibility for fielding questions about the paper post-publication
#2 = The corresponding author is the person with the most stable contact info and/or internet access
#3 = The corresponding author is usually the person who uploaded the files (usually the first author)
#4 = The corresponding author is usually the senior author
#5 = The corresponding author uploaded the files, managed the revisions and wrote the response to reviewers, and took responsibility for the paper after publication
# questions = The corresponding author is the person that has taken responsibility for fielding questions about the paper post-publication
# stable contact info = The corresponding author is the person with the most stable contact info and/or internet access
# uploaded files = The corresponding author is usually the person who uploaded the files (usually the first author)
# senior author = The corresponding author is usually the senior author
# full responsibility = The corresponding author uploaded the files, managed the revisions and wrote the response to reviewers, and took responsibility for the paper after publication
#1 = 0 (current students and people without a PhD should choose this)
#2 = 1-5
#3 = 6-10
#4 = 11-15
#5 = 16-20
#6 = >20
#7 = I do not have a PhD and am not a current student
corrcurrdataecoevo <-
polldata %>%
filter(CorrespondingBest01 != "NA", LastSenior != "NA",
WhereLive != "NA", BasicApplied != "NA",
depttype != "NA", PrimaryResearch != "NA", Inter01 != "NA") %>%
select(CorrespondingCurrent,CorrespondingCurrent01,ecoevo)
corrcurrdataecoevo$CorrespondingCurrentShort <- corrcurrdataecoevo$CorrespondingCurrent01
corrcurrdataecoevo$CorrespondingCurrentShort <- plyr::mapvalues(corrcurrdataecoevo$CorrespondingCurrentShort, c(1, 2, 3, 4, 5), c("questions","stable contact info", "uploaded files", "senior author", "full responsibility"))
currentsummaryecoevo <- corrcurrdataecoevo %>%
filter(ecoevo != "other") %>%
group_by(ecoevo,CorrespondingCurrentShort) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
currentsummaryecoevo$CorrespondingCurrentShort <- as.factor(currentsummaryecoevo$CorrespondingCurrentShort)
ecoevo = c("evolution")
CorrespondingCurrentShort = c("stable contact info")
n = c(0)
rel.freq = c(0)
missingdfecoevo <- data.frame(ecoevo, CorrespondingCurrentShort, n, rel.freq)
currentsummaryecoevo <- bind_rows(currentsummaryecoevo,missingdfecoevo)
currentlineplotecoevo <- currentsummaryecoevo %>%
filter(CorrespondingCurrentShort != "NA") %>%
ggplot(aes(x=ecoevo,y=rel.freq,color= CorrespondingCurrentShort)) +
geom_line(stat='identity',aes(linetype=CorrespondingCurrentShort),size=1) +
geom_point(aes(shape=CorrespondingCurrentShort), size=4) +
ggtitle("Views of people at different career stages on \ncurrent corresponding authorship practices") +
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73","#0072B2")) +
ylab("Percent")
currentbarplotecoevo <- currentsummaryecoevo %>%
filter(CorrespondingCurrentShort != "NA") %>%
ggplot(aes(x=ecoevo,y=rel.freq,fill= CorrespondingCurrentShort)) +
geom_bar(stat='identity',aes(fill=CorrespondingCurrentShort),position=position_dodge(),color="black") +
ggtitle("Views of ecologists vs. evolutionary biologists on \ncurrent corresponding authorship practices") +
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73","#0072B2")) +
ylab("Percent") +
xlab("Main research area") +
guides(fill=guide_legend(title="Corresponding authorship"))
currentbarplotecoevo
save_plot("currentbarplotecoevo.jpg", currentbarplotecoevo, base_width = 8, base_height = 4)
corrcurrdatadepttype <-
polldata %>%
filter(CorrespondingBest01 != "NA", LastSenior != "NA",
WhereLive != "NA", BasicApplied != "NA",
depttype != "NA", PrimaryResearch != "NA", Inter01 != "NA") %>%
select(CorrespondingCurrent,CorrespondingCurrent01,depttype)
corrcurrdatadepttype$CorrespondingCurrentShort <- corrcurrdatadepttype$CorrespondingCurrent01
corrcurrdatadepttype$CorrespondingCurrentShort <- plyr::mapvalues(corrcurrdatadepttype$CorrespondingCurrentShort, c(1, 2, 3, 4, 5), c("questions","stable contact info", "uploaded files", "senior author", "full responsibility"))
currentsummarydepttype <- corrcurrdatadepttype %>%
filter(depttype != "other") %>%
group_by(depttype,CorrespondingCurrentShort) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
currentsummarydepttype$CorrespondingCurrentShort <- as.factor(currentsummarydepttype$CorrespondingCurrentShort)
currentbarplotdepttype <- currentsummarydepttype %>%
filter(CorrespondingCurrentShort != "NA") %>%
ggplot(aes(x=depttype,y=rel.freq,fill= CorrespondingCurrentShort)) +
geom_bar(stat='identity',aes(fill=CorrespondingCurrentShort),position=position_dodge(),color="black") +
ggtitle("Views of people in different department types on \ncurrent corresponding authorship practices") +
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73","#0072B2")) +
ylab("Percent") +
xlab("Department type") +
guides(fill=guide_legend(title="Corresponding authorship"))
currentbarplotdepttype
save_plot("currentbarplotdepttype.jpg", currentbarplotdepttype, base_width = 8, base_height = 4)
corrcurrdataregion <-
polldata %>%
filter(CorrespondingBest01 != "NA", LastSenior != "NA",
WhereLive != "NA", BasicApplied != "NA",
depttype != "NA", PrimaryResearch != "NA", Inter01 != "NA") %>%
filter(WhereLive01 < 6) %>%
filter(WhereLive01 > 3) %>%
select(CorrespondingCurrent,CorrespondingCurrent01,WhereLive)
corrcurrdataregion$CorrespondingCurrentShort <- corrcurrdataregion$CorrespondingCurrent01
corrcurrdataregion$CorrespondingCurrentShort <- plyr::mapvalues(corrcurrdataregion$CorrespondingCurrentShort, c(1, 2, 3, 4, 5), c("questions","stable contact info", "uploaded files", "senior author", "full responsibility"))
currentsummaryregion <- corrcurrdataregion %>%
group_by(WhereLive,CorrespondingCurrentShort) %>%
summarise(n=n()) %>%
mutate(rel.freq = round(100 * n/sum(n), 0))
currentbarplotregion <- currentsummaryregion %>%
filter(CorrespondingCurrentShort != "NA") %>%
ggplot(aes(x=WhereLive,y=rel.freq,fill= CorrespondingCurrentShort)) +
geom_bar(stat='identity',aes(fill=CorrespondingCurrentShort),position=position_dodge(),color="black") +
ggtitle("Views of people in Europe and North America on \ncurrent corresponding authorship practices") +
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73","#0072B2")) +
ylab("Percent") +
xlab("Location") +
guides(fill=guide_legend(title="Corresponding authorship"))
currentbarplotregion
save_plot("currentbarplotregion.jpg", currentbarplotregion, base_width = 8, base_height = 4)
```
```{r, combining plots related to corresponding authorship views into one}
correspondingplotgrid4panel <- plot_grid(currentlineplot, currentbarplotecoevo, currentbarplotdepttype, currentbarplotregion,
labels = c("A", "B", "C", "D"), ncol = 1, nrow = 4,
rel_heights = c(0.25, 0.25, 0.25, 0.25))
save_plot("correspondingplotgrid4panel.jpg", correspondingplotgrid4panel, base_width = 8, base_height = 13)
```
```{r, stats for corresponding authorship question}
## CAREEER STAGE
polldata$FullResYes <- ifelse(polldata$CorrespondingCurrent01==5,1,0)
polldatacareerstageanalysis2 <- subset(polldata, PhD01 < 7)
polldatacareerstageanalysisfit2 <- glm(FullResYes ~ ordered(PhD01), family = binomial(link="logit"), data = polldatacareerstageanalysis2)
summary(polldatacareerstageanalysisfit2)
# comparing to analysis w/year of PhD as a factor to compare
polldatacareerstageanalysisfitfactor2 <- glm(FullResYes ~ factor(PhD01), family = binomial(link="logit"), data = polldatacareerstageanalysis2)
summary(polldatacareerstageanalysisfitfactor2)
# and now comparing to it with PhD as numeric, again just to compare
polldatacareerstageanalysisfitnumeric2 <- glm(FullResYes ~ PhD01, family = binomial(link="logit"), data = polldatacareerstageanalysis2)
summary(polldatacareerstageanalysisfitnumeric2)
## WHERE LIVE
polldatawhereliveanalysis2 <- subset(polldata, WhereLive01 == 4 | WhereLive01 == 5)
polldatawherelivefit2 <- glm(FullResYes ~ WhereLive, family = binomial(link="logit"), data = polldatawhereliveanalysis2)
summary(polldatawherelivefit2)
## ECOLOGY VS. EVOLUTION
polldataecoevoanalysis2 <- subset(polldata, ecoevo != "other")
polldataecoevofit2 <- glm(FullResYes ~ ecoevo, family = binomial(link="logit"), data = polldataecoevoanalysis2)
summary(polldataecoevofit2)
## DEPT TYPE
polldatadepttypeanalysis2 <- subset(polldata, Dept01 < 4)
polldatadeptfit2 <- glm(FullResYes ~ factor(Dept01), family = binomial(link="logit"), data = polldatadepttypeanalysis2)
summary(polldatadeptfit2)
## COMPARING TO OUTPUT OF ONE BIG MODEL (note: This model isn't quite the right form, because I'm making the ordinal variables into factors, but want to see if there's general consistency in the approaches)
fullmodelfit2 <- glm(FullResYes ~ factor(PhD01) + WhereLive + ecoevo + factor(Dept01), family = binomial(link="logit"), data = polldata)
summary(fullmodelfit2)
```
You can’t perform that action at this time.