ε-index

R function to calculate the ε-index of a researcher's relative citation performance

Prof Corey J. A. Bradshaw
Global Ecology, Flinders University, Adelaide, Australia
September 2021
e-mail

Existing citation-based indices used to rank research performance do not permit a fair comparison of researchers among career stages or disciplines, nor do they treat women and men equally. We designed the ε-index, which is simple to calculate, based on open-access data, corrects for disciplinary variation, can be adjusted for career breaks, and sets a sample-specific threshold above and below which a researcher is deemed to be performing above or below expectation.

Code accompanies the article:

BRADSHAW, CJA, JM CHALKER, SA CRABTREE, BA EIJKELKAMP, JA LONG, JR SMITH, K TRINAJSTIC, V WEISBECKER. 2021. A fairer way to compare researchers at any career stage and in any discipline using open-access citation data. PLoS One 16(9): e0257141. doi:10.1371/journal.pone.0257141

--
DIRECTIONS

Create a .csv file of exactly the same format as the example file in this repository ('datasample.csv'):

COLUMN 1: personID — any character identification of an individual researcher (can be a name)
COLUMN 2: gender — researcher's gender ("F" or "M")
COLUMN 3: i10 — researcher's i10 index (# papers with ≥ 10 citations); must be > 0
COLUMN 4: h — researcher's h-index
COLUMN 5: maxcit — number of citations of researcher's most cited peer-reviewed paper
COLUMN 6: firstyrpub — the year of the researcher's first published peer-reviewed paper

Import the sample .csv file, or your own following the format indicated above (make sure first to specify the directory in which 'datasample.csv' resides using the 'setwd()' command):
```
 setwd("/path") # where /path is the directory path on your machine
 example.dat <- read.csv("datasample.csv", header=T) 
```
Alternatively, you can automatically harvest the necessary citation data from Google Scholar using the 'get.profile.func.R' function, which produces a file that can be called directly by the 'epsilon.index.func.R':

i. Predefine a Google Scholar ids vector (12-character user ID from scholar.google.com), e.g.,
```
  ids <- c("1sO0O3wAAAAJ","ZBUju2QAAAAJ","oGAui-IAAAAJ","cpJnEYIAAAAJ","ptDEg44AAAAJ","PJYrOvQAAAAJ","4UxbBYIAAAAJ") 
```
ii. Then define a 'genders' vector of the same length, e.g.,
```
  genders <- c("M","M","F","M","M","F","F")
```
iii. Load get.profile.func

iv. Define an input file that the epsilon.index.func will use, e.g.,
```
  example.dat <- getProfiledatFunc(ids, genders)
```
Note: The estimation of the first year of publication (Y₁) can return errors because the function does not differentiate peer-reviewed and non-peer-reviewed entries in Google Scholar, nor can it avoid clearly erroneous entries in a researcher's publication history. We recommend that all harvested values for the year of first publication be checked manually for each researcher in the sample. A case in point is id=ptDEg44AAAAJ that returns Y₁ = 1791, but the true year of first publication for this researcher is 1982.
Load the function ('epsilon.index.func') in R by submitting the entire function code (lines 20 to 212) to the R console.

Simply run the function as follows:

 epsilonIndexFunc(dat.samp=example.dat, bygender=c('no','yes'), sort.index=c('e', 'd', 'ep', 'dp'))

where 'bygender' indicates whether you want to calculate the gender-debiased index, and 'sort.out' is a sorting option for the final results table based on desired index (default = 'e')

possible values: 'e' = pooled; 'ep' = normalised; 'd' = gender-debiased; 'dp' = normalised gender-debiased

If there are insufficient individuals per gender to estimate a gender-specific index, we recommmend selecting bygender='no' and not using or sorting based on the gender-debiased index (option 'd'). If the individuals in the sample are not all in the same approximate discipline, we recommend not using or sorting based on either of the two normalised indices (options 'ep' or 'dp').

The output includes the following columns:

person: researcher's ID (specified by user)
gender: F=female; M=male
yrs.publ: number of years since first peer-reviewed article
gender.eindex: ε-index relative to others of the same gender in the sample
expectation: whether above or below expectation based on chosen index (default is 'e' = pooled index)
m-quotient: h-index ÷ yrs.publ
h-index: h-index
debiased.e.prime.index: scaled gender.eindex (gender ε′-index)
gender.rank: rank from gender.eindex (1 = highest)
rnk.debiased: gender-debiased rank (1 = highest)
pooled.eindex: ε-index generated from the entire sample (not gender-specific)
e.prime.index: scaled pooled.eindex (ε′-index)
pooled.rnk: rank from pooled.eindex (1 = highest)

and

if sort.index = 'ep':

eprime.rnk: rank from scaled pooled.eindex (ε′-index)

or if sort.index = 'dp':

eprime.debiased.rnk: rank from scaled gender.eindex (gender ε′-index)

You can easily export the output to a file like this:

 out <- epsilon.index.func(dat.samp=example.dat, sort.index=c('e', 'd', 'ep', 'dp'))
 write.table(out,file="rank.output.csv",sep=",",dec = ".", row.names = F,col.names = TRUE)

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
LICENSE		LICENSE
README.md		README.md
datasample.csv		datasample.csv
epsilonIndex logo.png		epsilonIndex logo.png
epsilonIndexFunc.R		epsilonIndexFunc.R
getProfileFunc.R		getProfileFunc.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

datasample.csv

datasample.csv

epsilonIndex logo.png

epsilonIndex logo.png

epsilonIndexFunc.R

epsilonIndexFunc.R

getProfileFunc.R

getProfileFunc.R

Repository files navigation

ε-index

About

Releases

Packages

Languages

License

cjabradshaw/EpsilonIndex

Folders and files

Latest commit

History

Repository files navigation

ε-index

About

Topics

Resources

License

Stars

Watchers

Forks

Languages