Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
img
.gitignore
README.Rmd
README.md
user2017.pdf
user2017.sla

README.md

useR!2017 poster

By David Selby and David Firth, Department of Statistics, University of Warwick.

A poster for the useR!2017 conference in Brussels.

Title

Ranking influential communities in networks

Abstract

Which scientific fields export the most intellectual influence, through recent research, to other fields? Citation behaviour varies greatly across disciplines, making inter-field comparisons difficult. Recent approaches to measuring influence in networks, based on the PageRank algorithm, take the source of citations (or recommendations or links) into account.

By aggregating all publications in each Web of Science field into "super-journals", we model the exchange of citations between fields. We then fit a Bradley–Terry paired comparisons model—which can be shown to be equivalent to scaled PageRank—to measure the relative influence of academic communities. Uncertainty intervals are provided to quantify the robustness of the ranking. All analyses were performed in R.

Which field is top of the table? Visit the poster to find out!

Keywords: PageRank, Bradley–Terry model, networks, ranking, bibliometrics

Design inspiration

Alberto Cairo's two-page spread on the Galileo telescope is a good basis for a layout. This features one huge diagram just left of centre, with an introductory paragraph on the far left, a self-contained series of fun facts along the bottom and a set of more detailed diagrams on the right. See also Giants of the Ocean.

See also the diagram in Figure 3 of Fast unfolding of communities in large networks by Blondel et al (2008), where we have a large node-link diagram with communities aggregated into nodes, then a 'zoom' on a selected community, represented as its constituent nodes and links.

Since useR!2017 specifies posters must be A0 in portrait orientation, we can't implement landscape layouts like those mentioned above. While using some of those ideas, we might consider Colin Purrington's results area design for portrait posters.

Another useful resource is the Better Posters blog.

A large dose of inspiration may also be drawn from Dorling Kindersley Eyewitness books.

Method

Unfortunately, some of the raw data is property of Thomson Reuters / Clarivate Analytics so I cannot republish it in this repository. Nonetheless, hopefully you can get a general idea of what I have done from the source code and results.

We will need the following packages.

library(igraph)
library(ggraph)

Data wrangling

First we load the 2013 Web of Science citation data into R.

WoS <- readRDS('../thesis/data/thomsonreuters.Rds')

We are only interested in citations from 2013 to publications from 2003–2012. Furthermore, some miscellaneous journals are unhelpfully classified as ALL OTHERS and we want to omit any reference to these as well. Citations of count (weight) equal to zero should be deleted to avoid connectivity problems later.

WoS <- data.frame(from = WoS$Citing, to = WoS$Cited, weight = WoS$AllYears - WoS$Earlier)
WoS <- WoS[- union(grep('ALL OTHERS', WoS$from), grep('ALL OTHERS', WoS$to)), ]
WoS <- subset(WoS, weight > 0)

It's important to put the citing journal in the first column, the cited journal second and call the citation counts weight, because of how the igraph function graph_from_data_frame works. We will now turn our data into an igraph object.

ig <- graph_from_data_frame(WoS)

We now have a weighted, directed graph.

Before we run our community detection algorithm, there is some other housekeeping to do. A number of journals either gave out or received zero citations during the study period. These will break the graph up into disconnected components, so we want to remove all these singletons.

Firstly, we calculate the strongly-connected components of the graph. To be strongly connected to the rest of the network, a journal must both cite and be cited by others.

strong <- components(ig, mode = 'strong')
core <- which(strong$csize > 1)
if (length(core) > 1) stop('There should be only one core component')

The whole graph contains 12,604 nodes (journals). Of these, 10,713 belong to our "core" strongly-connected component and 1,891 are singletons, either weakly connected or completely disconnected from the rest of the graph. Let's get rid of these singleton nodes.

ig <- induced_subgraph(ig, which(strong$membership == core))

We now have 10,713 journals in our network.

Super journals

We will put our directed, weighted graph through the Infomap algorithm, as implemented in the igraph package. Results can be nondeterministic, so we will fix the random seed for reproducible results. Community detection can take a long time, so you might want to cache this chunk!

set.seed(2017)
infomap <- cluster_infomap(ig)

The algorithm returns 112 communities; the largest contains 971 journals and the smallest contains 2. The mean community size is 96 journals.

The journals of community will be aggregated into a super-journal representing all incoming and outgoing citations for that community. Edge weights are summed and other edge and vertex attributes are ignored.

sj <- contract.vertices(ig, membership(infomap), 'ignore')
sj <- simplify(sj, remove.multiple = TRUE, remove.loops = TRUE,
               edge.attr.comb = list(weight = 'sum', 'ignore'))

For later reference, each super-journal will be assigned a unique ID.

V(sj)$name <- 1:vcount(sj)

Visualisation

A nice way to visualise these graphs is with the ggraph package. Each node will be represented by a point, proportional to its PageRank score, and each edge will be represented by an arc between these points.

There is a problem, however: ggraph, ggplot2 and our PDF reader won't like it if we try to plot 9,145,683 arcs in a single graphic!

(Another approach is to have a single arc for each pair of nodes, with opacity or width proportional to the number of citations, but this doesn't look very good in my opinion.)

Calculating the positions of nodes in our graphic will involve multidimensional scaling of the correlation matrix. Let's extract the weighted adjacency matrix, paying attention to the fact that igraph considers citations to travel from rows to columns.

xtab <- Matrix::t(as_adjacency_matrix(sj, attr = 'weight'))

We will scale the citations counts so no pair of nodes has more than 1000 arcs drawn between them on the final graphic.

small_xtab <- as.matrix(xtab)
diag(small_xtab) <- 0 # ignore self-citations
scalefactor <- 1000 / max(small_xtab)
small_xtab <- ceiling(small_xtab * scalefactor)

Then we can turn this back into an igraph object for visualisation. A useful attribute is the PageRank, which we can use to make nodes with greater total influence appear larger on the plot.

viz_ig <- graph_from_adjacency_matrix(t(small_xtab))
V(viz_ig)$PageRank <- page.rank(sj)$vector

Now let's perform multidimensional scaling to generate some coordinates.

my_mds <- create_layout(viz_ig,
                        layout = 'igraph',
                        algorithm = 'mds',
                        dist = 1 - cor(as.matrix(xtab)))

Let's make some plots!

ggraph(my_mds) +
  geom_edge_fan0(alpha = .01, colour = '#4F94CD') +
  geom_node_point(aes(size = PageRank), fill = '#4F94CD', pch = 21, colour = 'white') +
  coord_fixed() +
  scale_x_reverse() + # flip horizontally
  theme_graph() +
  theme(legend.position = 'none')

On its own, our visualisation does not imply much because we don't know which community is which. We need labels on the communities for that. I have gone through and manually assigned plausible labels to the communities generated by Infomap.

labels <- read.csv('data/cluster_names.csv', stringsAsFactors = FALSE)
V(viz_ig)$field <- V(sj)$field <- labels$field[match(V(viz_ig)$name, labels$community)]

We can then have a labelled plot for reference, or even show a selection of "interesting" labels on the main graph, while omitting most of them to avoid clutter.

# Hide plot labels for very small fields
my_mds$flabel <- ifelse(rank(my_mds$PageRank) > 32, as.character(my_mds$field), NA)

ggraph(my_mds) +
  geom_edge_fan0(alpha = .007, colour = 'tomato2') + #4F94CD #2D7B95
  geom_node_point(aes(size = PageRank), fill = 'tomato2', pch = 21, colour = 'white') +
  geom_node_label(aes(label = flabel),
                  size = 2,
                  repel = TRUE,
                  family = 'Gill Sans MT Condensed',
                  fontface = 'bold',
                  colour = 'tomato2',
                  # Label options
                  segment.alpha = .5,
                  segment.size = 0.2,
                  label.r = unit(0.1, 'lines'),
                  label.size = NA, # no label border
                  label.padding = unit(0.1, 'lines'),
                  fill = rgb(1, 1, 1, .5)) +
  coord_fixed() +
  scale_x_reverse() +
  theme_graph() +
  theme(legend.position = 'none')

Within-field analysis

Now, let's take a particular field out of the network and examine its inner structure. Varin et al. (2016) set a precedent to study statistics journals, so let's have a look at statistics.

statistics <- labels$community[labels$field == 'statistics']
stats_ig <- induced_subgraph(ig, which(membership(infomap) == statistics))
V(stats_ig)$PageRank <- page.rank(stats_ig)$vector

There are 77 journals in the statistics subgraph, in 1 strongly-connected component(s).

At the moment we only have Thomson Reuters' abbreviations for the journal names. Some are more obvious than others! We have the full journal titles, but some are a bit too long, so we will use a table of custom short-ish titles.

titles <- read.csv('data/stats_titles.csv', stringsAsFactors = FALSE)
V(stats_ig)$title <- titles$Short[match(V(stats_ig)$name, titles$JCR)]

We can visualise the network as before.

stats_xtab <- Matrix::t(as_adjacency_matrix(stats_ig))
stats_layout <- create_layout(stats_ig,
                              layout = 'igraph',
                              algorithm = 'mds',
                              dist = 1 - cor(as.matrix(stats_xtab)))

ggraph(stats_layout) +
  geom_edge_fan0(alpha = .05, colour = 'tomato2') +
  geom_node_point(aes(size = PageRank), fill = 'tomato2', pch = 21, colour = 'white') +
  geom_node_text(aes(label = title), size = 3,
                 repel = TRUE,
                 family = 'Gill Sans MT Condensed',
                 fontface = 'bold',
                 colour = 'tomato2',
                 segment.alpha = .2) +
  coord_fixed() +
  theme_graph() +
  theme(legend.position = 'none')

And here is a ranking of these statistical journals, by Bradley–Terry score.

(stats_ranks <- dplyr::arrange(
  data.frame(journal = V(stats_ig)$title,
             Scroogefactor = scrooge::Scroogefactor(stats_xtab),
             PageRank = V(stats_ig)$PageRank,
             BradleyTerry = scrooge::BTscores(stats_xtab),
             rank = rank(-scrooge::BTscores(stats_xtab), ties.method = 'first')),
  desc(BradleyTerry)
))
journal Scroogefactor PageRank BradleyTerry rank
JRSS-A 0.0282376 0.0103465 0.0283302 1
American Stat 0.0270228 0.0057930 0.0270249 2
JRSS-B 0.0256999 0.0454141 0.0259910 3
Canada J Stats 0.0239206 0.0083131 0.0241060 4
Annals 0.0236113 0.1100324 0.0240983 5
Biometrika 0.0228638 0.0498885 0.0230736 6
JRSS-C 0.0210658 0.0089601 0.0213421 7
Technometrics 0.0209600 0.0139015 0.0208967 8
Stat Sci 0.0209032 0.0153774 0.0206936 9
Biostatistics 0.0209463 0.0158200 0.0205330 10
Lifetime Data Analysis 0.0182751 0.0058858 0.0199515 11
J Stat Software 0.0201867 0.0162305 0.0197136 12
JASA 0.0194846 0.0835435 0.0194837 13
Stats & Comp 0.0199332 0.0109201 0.0193991 14
JCGS 0.0198372 0.0190766 0.0193987 15
Scan J Stats 0.0188394 0.0169175 0.0182769 16
Biometrics 0.0184227 0.0371235 0.0181814 17
J Machine Learning Res 0.0178295 0.0302504 0.0172690 18
Machine Learning 0.0164286 0.0057000 0.0171842 19
Stat Meth Med Res 0.0168654 0.0072661 0.0167750 20
Bernoulli 0.0163526 0.0165876 0.0162539 21
Int Stats Rev 0.0161866 0.0050933 0.0162027 22
Stat Neerlandica 0.0157788 0.0038321 0.0161538 23
J Time Series 0.0164238 0.0075043 0.0160605 24
J Stat Plan & Infer 0.0166547 0.0368382 0.0159328 25
ANZ J Stats 0.0156540 0.0046732 0.0158607 26
Stat Model 0.0157907 0.0047046 0.0157993 27
Ann Inst Stat Maths 0.0157902 0.0083345 0.0155305 28
Extremes 0.0149530 0.0057294 0.0153994 29
Bayesian Analysis 0.0154541 0.0114376 0.0153939 30
Test 0.0162304 0.0075401 0.0152143 31
J Qual Tech 0.0133428 0.0078417 0.0144551 32
Prob Eng & Inf Sci 0.0129198 0.0038546 0.0141473 33
Biometrical J 0.0149472 0.0107017 0.0141281 34
Stats & Prob Letters 0.0146409 0.0227675 0.0137136 35
Stats in Medicine 0.0145437 0.0454388 0.0136752 36
J Nonpar Stats 0.0137078 0.0080410 0.0135831 37
Comp Stats & Data Analysis 0.0146516 0.0420382 0.0134674 38
J Multivariate Analysis 0.0142435 0.0318751 0.0132759 39
Envir & Eco Stats 0.0129528 0.0046764 0.0131239 40
Environometrics 0.0120375 0.0072662 0.0129543 41
Scand Actuarial J 0.0119172 0.0049733 0.0124851 42
Stoch Models 0.0112702 0.0041752 0.0123314 43
Ann Appl Stats 0.0135044 0.0192067 0.0120295 44
Methods & Comp in Appl Prob 0.0108873 0.0045467 0.0115274 45
Metrika 0.0094645 0.0055462 0.0102697 46
Stat Method 0.0088005 0.0046021 0.0092631 47
ESAIM Prob & Stats 0.0092176 0.0036076 0.0092150 48
Comm Stats: Sim & Comp 0.0095515 0.0060782 0.0091717 49
J Stats Comp & Sim 0.0092627 0.0082753 0.0091212 50
Comm Stats: Theory & Methods 0.0094081 0.0118277 0.0090699 51
Seq Analysis 0.0088882 0.0036688 0.0089746 52
EJ Stats 0.0097919 0.0154410 0.0089544 53
J Agr Bio & Envir Stats 0.0089445 0.0045417 0.0089017 54
REVSTAT 0.0086771 0.0028658 0.0083682 55
SORT 0.0079263 0.0021687 0.0083124 56
Statistics 0.0079254 0.0050826 0.0082635 57
Stats & Interface 0.0087419 0.0040690 0.0079890 58
J Applied Stats 0.0082399 0.0066011 0.0078485 59
Appl Stoch Models in Business 0.0067881 0.0035229 0.0077174 60
Stats Papers 0.0065067 0.0060219 0.0069310 61
Qual Eng 0.0063973 0.0043052 0.0068593 62
Pakistan J Stats 0.0058241 0.0040568 0.0064576 63
Int J Biostatistics 0.0073174 0.0051220 0.0063488 64
Brazil J Prob & Stats 0.0056185 0.0025517 0.0059057 65
Pharm Stats 0.0055106 0.0043995 0.0058420 66
J Biopharm Stats 0.0059426 0.0077599 0.0057170 67
R Journal 0.0063932 0.0026105 0.0054341 68
Adv Stat Analysis 0.0051192 0.0027805 0.0052610 69
Stat Methods & Appl 0.0044261 0.0028058 0.0050218 70
Qual Rel Eng Int 0.0044372 0.0060060 0.0046339 71
Comp Stats 0.0048448 0.0035324 0.0046084 72
J Korean SS 0.0045534 0.0032268 0.0045248 73
Stats in Biopharm Res 0.0038185 0.0033177 0.0044349 74
Rev Colomb Estad 0.0041089 0.0023813 0.0043742 75
Adv Data Analysis & Class 0.0032452 0.0025590 0.0036667 76
Qual Tech & Quant Mgmt 0.0021391 0.0022255 0.0021176 77

And now for something completely different. What if we actually have a single non-statistics "super-journal" in the statistics network?

mapping <- match(V(ig)$name, V(stats_ig)$name)
mapping[is.na(mapping)] <- max(mapping, na.rm = TRUE) + 1
stats_others <- contract.vertices(ig, mapping, list(name = 'first', 'ignore'))
V(stats_others)$name[vcount(stats_others)] <- "(All others)"
others_xtab <- Matrix::t(as_adjacency_matrix(stats_others))
diag(others_xtab) <- 0

dplyr::arrange(
  data.frame(journal = V(stats_others)$name,
             Scroogefactor = scrooge::Scroogefactor(others_xtab),
             PageRank = page.rank(stats_others)$vector,
             BradleyTerry = scrooge::BTscores(others_xtab)),
  desc(BradleyTerry)
)
journal Scroogefactor PageRank BradleyTerry
J STAT SOFTW 0.0597587 0.0159115 0.0662567
J R STAT SOC B 0.0533165 0.0320152 0.0584122
AM STAT 0.0498796 0.0084807 0.0547457
BIOMETRIKA 0.0343575 0.0368557 0.0361016
J COMPUT GRAPH STAT 0.0275142 0.0169518 0.0283282
ANN STAT 0.0266800 0.0694021 0.0278383
STAT COMPUT 0.0268367 0.0123934 0.0275116
BIOMETRICS 0.0250801 0.0277151 0.0266368
J AM STAT ASSOC 0.0237224 0.0560967 0.0251152
MACH LEARN 0.0224210 0.0070761 0.0246029
J MACH LEARN RES 0.0212466 0.0186240 0.0225928
BIOSTATISTICS 0.0214237 0.0136015 0.0218223
CAN J STAT 0.0225656 0.0100984 0.0206261
STAT METHODS MED RES 0.0188755 0.0085729 0.0203618
TECHNOMETRICS 0.0194398 0.0148760 0.0189859
STAT SCI 0.0190649 0.0139816 0.0189028
BIOMETRICAL J 0.0178737 0.0118255 0.0173150
J R STAT SOC C-APPL 0.0179341 0.0102346 0.0172444
J R STAT SOC A STAT 0.0171675 0.0091808 0.0170781
STAT MED 0.0158302 0.0336120 0.0167550
COMPUT STAT DATA AN 0.0161689 0.0384759 0.0164907
J QUAL TECHNOL 0.0141561 0.0111754 0.0154762
SCAND J STAT 0.0165599 0.0157602 0.0147462
ANN I STAT MATH 0.0154795 0.0098950 0.0141590
INT STAT REV 0.0142590 0.0073603 0.0136896
J STAT PLAN INFER 0.0141285 0.0328774 0.0133039
EXTREMES 0.0133718 0.0081363 0.0128179
LIFETIME DATA ANAL 0.0127270 0.0081576 0.0125918
BAYESIAN ANAL 0.0137490 0.0115487 0.0124839
J TIME SER ANAL 0.0137731 0.0093792 0.0124744
STAT MODEL 0.0132021 0.0074011 0.0117373
TEST-SPAIN 0.0128800 0.0098124 0.0102774
J BIOPHARM STAT 0.0096748 0.0098427 0.0101378
J NONPARAMETR STAT 0.0123958 0.0098934 0.0098247
ENVIRONMETRICS 0.0096255 0.0093404 0.0097203
AUST NZ J STAT 0.0108198 0.0074397 0.0093956
PHARM STAT 0.0083334 0.0073263 0.0093387
STAT NEERL 0.0104667 0.0067418 0.0090792
ENVIRON ECOL STAT 0.0088240 0.0071893 0.0089873
PROBAB ENG INFORM SC 0.0081377 0.0066973 0.0086932
J MULTIVARIATE ANAL 0.0097456 0.0262258 0.0086846
SORT-STAT OPER RES T 0.0074250 0.0054547 0.0081761
REVSTAT-STAT J 0.0079691 0.0061630 0.0079729
METRIKA 0.0077236 0.0085714 0.0078223
BERNOULLI 0.0086403 0.0148244 0.0073900
J AGR BIOL ENVIR ST 0.0076118 0.0071660 0.0072628
SCAND ACTUAR J 0.0068412 0.0073625 0.0072348
STAT PROBABIL LETT 0.0082754 0.0212788 0.0071910
METHODOL COMPUT APPL 0.0066574 0.0071288 0.0069508
STOCH MODELS 0.0061108 0.0066631 0.0065910
STATISTICS 0.0068530 0.0081674 0.0064932
STAT METHODOL 0.0068281 0.0076906 0.0063508
(All others) 0.0056965 0.0115052 0.0061721
QUAL ENG 0.0057950 0.0076824 0.0061600
COMMUN STAT-SIMUL C 0.0071682 0.0090134 0.0060598
J STAT COMPUT SIM 0.0063426 0.0112566 0.0058592
SEQUENTIAL ANAL 0.0067499 0.0071786 0.0057831
COMMUN STAT-THEOR M 0.0059140 0.0140319 0.0054953
R J 0.0065317 0.0059580 0.0052631
QUAL RELIAB ENG INT 0.0049446 0.0092773 0.0052533
INT J BIOSTAT 0.0062151 0.0072493 0.0052001
REV COLOMB ESTAD 0.0039183 0.0060033 0.0045831
ANN APPL STAT 0.0057245 0.0152338 0.0045302
ESAIM-PROBAB STAT 0.0048405 0.0063724 0.0045107
STAT BIOPHARM RES 0.0040342 0.0063369 0.0043468
STAT PAP 0.0041720 0.0095298 0.0040877
BRAZ J PROBAB STAT 0.0039294 0.0057803 0.0039361
STAT INTERFACE 0.0054177 0.0067122 0.0038533
APPL STOCH MODEL BUS 0.0035906 0.0064436 0.0038035
ASTA-ADV STAT ANAL 0.0043451 0.0061006 0.0037645
J APPL STAT 0.0038169 0.0094288 0.0035409
ADV DATA ANAL CLASSI 0.0035820 0.0058913 0.0031111
ELECTRON J STAT 0.0049902 0.0129883 0.0030936
QUAL TECHNOL QUANT M 0.0030537 0.0055776 0.0026304
STAT METHOD APPL-GER 0.0025070 0.0061124 0.0025973
J KOREAN STAT SOC 0.0024670 0.0063257 0.0021873
COMPUTATION STAT 0.0025795 0.0064853 0.0021572
PAK J STAT 0.0012715 0.0068723 0.0012420

We can try plotting it, too.

stats_others <- simplify(stats_others, remove.loops = TRUE, remove.multiple = FALSE) # remove self-citations
V(stats_others)$PageRank <- page.rank(stats_others)$vector

other_layout <- create_layout(stats_others,
                              layout = 'igraph',
                              algorithm = 'mds',
                              dist = 1 - cor(as.matrix(others_xtab)))

ggraph(other_layout) +
  geom_edge_fan0(alpha = .01, colour = '#4F94CD') +
  geom_node_point(aes(size = PageRank), fill = '#4F94CD', pch = 21, colour = 'white') +
  geom_node_text(aes(label = name), size = 3,
                 repel = TRUE,
                 family = 'Gill Sans MT Condensed',
                 fontface = 'bold',
                 colour = '#4F94CD',
                 segment.alpha = .2) +
  coord_fixed() +
  theme_graph() +
  theme(legend.position = 'none')

Ranking

We can penalise the Bradley–Terry model by adding a "player zero" who cites/is cited by every player/journal/field at a constant rate (say 1/2). This will help reduce the chance of outliers (such as journals for which we have very little citation data, or fields containing very few journals) from shooting to the top or the bottom of a Bradley–Terry scores league table.

zero_cite <- .15 * sum(xtab) / 2 / nrow(xtab) ## 2ka = .15n
penalised_xtab <- rbind(zero_cite, cbind(zero_cite, xtab)) # add zeroth player

library(scrooge)
field_ranks <- data.frame(
  field = V(sj)$field,
  PageRank = PageRank(penalised_xtab)[-1],
  BradleyTerry = BTscores(penalised_xtab)[-1],
  Scroogefactor = Scroogefactor(penalised_xtab)[-1]
)
field_ranks$rank <- rank(-field_ranks$BradleyTerry, ties.method = 'first')
dplyr::arrange(field_ranks, desc(BradleyTerry))
field PageRank BradleyTerry Scroogefactor rank
economics 0.0095262 0.0198675 0.0198085 1
statistics 0.0052172 0.0182253 0.0184844 2
rheumatology 0.0075946 0.0154524 0.0149113 3
medicine 0.0785679 0.0147539 0.0140610 4
surgery 0.0187227 0.0147393 0.0140006 5
nephrology 0.0091929 0.0147153 0.0141108 6
psychometrics 0.0030642 0.0146652 0.0153434 7
oncology 0.0298477 0.0142217 0.0134646 8
neuroscience 0.0483922 0.0138453 0.0131796 9
urology 0.0054990 0.0134027 0.0129622 10
politics 0.0050798 0.0129090 0.0134032 11
orthopaedics 0.0081823 0.0127964 0.0125031 12
parasitology 0.0200139 0.0127915 0.0122454 13
biomedical sciences 0.1391158 0.0126568 0.0119706 14
psychiatry 0.0162007 0.0125158 0.0119816 15
psychology 0.0157250 0.0123666 0.0119232 16
management 0.0082283 0.0123594 0.0118723 17
vascular surgery 0.0032286 0.0120399 0.0120587 18
obstetrics 0.0077170 0.0117973 0.0113524 19
opthalmology 0.0049835 0.0117134 0.0113681 20
sports medicine 0.0062347 0.0115786 0.0112158 21
marketing 0.0044868 0.0112366 0.0109821 22
sociology 0.0065198 0.0109829 0.0108478 23
computer graphics 0.0028843 0.0106927 0.0107898 24
plastic surgery 0.0036232 0.0104481 0.0102228 25
radiology 0.0104202 0.0101617 0.0098458 26
dentistry 0.0048420 0.0095513 0.0092427 27
operational research 0.0058023 0.0094057 0.0092595 28
rhinology 0.0048558 0.0093847 0.0091205 29
philosophy 0.0024042 0.0093150 0.0098626 30
zoology 0.0212266 0.0093070 0.0090279 31
communication 0.0032187 0.0090557 0.0091353 32
dermatology 0.0046755 0.0089293 0.0085348 33
social anthropology 0.0027730 0.0088286 0.0090860 34
mathematical finance 0.0023882 0.0087307 0.0091967 35
music 0.0023110 0.0086923 0.0090142 36
hypnosis 0.0022330 0.0086700 0.0091130 37
history of science 0.0025406 0.0086569 0.0088251 38
evaluation 0.0022738 0.0086527 0.0090182 39
toxicology 0.0082432 0.0086072 0.0080802 40
lighting 0.0022472 0.0085945 0.0089657 41
psychoanalysis 0.0022574 0.0085334 0.0088749 42
transfusion 0.0027156 0.0085100 0.0084292 43
social history 0.0025049 0.0084987 0.0088589 44
French sociology 0.0022331 0.0084802 0.0088812 45
rehabilitation 0.0022275 0.0084746 0.0088672 46
onomastics/history of mathematics 0.0022206 0.0084621 0.0088660 47
natural history 0.0022199 0.0084576 0.0088621 48
Russian sociology 0.0022237 0.0084546 0.0088548 49
history of education 0.0022232 0.0084451 0.0088458 50
history of economics 0.0022352 0.0084432 0.0088541 51
religion 0.0022335 0.0084329 0.0087993 52
complexity 0.0031541 0.0084078 0.0086762 53
translation 0.0022264 0.0083900 0.0087861 54
agriculture 0.0139732 0.0083396 0.0081027 55
German sociology 0.0022213 0.0083082 0.0086561 56
linguistics 0.0027522 0.0082932 0.0083216 57
civil engineering 0.0022271 0.0082824 0.0086966 58
petrochemical engineering 0.0023708 0.0082457 0.0085083 59
Brazilian/French philosophy 0.0022209 0.0082120 0.0084957 60
leisure studies 0.0023499 0.0082049 0.0084187 61
Latin American studies 0.0022265 0.0081671 0.0085518 62
Romanian/Italian ethics 0.0022224 0.0081632 0.0084943 63
leather 0.0022236 0.0081602 0.0085846 64
anthropology 0.0032005 0.0081559 0.0080893 65
policy 0.0027347 0.0080814 0.0081088 66
international law 0.0024032 0.0080547 0.0084432 67
occupational therapy 0.0023249 0.0080541 0.0081471 68
informatics 0.0099826 0.0080403 0.0078628 69
electrical engineering 0.0035042 0.0078939 0.0078633 70
automation 0.0022552 0.0078823 0.0082833 71
geotechnology 0.0027123 0.0078614 0.0080759 72
engineering design 0.0024121 0.0078412 0.0081312 73
forensics 0.0026537 0.0078035 0.0076939 74
education 0.0047743 0.0077080 0.0075149 75
mycology 0.0025491 0.0076594 0.0076372 76
logic 0.0024510 0.0076156 0.0078584 77
bibliometrics 0.0027331 0.0075872 0.0077041 78
anatomy 0.0024009 0.0075638 0.0076517 79
information systems 0.0036066 0.0075381 0.0072705 80
electronics 0.0037840 0.0075347 0.0074561 81
Brazilian healthcare 0.0024541 0.0075112 0.0073808 82
social work 0.0028226 0.0074891 0.0074850 83
biomaterials 0.0112502 0.0073331 0.0067864 84
logistics 0.0034656 0.0072290 0.0071273 85
human geography 0.0045769 0.0072145 0.0071259 86
ecology 0.0061912 0.0069880 0.0068437 87
astrophysics 0.0042515 0.0069680 0.0064905 88
nuclear engineering 0.0024137 0.0069329 0.0073273 89
radioactivity 0.0031276 0.0069110 0.0070901 90
law reviews 0.0026354 0.0068964 0.0067880 91
mathematics 0.0086304 0.0068269 0.0069352 92
learning disabilities 0.0031019 0.0067851 0.0066117 93
marine engineering 0.0023518 0.0066475 0.0070585 94
geology 0.0139842 0.0064870 0.0061994 95
social justice 0.0024211 0.0063083 0.0060054 96
veterinary medicine 0.0057741 0.0062613 0.0060098 97
spectroscopy 0.0107509 0.0061040 0.0058136 98
tourism 0.0025631 0.0060081 0.0060159 99
tribology 0.0037282 0.0059347 0.0058167 100
measurement 0.0025076 0.0057733 0.0061204 101
engineering management 0.0025191 0.0054657 0.0055476 102
environmental science 0.0197079 0.0054299 0.0051928 103
food science 0.0125544 0.0054239 0.0051509 104
chemistry & physics 0.0441748 0.0053294 0.0049305 105
robotics 0.0044990 0.0049737 0.0050510 106
material science 0.0073490 0.0049279 0.0049302 107
agronomy 0.0024144 0.0048210 0.0050570 108
energy 0.0072781 0.0047346 0.0046635 109
textiles 0.0023452 0.0047212 0.0052096 110
wood 0.0024436 0.0045605 0.0049897 111
particle physics 0.0051697 0.0039993 0.0039264 112
lastplace <- max(field_ranks$rank)
interestingfields <- c('mathematics',
                       'informatics',
                       'biomedical sciences',
                       'medicine',
                       'chemistry & physics',
                       'zoology',
                       'textiles')

library(ggrepel)
ggplot(field_ranks) +
  aes(rank, 100*BradleyTerry, label = field) +
  geom_point(colour = 'tomato2', size = 1) +
  geom_text_repel(data = subset(field_ranks, rank < 5 | field %in% interestingfields),
                  nudge_y = .15,
                  nudge_x = -5,
                  segment.alpha = .25,
                  family = 'Gill Sans MT Condensed', 
                  fontface = 'bold',
                  colour = 'tomato2',
                  point.padding = unit(0.1, 'lines')
                  ) +
  scale_x_reverse(name = NULL,
                  labels = scales::ordinal,
                  breaks = c(1, seq(20, lastplace, by = 20)),
                  minor_breaks = c(1, seq(5, lastplace, by = 5)),
                  limits = c(lastplace, 1),
                  expand = c(0.02, 0)) +
  scale_y_continuous(NULL,
                     position = 'right') +
  theme_bw() +
  theme(text = element_text(family = 'Gill Sans MT Condensed', face = 'bold'),
        axis.text = element_text(colour = 'tomato2'),
        panel.grid = element_blank(),
        panel.border = element_blank(),
        axis.ticks = element_line(colour = 'tomato2'),
        axis.line = element_line(colour = 'tomato2'),
        plot.background = element_rect(fill = 'transparent', colour = NA),
        panel.background = element_rect(fill = 'transparent', colour = NA))
interestingjournals <- c('R Journal',
                         'Stat Science',
                         'Biometrika',
                         #'J Stat Software',
                         'Annals',
                         'Stats in Medicine',
                         #'Biostatistics',
                         'Machine Learning',
                         'JRSS-B',
                         'Statistics',
                         #'J Applied Stats',
                         'Metrika',
                         'JCGS')
lastplace2 <- max(stats_ranks$rank)

ggplot(stats_ranks) +
  aes(rank, 100*BradleyTerry, label = journal) +
  geom_point(colour = '#2D7B95', size = 1) +
  geom_text_repel(data = subset(stats_ranks, rank <= 3 | journal %in% interestingjournals),
                  nudge_y = .15,
                  nudge_x = -5,
                  segment.alpha = .25,
                  family = 'Gill Sans MT Condensed',
                  fontface = 'bold',
                  colour = '#2D7B95',
                  point.padding = unit(0.2, 'lines')
                  ) +
  scale_x_reverse(name = NULL,
                  labels = scales::ordinal,
                  breaks = c(1, seq(20, lastplace2, by = 20)),
                  minor_breaks = c(1, seq(5, lastplace2, by = 5)),
                  limits = c(lastplace2, 1),
                  expand = c(0.02, 0)) +
  scale_y_continuous(NULL,
                     position = 'right') +
  theme_bw() +
  theme(text = element_text(family = 'Gill Sans MT Condensed', face = 'bold'),
        axis.text = element_text(colour = '#2D7B95'),
        panel.grid = element_blank(),
        panel.border = element_blank(),
        axis.ticks = element_line(colour = '#2D7B95'),
        axis.line = element_line(colour = '#2D7B95'),
        plot.background = element_rect(fill = 'transparent', colour = NA),
        panel.background = element_rect(fill = 'transparent', colour = NA))