# Summary

This dashboard was made to give insight into the research performance of higher education institutions (HEIs) in the UK using data from the 2021 Research Excellence Framework (REF 2021). The dashboard is intended to satisfy the needs of two groups of users:

1. graduate students and researchers looking to evaluate opportunities and
2. academic administrators looking to identify and address departmental strengths and weaknesses.

REF 2021 was a comprehensive undertaking to assess the quality of research in higher education institutions that was conducted jointly by the four UK higher education funding bodies and is used to allocate around £2 billion per year to universities for research. Each HEI was required to submit documentation and evidence characterising their research activity over the period 1 January 2014 to 31 December 2020. These submissions were grouped into 34 units of assessment (UoAs) that roughly correspond to research disciplines and were graded with a qualitative five-point system.

Each submission made by an institution into a UoA contained a standard set of information in relation to three elements or “sub-profiles”: the quality of research **outputs**, the quality of the **impact** of research and the quality of the **environment** to support research and impact. Each submission was assessed in terms of these three elements, which were combined to produce an **overall** quality “profile” awarded to each submission.

This resulted in a dataset of ratings assigned to submissions for by 156 HEIs across the UoAs broken down by the (sub-)profiles—indicating the percentage of the submissions of a subject (UoA) at a HEI that achieved a specific quality level at each profile. For example, the University of Glasgow had its outputs for Mathematics rated as 50.4% world-leading, 48.8% internationally excellent, 0.8% “recognised internationally” 0.0% “recognised nationally” and 0.0% “below nationally recognised standard or not suitable for assessment”). (See “The dataset and dashboard terminology” section below for more details.)

In order to clearly compare and rank performance between or within institutions, a score derived from a weighted combination of the ratings across the five-point system is useful. However, the published data do not immediately permit rankings as ratings are given as percentages obtained in the qualitative five-point system, as the above example shows. Furthermore, the weightings used by funding bodies (not included in the REF data) and education consultancies are inadequate for meeting the needs of the researchers, graduate students and administrators listed previously.

# Project goals

In order to facilitate comparisons between and within institutions, in this project, I set out to

1. produce a suitable weighting scheme that combined these ratings into scores
2. visualise the data most relevant to the user groups and
3. encourage comparisons between similar institutions.

The idea is to be able to identify top universities for research in a selected discipline(s) as well as select an institution and find out what its particular strengths and weaknesses are when compared against suitably similar institutions. The approach to comparing by similarity is implemented by filtering by number of staff and comprehensiveness in the range of disciplines researched.

This solves a number of problems faced by other ranking systems. When ranking across all (or multiple) subjects an institution that offers a single subject will be compared against another that offers 30 subjects across in all domains. For example, excelling in Business and Management Studies alone, places The London Business School as the top institution in the UK when no filters are applied. The approach taken by the popular ranking systems results in another problem: when ranking performance in a particular subject an excellent institution that specialises in the Social Sciences, for example, receives a low rank for a subject they offer because of their small size. (This is explained in more detail below.)

# Features of the Dashboard

The right-hand side of the dashboard is prominently occupied by a table that displays for each institution, the score, number of subjects (i.e., UoAs) offered, (FTE) staff numbers and coarse-grained rank. This can be filtered by buttons in the top section to view rankings by the desired quality profile; allowing an administrator to inspect why their overall score for a given subject was low, for example.

Filtering by comprehensiveness and (FTE) staff numbers is also easily achieved by buttons in the top section; affording a researcher direct control over the kind of institutions that are compared. For example, one researcher may prefer to work in a highly comprehensive institution—where multi-disciplinary opportunities abound, while another researcher may prefer smaller institutions even if they are highly specialised. Similarly, an administrator may want to compare their institutions performance in a subject with institutions of a similar size.

At the top left-hand side, one can select an institution from a drop-down list. Below that is a bar chart with scores for all subjects. The selected institutions’ scores for each subject is displayed side-by-side with the average score of the other (possibly filtered) institutions in the data set. One can then click on a subject in the bar chart and cause the ranking table on the right of the dashboard to be sorted by performance in the selected subject. Additionally, the score percentile for each subject of the selected institution within the filtered group is displayed at the tip of the bars.

One can also see how staff distribution across subject groups, how the distribution of scores across the quality profiles, and how the distribution of ratings across the quality levels for a selected institution compares with the group in three separate charts in the centre of the dashboard. The staff distribution is likely to be relevant for a researcher or graduate while the other two may be more relevant to administrators.

A simple colour scheme is used throughout; with data pertaining to the selected institution in blue and data for the group of filtered institutions in green.

# Problems with existing ranking systems

The most prominent providers of rankings that asses the research quality of HEIs globally<sup><sup>[\[1\]](#footnote-32578)</sup></sup> are Times Higher Education (THE), Quacquarelli Symonds (QS), and Academic Ranking of World Universities (ARWU). These rankings are frequently criticised as inadequate for making meaningful comparisons, as they tend to heavily focus on reputation as well as metrics that disfavour small or young institutions and undervalue institutions with substantial social science and humanities output. These rankings are based almost exclusively on proxies for quality, such as number of publications in the journals _Nature_ and _Science_, numbers of Nobel prize winners from generations ago, and reputation votes by academics. Additionally, their metrics are heavily influenced by institution size and usually include teaching and student learning factors, and hence, are less suited to evaluating research quality.

Aside from the traditional rankings, other weighting schemes have been applied to the REF 2021 data to evaluate UK institutions. In particular, the public organisations in the UK that commissions the REF have a system that combines the raw rating data to determine how to allocate billions of pounds to HEIs. This includes many details that are relevant to the purpose we will not go into here, however one step of this process is to assign 4 points to the highest quality rating (“world-leading”), 1 point to the next level (“internationally excellent”) and 0 to the remaining three 0 (“recognised internationally”, “recognised nationally”, and “below nationally recognised standard or not suitable for assessment”). This is designed to focus rewards on only the best performers. That is the scheme has the respective weights: 4,1,0,0 and 0. Thus, an institution that has most of its research deemed to be “world-leading” and “internationally recognised” will perform poorly and one that has all its outputs rated as “recognised internationally” will receive a score of 0—the same as an institution with all of its outputs rated at the lowest level: “below nationally recognised standard or not suitable for assessment”. Though these outcomes may be suitable for the intentions of the funding bodies, this is a very strict system that is unable to discriminate between good and poor performance—a distinction that is pertinent to my target users, especially administrators.

The Times Higher Education consultancy (THE) has also produced a ranking table based on REF 2021 that uses a far more lenient weighting system (weights: 4, 3, 2, 1, 0). They then multiply the weighted score by the number of staff to produce a final score called the “research power”. Ostensibly, research power is used because ranking by the size-independent raw weighted scores results in obscure, small and highly specialised institutions appearing high in the rankings. Unsurprisingly, this ranking largely correlates with staff numbers. Furthermore, the distribution of the scores is also very skewed—such that there’s little to separate the high-performing institutions while the low-performing end has high discrimination.

# How this system is an improvement

The REF data, based directly on evaluating representative publications combined with the holistic inclusion of research impact and environment, is an improvement on the indirect, reputation and legacy based approach that the popular rankings use. The evaluation and rating process was also thoroughly audited through various mechanisms, ensuring consistency in assessment standards within and between sub-panels. Additionally, the REF allows institutions, like the London School of Economics, that focus on social sciences to get a fairer evaluation as outputs that are appropriate for the discipline, such as books, are used rather than publications that are typically STEM outlets, like the journal _Nature_.

Furthermore, I chose a weighting system (4, 1.5, 1, 0.5, 0) that is balanced between being strict enough to heavily reward top performers—reflecting the intentions of the organisations that commissioned the data—and still rewarding quality at all levels of the five-point system. This achieves a distribution that is approximately normal and hence successfully discriminates institutions in any score range. The normality of this scheme is consistently superior to the “official” and “lenient” ones described above—even when the data are filtered by different profiles (ie. outputs, environment and impact), subject groups (e.g. Arts and Humanities) or unfiltered; without institution grouping, whereby the distribution of scores is for every individual subject (1881 in total) rated in each university.

This dashboard also allows one to filter the data by comprehensiveness of institutions and size (number of staff), thereby facilitating more meaningful comparisons between HEI that are similar. For example, the School of Oriental and African Studies has all its subjects under the Social Sciences or Arts and Humanities groups (giving a comprehensiveness of 2); comparing this institution with a fully comprehensive one like Anglia Ruskin University that has all subject groups represented despite having seems rather imprudent, despite the two institutions having similar staff numbers. The popular ranking systems appear to address this by multiplying by the number of staff, and as discussed in the previous section, this emphasis on size dilutes the potential for the data to be used to obtain insights about quality. I believe this challenge is better addressed with filtering.

The dashboard was made with the idea that when it comes to ranking performance across multiple subjects it is better to compare institutions that are specialised to similar degrees. This deals with cases like The London Business School without diluting the effectiveness of the data. On the other hand, when comparing HEIs by an individual subject, it may be suitable to include institutions of any degree of specialisation/comprehensiveness but use size to include only institutions of interest—comparing the average output of 5 FTE staff versus 300 FTE staff working in a discipline hardly seems appropriate.

# The dataset and the dashboard terminology

The dataset covers submissions to REF 2021 made by 156 UK HEIs across 34 discipline-based units of assessments characterising research activity in all disciplines over the period from 1 January 2014 to 31 December 2020. The results of the evaluation programme were distilled into an XLSX file containing 7506 rows and 14 columns detailing the ratings obtained for each subject across quality levels and quality profiles, together with data on FTE staff numbers and submissions rates.

Aside from where indicated, all terms and definitions originate from the REF 2021 framework.

## Rank

I added a rank category to the data to facilitate comparison—results of HEI evaluations are most often reported in terms of their rankings. However, some ranking providers have been criticised for using excessively precise rankings, as though a 0.01% difference in their scoring system was meaningful. I agree with this criticism, especially since a lot of subjectivity is involved in the process at all stages of the data acquisition and analysis. Therefore, I coarse-grain the rankings by rounding off to the nearest whole percent. (Sorting by “score” instead of “rank” will allow one to quickly infer a more fine-grained ranking, if desired.)

## Subjects

These include subjects in the conventional sense, such as “Chemistry”, “Philosophy”, etc. as well as small groupings such as “Psychology, Psychiatry and Neuroscience” and “Agriculture, Food and Veterinary Sciences”. While these are referred to as “units of assessment” in the data, I use the label “subjects” to facilitate comprehension of the dashboard and minimise the need for looking up definitions.

## Subject groups

- Medicine, health and life sciences
- Engineering, physical & mathematical sciences
- Social sciences
- Arts and humanities

(These are respectively referred to as the “main panels” A, B, C and D in REF publications.)

## Comprehensiveness

This is also a category not present in the original data. It is defined by the number or subject groups a HEI has reported. Filtering by comprehensiveness allows one to compare institutions that are more similar to each other.

## Quality levels

### Overall profile

The overall profile can be summarised as evaluating originality, significance and rigour with starred levels corresponding to the following; 4★: world-leading, 3★: internationally excellent, 2★: recognised internationally, 1★: recognised nationally, Unclassified: below nationally recognised standard or not suitable for assessment.

### Output subprofile

The defined levels are identical to the overall profile above, but displayed here in full:

| Quality level | Description |
| --- | --- |
| 4★  | Quality that is **world-leading** in terms of originality, significance and rigour. |
| 3★  | Quality that is **internationally excellent** in terms of originality, significance and rigour but which **falls short of the highest standards** of excellence. |
| 2★  | Quality that is **recognised internationally** in terms of originality, significance and rigour |
| 1★  | Quality that is **recognised nationally** in terms of originality, significance and rigour. |
| Unclassified | Quality that **falls below the standard of nationally recognised** work. Or work which **does not meet the published definition of research** for the purposes of this assessment. |

### Impact sub-profile

The starred levels denote quality as follows:

| Quality level | Description |
| --- | --- |
| 4★  | **Outstanding** impacts in terms of their reach and significance. |
| 3★  | **Very considerable** impacts in terms of their reach and significance. |
| 2★  | **Considerable** impacts in terms of their reach and significance. |
| 1★  | **Recognised but modest** impacts in terms of their reach and significance. |
| Unclassified | The impact is of **little or no reach and significance**; or the impact was **not eligible**; or the impact was **not underpinned by excellent research** produced by the submitted unit |

### Environment sub-profile

The starred levels denote quality as follows:

| Quality level | Description |
| --- | --- |
| 4★  | An environment that is **conducive to producing research of world-leading** **quality** and **enabling outstanding impact**, in terms of its vitality and sustainability. |
| 3★  | An environment that is **conducive to producing research of internationally excellent** **quality** and **enabling very considerable impact**, in terms of its vitality and sustainability. |
| 2★  | An environment that is **conducive to producing research of internationally recognised quality** and **enabling considerable impact**, in terms of its vitality and sustainability. |
| 1★  | An environment that is **conducive to producing research of nationally recognised quality** and **enabling recognised but modest impact**, in terms of its vitality and sustainability. |
| Unclassified | An environment that is **not conducive to producing research of nationally recognised quality** or **enabling impact of reach and significance**. |

## Quality Profiles

### Outputs

Outputs are the published or publicly available products of research, which can take many forms. These include books, monographs, chapters in books and journal articles as well as performances, exhibitions and other practice research outputs, software, patents, conference proceedings, translations, and digital and visual media. Outputs were assessed against three criteria: originality, significance, and rigour.

### Impact

HEIs were required to submit impact case studies that demonstrate the impacts their research has had beyond academia. Impact is defined as the effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia. This was assessed against two criteria: reach and significance.

### Environment

HEIs were required to submit narrative evidence of the environment to support research and enable impact within each unit, alongside data on research income, research income in kind, and completed doctoral degrees. Environment refers to “the environment for supporting research and enabling impact within each submitting unit”. Assessed against two criteria: vitality and sustainability.

### Overall

This is the aggregate rating obtained by weighting the profiles as follows; outputs: 60%, impact: 25%, and environment: 15%.

These weights are applied together with the quality level weightings only when aggregated to get the overall score. Therefore, when filtering the dashboard by the sub-quality profiles (eg., by impact only), just the quality level weightings are applied.

## FTE Staff

Full-time equivalent (FTE) is a measurement that combines the hours worked by employees to determine the number of full-time employees it would take to work those hours. For example, if a HEI considers 37.5 hours full time, an employee working 7.5 hours per week would be 0.2 FTE staff and five such employees would be 1 FTE staff.

For REF 2021 all staff with a contract of employment with at least 0.2FTE that had significant responsibility for research during the evaluation period.

Any references to “size” or just “staff” in any part of this project refers to FTE staff.

# References

<https://www.ukri.org/publications/research-england-how-we-fund-higher-education-providers/how-we-fund-higher-education-providers/>

<https://www.sfc.ac.uk/our-funding/university-funding/>

<https://2021.ref.ac.uk/guidance-on-results/guidance-on-ref-2021-results/index.html>

<https://researchsupport.admin.ox.ac.uk/ref-2021-guidance-and-framework>

<https://www.timeshighereducation.com/news/ref-2021-times-higher-educations-table-methodology>

<https://www.thecompleteuniversityguide.co.uk/sector/insights/university-and-subject-league-tables-methodology>

<https://www.researchprofessionalnews.com/rr-news-uk-universities-2022-5-ref2021-the-top-10/>

<https://www.hepi.ac.uk/wp-content/uploads/2016/12/Hepi_International-university-rankings-For-good-or-for-ill-REPORT-89-10_12_16_Screen.pdf>

# Footnotes

1. Rankings exclusively focused on UK HEIs such as _The Complete University Guide_ and _The Guardian_ ranking tend to be focused on student experience and outcomes. [↑](#footnote-ref-32578)