PLOS Subject Area Thesaurus
Latest commit 6eeaf2e Feb 23, 2017 @rdrysdale rdrysdale Update to plosthes.2016-4


The "Subject Areas" panel on each article page displays a set of terms selected for that article based on its content. The Subject Areas belong to a thesaurus of over 10,000 terms initially built for us over the course of 2012 by Access Innovations, taking into account the controlled vocabulary of classification terms that had been in use in PLOS Editorial Manager. The entire corpus of PLOS articles was analysed to ensure that the thesaurus covers the research domain comprehensively.

Specific terms are associated with articles by Machine Aided Indexing (MAI) which identifies text strings in the articles and matches them to Subject Area terms from the thesaurus. The output ranks the matches in order of frequency of hits within the text of the article and the top eight terms are selected for display. Whereas previously the eight terms were presented in alphabetical order, we are introducing weightings to the Subject Area panel, such that the order of Subject Areas will reflect the frequency of hits in the MAI process. This new presentation will be introduced progressively, beginning with the main Article tab.

The MAI process uses a Rulebase to guide Subject Area selection. Whereas identifying a phrase such as "Retinitis pigmentosa" is relatively straightforward for software, the issue is more complex for a word such as "Sodium" where the relevant Subject Area might be "Voltage-gated sodium channels", any of several sodium compounds or even just "Sodium" the element. The rule for terms such as "Sodium" are therefore compound and include conditional statements to disambiguate these different contexts. While the vast majority of terms are effectively indexed there remain ambiguities for some terms and part of our work is to continue to identify these cases and modify the Rulebase accordingly.

The Subject Area terms are related to each other with a system of broader/narrower term relationships. The thesaurus structure is a polyhierarchy, so for example the Subject Area "White blood cells" has two broader terms "Blood cells" and "Immune cells". At its deepest the hierarchy is ten tiers deep, with all terms tracking back to one or more of the top tier Subject Areas, such as "Biology and life sciences" or "Social sciences".

The Subject Area terms can be used to access PLOS articles via Advanced Search. The MAI Rulebase accommodates synonyms, so a Subject-specific Search for "Highly active antiretroviral therapy" will retrieve articles based on MAI matches to "HAART" (a known synonym) as well as "Highly active antiretroviral therapy" in one step. Additionally, because we select the most frequently MAI-indexed Subject Areas for each article the Search returns articles where the Subject Area in question ranks highly enough to be included in the top eight MAI-retrieved terms, so is of relatively high significance, whereas a query term in an "All fields" search will return every article that contains even a single match to the query term.

The hierarchical nature of the thesaurus also enhances article retrieval in Search. Not only does Search retrieve all articles specifically indexed by MAI with the query Subject Area term in question, but it also retrieves all articles specifically indexed with any Subject Area term that sits deeper into the hierarchy than the query Subject Area term, but on the same broader/narrower term path. Thus, in the "White blood cells" example, queries for either "Blood cells" or "Immune cells" will return all articles indexed with "White blood cells".

The Subject Area terms on all article pages are hyperlinked and following the link returns a listing of all articles to which that Subject Area term applies. Subject Area terms can also be used as the basis for Saved Searches, RSS feeds and PLOS ONE customized Journal Alerts.

The content of the PLOS thesaurus and the Rulebase that governs the application of Subject Areas to the articles is constantly under review. We incorporate analysis of the Subject Area panel "feedback" clicks into this review process, and are always happy to receive specific input on Subject Area terms and their application by email. We update the thesaurus behind the PLOS sites several times each year.

Questions and comments about Subject Areas and the PLOS thesaurus can be sent to