# Exploring The Dimensions Search Language (DSL) - Deep Dive

This tutorial provides a detailed walkthrough of the most important features of the [Dimensions Search Language](https://docs.dimensions.ai/dsl/). 

This tutorial is based on the [Query Syntax](https://docs.dimensions.ai/dsl/language.html) section of the official documentation. So, it can be used as an interactive version of the documentation, as it allows to try out the various DSL queries presented there.

## What is the Dimensions Search Language?

The DSL aims to capture the type of interaction with Dimensions data
that users are accustomed to performing graphically via the [web
application](https://app.dimensions.ai/), and enable web app developers, power users, and others to
carry out such interactions by writing query statements in a syntax
loosely inspired by SQL but particularly suited to our specific domain
and data organization.

**Note:** this notebook uses the Python programming language, however all the **DSL queries are not Python-specific** and can in fact be reused with any other API client. 



## Prerequisites

This notebook assumes you have installed the [Dimcli](https://pypi.org/project/dimcli/) library and are familiar with the *Getting Started* tutorial.


In [1]:
!pip install dimcli --quiet 

import dimcli
from dimcli.shortcuts import *
import json
import sys
import pandas as pd
#

print("==\nLogging in..")
# https://github.com/digital-science/dimcli#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  USERNAME = getpass.getpass(prompt='Username: ')
  PASSWORD = getpass.getpass(prompt='Password: ')    
  dimcli.login(USERNAME, PASSWORD, ENDPOINT)
else:
  USERNAME, PASSWORD  = "", ""
  dimcli.login(USERNAME, PASSWORD, ENDPOINT)
dsl = dimcli.Dsl()

==
Logging in..
Dimcli - Dimensions API Client (v0.6.9.2)
Connected to endpoint: https://app.dimensions.ai - DSL version: 1.25
Method: dsl.ini file



## Sections Index 

1. Basic query structure
2. Full-text searching
3. Field searching
4. Searching for researchers
5. Returning results 
6. Aggregations

## 1. Basic query structure

DSL queries consist of two required components: a `search` phrase that
indicates the scientific records to be searched, and one or
more `return` phrases which specify the contents and structure of the
desired results.

The simplest valid DSL query is of the form `search <source>|return <result>`:

In [3]:
%%dsldf 
search grants return  grants limit 5

Returned Grants: 5 (total = 5310256)


Unnamed: 0,language,title_language,active_year,project_num,start_year,funding_org_name,id,title,start_date,original_title,funders,end_date
0,en,en,[2021],2018-HRSI-1548,2021,New Brunswick Health Research Foundation,grant.8690978,APPROACH to Enriching the Real World Evidence ...,2021-11-30,APPROACH to Enriching the Real World Evidence ...,"[{'id': 'grid.484521.e', 'acronym': 'NBHRF', '...",
1,en,en,[2021],1301720F,2021,Fund for Scientific Research,grant.8950252,Molecular mechanism of DNA double strand break...,2021-10-01,Mécanismes moléculaires de la formation et la ...,"[{'id': 'grid.424470.1', 'acronym': 'FRS FNRS'...",
2,en,en,"[2021, 2022, 2023]",M 2734,2021,FWF Austrian Science Fund,grant.8715161,Life as concept and as science,2021-10-01,Life as concept and as science,"[{'id': 'grid.25111.36', 'acronym': 'FWF', 'ci...",2023-09-30
3,en,en,"[2021, 2022, 2023]",892933,2021,European Commission,grant.8964235,Scintillation Light For New Physics with Liqui...,2021-09-01,Scintillation Light For New Physics with Liqui...,"[{'id': 'grid.270680.b', 'acronym': 'EC', 'cit...",2023-08-31
4,en,en,"[2021, 2022, 2023]",893021,2021,European Commission,grant.8963889,Jet quenching for heavy-ion collisions at the LHC,2021-09-01,Jet quenching for heavy-ion collisions at the LHC,"[{'id': 'grid.270680.b', 'acronym': 'EC', 'cit...",2023-08-31


### `search source`

A query must begin with the word `search` followed by a `source` name, i.e. the name of a type of scientific `record`, such as `grants` or `publications`.

**What are the sources available?** See the [data sources](https://docs.dimensions.ai/dsl/data-sources.html) section of the documentation. 

Alternatively, we can use the 'schema' API ([describe](https://docs.dimensions.ai/dsl/data-sources.html#metadata-api)) to return this information programmatically:

In [4]:
dsl.query("describe schema")

<dimcli.DslDataset object #4635749392. Dict keys: 'sources', 'entities'>

A more useful query might also make use of the optional `for` and
`where` phrases to limit the set of records returned.

In [5]:
%%dsldf 
search grants  for "lung cancer" 
    where active_year=2000 
return  grants  limit 5

Returned Grants: 5 (total = 1734)


Unnamed: 0,project_num,end_date,start_date,original_title,start_year,title_language,id,funding_org_name,funders,active_year,language,title
0,F32HL010455,2002-01-01,2000-12-31,ROLE OF CD44 ISOFORMS IN ENDOTHELIAL CELL DAMAGE,2000,en,grant.2386513,National Heart Lung and Blood Institute,"[{'id': 'grid.279885.9', 'country_name': 'Unit...","[2000, 2001, 2002]",en,ROLE OF CD44 ISOFORMS IN ENDOTHELIAL CELL DAMAGE
1,R01HL063695,2004-11-30,2000-12-18,"ESTROGEN, ANGIOGENESIS AND ENDOTHELIAL PROGENI...",2000,en,grant.2537116,National Heart Lung and Blood Institute,"[{'id': 'grid.279885.9', 'country_name': 'Unit...","[2000, 2001, 2002, 2003, 2004]",en,"ESTROGEN, ANGIOGENESIS AND ENDOTHELIAL PROGENI..."
2,R01HL066221,2007-11-30,2000-12-18,GENETIC ANALYSIS OF EPHRIN-EPH SIGNALING IN AN...,2000,en,grant.2537801,National Heart Lung and Blood Institute,"[{'id': 'grid.279885.9', 'country_name': 'Unit...","[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007]",en,GENETIC ANALYSIS OF EPHRIN-EPH SIGNALING IN AN...
3,R01HL062244,2017-12-31,2000-12-15,Synthetic Heparan Sulfate: Probing Biosynthesi...,2000,en,grant.2536777,National Heart Lung and Blood Institute,"[{'id': 'grid.279885.9', 'country_name': 'Unit...","[2000, 2001, 2002, 2003, 2004, 2005, 2006, 200...",en,Synthetic Heparan Sulfate: Probing Biosynthesi...
4,R01CA088932,2019-03-31,2000-12-01,Regulation of Telomerase by Sphingolipid Signa...,2000,en,grant.2475193,National Cancer Institute,"[{'id': 'grid.48336.3a', 'country_name': 'Unit...","[2000, 2001, 2002, 2003, 2004, 2005, 2006, 200...",en,Regulation of Telomerase by Sphingolipid Signa...


### `return` result (source or facet)

The most basic `return` phrase consists of the keyword `return` followed
by the name of a `record` or `facet` to be returned. 

This must be the
name of the `source` used in the `search` phrase, or the name of a
`facet` of that source.

In [6]:
%%dsldf
search grants for "laryngectomy" 
return grants limit 5

Returned Grants: 5 (total = 110)


Unnamed: 0,start_date,title,end_date,title_language,project_num,id,funders,original_title,funding_org_name,start_year,language,active_year
0,2019-08-15,Wearable silent speech technology to enhance i...,2024-07-31,en,R01DC016621,grant.8554260,"[{'id': 'grid.214431.1', 'types': ['Facility']...",Wearable silent speech technology to enhance i...,National Institute on Deafness and Other Commu...,2019,en,"[2019, 2020, 2021, 2022, 2023, 2024]"
1,2019-04-01,Construction of a nursing system leading to im...,2023-03-31,en,19H03937,grant.8428997,"[{'id': 'grid.54432.34', 'types': ['Nonprofit'...",Construction of a nursing system leading to im...,Japan Society for the Promotion of Science,2019,ja,"[2019, 2020, 2021, 2022, 2023]"
2,2019-04-01,Development of self-directed TE shunt speech t...,2022-03-31,en,19K10927,grant.8441322,"[{'id': 'grid.54432.34', 'types': ['Nonprofit'...",Development of self-directed TE shunt speech t...,Japan Society for the Promotion of Science,2019,ja,"[2019, 2020, 2021, 2022]"
3,2019-04-01,Development of an olfactory improvement progra...,2021-03-31,ja,19K19574,grant.8422934,"[{'id': 'grid.54432.34', 'types': ['Nonprofit'...",喉頭がん、下咽頭がんにより喉頭摘出術を受けた患者に対する嗅覚向上プログラムの開発,Japan Society for the Promotion of Science,2019,ja,"[2019, 2020, 2021]"
4,2019-03-01,Early postoperative complications of laryngect...,2019-05-01,lv,AP-44/19,grant.9013618,"[{'id': 'grid.453247.3', 'types': ['Government...",Agrīnās laringektomiju pēcoperācijas komplikāc...,Ministry of Education and Science,2019,lv,[2019]


Eg let's see what are the *facets* available for the *grants* source:

In [7]:
fields = dsl.query("describe schema")['sources']['grants']['fields']
[x for x in fields if fields[x]['is_facet']]

['category_uoa',
 'category_for',
 'category_hrcs_hc',
 'category_hra',
 'category_rcdc',
 'language',
 'funder_countries',
 'research_org_state_codes',
 'category_hrcs_rac',
 'research_org_cities',
 'start_year',
 'funders',
 'funding_currency',
 'research_org_countries',
 'active_year',
 'funding_org_city',
 'funding_org_name',
 'language_title',
 'funding_org_acronym',
 'research_orgs',
 'researchers',
 'category_icrp_cso',
 'category_icrp_ct',
 'category_bra']

## 2. Full-text Searching

Full-text search or keyword search finds all instances of a term
(keyword) in a document, or group of documents. 

Full text search works
by using search indexes, which can be targeting specific sections of a
document e.g. its $abstract$, $authors$, $full text$ etc...

In [8]:
%%dsldf 
search publications 
    in full_data for "moon landing" 
return publications limit 5

Returned Publications: 5 (total = 168408)


Unnamed: 0,type,pages,author_affiliations,year,id,title
0,chapter,14-30,"[[{'first_name': 'Alessandro', 'last_name': 'B...",2020,pub.1127643502,"1. Into the Woods (Via Cuma 320, Bacoli)"
1,chapter,82-103,,2020,pub.1125153646,ANDRIY BONDAR
2,chapter,160-174,"[[{'first_name': 'Laura', 'last_name': 'Marcus...",2020,pub.1126253233,15. H. G. Wells at Uppark
3,chapter,224-240,"[[{'first_name': 'Jacob L.', 'last_name': 'Mac...",2020,pub.1127632269,12. The Silence of Aeneid 6 in Augustine’s Con...
4,chapter,232-271,"[[{'first_name': 'Alison', 'last_name': 'Finla...",2020,pub.1125633591,Skald Sagas in their Literary Context 2: Possi...


### 2.1 `in [search index]`

This optional phrase consists of the particle `in` followed by a term indicating a `search index`, specifying for example whether the search
is limited to full text, title and abstract only, or title only. 

In [9]:
%%dsldf 
search grants 
    in title_abstract_only for "something" 
return grants limit 5

Returned Grants: 5 (total = 9677)


Unnamed: 0,start_date,title,end_date,title_language,project_num,id,funders,original_title,funding_org_name,start_year,language,active_year
0,2020-10-01,SaTC: CORE: Medium: Collaborative: Hardening O...,2024-09-30,en,1954521,grant.9046367,"[{'id': 'grid.457785.c', 'types': ['Government...",SaTC: CORE: Medium: Collaborative: Hardening O...,Directorate for Computer & Information Science...,2020,en,"[2020, 2021, 2022, 2023, 2024]"
1,2020-10-01,SaTC: CORE: Medium: Collaborative: Hardening O...,2024-09-30,en,1955270,grant.9046432,"[{'id': 'grid.457785.c', 'types': ['Government...",SaTC: CORE: Medium: Collaborative: Hardening O...,Directorate for Computer & Information Science...,2020,en,"[2020, 2021, 2022, 2023, 2024]"
2,2020-10-01,SaTC: CORE: Medium: Collaborative: Hardening O...,2024-09-30,en,1954712,grant.9046384,"[{'id': 'grid.457785.c', 'types': ['Government...",SaTC: CORE: Medium: Collaborative: Hardening O...,Directorate for Computer & Information Science...,2020,en,"[2020, 2021, 2022, 2023, 2024]"
3,2020-09-30,The Cosmology of the Early and Late Universe,2023-09-29,en,ST/T000732/1,grant.8673892,"[{'id': 'grid.14467.30', 'types': ['Government...",The Cosmology of the Early and Late Universe,Science and Technology Facilities Council,2020,en,"[2020, 2021, 2022, 2023]"
4,2020-09-01,Decoding the Infrared Spectra of High Frequenc...,2023-08-31,en,1900095,grant.8966252,"[{'id': 'grid.457875.c', 'types': ['Government...",Decoding the Infrared Spectra of High Frequenc...,Directorate for Mathematical & Physical Sciences,2020,en,"[2020, 2021, 2022, 2023]"


Eg let's see what are the *search fields* available for the *grants* source:

In [10]:
dsl.query("describe schema")['sources']['grants']['search_fields']

['concepts',
 'title_abstract_only',
 'title_only',
 'noun_phrases',
 'investigators',
 'full_data']

In [11]:
%%dsldf 
search grants 
    in full_data for "graphene AND computer AND iron" 
return grants limit 5

Returned Grants: 5 (total = 10)


Unnamed: 0,start_date,title,end_date,title_language,project_num,id,funders,original_title,funding_org_name,start_year,language,active_year
0,2019-01-01,Weyl and Dirac semimetals and beyond - predict...,2021-12-31,en,19-43-04129,grant.8413990,"[{'id': 'grid.454869.2', 'types': ['Nonprofit'...",Weyl and Dirac semimetals and beyond - predict...,Russian Science Foundation,2019,en,"[2019, 2020, 2021]"
1,2018-01-01,Project of the organization of the 18th Intern...,2018-12-31,ru,18-02-20097,grant.8731867,"[{'id': 'grid.452899.b', 'types': ['Government...",Проект организации 18-ой Международной конфере...,Russian Foundation for Basic Research,2018,ru,[2018]
2,2016-02-22,Subject subsidy for maintaining the research p...,2016-12-31,pl,4491/E-370/S/2016,grant.7397800,"[{'id': 'grid.425823.a', 'types': ['Government...",Dotacja podmiotowa na utrzymanie potencjału ba...,Ministry of Science and Higher Education,2016,pl,[2016]
3,2015-02-19,Subject subsidy for maintaining the research p...,2015-12-31,pl,4491/E-370/S/2015,grant.7397795,"[{'id': 'grid.425823.a', 'types': ['Government...",Dotacja podmiotowa na utrzymanie potencjału ba...,Ministry of Science and Higher Education,2015,pl,[2015]
4,2014-04-09,Intentional grant for conducting in 2014 the F...,2014-12-31,pl,4491/E-370/M/2014,grant.7397490,"[{'id': 'grid.425823.a', 'types': ['Government...",Dotacja celowa na prowadzenie w 2014 przez Wyd...,Ministry of Science and Higher Education,2014,pl,[2014]


Special search indexes for persons names permit to perform full text
searches on publications `authors` or grants `investigators`. Please see the
*Researchers Search* section below for more information
on how searches work in this case.

In [12]:
%dsldf search publications in authors for "\"Jennifer A Doudna\"" return publications limit 5

Returned Publications: 5 (total = 323)


Unnamed: 0,title,author_affiliations,issue,id,year,volume,type,pages,journal.id,journal.title
0,Machine learning predicts new anti-CRISPR prot...,"[[{'first_name': 'Simon', 'last_name': 'Eitzin...",9.0,pub.1125959258,2020,48.0,article,4698-4708,jour.1018982,Nucleic Acids Research
1,Author Correction: Phage-assisted evolution of...,"[[{'first_name': 'Michelle F.', 'last_name': '...",,pub.1127737872,2020,,article,1-1,jour.1115214,Nature Biotechnology
2,Huge and variable diversity of episymbiotic CP...,"[[{'first_name': 'Christine Y', 'last_name': '...",,pub.1127645424,2020,,preprint,2020.05.14.094862,jour.1293558,bioRxiv
3,Cancer-specific loss of TERT activation sensit...,"[[{'first_name': 'Alexandra M', 'last_name': '...",,pub.1127163455,2020,,preprint,2020.04.25.061606,jour.1293558,bioRxiv
4,Blueprint for a Pop-up SARS-CoV-2 Testing Lab,[[{'first_name': 'Innovative Genomics Institut...,,pub.1126635310,2020,,article,2020.04.11.20061424,jour.1369542,medRxiv


### 2.2 `for "search term"`

This optional phrase consists of the keyword `for` followed by a
`search term` `string`, enclosed in double quotes (`"`).

Strings in double quotes can contain nested quotes escaped by a
backslash `\`. This will ensure that the string in nested double quotes
is searched for as if it was a single phrase, not multiple words.

An example of a phrase: `"\"Machine Learning\""` : results must contain
`Machine Learning` as a phrase.

In [13]:
%dsldf search publications for "\"Machine Learning\"" return publications limit 5

Returned Publications: 5 (total = 1139898)


Unnamed: 0,pages,id,type,title,author_affiliations,year,volume,issue,journal.id,journal.title
0,243-248,pub.1124666091,chapter,Towards maritime traffic coordination in the e...,"[[{'first_name': 'Eetu', 'last_name': 'Heikkil...",2020,,,,
1,1726672,pub.1125710665,article,Recognizing hotspots in Brief Eclectic Psychot...,"[[{'first_name': 'Sytske', 'last_name': 'Wiege...",2020,11.0,1.0,jour.1045059,European Journal of Psychotraumatology
2,41-54,pub.1126735888,article,Capacitated vehicle routing problem with colum...,"[[{'first_name': 'Baze University Abuja', 'las...",2020,3.0,1.0,jour.1365688,Open Journal of Discrete Applied Mathematics
3,219-250,pub.1124034443,chapter,Die Erfassung und Messung von Bedeutungsstrukt...,"[[{'first_name': 'Jan', 'last_name': 'Goldenst...",2020,,,,
4,83-94,pub.1124677880,chapter,Korean Technical Innovation: toward Autonomous...,"[[{'first_name': 'Yongwon', 'last_name': 'Kwon...",2020,,,,


Example of multiple keywords: `"Machine Learning"` : this searches for
keywords independently.

In [14]:
%dsldf search publications for "Machine Learning" return publications limit 5

Returned Publications: 5 (total = 2400834)


Unnamed: 0,pages,id,type,title,author_affiliations,year
0,84-118,pub.1124947017,chapter,4. Visualizing the Division of Labor: William ...,"[[{'first_name': 'John', 'last_name': 'Barrell...",2020
1,65-368,pub.1127396158,chapter,Documents,,2020
2,87-139,pub.1125380179,chapter,I. THE PHILOSOPHY OF SUCCESS,"[[{'first_name': 'Heinrich Robert', 'last_name...",2020
3,243-248,pub.1124666091,chapter,Towards maritime traffic coordination in the e...,"[[{'first_name': 'Eetu', 'last_name': 'Heikkil...",2020
4,1-60,pub.1124109965,chapter,George Eliot’s Spinoza. An introduction,"[[{'first_name': 'Benedictus de', 'last_name':...",2020


Note: Special characters, such as any of `^ " : ~ \ [ ] { } ( ) ! | & +` must be escaped by a backslash `\`. Also, please note escaping rules in
[Python](http://python-reference.readthedocs.io/en/latest/docs/str/escapes.html) (or other languages). For example, when writing a query with escaped quotes, such as `search publications for "\"phrase 1\" AND \"phrase 2\""`, in Python, it is necessary to escape the backslashes as well, so it
would look like: `'search publications for "\\"phrase 1\\" AND \\"phrase 2\\""'`. 

See the [official docs](https://docs.dimensions.ai/dsl/language.html#for-search-term) for more details.

### 2.3 Boolean Operators

Search term can consist of multiple keywords or phrases connected using
boolean logic operators, e.g. `AND`, `OR` and `NOT`.

In [15]:
%dsldf search publications for "(dose AND concentration)" return publications limit 5

Returned Publications: 5 (total = 5259655)


Unnamed: 0,title,id,year,type,pages,author_affiliations,issue,volume,journal.id,journal.title
0,ANHANG. Part 2,pub.1126070644,2020,chapter,802-1094,,,,,
1,England in 1845 and in 1885,pub.1126070808,2020,chapter,61-66,,,,,
2,Translational studies of estradiol and progest...,pub.1124948447,2020,article,1723857,"[[{'first_name': 'Antonia V', 'last_name': 'Se...",1.0,11.0,jour.1045059,European Journal of Psychotraumatology
3,7. Conservation of the Amsterdam Sunflowers: F...,pub.1125801745,2020,chapter,175-206,"[[{'first_name': 'Ella', 'last_name': 'Hendrik...",,,,
4,New findings questioning the construct validit...,pub.1124216519,2020,article,1708145,"[[{'first_name': 'Julian D', 'last_name': 'For...",1.0,11.0,jour.1045059,European Journal of Psychotraumatology


When specifying Boolean operators with keywords such as `AND`, `OR` and
`NOT`, the keywords must appear in all uppercase. 

The operators available are shown in the table below.
.

| Boolean Operator | Alternative Symbol | Description                                                                                                                                                                 |
|------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `AND`            | `&&`               | Requires both terms on either side of the Boolean operator to be present for a match.                                                                                       |
| `NOT`            | `!`                | Requires that the following term not be present.                                                                                                                            |
| `OR`             | `||`               | Requires that either term (or both terms) be present for a match.                                                                                                           |
|                  | `+`                | Requires that the following term be present.                                                                                                                                |
|                  | `-`                | Prohibits the following term (that is, matches on fields or documents that do not include that term). The `-` operator is functionally similar to the Boolean operator `!`. |

In [16]:
%dsldf search publications for "(dose OR concentration) AND (-malaria +africa)" return publications limit 5

Returned Publications: 5 (total = 1355217)


Unnamed: 0,type,pages,year,id,title,author_affiliations
0,chapter,65-368,2020,pub.1127396158,Documents,
1,chapter,129-143,2020,pub.1124248733,8. India in the Early Nuclear Age,"[[{'first_name': 'Campbell', 'last_name': 'Cra..."
2,chapter,155-174,2020,pub.1127822864,The Economy of Detainability,"[[{'first_name': 'Nicholas', 'last_name': 'De ..."
3,chapter,634-688,2020,pub.1124248682,17. Institutions for Infrastructure in Develop...,"[[{'first_name': 'Antonio', 'last_name': 'Esta..."
4,chapter,285-304,2020,pub.1124946791,16. The Neuroethology of Birdsong,"[[{'first_name': 'Eliot A.', 'last_name': 'Bre..."


The combination of keywords and boolean operators allow to construct rather sophisticated queries. For example, here's a real-world query used to extract publications related to COVID-19. 

In [70]:
q_inner = """ "2019-nCoV" OR "COVID-19" OR "SARS-CoV-2" OR "HCoV-2019" OR "hcov" OR "NCOVID-19" OR  
    "severe acute respiratory syndrome coronavirus 2" OR "severe acute respiratory syndrome corona virus 2" 
    OR (("coronavirus"  OR "corona virus") AND (Wuhan OR China OR novel)) """

# tip: dsl_escape is a dimcli utility function for escaping special characters 
q_outer = f"""search publications in full_data for "{dsl_escape(q_inner)}" return publications"""
print(q_outer)

dsl.query(q_outer)

search publications in full_data for " \"2019-nCoV\" OR \"COVID-19\" OR \"SARS-CoV-2\" OR \"HCoV-2019\" OR \"hcov\" OR \"NCOVID-19\" OR  
    \"severe acute respiratory syndrome coronavirus 2\" OR \"severe acute respiratory syndrome corona virus 2\" 
    OR ((\"coronavirus\"  OR \"corona virus\") AND (Wuhan OR China OR novel)) " return publications
Returned Publications: 20 (total = 99186)


<dimcli.DslDataset object #4639883024. Records: 20/99186>

### 2.4 Wildcard Searches

The DSL supports single and multiple character wildcard searches within
single terms. Wildcard characters can be applied to single terms, but
not to search phrases.

In [17]:
%dsldf search publications in title_only for "ital? malaria" return publications limit 5

Too Many Requests for the Server. Sleeping for 30 seconds and then retrying.
Returned Publications: 5 (total = 142)


Unnamed: 0,title,author_affiliations,id,year,type,pages,journal.id,journal.title,volume,issue
0,"Seasons in Italy: Northern European travelers,...","[[{'first_name': 'Benjamin', 'last_name': 'Rei...",pub.1124231018,2020,article,1-20,jour.1141817,Journal of Tourism and Cultural Change,,
1,Updated guidelines for malaria prophylaxis in ...,"[[{'first_name': 'Guido', 'last_name': 'Caller...",pub.1123222257,2020,article,101544,jour.1034401,Travel Medicine and Infectious Disease,33.0,
2,Clinical management of imported malaria in Ita...,"[[{'first_name': 'Luciana', 'last_name': 'Lepo...",pub.1125332077,2020,article,28-33,jour.1089291,Microbiologica,43.0,1.0
3,Investigation on potential malaria vectors (An...,"[[{'first_name': 'Valentina', 'last_name': 'Ta...",pub.1113815431,2019,article,151,jour.1030597,Malaria Journal,18.0,1.0
4,Increasing imported malaria in children and ad...,"[[{'first_name': 'Fiorenza', 'last_name': 'Pan...",pub.1113201846,2019,article,34-39,jour.1034401,Travel Medicine and Infectious Disease,29.0,


In [18]:
%dsldf search publications in title_only for "it* malaria" return publications limit 5

Returned Publications: 5 (total = 1498)


Unnamed: 0,type,pages,author_affiliations,issue,volume,year,id,title,journal.id,journal.title
0,article,24.0,"[[{'first_name': 'Monica P.', 'last_name': 'Sh...",1.0,19.0,2020,pub.1124106064,The effectiveness of older insecticide-treated...,jour.1030597,Malaria Journal
1,article,109809.0,"[[{'first_name': 'Berge', 'last_name': 'Tsanou...",,136.0,2020,pub.1126819455,Modeling pyrethroids repellency and its role o...,jour.1026215,Chaos Solitons & Fractals
2,article,100333.0,"[[{'first_name': 'Toussaint', 'last_name': 'Ro...",,33.0,2020,pub.1124902730,Severe-malaria infection and its outcomes amon...,jour.1042240,Spatial and Spatio-temporal Epidemiology
3,article,,"[[{'first_name': 'Arif Jamal', 'last_name': 'S...",,67.0,2020,pub.1127964785,Neurological disorder and psychosocial aspects...,jour.1006696,Folia Parasitologica
4,preprint,,"[[{'first_name': 'Jifar', 'last_name': 'Hassen...",,,2020,pub.1127968073,Urban Malaria Prevalence and Its Associated Ri...,jour.1380788,Research Square


| Wildcard Search Type                                             | Special Character | Example                                                                                                                                                                                                                         |
|------------------------------------------------------------------|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Single character - matches a single character                    | `?`               | The search string `te?t` would match both `test` and `text`.                                                                                                                                                                    |
| Multiple characters - matches zero or more sequential characters | `*`               | The wildcard search: `tes*` would match `test`, `testing`, and `tester`. You can also use wildcard characters in the middle of a term. For example: `te*t` would match `test` and `text`. `*est` would match `pest` and `test`. |

### 2.5 Proximity Searches

A proximity search looks for terms that are within a specific distance
from one another.

To perform a proximity search, add the tilde character `~` and a numeric
value to the end of a search phrase. For example, to search for a
`formal` and `model` within 10 words of each other in a document, use
the search:

In [19]:
%dsldf search publications for "\"formal model\"~10" return publications limit 5

Returned Publications: 5 (total = 468787)


Unnamed: 0,pages,id,type,title,author_affiliations,year,volume,issue,journal.id,journal.title
0,84-102,pub.1124248667,chapter,2. Clientelistic Politics and Economic Develop...,"[[{'first_name': 'Pranab', 'last_name': 'Bardh...",2020,,,,
1,1726722,pub.1125320181,article,Building cooperative learning to address alcoh...,"[[{'first_name': 'Oladapo', 'last_name': 'Olad...",2020,13.0,1.0,jour.1041075,Global Health Action
2,xi-xvi,pub.1125144025,chapter,Foreword,,2020,,,,
3,137-159,pub.1125788857,chapter,6. Hierarchy and Power in the Tropical Forest,"[[{'first_name': 'Irving', 'last_name': 'Goldm...",2020,,,,
4,136-161,pub.1125789336,chapter,6. The Structure and Workings of Employer-Prom...,"[[{'first_name': 'Joseph F.', 'last_name': 'Ge...",2020,,,,


In [20]:
%dsldf search publications for "\"digital humanities\"~5  +ontology" return publications limit 5

Returned Publications: 5 (total = 7345)


Unnamed: 0,pages,id,type,title,author_affiliations,volume,year,issue,journal.id,journal.title
0,89,pub.1127423858,article,Citizen science in the social sciences and hum...,"[[{'first_name': 'Loreta', 'last_name': 'Taugi...",6.0,2020,1.0,jour.1136613,Palgrave Communications
1,471-478,pub.1127978306,proceeding,Atlante dei siti fortificati della provincia d...,"[[{'first_name': 'Maurizio', 'last_name': 'Tos...",,2020,,,
2,,pub.1127498852,monograph,Emerging Extended Reality Technologies For Ind...,"[[{'first_name': 'Jolanda G.', 'last_name': 'T...",,2020,,,
3,185-196,pub.1124901249,article,Sparse Low Rank Factorization for Deep Neural ...,"[[{'first_name': 'Sridhar', 'last_name': 'Swam...",398.0,2020,,jour.1128607,Neurocomputing
4,585-604,pub.1120871378,article,WebKey: a graph-based method for event detecti...,"[[{'first_name': 'Elham', 'last_name': 'Rasoul...",54.0,2020,3.0,jour.1327483,Journal of Intelligent Information Systems


The distance referred to here is the number of term movements needed to match the specified phrase.  
In the example above, if `formal` and `model` were 10 spaces apart in a
field, but `formal` appeared before `model`, more than 10 term movements
would be required to move the terms together and position `formal` to
the right of `model` with a space in between.

## 3. Field Searching

Field searching allows to use a specific `field` of a `source` as a
query filter. For example, this can be a
[Literal](supported-types.ipynb) field such as the $type$ of a
publication, its $date$, $mesh terms$, etc.. Or it can be an
[entity](data-entities.ipynb) field, such as the $journal title$ for a
publication, the $country name$ of its author affiliations, etc..

**What are the fields available for each source?** See the [data sources](https://docs.dimensions.ai/dsl/data-sources.html) section of the documentation. 

Alternatively, we can use the 'schema' API ([describe](https://docs.dimensions.ai/dsl/data-sources.html#metadata-api)) to return this information programmatically: 

In [21]:
%dsldocs publications  

Unnamed: 0,sources,field,type,description,is_filter,is_entity,is_facet
0,publications,altmetric,float,Altmetric attention score.,True,False,False
1,publications,altmetric_id,integer,AltMetric Publication ID,True,False,False
2,publications,authors,json,Ordered list of authors names and their affili...,True,False,False
3,publications,book_doi,string,The DOI of the book a chapter belongs to (note...,True,False,False
4,publications,book_series_title,string,"The title of the book series book, belong to.",False,False,False
5,publications,book_title,string,The title of the book a chapter belongs to (no...,False,False,False
6,publications,category_bra,categories,`Broad Research Areas <https://app.dimensions....,True,True,True
7,publications,category_for,categories,`ANZSRC Fields of Research classification <htt...,True,True,True
8,publications,category_hra,categories,`Health Research Areas <https://app.dimensions...,True,True,True
9,publications,category_hrcs_hc,categories,`HRCS - Health Categories <https://app.dimensi...,True,True,True


### 3.1 `where`

This optional phrase consists of the keyword `where` followed by a
`filters` phrase consisting of DSL filter expressions, as described
below.

In [22]:
%dsldf search publications where type = "book" return publications limit 5

Returned Publications: 5 (total = 289608)


Unnamed: 0,id,type,title,year
0,pub.1125300609,book,Duoethnography in English Language Teaching,2020
1,pub.1108455576,book,The Indo-Aryans of Ancient South Asia,2020
2,pub.1031251220,book,Scholia in Aeschinem,2020
3,pub.1124703342,book,Learning to Read Talmud,2020
4,pub.1125300607,book,Sociolinguistic Perspectives on Migration Control,2020


If a `for` phrase is also used in a filtered query, the
system will first apply the filters, and then search the resulting
restricted set of documents for the `search term`.

In [23]:
%dsldf search publications for "malaria" where type = "book" return publications limit 5

Returned Publications: 5 (total = 12374)


Unnamed: 0,type,year,id,title
0,book,2020,pub.1127956583,Food Microbiology and Biotechnology
1,book,2020,pub.1127885675,Armed Conflict Survey 2020
2,book,2020,pub.1127764124,"Textiles, Identity and Innovation: In Touch"
3,book,2020,pub.1127540316,Phagocytes and Cellular Immunity
4,book,2020,pub.1127312535,Pharmaceutical Drug Product Development and Pr...


### 3.2 `in`

For convenience, the DSL also supports shorthand notation for filters
where a particular field should be restricted to a specified range or
list of values (although the same logic may be expressed using complex
filters as shown below).

Syntax: a **range filter** consists of the `field` name, the keyword `in`, and a
range of values enclosed in square brackets (`[]`), where the range
consists of a `low` value, colon `:`, and a `high` value.

In [24]:
%%dsldf 
search grants 
    for "malaria" 
    where start_year in [ 2010 : 2015 ] 
return grants limit 5

Returned Grants: 5 (total = 3046)


Unnamed: 0,language,title_language,active_year,project_num,start_year,funding_org_name,id,title,start_date,original_title,funders,end_date
0,en,en,"[2015, 2016, 2017]",R21AI120981,2015,National Institute of Allergy and Infectious D...,grant.4729738,Bloodborne tropical pathogen detection using m...,2015-12-28,Bloodborne tropical pathogen detection using m...,"[{'id': 'grid.419681.3', 'acronym': 'NIAID', '...",2017-11-30
1,en,en,"[2015, 2016, 2017, 2018, 2019]",R21AI120973,2015,National Institute of Allergy and Infectious D...,grant.4729736,Field-deployable Assay for Differential Diagno...,2015-12-24,Field-deployable Assay for Differential Diagno...,"[{'id': 'grid.419681.3', 'acronym': 'NIAID', '...",2019-02-28
2,en,en,"[2015, 2016, 2017, 2018]",R21AI109439,2015,National Institute of Allergy and Infectious D...,grant.4729699,T cell driven antigen discovery for vaccine ca...,2015-12-21,T cell driven antigen discovery for vaccine ca...,"[{'id': 'grid.419681.3', 'acronym': 'NIAID', '...",2018-11-30
3,en,en,"[2015, 2016, 2017, 2018]",91488,2015,Volkswagen Foundation,grant.4854433,Senior Fellowship for Dr. Eduardo Samo Gudo: E...,2015-12-18,Senior Fellowship for Dr. Eduardo Samo Gudo: E...,"[{'id': 'grid.452969.5', 'acronym': 'Volkswage...",2018-12-18
4,en,en,"[2015, 2016, 2017, 2018, 2019]",MIS-311250,2015,National Institute of Food and Agriculture,grant.8821176,"Biology, Ecology & Management of Emerging Dise...",2015-12-10,"Biology, Ecology & Management of Emerging Dise...","[{'id': 'grid.482914.2', 'acronym': 'NIFA', 's...",2019-09-30


Syntax: a **list filter** consists of the `field` name, the keyword `in`, and a list
of one or more `value` s enclosed in square brackets (`[]`), where
values are separated by commas (`,`):

In [25]:
%%dsldf 
search grants 
    for "malaria" 
    where research_org_name in [ "UC Berkeley", "UC Davis", "UCLA"  ] 
return grants limit 5

Returned Grants: 0
Field 'research_org_name' is deprecated in favor of research_orgs. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


### 3.3 `count` - filter function

The filter function `count` is supported on some fields in
[publications](publications.ipynb) (e.g. `researchers` and
`research_orgs`).

Use of this filter is shown on the example below:

In [26]:
%%dsldf 
search publications 
    for "malaria" 
    where count(research_orgs) > 5 
return research_orgs limit 5

Returned Research_orgs: 5


Unnamed: 0,id,count,country_name,name,longitude,state_name,city_name,latitude,linkout,types,acronym
0,grid.4991.5,1477,United Kingdom,University of Oxford,-1.25401,Oxfordshire,Oxford,51.753437,[http://www.ox.ac.uk/],[Education],
1,grid.8991.9,1396,United Kingdom,London School of Hygiene & Tropical Medicine,-0.1307,Camden,London,51.5209,[http://www.lshtm.ac.uk/],[Education],LSHTM
2,grid.38142.3c,1015,United States,Harvard University,-71.11665,Massachusetts,Cambridge,42.377052,[http://www.harvard.edu/],[Education],
3,grid.21107.35,814,United States,Johns Hopkins University,-76.62028,Maryland,Baltimore,39.328888,[https://www.jhu.edu/],[Education],JHU
4,grid.7445.2,730,United Kingdom,Imperial College London,-0.175478,Westminster,London,51.4986,[http://www.imperial.ac.uk/],[Education],


Number of publications with more than 50 researcher.

In [27]:
%%dsldf 
search publications 
    for "malaria" 
    where count(researchers) > 50 
return publications limit 5

Returned Publications: 5 (total = 190)


Unnamed: 0,id,type,title,author_affiliations,year,journal.id,journal.title,pages,volume,issue
0,pub.1127418736,article,Mapping geographical inequalities in childhood...,"[[{'first_name': 'Robert C', 'last_name': 'Rei...",2020,jour.1077219,The Lancet,,,
1,pub.1127157285,article,Frequency and management of maternal infection...,"[[{'first_name': 'Mercedes', 'last_name': 'Bon...",2020,jour.1048786,The Lancet Global Health,e661-e671,8.0,5.0
2,pub.1126151286,article,Genetic tool development in marine protists: e...,"[[{'first_name': 'Drahomíra', 'last_name': 'Fa...",2020,jour.1033763,Nature Methods,481-494,17.0,5.0
3,pub.1127247220,article,A SARS-CoV-2 protein interaction map reveals t...,"[[{'first_name': 'David E.', 'last_name': 'Gor...",2020,jour.1018957,Nature,1-13,,
4,pub.1125560167,article,Triple artemisinin-based combination therapies...,"[[{'first_name': 'Rob W', 'last_name': 'van de...",2020,jour.1077219,The Lancet,1345-1360,395.0,10233.0


Number of publications with more than one researcher.

In [28]:
%%dsldf 
search publications
where count(researchers) > 1
return funders limit 5

Returned Funders: 5


Unnamed: 0,id,count,types,city_name,longitude,name,country_name,linkout,acronym,latitude,state_name
0,grid.419696.5,1758857,[Government],Beijing,116.33983,National Natural Science Foundation of China,China,[http://www.nsfc.gov.cn/publish/portal1/],NSFC,40.005177,
1,grid.270680.b,645606,[Government],Brussels,4.36367,European Commission,Belgium,[http://ec.europa.eu/index_en.htm],EC,50.85165,
2,grid.424020.0,565529,[Government],Beijing,116.316284,Ministry of Science and Technology of the Peop...,China,[http://www.most.gov.cn/eng/],MOST,39.827835,
3,grid.48336.3a,554556,[Government],Rockville,-77.10119,National Cancer Institute,United States,[http://www.cancer.gov/],NCI,39.004326,Maryland
4,grid.54432.34,525799,[Nonprofit],Tokyo,139.74039,Japan Society for the Promotion of Science,Japan,[http://www.jsps.go.jp/],JSPS,35.68716,


International collaborations: number of publications with more than one author and affiliations located in more than one country.

In [29]:
%%dsldf 
search publications
where count(researchers) > 1
and count(research_org_countries) > 1
return funders limit 5

Returned Funders: 5


Unnamed: 0,id,count,types,city_name,longitude,name,country_name,linkout,acronym,latitude
0,grid.419696.5,433678,[Government],Beijing,116.33983,National Natural Science Foundation of China,China,[http://www.nsfc.gov.cn/publish/portal1/],NSFC,40.005177
1,grid.270680.b,331110,[Government],Brussels,4.36367,European Commission,Belgium,[http://ec.europa.eu/index_en.htm],EC,50.85165
2,grid.424150.6,150024,[Facility],Bonn,7.147797,German Research Foundation,Germany,[http://www.dfg.de/en/],DFG,50.69934
3,grid.424020.0,143572,[Government],Beijing,116.316284,Ministry of Science and Technology of the Peop...,China,[http://www.most.gov.cn/eng/],MOST,39.827835
4,grid.54432.34,132520,[Nonprofit],Tokyo,139.74039,Japan Society for the Promotion of Science,Japan,[http://www.jsps.go.jp/],JSPS,35.68716


Domestic collaborations: number of publications with more than one author and more than one affiliation located in exactly one country.

In [30]:
%%dsldf 
search publications
where count(researchers) > 1
and count(research_org_countries) = 1
return funders limit 5

Returned Funders: 5


Unnamed: 0,id,count,types,city_name,longitude,name,country_name,linkout,acronym,latitude,state_name
0,grid.419696.5,1285232,[Government],Beijing,116.33983,National Natural Science Foundation of China,China,[http://www.nsfc.gov.cn/publish/portal1/],NSFC,40.005177,
1,grid.424020.0,411382,[Government],Beijing,116.316284,Ministry of Science and Technology of the Peop...,China,[http://www.most.gov.cn/eng/],MOST,39.827835,
2,grid.48336.3a,406370,[Government],Rockville,-77.10119,National Cancer Institute,United States,[http://www.cancer.gov/],NCI,39.004326,Maryland
3,grid.54432.34,361012,[Nonprofit],Tokyo,139.74039,Japan Society for the Promotion of Science,Japan,[http://www.jsps.go.jp/],JSPS,35.68716,
4,grid.280785.0,314257,[Facility],Bethesda,-77.09938,National Institute of General Medical Sciences,United States,[http://www.nigms.nih.gov/Pages/default.aspx],NIGMS,38.997833,Maryland


### 3.4 Filter Operators

A simple filter expression consists of a `field` name, an in-/equality
operator `op`, and the desired field `value`. 

The `value` must be a
`string` enclosed in double quotes (`"`) or an integer (e.g. `1234`).

The available operators are:

| `op`           | meaning                                                                                  |
|----------------|------------------------------------------------------------------------------------------|
| `=`            | *is* (or *contains* if the given `field` is multi-value)                                 |
| `!=`           | *is not*                                                                                 |
| `>`            | *is greater than*                                                                        |
| `<`            | *is less than*                                                                           |
| `>=`           | *is greater than or equal to*                                                            |
| `<=`           | *is less than or equal to*                                                               |
| `~`            | *partially matches* (see partial-string-matching below) |
| `is empty`     | *is empty* (see emptiness-filters below)                      |
| `is not empty` | *is not empty* (see emptiness-filters below)                  |

A couple of examples 

In [31]:
%dsldf search datasets where year > 2010 and year < 2012 return datasets limit 5

Returned Datasets: 5 (total = 38341)


Unnamed: 0,authors,keywords,id,title,year,journal.id,journal.title
0,"[{'name': 'Minna Väliranta', 'orcid': ''}, {'n...",[PANGAEA],10993892,(Table 1) Radiocarbon ages of samples taken fr...,2011,jour.1020344,Journal of Biogeography
1,"[{'name': 'Charles-Edouard Thuróczy', 'orcid':...",[PANGAEA],10993247,Average fluorescence and dissolved iron and Fe...,2011,jour.1023157,Deep Sea Research Part II Topical Studies in O...
2,"[{'name': 'Charles-Edouard Thuróczy', 'orcid':...",[PANGAEA],10993244,(Table 1) Average fluorescence in the surface ...,2011,,
3,"[{'name': 'Charles-Edouard Thuróczy', 'orcid':...",[PANGAEA],10993241,Dissolved and dissolvable iron concentrations ...,2011,jour.1312079,Journal of Geophysical Research
4,"[{'name': 'Jean-François Therrien', 'orcid': '...",[PANGAEA],10993193,(Table 1) Movement parameters of nine adult fe...,2011,jour.1023041,Journal of Avian Biology


In [32]:
%dsldf search patents where assignees != "grid.410484.d" return patents limit 5

Returned Patents: 5 (total = 39704493)


Unnamed: 0,publication_date,inventor_names,granted_year,filing_status,assignees,assignee_names,id,title,times_cited,year
0,2009-12-09,"[TUMBACK, STEFAN, SCHNELLE, KLAUS-PETER]",2009.0,Grant,"[{'id': 'grid.6584.f', 'city_name': 'Stuttgart...","[Robert Bosch GmbH, BOSCH GMBH ROBERT]",EP-1409282-B1,METHODS FOR OPERATING A MOTOR VEHICLE DRIVEN B...,0,2001
1,2009-12-10,"[SHKEDI, ROY]",,Application,,[SHKEDI ROY],WO-2009149128-A2,TARGETED TELEVISION ADVERTISEMENTS ASSOCIATED ...,1,2009
2,2009-12-09,"[RIVIELLO, JOHN, M., REY, MARIA, A.]",2009.0,Grant,"[{'id': 'grid.418190.5', 'acronym': 'Life Tech...","[Dionex Corp, DIONEX CORP]",EP-0868664-B1,MULTI-CYCLE LOOP INJECTION FOR TRACE ANALYSIS ...,0,1996
3,2009-12-09,"[TANAKA, EIJI, HIGASHI, TAMIO, KITAMURA, TAKAN...",2009.0,Grant,"[{'id': 'grid.471210.1', 'city_name': 'Tokyo',...","[Kuraray Co Ltd, KURARAY CO]",EP-0861808-B1,Waste water treatment apparatus,1,1998
4,2009-12-09,"[NAKAI, MICHIHIRO, SHIMA, KENSUKE, HIDAKA, HIR...",2009.0,Grant,"[{'id': 'grid.471143.4', 'city_name': 'Tokyo',...","[Fujikura Ltd, FUJIKURA LTD]",EP-0805365-B1,Optical waveguide grating and production metho...,0,1997


### 3.5 Partial string matching with `~`

The `~` operator indicates that the given `field` need only partially,
instead of exactly, match the given `string` (the `value` used with this
operator must be a `string`, not an integer).

For example, the filter `where research_orgs.name~"Saarland Uni"` would
match both the organization named "Saarland University" and the one
named "Universitätsklinikum des Saarlandes", and any other organization
whose name includes the terms "Saarland" and "Uni" (the order is
unimportant). 

In [33]:
%%dsldf 
search patents 
    where assignee_names ~ "IBM" 
return assignees limit 5

Returned Assignees: 5


Unnamed: 0,id,count,city_name,name,country_name
0,grid.410484.d,329418,Armonk,IBM (United States),United States
1,grid.471366.1,22089,George Town,GlobalFoundries (Cayman Islands),Cayman Islands
2,grid.14648.3f,5071,Winchester,IBM (United Kingdom),United Kingdom
3,grid.420451.6,3555,Mountain View,Google,United States
4,grid.472772.3,2717,Beijing,Lenovo (China),China


### 3.6 Emptiness filters `is empty`

To filter records which contain specific field or to filter those which
contain an empty field, it is possible to use something like
`where research_orgs is not empty` or `where issn is empty`.

In [34]:
%%dsldf
search publications 
    for "iron graphene" 
    where researchers is empty 
    and research_orgs is not empty 
return publications[id+title+researchers+research_orgs+type] limit 5

Returned Publications: 5 (total = 2066)


Unnamed: 0,type,research_orgs,id,title
0,article,"[{'id': 'grid.411507.6', 'country_name': 'Indi...",pub.1127980991,Sensitive determination of kojic acid in tomat...
1,article,"[{'id': 'grid.411507.6', 'country_name': 'Indi...",pub.1127901191,Copper oxide immobilized clay nano architectur...
2,article,"[{'id': 'grid.33764.35', 'country_name': 'Chin...",pub.1125095130,Molecular Dynamics Simulations of Melting Iron...
3,article,"[{'id': 'grid.411510.0', 'country_name': 'Chin...",pub.1124438091,Sulfur-Doped Alkylated Graphene Oxide as High-...
4,article,"[{'id': 'grid.410726.6', 'country_name': 'Chin...",pub.1127875464,Application of Raman spectroscopy to probe fun...


## 4. Searching for Researchers

The DSL offers different mechanisms for searching for researchers (e.g.
publication authors, grant investigators), each of them presenting
specific advantages.

### 4.1 Exact name searches

Special full-text indices allows to look up a researcher's name and
surname **exactly as they appear in the source documents** they derive from.

This approach has a broad scope, as it allows to search the full
collection of Dimensions documents irrespectively of whether a
researcher was succesfully disambiguated (and hence given a Dimensions
ID). On the other hand, this approach will only match names as they
appear in the source document, so different spellings or initials are
not necessarily returned via a single query. 

```
search in [authors|investigators|inventors]
```

It is possible to look up publications authors using a specific
`search index` called `authors`. 

This method expects case insensitive
phrases, in format $"<first name> <last name>"$ or reverse order. Note
that strings in double quotes that contain nested quotes must always be
escaped by a backslash `\`.

In [35]:
%dsldf search publications in authors for "\"Charles Peirce\"" return publications limit 5

Returned Publications: 5 (total = 229)


Unnamed: 0,title,author_affiliations,id,year,type,pages
0,26. Assurance through Reasoning,"[[{'first_name': 'Charles S.', 'last_name': 'P...",pub.1123488542,2019,chapter,565-585
1,Abbreviations of Peirce’s Works and Archives,"[[{'first_name': 'Charles S.', 'last_name': 'P...",pub.1123488550,2019,chapter,x-xii
2,5. On Logical Graphs,"[[{'first_name': 'Charles S.', 'last_name': 'P...",pub.1123488521,2019,chapter,211-261
3,12. Peripatetic Talks,"[[{'first_name': 'Charles S.', 'last_name': 'P...",pub.1123488528,2019,chapter,348-366
4,14. On the First Principles of Logical Algebra,"[[{'first_name': 'Charles S.', 'last_name': 'P...",pub.1123488530,2019,chapter,385-398


Instead of first name, initials can also be used. These are examples of
valid research search phrases:

-   `\"Peirce, Charles S.\"`
-   `\"Charles S. Peirce\"`
-   `\"CS Peirce\"`
-   `\"Peirce CS\"`
-   `\"C S Peirce\"`
-   `\"Peirce C S\"`
-   `\"C Peirce\"`
-   `\"Peirce C\"`
-   `\"Charles Peirce\"`
-   `\"Peirce Charles\"`

**Warning**: In order to produce valid results an author or an investigator search
query must contain **at least two components or more** (e.g., name and
surname, either in full or initials).

Investigators search is similar to *authors* search, only it allows to search on `grants` and
`clinical trials` using a separate search index `investigators`, and on
`patents` using the index `inventors`.

In [36]:
%%dsldf 
search clinical_trials in investigators for "\"John Smith\"" 
return clinical_trials limit 5

Returned Clinical_trials: 2 (total = 2)


Unnamed: 0,active_years,id,investigator_details,title
0,"[2008, 2009, 2010, 2011, 2012, 2013, 2014, 201...",NCT00689533,"[[John M Flynn, MD, Principal Investigator, Ch...",VEPTR Implantation to Treat Children With Earl...
1,,NCT01241149,"[[Ellie Mentler, MD, Principal Investigator, U...",Prospective Evaluation of Symptom Resolution i...


In [37]:
%%dsldf 
search grants in investigators for "\"Satoko Shimazaki\"" 
return grants limit 5

Returned Grants: 4 (total = 4)


Unnamed: 0,start_date,title,end_date,title_language,project_num,id,funders,original_title,funding_org_name,start_year,language,active_year
0,2020-09-01,"Kabuki Actors, Print Technology, and the Theat...",2021-08-31,en,FEL-263245-19,grant.7925589,"[{'id': 'grid.422239.c', 'types': ['Government...","Kabuki Actors, Print Technology, and the Theat...",National Endowment for the Humanities,2020,en,"[2020, 2021]"
1,2018-04-01,Genealogy research on female saints in the Pal...,2021-03-31,ja,18K00431,grant.7527261,"[{'id': 'grid.54432.34', 'types': ['Nonprofit'...",古・中英語期における女性聖人伝の系譜研究：Aelfricのテクストと言語を中心に,Japan Society for the Promotion of Science,2018,ja,"[2018, 2019, 2020, 2021]"
2,2015-04-01,Images of Women in the Old English Lives of Sa...,2018-03-31,en,15K02313,grant.5858713,"[{'id': 'grid.54432.34', 'types': ['Nonprofit'...",Images of Women in the Old English Lives of Sa...,Japan Society for the Promotion of Science,2015,en,"[2015, 2016, 2017, 2018]"
3,2012-04-01,Reception and Transfromation of the Images of ...,2015-03-31,en,24520310,grant.6086985,"[{'id': 'grid.54432.34', 'types': ['Nonprofit'...",Reception and Transfromation of the Images of ...,Japan Society for the Promotion of Science,2012,en,"[2012, 2013, 2014, 2015]"


In [38]:
%%dsldf 
search patents in inventors for "\"John Smith\"" 
return patents limit 5

Returned Patents: 5 (total = 501)


Unnamed: 0,title,times_cited,filing_status,publication_date,id,year,assignee_names,assignees,inventor_names,granted_year
0,Diagnostic method,0,Application,2002-10-31,US-20020160362-A1,2001,"[AstraZeneca AB, SMITH JOHN CRAIG]","[{'id': 'grid.418151.8', 'city_name': 'Södertä...",[John Smith],
1,Automotive heat exchanger,0,Grant,2006-03-22,GB-2384299-B,2002,"[Llanelli Radiators Ltd, Calsonic Kansei UK Lt...","[{'id': 'grid.472810.8', 'city_name': 'Llanell...",[SMITH JOHN],2006.0
2,Microelectronic assemblies with composite cond...,2,Application,2005-06-23,US-20050133900-A1,2005,"[Tessera Inc, TESSERA INC]","[{'id': 'grid.455499.0', 'city_name': 'San Jos...",[John Smith],
3,A lockable safety insert for an electrical dom...,0,Grant,2004-11-03,IE-S20030195-A2,2003,[SMITH JOHN],,[SMITH JOHN],2004.0
4,Ammunition cartridge,0,Application,2014-10-22,GB-2513101-A,2013,"[Eley Ltd, ELEY LTD]",,[SMITH JOHN],


### 4.2 Fuzzy Searches

This type of search is similar to *full-text
search*, with the difference that it
allows searching by only a part of a name, e.g. only the 'last name' of
a person, by using the `where` clause. 

**Note** At this moment, this type of search is only available for
`publications`. Other sources will add this option in the future.

For example:

In [39]:
%%dsldf 
search publications where authors = "Hawking" 
return publications limit 5[id+doi+title+authors] limit 10

Returned Errors: 1
1 QuerySyntaxError found
1 ParserError found
  * [Line 2:27] ('[') mismatched input '[' expecting <EOF>


Generally speaking, using a `where` clause to search authors is less
precise that using the relevant exact-search syntax. 

On the other hand, using a
`where` clause can be handy if one wants to **combine an author search
with another full-text search index**.

For example:

In [40]:
%%dsldf 
search publications 
    in title_abstract_only for "dna replication" 
    where authors = "smith"  
return publications limit 5

Returned Publications: 5 (total = 1527)


Unnamed: 0,pages,id,type,title,author_affiliations,volume,year,issue,journal.id,journal.title
0,37,pub.1124910780,article,Genetic associations with clozapine-induced my...,"[[{'first_name': 'Paul', 'last_name': 'Lacaze'...",10,2020,1.0,jour.1045271,Translational Psychiatry
1,46,pub.1125664041,article,An epigenome-wide association study of posttra...,"[[{'first_name': 'Mark W.', 'last_name': 'Logu...",12,2020,1.0,jour.1042271,Clinical Epigenetics
2,11,pub.1124060243,article,Longitudinal epigenome-wide association studie...,"[[{'first_name': 'Clara', 'last_name': 'Snijde...",12,2020,1.0,jour.1042271,Clinical Epigenetics
3,250-256,pub.1126387158,article,Molecular Targeting of Cancer-Associated PCNA ...,"[[{'first_name': 'Shanna J.', 'last_name': 'Sm...",17,2020,,jour.1052368,Molecular Therapy - Oncolytics
4,rna.073114.119,pub.1125466205,article,Reciprocal monoallelic expression of ASAR lncR...,"[[{'first_name': 'Michael', 'last_name': 'Hesk...",26,2020,6.0,jour.1114285,RNA


### 4.3 Using the disambiguated Researchers database

The Dimensions [Researchers](https://docs.dimensions.ai/dsl/datasource-researchers.html) source is a database of
researchers information algorithmically extracted and disambiguated from
all of the other content sources (publications, grants, clinical trials
etc..).

By using the `researchers` source it is possible to match an
'aggregated' person object linking together multiple publication
authors, grant investigators etc.. irrespectively of the form their
names can take in the original source documents.

However, since database does not contain all authors and investigators information
available in Dimensions. 

E.g. think of authors from older publications,
or authors with very common names that are difficult to disambiguate, or
very new authors, who have only one or few publications. In such cases,
using full-text authors search might be more
appropriate.

Examples:

In [41]:
%%dsldf 
search researchers for "\"Satoko Shimazaki\"" 
return researchers[basics+obsolete] 

Returned Researchers: 4 (total = 4)


Unnamed: 0,id,obsolete,first_name,last_name,research_orgs
0,ur.014307627665.09,0,Satoko,Shimazaki,"[{'id': 'grid.19006.3e', 'types': ['Education'..."
1,ur.010537333602.30,1,Satoko,Shimazaki,
2,ur.07751146721.59,0,Satoko,Shimazaki,
3,ur.015527473602.63,0,Satoko,Shimazaki,"[{'id': 'grid.266190.a', 'types': ['Education'..."


NOTE pay attentiont to the `obsolete` field. This indicates the researcher ID status. 0 means that the researcher ID is still **active**, 1 means that the researcher ID is **no longer valid**. This is due to the ongoing process of refinement of Dimensions researchers. 

Hence the query above is best written like this:

In [42]:
%%dsldf 
search researchers where obsolete=0 for "\"Satoko Shimazaki\"" 
return researchers[basics+obsolete] 

Returned Researchers: 3 (total = 3)


Unnamed: 0,id,research_orgs,first_name,last_name,obsolete
0,ur.014307627665.09,"[{'id': 'grid.19006.3e', 'acronym': 'UCLA', 's...",Satoko,Shimazaki,0
1,ur.07751146721.59,,Satoko,Shimazaki,0
2,ur.015527473602.63,"[{'id': 'grid.266190.a', 'acronym': 'UCB', 'st...",Satoko,Shimazaki,0


With `Researchers`, one can use other fields as well:

In [43]:
%%dsldf 
search researchers 
    where obsolete=0 and last_name="Shimazaki" 
return researchers[basics] limit 5

Returned Researchers: 5 (total = 468)


Unnamed: 0,id,research_orgs,first_name,last_name
0,ur.013510032403.65,"[{'id': 'grid.419075.e', 'acronym': 'ARC', 'st...",Tatsuo,Shimazaki
1,ur.010700310627.87,"[{'id': 'grid.471199.3', 'city_name': 'Kyoto',...",Tomomi,Shimazaki
2,ur.011035131473.19,"[{'id': 'grid.415776.6', 'acronym': 'NIPH', 'c...",Dai,Shimazaki
3,ur.016627632300.80,,Koji,Shimazaki
4,ur.013205240215.48,"[{'id': 'grid.420062.2', 'city_name': 'Tokyo',...",Toshiyuki,Shimazaki


## 5. Returning results

After the `search` phrase, a query must contain one or more `return`
phrases, specifying the content and format of the information that
should be returned.



### 5.1 Returning Multiple Sources

Multiple results may not be returned in a single `return` phrase.

In [44]:
%%dsldf 
search publications 
return funders limit 5 
return research_orgs limit 5 
return year

Returned Year: 20
Returned Funders: 5
Returned Research_orgs: 5


Unnamed: 0,id,count
0,2019,5488513
1,2018,5118512
2,2017,4789054
3,2016,4403893
4,2015,4219869
5,2014,4077817
6,2013,3885112
7,2012,3624608
8,2011,3506759
9,2010,3090707



### 5.2 Returning Specific Fields

For control over which information from each given `record` will be
returned, a `source` or `entity` name in the `results` phrase can be
optionally followed by a specification of `fields` and `fieldsets` to be
included in the JSON results for each retrieved record.

The fields specification may be an arbitrary list of `field` names
enclosed in brackets (`[`, `]`), with field names separated by a plus
sign (`+`). Minus sign (`-`) can be used to exclude `field` or a
`fieldset` from the result. Field names thus listed within brackets must
be "known" to the DSL, and therefore only a subset of fields may be used
in this syntax (see note below).

In [45]:
%%dsldf 
search grants 
return grants[grant_number + title + language] limit 5

Returned Grants: 5 (total = 5310256)


Unnamed: 0,grant_number,title,language
0,2018-HRSI-1548,APPROACH to Enriching the Real World Evidence ...,en
1,1301720F,Molecular mechanism of DNA double strand break...,en
2,M 2734,Life as concept and as science,en
3,892933,Scintillation Light For New Physics with Liqui...,en
4,893021,Jet quenching for heavy-ion collisions at the LHC,en


In [46]:
%%dsldf 
search clinical_trials 
return clinical_trials [id+ title + acronym + phase] limit 5

Returned Clinical_trials: 5 (total = 562451)


Unnamed: 0,id,title,phase,acronym
0,NCT00249756,Re-Entry MTC for Offenders With MICA Disorders,,
1,NCT00249782,"A Phase II, Randomized, Partial-Blind, Paralle...",Phase 2,
2,NCT00249795,A Parallel Randomized Controlled Evaluation of...,Phase 3,ACTIVE I
3,NCT00249847,A Feasibility Study of Positron Emission Tomog...,,
4,NCT00249860,A Multicentre Phase III Study of Interferon-be...,Phase 3,


**Shortcuts: `fieldsets`**

The fields specification may be the name of a pre-defined `fieldset`
(e.g. `extras`, `basics`). These are shortcuts that can be handy when testing out new queries, for example. 

NOTE In general when writing code used in integrations or long-standing extraction scripts it is **best to return specific fields rather that a predefined set**. This has also the advantage of making queries faster by avoiding the extraction of unnecessary data.
    

In [47]:
%%dsldf 
search grants 
return grants [basics] limit 5 

Returned Grants: 5 (total = 5310256)
Field 'project_num' is deprecated in favor of grant_number. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'title_language' is deprecated in favor of language_title. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,start_date,title,language,project_num,id,funders,original_title,funding_org_name,start_year,title_language,active_year,end_date
0,2021-11-30,APPROACH to Enriching the Real World Evidence ...,en,2018-HRSI-1548,grant.8690978,"[{'id': 'grid.484521.e', 'types': ['Nonprofit'...",APPROACH to Enriching the Real World Evidence ...,New Brunswick Health Research Foundation,2021,en,[2021],
1,2021-10-01,Molecular mechanism of DNA double strand break...,en,1301720F,grant.8950252,"[{'id': 'grid.424470.1', 'types': ['Nonprofit'...",Mécanismes moléculaires de la formation et la ...,Fund for Scientific Research,2021,en,[2021],
2,2021-10-01,Life as concept and as science,en,M 2734,grant.8715161,"[{'id': 'grid.25111.36', 'types': ['Nonprofit'...",Life as concept and as science,FWF Austrian Science Fund,2021,en,"[2021, 2022, 2023]",2023-09-30
3,2021-09-01,Scintillation Light For New Physics with Liqui...,en,892933,grant.8964235,"[{'id': 'grid.270680.b', 'types': ['Government...",Scintillation Light For New Physics with Liqui...,European Commission,2021,en,"[2021, 2022, 2023]",2023-08-31
4,2021-09-01,Jet quenching for heavy-ion collisions at the LHC,en,893021,grant.8963889,"[{'id': 'grid.270680.b', 'types': ['Government...",Jet quenching for heavy-ion collisions at the LHC,European Commission,2021,en,"[2021, 2022, 2023]",2023-08-31


In [48]:
%%dsldf 
search publications 
return publications [basics+times_cited] limit 5 

Returned Publications: 5 (total = 110113720)
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,type,times_cited,pages,author_affiliations,year,id,title,journal.id,journal.title,issue,volume
0,article,0,1-18,"[[{'first_name': 'Nihal', 'last_name': 'ATA TU...",2020,pub.1125931386,Visual research on the trustability of classic...,jour.1142190,Hacettepe Journal of Mathematics and Statistics,,
1,chapter,0,21-48,"[[{'first_name': 'Nienke', 'last_name': 'Bakke...",2020,pub.1125801740,2. The Sunflowers in Perspective,,,,
2,chapter,0,333-349,,2020,pub.1125632078,Literature,,,,
3,monograph,32,,"[[{'first_name': 'Jochen', 'last_name': 'Taupi...",2020,pub.1096916023,Die Standesordnungen der freien Berufe,,,,
4,article,0,1711335,"[[{'first_name': 'Nathaly', 'last_name': 'Aya ...",2020,pub.1124196727,The gender responsiveness of social marketing ...,jour.1041075,Global Health Action,1.0,13.0


The fields specification may be an (`all`), to indicate that all fields
available for the given `source` should be returned.

In [49]:
%%dsldf
search publications 
return publications [all] limit 5 

Returned Publications: 5 (total = 110113720)
Field 'FOR_first' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'terms' is deprecated in favor of concepts. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'RCDC' is deprecated in favor of category_rcdc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_RAC' is deprecated in favor of category_hrcs_rac. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_HC' is deprecated in favor of category_hrcs_hc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html

Unnamed: 0,date_inserted,references,reference_ids,researchers,linkout,id,pages,concepts_scores,concepts,year,...,volume,RCDC,category_rcdc,research_org_country_names,research_org_countries,research_org_names,research_org_cities,category_hra,issue,research_orgs
0,2020-03-28,"[pub.1107763504, pub.1061471419, pub.100981774...","[pub.1107763504, pub.1061471419, pub.100981774...","[{'id': 'ur.015425340575.47', 'first_name': 'N...",https://dergipark.org.tr/tr/download/article-f...,pub.1125931386,1-18,"[{'concept': 'Cox regression', 'relevance': 0....","[Cox regression, regression, research, method,...",2020,...,,,,,,,,,,
1,2020-03-22,,,,,pub.1125801740,21-48,"[{'concept': 'perspective', 'relevance': 0.055...","[perspective, sunflower]",2020,...,,,,,,,,,,
2,2020-03-15,,,,,pub.1125632078,333-349,,,2020,...,,,,,,,,,,
3,2017-12-07,,,,,pub.1096916023,,,,2020,...,,,,,,,,,,
4,2020-01-21,"[pub.1038918292, pub.1013186597, pub.101488649...","[pub.1038918292, pub.1013186597, pub.101488649...","[{'id': 'ur.07430064243.75', 'first_name': 'Na...",https://www.tandfonline.com/doi/pdf/10.1080/16...,pub.1124196727,1711335,"[{'concept': 'social marketing interventions',...","[social marketing interventions, tropical dise...",2020,...,13.0,"[{'id': '498', 'name': 'Behavioral and Social ...","[{'id': '498', 'name': 'Behavioral and Social ...",[Switzerland],"[{'id': 'CH', 'name': 'Switzerland'}]","[Universita della Svizzera Italiana, Graduate ...","[{'id': 2657896, 'name': 'Zürich'}, {'id': 265...","[{'id': '3903', 'name': 'Population & Society'}]",1.0,"[{'id': 'grid.424404.2', 'types': ['Education'..."


### 5.3 Returning Facets

In addition to returning source records matching a query, it is possible
to $facet$ on the [entity](data-entities.ipynb) fields related to a
particular source and return only those entity values as an aggregrated
view of the related source data. This operation is similar to a
$group by$ or $pivot table$.

**Warning** Faceting can return up to a maximum of 1000 results. This is to ensure
adequate performance with all queries. Furthemore, although the `limit`
operator is allowed, the `skip` operator cannot be used.

In [50]:
%%dsldf 
search publications 
    for "coronavirus" 
return research_orgs limit 5

Returned Research_orgs: 5


Unnamed: 0,id,count,country_name,name,longitude,state_name,city_name,latitude,linkout,types,acronym
0,grid.194645.b,984,China,University of Hong Kong,114.13708,Hong Kong,Hong Kong,22.283287,[http://www.hku.hk/],[Education],HKU
1,grid.21107.35,827,United States,Johns Hopkins University,-76.62028,Maryland,Baltimore,39.328888,[https://www.jhu.edu/],[Education],JHU
2,grid.38142.3c,760,United States,Harvard University,-71.11665,Massachusetts,Cambridge,42.377052,[http://www.harvard.edu/],[Education],
3,grid.25879.31,725,United States,University of Pennsylvania,-75.19322,Pennsylvania,Philadelphia,39.952457,[http://www.upenn.edu/],[Education],
4,grid.4991.5,703,United Kingdom,University of Oxford,-1.25401,Oxfordshire,Oxford,51.753437,[http://www.ox.ac.uk/],[Education],


In [51]:
%%dsldf 
search publications 
    for "coronavirus" 
return research_org_countries limit 5
return year limit 5
return category_for limit 5

Returned Category_for: 5
Returned Research_org_countries: 5
Returned Year: 5


Unnamed: 0,id,count,name
0,2211,61716,11 Medical and Health Sciences
1,2206,21254,06 Biological Sciences
2,3114,19179,1108 Medical Microbiology
3,3053,15688,1103 Clinical Sciences
4,3177,15199,1117 Public Health and Health Services


For control over the organization and headers of the JSON query results,
the `return` keyword in a return phrase may be followed by the keyword
`in` and then a `group` name for this group of results, where the group
name is enclosed in double quotes(`"`).

Also, one can define `aliases` that replace the defaul JSON fields names with other ones provided by the user. 

See the [official documentation](https://docs.dimensions.ai/dsl/language.html#aliases) for more details about this feature. 

In [52]:
%%dsldf 
search publications 
return in "facets" funders 
return in "facets" research_orgs

Returned Facets: 2


Unnamed: 0,funders,research_orgs
0,"[{'id': 'grid.419696.5', 'count': 1951296, 'ty...","[{'id': 'grid.26999.3d', 'count': 325233, 'typ..."


### 5.4 What the query statistics refer to - sources VS facets

When performing a DSL search, a `_stats` object is return which contains some useful info eg the total number of records available for a search. 

In [53]:
%%dsldf 
search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
return publications limit 5

Returned Publications: 5 (total = 3768)


Unnamed: 0,type,pages,author_affiliations,issue,volume,year,id,title,journal.id,journal.title
0,article,18124-18131,"[[{'first_name': 'Siewteng', 'last_name': 'Sim...",12.0,3.0,2018,pub.1110885950,Development of Organo-Dispersible Graphene Oxi...,jour.1157000,ACS Omega
1,proceeding,,"[[{'first_name': 'T.', 'last_name': 'Miyagi', ...",,,2018,pub.1110925389,Nuclear Ab Initio Calculations with the Unitar...,,
2,article,29200-29209,"[[{'first_name': 'Taro', 'last_name': 'Toyoda'...",51.0,122.0,2018,pub.1110369527,"Anisotropic Crystal Growth, Optical Absorption...",jour.1038386,The Journal of Physical Chemistry C
3,article,28491-28496,"[[{'first_name': 'Liang', 'last_name': 'Wang',...",50.0,122.0,2018,pub.1110271601,Indium Zinc Oxide Electron Transport Layer for...,jour.1038386,The Journal of Physical Chemistry C
4,article,43682-43690,"[[{'first_name': 'Ami', 'last_name': 'Nomura',...",50.0,10.0,2018,pub.1110222625,Chalcopyrite ZnSnSb2: A Promising Thermoelectr...,jour.1041450,ACS Applied Materials & Interfaces




It is important to note though that the **total number always refers to the main source, never the facets** one is searching for. 

For example, in this query we return `researchers` linked to publications: 

In [54]:
%%dsldf 
search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
return researchers limit 5

Returned Researchers: 5


Unnamed: 0,id,count,research_orgs,first_name,last_name,orcid_id
0,ur.01055753603.27,138,"[grid.14003.36, grid.266298.1, grid.258806.1, ...",Shuzi Shuzi,Hayase,
1,ur.011212042763.67,102,"[grid.258806.1, grid.27476.30, grid.462727.2]",Masayuki,Hikita,
2,ur.01144540527.52,98,"[grid.258806.1, grid.177174.3, grid.11135.37, ...",Ting-Li,Ma,[0000-0002-3310-459X]
3,ur.07644453127.11,96,"[grid.258806.1, grid.471634.3, grid.11417.32, ...",M Kozako M,Kozako,
4,ur.016357156077.09,91,"[grid.54432.34, grid.454850.8, grid.268415.c, ...",Huimin,Lu,[0000-0001-9794-3221]


NOTE: facet results can be 1000 at most (due to performance limitations) so if there are more than 1000 it is not possible to know the total number. 

### 5.5 Paginating Results

At the end of a `return` phrase, the user can specify the maximum number
of results to be returned and the number of top records to skip over
before returning the first result record, for e.g. returning large
result sets page-by-page (i.e. "paging" results) as described below.

This is done using the keyword `limit` followed by the maximum number of
results to return, optionally followed by the keyword `skip` and the
number of results to skip (the offset).

In [55]:
%%dsldf 
search publications return publications limit 10

Returned Publications: 10 (total = 110113720)


Unnamed: 0,title,author_affiliations,id,year,type,pages,journal.id,journal.title,issue,volume
0,Visual research on the trustability of classic...,"[[{'first_name': 'Nihal', 'last_name': 'ATA TU...",pub.1125931386,2020,article,1-18,jour.1142190,Hacettepe Journal of Mathematics and Statistics,,
1,2. The Sunflowers in Perspective,"[[{'first_name': 'Nienke', 'last_name': 'Bakke...",pub.1125801740,2020,chapter,21-48,,,,
2,Literature,,pub.1125632078,2020,chapter,333-349,,,,
3,Die Standesordnungen der freien Berufe,"[[{'first_name': 'Jochen', 'last_name': 'Taupi...",pub.1096916023,2020,monograph,,,,,
4,The gender responsiveness of social marketing ...,"[[{'first_name': 'Nathaly', 'last_name': 'Aya ...",pub.1124196727,2020,article,1711335,jour.1041075,Global Health Action,1.0,13.0
5,To start or to complete? – Challenges in imple...,"[[{'first_name': 'Mahendra M', 'last_name': 'R...",pub.1124099280,2020,article,1704540,jour.1041075,Global Health Action,1.0,13.0
6,Long-term trends in seasonality of mortality i...,"[[{'first_name': 'Benjamin-Samuel', 'last_name...",pub.1124649186,2020,article,1717411,jour.1041075,Global Health Action,1.0,13.0
7,"Eine Warnung an alle, dy sych etwaz duncken: D...","[[{'first_name': 'Ulla', 'last_name': 'William...",pub.1125632729,2020,chapter,167-190,,,,
8,Marienklagen und Pietà,"[[{'first_name': 'Georg', 'last_name': 'Satzin...",pub.1125635978,2020,chapter,241-276,,,,
9,Johannes Taulers Via negationis,"[[{'first_name': 'Walter', 'last_name': 'Haug'...",pub.1125632704,2020,chapter,76-93,,,,


If paging information is not provided, the default values
`limit 20 skip 0` are used, so the two following queries are equivalent:

Combining `limit` and `skip` across multiple queries enables paging or
batching of results; e.g. to retrieve 30 grant records divided into 3
pages of 10 records each, the following three queries could be used:

```
return grants limit 10           => get 1st 10 records for page 1 (skip 0, by default)
return grants limit 10 skip 10   => get next 10 for page 2; skip the 10 we already have
return grants limit 10 skip 20   => get another 10 for page 3, for a total of 30
```

### 5.6 Sorting Results

A sort order for the results in a given `return` phrase can be specified
with the keyword `sort by` followed by the name of 
* a `field` (in the
case that a `source` is being requested) 
* an `indicator (aggregation)` (in the case
that one or more facets are being requested). 

 By default, the result set of full text
queries ($search ... for "full text query"$) is sorted by "relevance".
Additionally, it is possible to specify the sort order, using `asc` or
`desc` keywords. By default, descending order is selected.

In [56]:
%%dsldf 
search grants 
    for "nanomaterials"
return grants sort by title desc limit 5 

Returned Grants: 5 (total = 17719)


Unnamed: 0,project_num,end_date,start_date,original_title,start_year,title_language,id,funding_org_name,funders,active_year,language,title
0,2018/29/N/ST5/01240,2022-03-31,2019-04-01,x,2019,pl,grant.8518592,National Science Center,"[{'id': 'grid.436846.b', 'country_name': 'Pola...","[2019, 2020, 2021, 2022]",pl,x
1,280331443,,2015-01-01,Transmissionselektronenmikroskop,2015,en,grant.4841519,German Research Foundation,"[{'id': 'grid.424150.6', 'country_name': 'Germ...",[2015],en,Transmissionselektronenmikroskop
2,220923099,,2012-01-01,Transmissionselektronenmikroskop,2012,en,grant.4823271,German Research Foundation,"[{'id': 'grid.424150.6', 'country_name': 'Germ...",[2012],de,Transmissionselektronenmikroskop
3,3E120109,2015-06-13,2011-06-16,Snowcontrol.,2011,en,grant.6774902,Belgian Federal Science Policy Office,"[{'id': 'grid.425119.a', 'country_name': 'Belg...","[2011, 2012, 2013, 2014, 2015]",en,Snowcontrol.
4,245513494,,2014-01-01,Röntgenquelle,2014,en,grant.4834305,German Research Foundation,"[{'id': 'grid.424150.6', 'country_name': 'Germ...",[2014],de,Röntgenquelle


In [57]:
%%dsldf  
search grants  
    for "nanomaterials"
return grants  sort by relevance desc limit 5

Returned Grants: 5 (total = 17719)


Unnamed: 0,start_date,title,end_date,title_language,project_num,id,funders,original_title,funding_org_name,start_year,language,active_year
0,2012-06-01,Optically-active chiral nanomaterials,2013-05-31,en,11/W.1/I2065,grant.3984032,"[{'id': 'grid.437854.9', 'types': ['Nonprofit'...",Optically-active chiral nanomaterials,Science Foundation Ireland,2012,en,"[2012, 2013]"
1,2016-04-01,Polymer Nanomaterials,2017-03-31,en,617505,grant.6973622,"[{'id': 'grid.452912.9', 'types': ['Government...",Polymer Nanomaterials,Natural Sciences and Engineering Research Council,2016,en,"[2016, 2017]"
2,2016-04-01,Polymer Nanomaterials,2017-03-31,en,617153,grant.6973270,"[{'id': 'grid.452912.9', 'types': ['Government...",Polymer Nanomaterials,Natural Sciences and Engineering Research Council,2016,en,"[2016, 2017]"
3,2013-04-01,Polymer Nanomaterials,2014-03-31,en,543663,grant.3643972,"[{'id': 'grid.452912.9', 'types': ['Government...",Polymer Nanomaterials,Natural Sciences and Engineering Research Council,2013,en,"[2013, 2014]"
4,2010-04-01,Polymer Nanomaterials,2011-03-31,en,454382,grant.2865162,"[{'id': 'grid.452912.9', 'types': ['Government...",Polymer Nanomaterials,Natural Sciences and Engineering Research Council,2010,en,"[2010, 2011]"


Number of citations per publication

In [58]:
%%dsldf  
search publications
return publications  [doi + times_cited] 
    sort by times_cited limit 5

Returned Publications: 5 (total = 110023255)


Unnamed: 0,times_cited,doi
0,230793,
1,196708,10.1038/227680a0
2,178696,10.1016/0003-2697(76)90527-3
3,87448,10.1006/meth.2001.1262
4,82895,10.1103/physrevlett.77.3865


Recent citations per publication.
Note: Recent citation refers to the number of citations accrued in the last two year period. A single value is stored per document and the year window rolls over in July.

In [59]:
%%dsldf 
search publications
return publications [doi + recent_citations]
    sort by recent_citations limit 5

Returned Publications: 5 (total = 110023255)


Unnamed: 0,recent_citations,doi
0,29381,10.1006/meth.2001.1262
1,22006,10.1103/physrevlett.77.3865
2,21376,10.1176/appi.books.9780890425596
3,20907,10.1109/cvpr.2016.90
4,20077,10.1191/1478088706qp063oa


When a facet is being returned, the `indicator` used in the
`sort` phrase must either be `count` (the default, such that
`sort by count` is unnecessary), or one of the indicators specified in
the `aggregate` phrase, i.e. one whose values are being computed in the
faceting operation. 


In [60]:
%%dsldf 
search publications 
    for "nanomaterials"
return research_orgs 
    aggregate altmetric_median, rcr_avg sort by rcr_avg limit 5 

Returned Research_orgs: 5


Unnamed: 0,id,count,rcr_avg,altmetric_median,types,city_name,longitude,name,country_name,linkout,latitude,acronym,state_name
0,grid.11444.34,1,207.839996,343.0,[Facility],Shanghai,121.467255,Shanghai Institute of Hypertension,China,[http://www.china-sih.com/],31.211678,,
1,grid.11485.39,1,207.839996,343.0,[Nonprofit],London,-0.106269,Cancer Research UK,United Kingdom,[http://www.cancerresearchuk.org/],51.531322,CRUK,
2,grid.11642.30,1,207.839996,343.0,[Education],Saint-Denis,55.48455,University of La Réunion,Reunion,[http://www.univ-reunion.fr/university-of-reun...,-20.901735,,
3,grid.120073.7,1,207.839996,343.0,[Healthcare],Cambridge,0.14,Addenbrooke's Hospital,United Kingdom,[http://www.cuh.org.uk/addenbrookes-hospital],52.176,,Cambridgeshire
4,grid.20931.39,1,207.839996,343.0,[Education],London,-0.134,Royal Veterinary College,United Kingdom,[http://www.rvc.ac.uk/],51.5368,RVC,


## 6. Aggregations

In a `return` phrase requesting one or more `facet` results, aggregation
operations to perform during faceting can be specified after the facet
name(s) by using the keyword `aggregate` followed by a comma-separated
list of one or more `indicator` names corresponding to the `source`
being searched.

In [61]:
%%dsldf
search publications 
    where year > 2010 
return research_orgs  
    aggregate rcr_avg, altmetric_median limit 5

Returned Research_orgs: 5


Unnamed: 0,id,count,rcr_avg,altmetric_median,country_name,name,longitude,state_name,city_name,latitude,linkout,types,acronym
0,grid.17063.33,140923,1.692821,4.0,Canada,University of Toronto,-79.395,Ontario,Toronto,43.661667,[http://www.utoronto.ca/],[Education],
1,grid.38142.3c,136543,2.213127,5.0,United States,Harvard University,-71.11665,Massachusetts,Cambridge,42.377052,[http://www.harvard.edu/],[Education],
2,grid.11899.38,132248,1.045882,2.0,Brazil,University of São Paulo,-46.730103,,São Paulo,-23.563051,[http://www5.usp.br/en/],[Education],USP
3,grid.83440.3b,120731,1.906856,4.0,United Kingdom,University College London,-0.133982,,London,51.52447,[http://www.ucl.ac.uk/],[Education],UCL
4,grid.26999.3d,119074,1.181334,2.0,Japan,University of Tokyo,139.76222,,Tokyo,35.713333,[http://www.u-tokyo.ac.jp/en/],[Education],UT


**What are the metrics/aggregations available?** See the data sources documentation for information about available [indicators](https://docs.dimensions.ai/dsl/datasource-publications.html#publications-indicators).  

Alternatively, we can use the 'schema' API ([describe](https://docs.dimensions.ai/dsl/data-sources.html#metadata-api)) to return this information programmatically:

In [62]:
schema = dsl.query("describe schema")
sources = [x for x in schema['sources']]
# for each source name, extract metrics info
for s in sources:
    print("SOURCE:", s)
    for m in schema['sources'][s]['metrics']:
        print("--", schema['sources'][s]['metrics'][m]['name'], " => ", schema['sources'][s]['metrics'][m]['description'], )

SOURCE: publications
-- count  =>  Total count
-- altmetric_median  =>  Median Altmetric attention score
-- altmetric_avg  =>  Altmetric attention score mean
-- citations_total  =>  Aggregated number of citations
-- citations_avg  =>  Arithmetic mean of citations
-- citations_median  =>  Median of citations
-- recent_citations_total  =>  For a given article, in a given year, the number of citations accrued in the last two year period. Single value stored per document, year window rolls over in July.
-- rcr_avg  =>  Arithmetic mean of `relative_citation_ratio` field.
-- fcr_gavg  =>  Geometric mean of `field_citation_ratio` field (note: This field cannot be used for sorting results).
SOURCE: grants
-- count  =>  Total count
-- funding  =>  Total funding amount, in USD.
SOURCE: patents
-- count  =>  Total count
SOURCE: clinical_trials
-- count  =>  Total count
SOURCE: policy_documents
-- count  =>  Total count
SOURCE: researchers
-- count  =>  Total count
SOURCE: organizations
-- count  

**NOTE** In addition to any specified aggregations, `count` is always computed
and reported when facet results are requested.

In [63]:
%%dsldf
search grants 
    for "5g network" 
return funders 
    aggregate count, funding sort by funding limit 5 

Returned Funders: 5


Unnamed: 0,id,count,funding,acronym,city_name,name,types,longitude,linkout,latitude,country_name,state_name
0,grid.270680.b,175,834354500.0,EC,Brussels,European Commission,[Government],4.36367,[http://ec.europa.eu/index_en.htm],50.85165,Belgium,
1,grid.421091.f,68,52650403.0,EPSRC,Swindon,Engineering and Physical Sciences Research Cou...,[Government],-1.784602,[https://www.epsrc.ac.uk/],51.567093,United Kingdom,England
2,grid.457785.c,106,49446108.0,NSF CISE,Arlington,Directorate for Computer & Information Science...,[Government],-77.111,[http://www.nsf.gov/dir/index.jsp?org=CISE],38.88058,United States,Virginia
3,grid.55047.33,5,47182381.0,NCRD,Warsaw,National Centre for Research and Development,[Government],21.00763,[http://www.ncbr.gov.pl/en/],52.227455,Poland,
4,grid.457810.f,73,24371660.0,NSF ENG,Arlington,Directorate for Engineering,[Government],-77.111,[http://www.nsf.gov/dir/index.jsp?org=ENG],38.88058,United States,Virginia


Aggregated total number of citations

In [64]:
%%dsldf
search publications
    for "ontologies"
return funders 
    aggregate citations_total 
    sort by citations_total  limit 5

Returned Funders: 5


Unnamed: 0,id,count,citations_total,types,city_name,longitude,name,country_name,linkout,state_name,acronym,latitude
0,grid.48336.3a,12083,807005.0,[Government],Rockville,-77.10119,National Cancer Institute,United States,[http://www.cancer.gov/],Maryland,NCI,39.004326
1,grid.280785.0,11603,777080.0,[Facility],Bethesda,-77.09938,National Institute of General Medical Sciences,United States,[http://www.nigms.nih.gov/Pages/default.aspx],Maryland,NIGMS,38.997833
2,grid.280128.1,4424,575386.0,[Facility],Bethesda,-77.09693,National Human Genome Research Institute,United States,[https://www.genome.gov/],Maryland,NHGRI,38.996967
3,grid.270680.b,18022,548865.0,[Government],Brussels,4.36367,European Commission,Belgium,[http://ec.europa.eu/index_en.htm],,EC,50.85165
4,grid.52788.30,4838,418936.0,[Nonprofit],London,-0.135005,Wellcome Trust,United Kingdom,[http://www.wellcome.ac.uk/],,WT,51.525867


Arithmetic mean number of citations

In [65]:
%%dsldf
search publications
return funders 
    aggregate citations_avg 
    sort by citations_avg limit 5

Returned Funders: 5


Unnamed: 0,id,count,citations_avg,types,city_name,longitude,name,country_name,linkout,state_name,latitude
0,grid.478308.0,169,276.136095,[Nonprofit],Washington D.C.,-77.03973,Alexander & Margaret Stewart Trust,United States,[http://www.stewart-trust.org/],District of Columbia,38.90116
1,grid.453780.d,143,186.685315,[Nonprofit],Washington D.C.,-77.03952,Accelerate Brain Cancer Cure,United States,[http://www.abc2.org/],District of Columbia,38.90672
2,grid.478789.d,568,164.917254,[Other],Las Vegas,-115.29985,Donald W. Reynolds Foundation,United States,[http://www.dwreynolds.org/],Nevada,36.19046
3,grid.417710.4,181,162.027624,[Company],Rockville,-77.20376,Human Genome Sciences (United States),United States,[http://www.hgsi.com],Maryland,39.09665
4,grid.429197.0,719,146.849791,[Other],New City,-73.982895,Helen Hay Whitney Foundation,United States,[http://www.hhwf.org/],New York,41.15845


Geometric mean of FCR


In [66]:
%%dsldf
search publications
return funders 
    aggregate fcr_gavg limit 5

Returned Funders: 5


Unnamed: 0,id,fcr_gavg,count,acronym,city_name,name,types,longitude,linkout,latitude,country_name,state_name
0,grid.419696.5,2.304725,1951296,NSFC,Beijing,National Natural Science Foundation of China,[Government],116.33983,[http://www.nsfc.gov.cn/publish/portal1/],40.005177,China,
1,grid.270680.b,3.281903,677891,EC,Brussels,European Commission,[Government],4.36367,[http://ec.europa.eu/index_en.htm],50.85165,Belgium,
2,grid.424020.0,2.523239,612579,MOST,Beijing,Ministry of Science and Technology of the Peop...,[Government],116.316284,[http://www.most.gov.cn/eng/],39.827835,China,
3,grid.48336.3a,4.901802,584689,NCI,Rockville,National Cancer Institute,[Government],-77.10119,[http://www.cancer.gov/],39.004326,United States,Maryland
4,grid.54432.34,2.258015,574493,JSPS,Tokyo,Japan Society for the Promotion of Science,[Nonprofit],139.74039,[http://www.jsps.go.jp/],35.68716,Japan,


Median Altmetric Attention Score

In [67]:
%%dsldf 
search publications
return funders aggregate altmetric_median 
    sort by altmetric_median limit 5 

Returned Funders: 5


Unnamed: 0,id,count,altmetric_median,types,city_name,longitude,name,country_name,linkout,acronym,latitude,state_name
0,grid.258806.1,6,309.0,[Education],Kitakyushu,130.8392,Kyushu Institute of Technology,Japan,[https://www.kyutech.ac.jp/english/],KIT,33.894436,
1,grid.470711.4,2,110.5,[Nonprofit],Edinburgh,-3.219597,Chest Heart and Stroke Scotland,United Kingdom,[http://www.chss.org.uk/],CHSS,55.946075,
2,grid.443873.f,5,99.0,[Nonprofit],Chicago,-87.62648,LUNGevity Foundation,United States,[http://www.lungevity.org/],LUNG,41.878674,Illinois
3,grid.473856.b,2,66.0,[Government],Washington D.C.,-77.01637,Administration for Children and Families,United States,[https://www.acf.hhs.gov/],ACF,38.88594,District of Columbia
4,grid.473769.8,1,33.0,[Nonprofit],Bethesda,-77.09788,Bladder Cancer Advocacy Network,United States,[http://www.bcan.org/],BCAN,38.988724,Maryland
