# Exploring The Dimensions Search Language (DSL) - Deep Dive

This notebook takes you through all the most important features of the [Dimensions Search Language](https://docs.dimensions.ai/dsl/). 

The notebook is based on the [Query Syntax](https://docs.dimensions.ai/dsl/language.html) section of the official documentation. It can seen as an interactive version of the documentaion, as it allows to go run the queries included in that section using Python.  

## What is the Dimensions Search Language?

The DSL aims to capture the type of interaction with Dimensions data
that users are accustomed to performing graphically via the [web
application](https://app.dimensions.ai/), and enable web app developers, power users, and others to
carry out such interactions by writing query statements in a syntax
loosely inspired by SQL but particularly suited to our specific domain
and data organization.

**Note:** this notebook uses the Python programming language, however all the **DSL queries are not Python-specific** and can in fact be reused with any other API client. 



## Prerequisites

This notebook assumes you have installed the [Dimcli](https://pypi.org/project/dimcli/) library and are familiar with the *Getting Started* tutorial.


In [298]:

# @markdown # Get the API library and login
# @markdown Click the 'play' button on the left (or shift+enter) after entering your API credentials

username = "" #@param {type: "string"}
password = "" #@param {type: "string"}
endpoint = "https://app.dimensions.ai" #@param {type: "string"}


!pip install dimcli plotly tqdm -U --quiet
import dimcli
from dimcli.shortcuts import *
dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()

#
# load common libraries
import time
import sys
import json
import pandas as pd
from pandas.io.json import json_normalize
from tqdm.notebook import tqdm as progress

#
# charts libs
# import plotly_express as px
import plotly.express as px
if not 'google.colab' in sys.modules:
  # make js dependecies local / needed by html exports
  from plotly.offline import init_notebook_mode
  init_notebook_mode(connected=True)

DimCli v0.6.7 - Succesfully connected to <https://app.dimensions.ai> (method: dsl.ini file)



## Sections Index 

1. Basic query structure
2. Full-text searching
3. Field searching
4. Searching for researchers
5. Returning results 
6. Aggregations

## 1. Basic query structure

DSL queries consist of two required components: a `search` phrase that
indicates the scientific records to be searched, and one or
more `return` phrases which specify the contents and structure of the
desired results.

The simplest valid DSL query is of the form `search <source>|return <result>`:

In [299]:
%%dsldf 
search grants return  grants limit 5

Returned Grants: 5 (total = 5263527)


Unnamed: 0,id,start_date,project_num,start_year,title_language,title,original_title,funders,language,active_year,funding_org_name,end_date
0,grant.8690978,2021-11-30,2018-HRSI-1548,2021,en,APPROACH to Enriching the Real World Evidence ...,APPROACH to Enriching the Real World Evidence ...,"[{'id': 'grid.484521.e', 'state_name': 'New Br...",en,[2021],New Brunswick Health Research Foundation,
1,grant.8950252,2021-10-01,1301720F,2021,en,Molecular mechanism of DNA double strand break...,Mécanismes moléculaires de la formation et la ...,"[{'id': 'grid.424470.1', 'linkout': ['http://w...",en,[2021],Fund for Scientific Research,
2,grant.8715161,2021-10-01,M 2734,2021,en,Life as concept and as science,Life as concept and as science,"[{'id': 'grid.25111.36', 'linkout': ['https://...",en,"[2021, 2022, 2023]",FWF Austrian Science Fund,2023-09-30
3,grant.8963889,2021-09-01,893021,2021,en,Jet quenching for heavy-ion collisions at the LHC,Jet quenching for heavy-ion collisions at the LHC,"[{'id': 'grid.270680.b', 'linkout': ['http://e...",en,"[2021, 2022, 2023]",European Commission,2023-08-31
4,grant.8964235,2021-09-01,892933,2021,en,Scintillation Light For New Physics with Liqui...,Scintillation Light For New Physics with Liqui...,"[{'id': 'grid.270680.b', 'linkout': ['http://e...",en,"[2021, 2022, 2023]",European Commission,2023-08-31


### `search source`

A query must begin with the word `search` followed by a `source` name, i.e. the name of a type of scientific `record`, such as `grants` or `publications`.

**What are the sources available?** See the [data sources](https://docs.dimensions.ai/dsl/data-sources.html) section of the documentation. 

Alternatively, we can use the 'schema' API ([describe](https://docs.dimensions.ai/dsl/data-sources.html#metadata-api)) to return this information programmatically:

In [300]:
dsl.query("describe schema")

<dimcli.Dataset object #4874351440. Dict keys: 'sources', 'entities'>

A more useful query might also make use of the optional `for` and
`where` phrases to limit the set of records returned.

In [301]:
%%dsldf 
search grants  for "lung cancer" 
    where active_year=2000 
return  grants  limit 5

Returned Grants: 5 (total = 1728)


Unnamed: 0,id,start_date,project_num,end_date,start_year,title_language,title,original_title,funders,language,active_year,funding_org_name
0,grant.2386513,2000-12-31,F32HL010455,2002-01-01,2000,en,ROLE OF CD44 ISOFORMS IN ENDOTHELIAL CELL DAMAGE,ROLE OF CD44 ISOFORMS IN ENDOTHELIAL CELL DAMAGE,"[{'id': 'grid.279885.9', 'state_name': 'Maryla...",en,"[2000, 2001, 2002]",National Heart Lung and Blood Institute
1,grant.2537116,2000-12-18,R01HL063695,2004-11-30,2000,en,"ESTROGEN, ANGIOGENESIS AND ENDOTHELIAL PROGENI...","ESTROGEN, ANGIOGENESIS AND ENDOTHELIAL PROGENI...","[{'id': 'grid.279885.9', 'state_name': 'Maryla...",en,"[2000, 2001, 2002, 2003, 2004]",National Heart Lung and Blood Institute
2,grant.2537801,2000-12-18,R01HL066221,2007-11-30,2000,en,GENETIC ANALYSIS OF EPHRIN-EPH SIGNALING IN AN...,GENETIC ANALYSIS OF EPHRIN-EPH SIGNALING IN AN...,"[{'id': 'grid.279885.9', 'state_name': 'Maryla...",en,"[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007]",National Heart Lung and Blood Institute
3,grant.2536777,2000-12-15,R01HL062244,2017-12-31,2000,en,Synthetic Heparan Sulfate: Probing Biosynthesi...,Synthetic Heparan Sulfate: Probing Biosynthesi...,"[{'id': 'grid.279885.9', 'state_name': 'Maryla...",en,"[2000, 2001, 2002, 2003, 2004, 2005, 2006, 200...",National Heart Lung and Blood Institute
4,grant.2475327,2000-12-01,R01CA089614,2016-06-30,2000,en,Redox Regulation of the rho-GTPases in the vas...,Redox Regulation of the rho-GTPases in the vas...,"[{'id': 'grid.48336.3a', 'state_name': 'Maryla...",en,"[2000, 2001, 2002, 2003, 2004, 2005, 2006, 200...",National Cancer Institute


### `return` result (source or facet)

The most basic `return` phrase consists of the keyword `return` followed
by the name of a `record` or `facet` to be returned. 

This must be the
name of the `source` used in the `search` phrase, or the name of a
`facet` of that source.

In [302]:
%%dsldf
search grants for "laryngectomy" 
return grants limit 5

Returned Grants: 5 (total = 104)


Unnamed: 0,funders,active_year,original_title,start_year,end_date,project_num,id,funding_org_name,start_date,title_language,title,language
0,"[{'id': 'grid.214431.1', 'longitude': -77.1868...","[2019, 2020, 2021, 2022, 2023, 2024]",Wearable silent speech technology to enhance i...,2019,2024-07-31,R01DC016621,grant.8554260,National Institute on Deafness and Other Commu...,2019-08-15,en,Wearable silent speech technology to enhance i...,en
1,"[{'id': 'grid.54432.34', 'longitude': 139.7403...","[2019, 2020, 2021]",喉頭がん、下咽頭がんにより喉頭摘出術を受けた患者に対する嗅覚向上プログラムの開発,2019,2021-03-31,19K19574,grant.8422934,Japan Society for the Promotion of Science,2019-04-01,ja,Development of an olfactory improvement progra...,ja
2,"[{'id': 'grid.450640.3', 'longitude': -47.9478...","[2019, 2020, 2021, 2022]",Desenvolvimento de uma prótese de voz,2019,2022-02-28,,grant.8475371,National Council for Scientific and Technologi...,2019-02-18,pt,Development of a voice prosthesis,pt
3,"[{'id': 'grid.457810.f', 'longitude': -77.111,...","[2018, 2019, 2020]",EAGER: Design of an Active Voice Box Prosthesi...,2018,2020-07-31,1836333,grant.7672598,Directorate for Engineering,2018-08-01,en,EAGER: Design of an Active Voice Box Prosthesi...,en
4,"[{'id': 'grid.214431.1', 'longitude': -77.1868...","[2018, 2019]",EMG Voice Restoration,2018,2019-10-31,R43DC017097,grant.7552012,National Institute on Deafness and Other Commu...,2018-05-16,en,EMG Voice Restoration,en


Eg let's see what are the *facets* available for the *grants* source:

In [303]:
fields = dsl.query("describe schema")['sources']['grants']['fields']
[x for x in fields if fields[x]['is_facet']]

['research_org_state_codes',
 'category_hrcs_hc',
 'category_hrcs_rac',
 'category_hra',
 'funding_org_name',
 'active_year',
 'research_org_countries',
 'funding_org_acronym',
 'category_icrp_cso',
 'category_icrp_ct',
 'language',
 'funder_countries',
 'funding_org_city',
 'researchers',
 'research_orgs',
 'category_rcdc',
 'funding_currency',
 'funders',
 'start_year',
 'research_org_cities',
 'category_bra',
 'language_title',
 'category_for']

## 2. Full-text Searching

Full-text search or keyword search finds all instances of a term
(keyword) in a document, or group of documents. 

Full text search works
by using search indexes, which can be targeting specific sections of a
document e.g. its $abstract$, $authors$, $full text$ etc...

In [304]:
%%dsldf 
search publications 
    in full_data for "Apollo 11" 
return publications limit 5

Returned Publications: 5 (total = 216069)


Unnamed: 0,issue,year,id,author_affiliations,title,type,pages,volume,journal.id,journal.title
0,2.0,2020,pub.1124928412,"[[{'first_name': 'Camila Aline', 'last_name': ...",Fantastic Creatures and where to find them in ...,article,235-252,32.0,jour.1148021,Classica - Revista Brasileira de Estudos Cláss...
1,,2020,pub.1125380177,,I. THE MEETING OF EAST AND WEST,chapter,1-47,,,
2,,2020,pub.1126068534,,Karl Marx . Der 18. Brumaire des Louis Bonaparte,chapter,679-761,,,
3,,2020,pub.1124034710,"[[{'first_name': 'Bodo', 'last_name': 'Birk', ...",Ausnahmezustand – Vom barocken Festspiel zur E...,chapter,291-320,,,
4,,2020,pub.1124265303,"[[{'first_name': 'Gordon', 'last_name': 'Teske...",Chapter 8. The Period Concept and Seventeenth-...,chapter,145-162,,,


### 2.1 `in [search index]`

This optional phrase consists of the particle `in` followed by a term indicating a `search index`, specifying for example whether the search
is limited to full text, title and abstract only, or title only. 

In [305]:
%%dsldf 
search grants 
    in title_abstract_only for "something" 
return grants limit 5

Returned Grants: 5 (total = 9316)


Unnamed: 0,funders,active_year,original_title,start_year,end_date,project_num,id,funding_org_name,start_date,title_language,title,language
0,"[{'id': 'grid.14467.30', 'longitude': -1.78553...","[2020, 2021, 2022, 2023]",The Cosmology of the Early and Late Universe,2020,2023-09-29,ST/T000732/1,grant.8673892,Science and Technology Facilities Council,2020-09-30,en,The Cosmology of the Early and Late Universe,en
1,"[{'id': 'grid.457875.c', 'longitude': -77.1109...","[2020, 2021, 2022, 2023]",Decoding the Infrared Spectra of High Frequenc...,2020,2023-08-31,1900095,grant.8966252,Directorate for Mathematical & Physical Sciences,2020-09-01,en,Decoding the Infrared Spectra of High Frequenc...,en
2,"[{'id': 'grid.452896.4', 'longitude': 4.359973...","[2020, 2021, 2022, 2023, 2024, 2025]",Overcoming stellar activity in radial velocity...,2020,2025-06-30,865624,grant.8964099,European Research Council,2020-07-01,en,Overcoming stellar activity in radial velocity...,en
3,"[{'id': 'grid.457875.c', 'longitude': -77.1109...","[2020, 2021, 2022, 2023, 2024, 2025]",CAREER: Theory of Membrane Shape Sensing at th...,2020,2025-06-30,1945141,grant.8832391,Directorate for Mathematical & Physical Sciences,2020-07-01,en,CAREER: Theory of Membrane Shape Sensing at th...,en
4,"[{'id': 'grid.457810.f', 'longitude': -77.111,...","[2020, 2021, 2022, 2023, 2024, 2025]",CAREER: Modeling Human Gait to Optimize Exoske...,2020,2025-05-31,1943561,grant.8833367,Directorate for Engineering,2020-06-01,en,CAREER: Modeling Human Gait to Optimize Exoske...,en


Eg let's see what are the *search fields* available for the *grants* source:

In [306]:
dsl.query("describe schema")['sources']['grants']['search_fields']

['investigators', 'full_data', 'concepts', 'title_only', 'title_abstract_only']

In [307]:
%%dsldf 
search grants 
    in full_data for "graphene AND computer AND iron" 
return grants limit 5

Returned Grants: 5 (total = 10)


Unnamed: 0,funders,active_year,original_title,start_year,end_date,project_num,id,funding_org_name,start_date,title_language,title,language
0,"[{'id': 'grid.454869.2', 'longitude': 37.64049...","[2019, 2020, 2021]",Weyl and Dirac semimetals and beyond - predict...,2019,2021-12-31,19-43-04129,grant.8413990,Russian Science Foundation,2019-01-01,en,Weyl and Dirac semimetals and beyond - predict...,en
1,"[{'id': 'grid.452899.b', 'longitude': 37.57781...",[2018],Проект организации 18-ой Международной конфере...,2018,2018-12-31,18-02-20097,grant.8731867,Russian Foundation for Basic Research,2018-01-01,ru,Project of the organization of the 18th Intern...,ru
2,"[{'id': 'grid.425823.a', 'longitude': 21.02022...",[2016],Dotacja podmiotowa na utrzymanie potencjału ba...,2016,2016-12-31,4491/E-370/S/2016,grant.7397800,Ministry of Science and Higher Education,2016-02-22,pl,Subject subsidy for maintaining the research p...,pl
3,"[{'id': 'grid.425823.a', 'longitude': 21.02022...",[2015],Dotacja podmiotowa na utrzymanie potencjału ba...,2015,2015-12-31,4491/E-370/S/2015,grant.7397795,Ministry of Science and Higher Education,2015-02-19,pl,Subject subsidy for maintaining the research p...,pl
4,"[{'id': 'grid.425823.a', 'longitude': 21.02022...",[2014],Dotacja celowa na prowadzenie w 2014 przez Wyd...,2014,2014-12-31,4491/E-370/M/2014,grant.7397490,Ministry of Science and Higher Education,2014-04-09,pl,Intentional grant for conducting in 2014 the F...,pl


Special search indexes for persons names permit to perform full text
searches on publications `authors` or grants `investigators`. Please see the
*Researchers Search* section below for more information
on how searches work in this case.

In [308]:
%dsldf search publications in authors for "\"Jennifer A Doudna\"" return publications limit 5

Returned Publications: 5 (total = 315)


Unnamed: 0,year,id,author_affiliations,title,type,journal.id,journal.title,pages,issue,volume
0,2020,pub.1125959258,"[[{'first_name': 'Simon', 'last_name': 'Eitzin...",Machine learning predicts new anti-CRISPR prot...,article,jour.1018982,Nucleic Acids Research,,,
1,2020,pub.1126635310,[[{'first_name': 'Innovative Genomics Institut...,Blueprint for a Pop-up SARS-CoV-2 Testing Lab,article,jour.1369542,medRxiv,2020.04.11.20061424,,
2,2020,pub.1125548532,"[[{'first_name': 'Kyle E.', 'last_name': 'Watt...",Potent CRISPR-Cas9 inhibitors from Staphylococ...,article,jour.1082971,Proceedings of the National Academy of Science...,6531-6539,12.0,117.0
3,2020,pub.1125325301,"[[{'first_name': 'Ivan E.', 'last_name': 'Ivan...",Cas9 interrogates DNA in discrete steps modula...,article,jour.1082971,Proceedings of the National Academy of Science...,5853-5860,11.0,117.0
4,2020,pub.1125677167,"[[{'first_name': 'Michelle F.', 'last_name': '...",Phage-assisted evolution of an adenine base ed...,article,jour.1115214,Nature Biotechnology,1-9,,


### 2.2 `for "search term"`

This optional phrase consists of the keyword `for` followed by a
`search term` `string`, enclosed in double quotes (`"`).

Strings in double quotes can contain nested quotes escaped by a
backslash `\`. This will ensure that the string in nested double quotes
is searched for as if it was a single phrase, not multiple words.

An example of a phrase: `"\"Machine Learning\""` : results must contain
`Machine Learning` as a phrase.

In [309]:
%dsldf search publications for "\"Machine Learning\"" return publications limit 5

Returned Publications: 5 (total = 1105282)


Unnamed: 0,year,id,author_affiliations,title,type,pages,issue,volume,journal.id,journal.title
0,2020,pub.1124674945,"[[{'first_name': 'Tom Arne', 'last_name': 'Ped...",Towards Simulation-based Verification of Auton...,chapter,1-13,,,,
1,2020,pub.1124670304,"[[{'first_name': 'Jon Arne', 'last_name': 'Glo...",Trustworthy versus Explainable AI in Autonomou...,chapter,37-47,,,,
2,2020,pub.1125933988,"[[{'first_name': 'Baze University', 'last_name...",Solving some variants of vehicle routing probl...,article,63-73,1.0,4.0,jour.1317852,Open Journal of Mathematical Sciences
3,2020,pub.1124109981,"[[{'first_name': 'AlexanderVE', 'last_name': '...","Statistics, Data Mining, and Machine Learning ...",monograph,,,,,
4,2020,pub.1124669794,"[[{'first_name': 'Zhongyi', 'last_name': 'Sui'...",Empirical analysis of complex network for mari...,chapter,24-36,,,,


Example of multiple keywords: `"Machine Learning"` : this searches for
keywords independently.

In [310]:
%dsldf search publications for "Machine Learning" return publications limit 5

Returned Publications: 5 (total = 2339984)


Unnamed: 0,pages,year,id,author_affiliations,type,title
0,297-312,2020,pub.1124034464,"[[{'first_name': 'Bettina', 'last_name': 'Frit...",chapter,„…the trace of the other in the self“
1,201-218,2020,pub.1124034459,"[[{'first_name': 'Susanne', 'last_name': 'Ress...",chapter,Positioned struggles over history
2,333-463,2020,pub.1125380184,,chapter,III. BRĀHMANISM
3,1-13,2020,pub.1124674945,"[[{'first_name': 'Tom Arne', 'last_name': 'Ped...",chapter,Towards Simulation-based Verification of Auton...
4,507-801,2020,pub.1126070643,,chapter,ANHANG. Part 1


Note: Special characters, such as any of `^ " : ~ \ [ ] { } ( ) ! | & +` must be escaped by a backslash `\`. Also, please note escaping rules in
[Python](http://python-reference.readthedocs.io/en/latest/docs/str/escapes.html) (or other languages). For example, when writing a query with escaped quotes, such as `search publications for "\"phrase 1\" AND \"phrase 2\""`, in Python, it is necessary to escape the backslashes as well, so it
would look like: `'search publications for "\\"phrase 1\\" AND \\"phrase 2\\""'`. 

See the [official docs](https://docs.dimensions.ai/dsl/language.html#for-search-term) for more details. 




### 2.3 Boolean Operators

Search term can consist of multiple keywords or phrases connected using
boolean logic operators, e.g. `AND`, `OR` and `NOT`.

In [311]:
%dsldf search publications for "(dose AND concentration)" return publications limit 5

Returned Publications: 5 (total = 5214760)


Unnamed: 0,pages,year,id,author_affiliations,type,title,issue,volume,journal.id,journal.title
0,125-158,2020,pub.1125801743,"[[{'first_name': 'Letizia', 'last_name': 'Moni...",chapter,5. Chemical Alteration and Colour Changes in t...,,,,
1,219-238,2020,pub.1124265256,"[[{'first_name': 'Samantha', 'last_name': 'Ash...",chapter,Chapter 11. Political Obligation and the Rule ...,,,,
2,25-49,2020,pub.1125630589,"[[{'first_name': 'Margaret Clunies', 'last_nam...",chapter,The Skald Sagas as a Genre: Definitions and Ty...,,,,
3,1694347,2020,pub.1124196768,"[[{'first_name': 'Anne M', 'last_name': 'de Gr...",article,Effectiveness of a peer-refugee delivered psyc...,1.0,11.0,jour.1045059,European Journal of Psychotraumatology
4,1665-1850,2020,pub.1126070754,,chapter,Le socialisme en Allemagne (Fragment du brouil...,,,,


When specifying Boolean operators with keywords such as `AND`, `OR` and
`NOT`, the keywords must appear in all uppercase. 

The operators available are shown in the table below.
.

| Boolean Operator | Alternative Symbol | Description                                                                                                                                                                 |
|------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `AND`            | `&&`               | Requires both terms on either side of the Boolean operator to be present for a match.                                                                                       |
| `NOT`            | `!`                | Requires that the following term not be present.                                                                                                                            |
| `OR`             | `||`               | Requires that either term (or both terms) be present for a match.                                                                                                           |
|                  | `+`                | Requires that the following term be present.                                                                                                                                |
|                  | `-`                | Prohibits the following term (that is, matches on fields or documents that do not include that term). The `-` operator is functionally similar to the Boolean operator `!`. |

In [312]:
%dsldf search publications for "(dose OR concentration) AND (-malaria +africa)" return publications limit 5

Returned Publications: 5 (total = 1332804)


Unnamed: 0,year,id,title,type,pages,author_affiliations
0,2020,pub.1124946803,Literature Cited,chapter,487-564,
1,2020,pub.1125629784,7. The Aryan question: Some general considerat...,chapter,177-191,"[[{'first_name': 'Arvind', 'last_name': 'Sharm..."
2,2020,pub.1126070451,Heft 4. Exzerpte zur Geschichte Spaniens aus W...,chapter,727-852,
3,2020,pub.1124248677,"12. Culture, Institutions, and Development",chapter,414-448,"[[{'first_name': 'Gerard', 'last_name': 'Rolan..."
4,2020,pub.1124248673,8. Trade-Related Institutions and Development,chapter,255-307,"[[{'first_name': 'Jaime', 'last_name': 'de Mel..."


### 2.4 Wildcard Searches

The DSL supports single and multiple character wildcard searches within
single terms. Wildcard characters can be applied to single terms, but
not to search phrases.

In [313]:
%dsldf search publications in title_only for "ital? malaria" return publications limit 5

Returned Publications: 5 (total = 142)


Unnamed: 0,pages,year,id,author_affiliations,type,title,journal.id,journal.title,volume,issue
0,1-20,2020,pub.1124231018,"[[{'first_name': 'Benjamin', 'last_name': 'Rei...",article,"Seasons in Italy: Northern European travelers,...",jour.1141817,Journal of Tourism and Cultural Change,,
1,101544,2020,pub.1123222257,"[[{'first_name': 'Guido', 'last_name': 'Caller...",article,Updated guidelines for malaria prophylaxis in ...,jour.1034401,Travel Medicine and Infectious Disease,33.0,
2,28-33,2020,pub.1125332077,"[[{'first_name': 'Luciana', 'last_name': 'Lepo...",article,Clinical management of imported malaria in Ita...,jour.1089291,Microbiologica,43.0,1.0
3,151,2019,pub.1113815431,"[[{'first_name': 'Valentina', 'last_name': 'Ta...",article,Investigation on potential malaria vectors (An...,jour.1030597,Malaria Journal,18.0,1.0
4,34-39,2019,pub.1113201846,"[[{'first_name': 'Fiorenza', 'last_name': 'Pan...",article,Increasing imported malaria in children and ad...,jour.1034401,Travel Medicine and Infectious Disease,29.0,


In [314]:
%dsldf search publications in title_only for "it* malaria" return publications limit 5

Returned Publications: 5 (total = 1491)


Unnamed: 0,pages,id,title,volume,type,year,author_affiliations,issue,journal.id,journal.title
0,24,pub.1124106064,The effectiveness of older insecticide-treated...,19.0,article,2020,"[[{'first_name': 'Monica P.', 'last_name': 'Sh...",1.0,jour.1030597,Malaria Journal
1,109809,pub.1126819455,Modeling pyrethroids repellency and its role o...,136.0,article,2020,"[[{'first_name': 'Berge', 'last_name': 'Tsanou...",,jour.1026215,Chaos Solitons & Fractals
2,,pub.1126848860,Efficacy of Artemisinin-lumefantrine for the t...,,preprint,2020,"[[{'first_name': 'Gabriel M.', 'last_name': 'K...",,jour.1380788,Research Square
3,jbc.ra120.012646,pub.1126501008,Genetic ablation of the mitoribosome in the ma...,,article,2020,"[[{'first_name': 'Liqin', 'last_name': 'Ling',...",,jour.1077138,Journal of Biological Chemistry
4,271,pub.1126492745,Analysis of the Role of TpUB05 Antigen from Th...,9.0,article,2020,"[[{'first_name': 'Jerome Nyhalah', 'last_name'...",4.0,jour.1047674,Pathogens


| Wildcard Search Type                                             | Special Character | Example                                                                                                                                                                                                                         |
|------------------------------------------------------------------|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Single character - matches a single character                    | `?`               | The search string `te?t` would match both `test` and `text`.                                                                                                                                                                    |
| Multiple characters - matches zero or more sequential characters | `*`               | The wildcard search: `tes*` would match `test`, `testing`, and `tester`. You can also use wildcard characters in the middle of a term. For example: `te*t` would match `test` and `text`. `*est` would match `pest` and `test`. |

### 2.5 Proximity Searches

A proximity search looks for terms that are within a specific distance
from one another.

To perform a proximity search, add the tilde character `~` and a numeric
value to the end of a search phrase. For example, to search for a
`formal` and `model` within 10 words of each other in a document, use
the search:

In [315]:
%dsldf search publications for "\"formal model\"~10" return publications limit 5

Returned Publications: 5 (total = 463369)


Unnamed: 0,pages,id,title,type,year,author_affiliations
0,189-217,pub.1124248671,"6. Institutions, Development, and Growth: Wher...",chapter,2020,"[[{'first_name': 'Steven N.', 'last_name': 'Du..."
1,294-312,pub.1124248742,17. History and the Unanswered Questions of th...,chapter,2020,"[[{'first_name': 'Francis J.', 'last_name': 'G..."
2,198-220,pub.1124946785,11. Song Traditions in Indigo Buntings: Origin...,chapter,2020,"[[{'first_name': 'Robert B.', 'last_name': 'Pa..."
3,42-71,pub.1125789332,2. Trends in Private Sector Grievance Arbitration,chapter,2020,"[[{'first_name': 'Dennis R.', 'last_name': 'No..."
4,595-633,pub.1124248681,"16. Institutions, Firm Financing, and Growth",chapter,2020,"[[{'first_name': 'Meghana', 'last_name': 'Ayya..."


In [316]:
%dsldf search publications for "\"digital humanities\"~5  +ontology" return publications limit 5

Returned Publications: 5 (total = 6617)


Unnamed: 0,year,id,author_affiliations,title,type,pages,volume,journal.id,journal.title,issue
0,2020,pub.1124427860,"[[{'first_name': 'Vincent', 'last_name': 'Jail...","Describing, comparing and analysing digital ur...",article,e00135,17.0,jour.1151279,Digital Applications in Archaeology and Cultur...,
1,2020,pub.1125401563,"[[{'first_name': 'Luis A.', 'last_name': 'Pine...",Practical non-monotonic knowledge-base system ...,article,102214,57.0,jour.1121462,Information Processing & Management,3.0
2,2020,pub.1126922287,,Transformative Digital Humanities,monograph,,,,,
3,2020,pub.1126879229,"[[{'first_name': 'Alba', 'last_name': 'Silva',...",Mobile Devices and Mobile Content,chapter,35-50,,,,
4,2020,pub.1126879230,"[[{'first_name': 'Xosé', 'last_name': 'López-G...",New Narratives in the Age of Visualization,chapter,51-63,,,,


The distance referred to here is the number of term movements needed to match the specified phrase.  
In the example above, if `formal` and `model` were 10 spaces apart in a
field, but `formal` appeared before `model`, more than 10 term movements
would be required to move the terms together and position `formal` to
the right of `model` with a space in between.

### 2.5 Term Boosting

The DSL provides the relevance level of matching documents based on the
terms found. To boost a term use the caret symbol $^$ with a boost
factor (a number) at the end of the term you are searching. The higher
the boost factor, the more relevant the term will be.

Boosting allows the user to control the relevance of a document by
boosting its term. For example, when searching for $"formal model"$ and
you want the term "formal" to be more relevant, you can boost it by
adding the $^$ symbol along with the boost factor immediately after the
term. For example, you could type:

In [317]:
%dsldf search publications for "formal^4 model" return publications limit 5 

Returned Publications: 5 (total = 4351279)


Unnamed: 0,year,id,author_affiliations,title,type,pages
0,2020,pub.1122051578,"[[{'first_name': 'Andrea', 'last_name': 'Crow'...",CHAPTER X. Mobilizing Academic Labor,chapter,192-205
1,2020,pub.1123020742,"[[{'first_name': 'Jelena', 'last_name': 'Rafai...",An Economic Survey of the Kingdom of Yugoslavia,chapter,35-56
2,2020,pub.1124487011,"[[{'first_name': 'Olivier', 'last_name': 'Roy'...",7. The Failure of Political Islam Revisited,chapter,167-180
3,2020,pub.1124034442,"[[{'first_name': 'Achim', 'last_name': 'Oberg'...",Netzwerke,chapter,191-218
4,2020,pub.1124034535,"[[{'first_name': 'Thomas', 'last_name': 'Hilge...","Helden, Freaks und dunkle Ritter: Batman in Ho...",chapter,177-214


This will make documents with the term $formal$ appear more relevant.
You can also boost Phrase Terms as in the example:

In [318]:
%dsldf search publications for "\"formal model\"^4 \"Lindenmayer system\"" return publications limit 5  

Returned Publications: 5 (total = 1149)


Unnamed: 0,pages,issue,year,id,author_affiliations,type,title,volume,journal.id,journal.title
0,54-65,1,2020,pub.1120164618,"[[{'first_name': 'Brenden M.', 'last_name': 'L...",article,People Infer Recursive Visual Concepts from Ju...,3,jour.1320442,Computational Brain & Behavior
1,,,2020,pub.1113605618,,book,From Astrophysics to Unconventional Computatio...,35,,
2,,,2020,pub.1126386421,,book,"Applications of Evolutionary Computation, 23rd...",12104,,
3,1615-1627,9,2019,pub.1117308212,"[[{'first_name': 'Cédric', 'last_name': 'Gauch...",article,GAUCHEREL et al.,10,jour.1044080,Methods in Ecology and Evolution
4,107-131,1-2,2019,pub.1113061099,"[[{'first_name': 'José L.', 'last_name': 'Besa...",article,"Math and Music, Models and Metaphors: Alberto ...",38,jour.1138316,Contemporary Music Review


By default, the boost factor is 1. Although the boost factor must be
positive, it can be less than 1 (for example, it could be 0.2).

## 3. Field Searching

Field searching allows to use a specific `field` of a `source` as a
query filter. For example, this can be a
[Literal](supported-types.ipynb) field such as the $type$ of a
publication, its $date$, $mesh terms$, etc.. Or it can be an
[entity](data-entities.ipynb) field, such as the $journal title$ for a
publication, the $country name$ of its author affiliations, etc..

**What are the fields available for each source?** See the [data sources](https://docs.dimensions.ai/dsl/data-sources.html) section of the documentation. 

Alternatively, we can use the 'schema' API ([describe](https://docs.dimensions.ai/dsl/data-sources.html#metadata-api)) to return this information programmatically: 

In [319]:
%dsldocs publications  

Unnamed: 0,sources,field,type,description,is_filter,is_entity,is_facet
0,publications,altmetric,float,Altmetric attention score.,True,False,False
1,publications,altmetric_id,integer,AltMetric Publication ID,True,False,False
2,publications,authors,json,Ordered list of authors names and their affili...,True,False,False
3,publications,book_doi,string,The DOI of the book a chapter belongs to (note...,True,False,False
4,publications,book_series_title,string,"The title of the book series book, belong to.",False,False,False
5,publications,book_title,string,The title of the book a chapter belongs to (no...,False,False,False
6,publications,category_bra,categories,`Broad Research Areas <https://app.dimensions....,True,True,True
7,publications,category_for,categories,`ANZSRC Fields of Research classification <htt...,True,True,True
8,publications,category_hra,categories,`Health Research Areas <https://app.dimensions...,True,True,True
9,publications,category_hrcs_hc,categories,`HRCS - Health Categories <https://app.dimensi...,True,True,True


### 3.1 `where`

This optional phrase consists of the keyword `where` followed by a
`filters` phrase consisting of DSL filter expressions, as described
below.

In [320]:
%dsldf search publications where type = "book" return publications limit 5

Returned Publications: 5 (total = 286539)


Unnamed: 0,id,title,type,year
0,pub.1124034527,Hollywood im Zeitalter des Post Cinema,book,2020
1,pub.1124034601,Wagner - Weimar - Eisenach,book,2020
2,pub.1124946767,Ecology and Evolution of Acoustic Communicatio...,book,2020
3,pub.1124907235,Weibliche Herrschaft im 18. Jahrhundert,book,2020
4,pub.1125300612,Contents Tourism and Pop Culture Fandom,book,2020


If a `for` phrase is also used in a filtered query, the
system will first apply the filters, and then search the resulting
restricted set of documents for the `search term`.

In [321]:
%dsldf search publications for "malaria" where type = "book" return publications limit 5

Returned Publications: 5 (total = 12194)


Unnamed: 0,year,id,type,title
0,2020,pub.1126879214,book,The Routledge History Handbook of Central and ...
1,2020,pub.1126251587,book,21st Century Nanoscience – A Handbook
2,2020,pub.1126499453,book,Iteration
3,2020,pub.1126265096,book,Médecins Sans Frontières and Humanitarian Situ...
4,2020,pub.1126251608,book,Microbial Mitigation of Stress Response of Foo...


### 3.2 `in`

For convenience, the DSL also supports shorthand notation for filters
where a particular field should be restricted to a specified range or
list of values (although the same logic may be expressed using complex
filters as shown below).

Syntax: a **range filter** consists of the `field` name, the keyword `in`, and a
range of values enclosed in square brackets (`[]`), where the range
consists of a `low` value, colon `:`, and a `high` value.

In [322]:
%%dsldf 
search grants 
    for "malaria" 
    where start_year in [ 2010 : 2015 ] 
return grants limit 5

Returned Grants: 5 (total = 2965)


Unnamed: 0,id,start_date,project_num,end_date,start_year,title_language,title,original_title,funders,language,active_year,funding_org_name
0,grant.4729738,2015-12-28,R21AI120981,2017-11-30,2015,en,Bloodborne tropical pathogen detection using m...,Bloodborne tropical pathogen detection using m...,"[{'id': 'grid.419681.3', 'state_name': 'Maryla...",en,"[2015, 2016, 2017]",National Institute of Allergy and Infectious D...
1,grant.4729736,2015-12-24,R21AI120973,2019-02-28,2015,en,Field-deployable Assay for Differential Diagno...,Field-deployable Assay for Differential Diagno...,"[{'id': 'grid.419681.3', 'state_name': 'Maryla...",en,"[2015, 2016, 2017, 2018, 2019]",National Institute of Allergy and Infectious D...
2,grant.4729699,2015-12-21,R21AI109439,2018-11-30,2015,en,T cell driven antigen discovery for vaccine ca...,T cell driven antigen discovery for vaccine ca...,"[{'id': 'grid.419681.3', 'state_name': 'Maryla...",en,"[2015, 2016, 2017, 2018]",National Institute of Allergy and Infectious D...
3,grant.4854433,2015-12-18,91488,2018-12-18,2015,en,Senior Fellowship for Dr. Eduardo Samo Gudo: E...,Senior Fellowship for Dr. Eduardo Samo Gudo: E...,"[{'id': 'grid.452969.5', 'linkout': ['https://...",en,"[2015, 2016, 2017, 2018]",Volkswagen Foundation
4,grant.8821176,2015-12-10,MIS-311250,2019-09-30,2015,en,"Biology, Ecology & Management of Emerging Dise...","Biology, Ecology & Management of Emerging Dise...","[{'id': 'grid.482914.2', 'state_name': 'Distri...",en,"[2015, 2016, 2017, 2018, 2019]",National Institute of Food and Agriculture


Syntax: a **list filter** consists of the `field` name, the keyword `in`, and a list
of one or more `value` s enclosed in square brackets (`[]`), where
values are separated by commas (`,`):

In [323]:
%%dsldf 
search grants 
    for "malaria" 
    where research_org_name in [ "UC Berkeley", "UC Davis", "UCLA"  ] 
return grants limit 5

Returned Grants: 0
Field 'research_org_name' is deprecated in favor of research_orgs. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


### 3.3 `count` - filter function

The filter function `count` is supported on some fields in
[publications](publications.ipynb) (e.g. `researchers` and
`research_orgs`).

Use of this filter is shown on the example below:

In [324]:
%%dsldf 
search publications 
    for "malaria" 
    where count(research_orgs) > 5 
return research_orgs limit 5

Returned Research_orgs: 5


Unnamed: 0,id,count,state_name,city_name,latitude,name,types,country_name,linkout,longitude,acronym
0,grid.4991.5,1445,Oxfordshire,Oxford,51.753437,University of Oxford,[Education],United Kingdom,[http://www.ox.ac.uk/],-1.25401,
1,grid.8991.9,1374,Camden,London,51.5209,London School of Hygiene & Tropical Medicine,[Education],United Kingdom,[http://www.lshtm.ac.uk/],-0.1307,LSHTM
2,grid.38142.3c,976,Massachusetts,Cambridge,42.377052,Harvard University,[Education],United States,[http://www.harvard.edu/],-71.11665,
3,grid.21107.35,802,Maryland,Baltimore,39.328888,Johns Hopkins University,[Education],United States,[https://www.jhu.edu/],-76.62028,JHU
4,grid.7445.2,721,Westminster,London,51.4986,Imperial College London,[Education],United Kingdom,[http://www.imperial.ac.uk/],-0.175478,


Number of publications with more than 50 researcher.

In [325]:
%%dsldf 
search publications 
    for "malaria" 
    where count(researchers) > 50 
return publications limit 5

Returned Publications: 5 (total = 173)


Unnamed: 0,pages,year,id,author_affiliations,type,title,journal.id,journal.title,issue,volume
0,1-14,2020,pub.1126151286,"[[{'first_name': 'Drahomíra', 'last_name': 'Fa...",article,Genetic tool development in marine protists: e...,jour.1033763,Nature Methods,,
1,1345-1360,2020,pub.1125560167,"[[{'first_name': 'Rob W', 'last_name': 'van de...",article,Triple artemisinin-based combination therapies...,jour.1077219,The Lancet,10233.0,395.0
2,60-79,2020,pub.1122257132,[[{'first_name': 'GBD 2017 Lower Respiratory I...,article,Quantifying risks and interventions that have ...,jour.1030033,The Lancet Infectious Diseases,1.0,20.0
3,1749-1768,2019,pub.1121303277,"[[{'first_name': 'Christina', 'last_name': 'Fi...",article,"Global, Regional, and National Cancer Incidenc...",jour.1051466,JAMA Oncology,12.0,5.0
4,1457-1973,2019,pub.1121951380,"[[{'first_name': 'Andrea', 'last_name': 'Cossa...",article,Guidelines for the use of flow cytometry and c...,jour.1054998,European Journal of Immunology,10.0,49.0


Number of publications with more than one researcher.

In [326]:
%%dsldf 
search publications
where count(researchers) > 1
return funders limit 5

Returned Funders: 5


Unnamed: 0,id,count,longitude,linkout,types,name,country_name,city_name,acronym,latitude,state_name
0,grid.419696.5,1711981,116.33983,[http://www.nsfc.gov.cn/publish/portal1/],[Government],National Natural Science Foundation of China,China,Beijing,NSFC,40.005177,
1,grid.270680.b,629227,4.36367,[http://ec.europa.eu/index_en.htm],[Government],European Commission,Belgium,Brussels,EC,50.85165,
2,grid.48336.3a,551777,-77.10119,[http://www.cancer.gov/],[Government],National Cancer Institute,United States,Rockville,NCI,39.004326,Maryland
3,grid.424020.0,550809,116.316284,[http://www.most.gov.cn/eng/],[Government],Ministry of Science and Technology of the Peop...,China,Beijing,MOST,39.827835,
4,grid.54432.34,518605,139.74039,[http://www.jsps.go.jp/],[Nonprofit],Japan Society for the Promotion of Science,Japan,Tokyo,JSPS,35.68716,


International collaborations: number of publications with more than one author and affiliations located in more than one country.

In [327]:
%%dsldf 
search publications
where count(researchers) > 1
and count(research_org_countries) > 1
return funders limit 5

Returned Funders: 5


Unnamed: 0,id,count,city_name,latitude,name,acronym,types,country_name,linkout,longitude
0,grid.419696.5,416514,Beijing,40.005177,National Natural Science Foundation of China,NSFC,[Government],China,[http://www.nsfc.gov.cn/publish/portal1/],116.33983
1,grid.270680.b,321979,Brussels,50.85165,European Commission,EC,[Government],Belgium,[http://ec.europa.eu/index_en.htm],4.36367
2,grid.424150.6,147782,Bonn,50.69934,German Research Foundation,DFG,[Facility],Germany,[http://www.dfg.de/en/],7.147797
3,grid.424020.0,138046,Beijing,39.827835,Ministry of Science and Technology of the Peop...,MOST,[Government],China,[http://www.most.gov.cn/eng/],116.316284
4,grid.54432.34,130524,Tokyo,35.68716,Japan Society for the Promotion of Science,JSPS,[Nonprofit],Japan,[http://www.jsps.go.jp/],139.74039


Domestic collaborations: number of publications with more than one author and more than one affiliation located in exactly one country.

In [328]:
%%dsldf 
search publications
where count(researchers) > 1
and count(research_org_countries) = 1
return funders limit 5

Returned Funders: 5


Unnamed: 0,id,count,city_name,latitude,name,acronym,types,country_name,linkout,longitude,state_name
0,grid.419696.5,1254361,Beijing,40.005177,National Natural Science Foundation of China,NSFC,[Government],China,[http://www.nsfc.gov.cn/publish/portal1/],116.33983,
1,grid.48336.3a,404524,Rockville,39.004326,National Cancer Institute,NCI,[Government],United States,[http://www.cancer.gov/],-77.10119,Maryland
2,grid.424020.0,401857,Beijing,39.827835,Ministry of Science and Technology of the Peop...,MOST,[Government],China,[http://www.most.gov.cn/eng/],116.316284,
3,grid.54432.34,355863,Tokyo,35.68716,Japan Society for the Promotion of Science,JSPS,[Nonprofit],Japan,[http://www.jsps.go.jp/],139.74039,
4,grid.280785.0,312686,Bethesda,38.997833,National Institute of General Medical Sciences,NIGMS,[Facility],United States,[http://www.nigms.nih.gov/Pages/default.aspx],-77.09938,Maryland


### 3.4 Filter Operators

A simple filter expression consists of a `field` name, an in-/equality
operator `op`, and the desired field `value`. 

The `value` must be a
`string` enclosed in double quotes (`"`) or an integer (e.g. `1234`).

The available operators are:

| `op`           | meaning                                                                                  |
|----------------|------------------------------------------------------------------------------------------|
| `=`            | *is* (or *contains* if the given `field` is multi-value)                                 |
| `!=`           | *is not*                                                                                 |
| `>`            | *is greater than*                                                                        |
| `<`            | *is less than*                                                                           |
| `>=`           | *is greater than or equal to*                                                            |
| `<=`           | *is less than or equal to*                                                               |
| `~`            | *partially matches* (see partial-string-matching below) |
| `is empty`     | *is empty* (see emptiness-filters below)                      |
| `is not empty` | *is not empty* (see emptiness-filters below)                  |

A couple of examples 

In [329]:
%dsldf search datasets where year > 2010 and year < 2012 return datasets limit 5

Returned Datasets: 5 (total = 38337)


Unnamed: 0,id,title,authors,year,keywords,journal.id,journal.title
0,10993892,(Table 1) Radiocarbon ages of samples taken fr...,"[{'name': 'Minna Väliranta', 'orcid': ''}, {'n...",2011,[PANGAEA],jour.1020344,Journal of Biogeography
1,10993247,Average fluorescence and dissolved iron and Fe...,"[{'name': 'Charles-Edouard Thuróczy', 'orcid':...",2011,[PANGAEA],jour.1023157,Deep Sea Research Part II Topical Studies in O...
2,10993244,(Table 1) Average fluorescence in the surface ...,"[{'name': 'Charles-Edouard Thuróczy', 'orcid':...",2011,[PANGAEA],,
3,10993241,Dissolved and dissolvable iron concentrations ...,"[{'name': 'Charles-Edouard Thuróczy', 'orcid':...",2011,[PANGAEA],jour.1312079,Journal of Geophysical Research
4,10993193,(Table 1) Movement parameters of nine adult fe...,"[{'name': 'Jean-François Therrien', 'orcid': '...",2011,[PANGAEA],jour.1023041,Journal of Avian Biology


In [330]:
%dsldf search patents where assignees != "grid.410484.d" return patents limit 5

Returned Patents: 5 (total = 39704493)


Unnamed: 0,assignees,filing_status,publication_date,year,inventor_names,id,granted_year,times_cited,assignee_names,title
0,"[{'id': 'grid.6584.f', 'name': 'Robert Bosch (...",Grant,2009-12-09,2001,"[TUMBACK, STEFAN, SCHNELLE, KLAUS-PETER]",EP-1409282-B1,2009.0,0,"[Robert Bosch GmbH, BOSCH GMBH ROBERT]",METHODS FOR OPERATING A MOTOR VEHICLE DRIVEN B...
1,,Application,2009-12-10,2009,"[SHKEDI, ROY]",WO-2009149128-A2,,1,[SHKEDI ROY],TARGETED TELEVISION ADVERTISEMENTS ASSOCIATED ...
2,"[{'id': 'grid.418190.5', 'name': 'Thermo Fishe...",Grant,2009-12-09,1996,"[RIVIELLO, JOHN, M., REY, MARIA, A.]",EP-0868664-B1,2009.0,0,"[Dionex Corp, DIONEX CORP]",MULTI-CYCLE LOOP INJECTION FOR TRACE ANALYSIS ...
3,"[{'id': 'grid.471210.1', 'name': 'Kuraray (Jap...",Grant,2009-12-09,1998,"[TANAKA, EIJI, HIGASHI, TAMIO, KITAMURA, TAKAN...",EP-0861808-B1,2009.0,1,"[Kuraray Co Ltd, KURARAY CO]",Waste water treatment apparatus
4,"[{'id': 'grid.471143.4', 'name': 'Fujikura (Ja...",Grant,2009-12-09,1997,"[NAKAI, MICHIHIRO, SHIMA, KENSUKE, HIDAKA, HIR...",EP-0805365-B1,2009.0,0,"[Fujikura Ltd, FUJIKURA LTD]",Optical waveguide grating and production metho...


### 3.5 Partial string matching with `~`

The `~` operator indicates that the given `field` need only partially,
instead of exactly, match the given `string` (the `value` used with this
operator must be a `string`, not an integer).

For example, the filter `where research_orgs.name~"Saarland Uni"` would
match both the organization named "Saarland University" and the one
named "Universitätsklinikum des Saarlandes", and any other organization
whose name includes the terms "Saarland" and "Uni" (the order is
unimportant). 

In [366]:
%%dsldf 
search patents 
    where assignee_names ~ "IBM" 
return assignees limit 5

Returned Assignees: 5


Unnamed: 0,id,count,city_name,name,country_name
0,grid.410484.d,329418,Armonk,IBM (United States),United States
1,grid.471366.1,22089,George Town,GlobalFoundries (Cayman Islands),Cayman Islands
2,grid.14648.3f,5071,Winchester,IBM (United Kingdom),United Kingdom
3,grid.420451.6,3555,Mountain View,Google,United States
4,grid.472772.3,2717,Beijing,Lenovo (China),China


### 3.6 Emptiness filters `is empty`

To filter records which contain specific field or to filter those which
contain an empty field, it is possible to use something like
`where research_orgs is not empty` or `where issn is empty`.

In [367]:
%%dsldf
search publications 
    for "iron graphene" 
    where researchers is empty 
    and research_orgs is not empty 
return publications[id+title+researchers+research_orgs+type] limit 5

Returned Publications: 5 (total = 1951)


Unnamed: 0,research_orgs,title,type,id
0,"[{'id': 'grid.412030.4', 'city_name': 'Tianjin...",Facile Approach to Prepare rGO@Fe3O4 Microsphe...,article,pub.1126764136
1,"[{'id': 'grid.6734.6', 'state_name': 'Berlin',...",Specific adsorption sites and conditions deriv...,article,pub.1126829056
2,"[{'id': 'grid.8547.e', 'state_name': 'Shanghai...",Adsorptive removal of tetracycline by sustaina...,article,pub.1124956822
3,"[{'id': 'grid.411510.0', 'city_name': 'Xuzhou'...",Sulfur-Doped Alkylated Graphene Oxide as High-...,article,pub.1124438091
4,"[{'id': 'grid.33764.35', 'state_name': 'Heilon...",Molecular Dynamics Simulations of Melting Iron...,article,pub.1125095130


## 4. Searching for Researchers

The DSL offers different mechanisms for searching for researchers (e.g.
publication authors, grant investigators), each of them presenting
specific advantages.

### 4.1 Exact name searches

Special full-text indices allows to look up a researcher's name and
surname **exactly as they appear in the source documents** they derive from.

This approach has a broad scope, as it allows to search the full
collection of Dimensions documents irrespectively of whether a
researcher was succesfully disambiguated (and hence given a Dimensions
ID). On the other hand, this approach will only match names as they
appear in the source document, so different spellings or initials are
not necessarily returned via a single query. 

```
search in [authors|investigators|inventors]
```

It is possible to look up publications authors using a specific
`search index` called `authors`. 

This method expects case insensitive
phrases, in format $"<first name> <last name>"$ or reverse order. Note
that strings in double quotes that contain nested quotes must always be
escaped by a backslash `\`.

In [368]:
%dsldf search publications in authors for "\"Charles Peirce\"" return publications limit 5

Returned Publications: 5 (total = 195)


Unnamed: 0,pages,id,title,type,year,author_affiliations
0,202-237,pub.1110025899,The Law of Mind 1,chapter,2017,"[[{'first_name': 'Charles S', 'last_name': 'Pe..."
1,1-3,pub.1106516825,Proem The Rules of Philosophy 1,chapter,2017,"[[{'first_name': 'Charles S', 'last_name': 'Pe..."
2,157-178,pub.1110025908,The Architecture of Theories 1,chapter,2017,"[[{'first_name': 'Charles S', 'last_name': 'Pe..."
3,106-130,pub.1110025906,The Order of Nature 1,chapter,2017,"[[{'first_name': 'Charles S', 'last_name': 'Pe..."
4,179-201,pub.1110025909,The Doctrine of Necessity Examined 1,chapter,2017,"[[{'first_name': 'Charles S', 'last_name': 'Pe..."


Instead of first name, initials can also be used. These are examples of
valid research search phrases:

-   `\"Peirce, Charles S.\"`
-   `\"Charles S. Peirce\"`
-   `\"CS Peirce\"`
-   `\"Peirce CS\"`
-   `\"C S Peirce\"`
-   `\"Peirce C S\"`
-   `\"C Peirce\"`
-   `\"Peirce C\"`
-   `\"Charles Peirce\"`
-   `\"Peirce Charles\"`

**Warning**: In order to produce valid results an author or an investigator search
query must contain **at least two components or more** (e.g., name and
surname, either in full or initials).

Investigators search is similar to *authors* search, only it allows to search on `grants` and
`clinical trials` using a separate search index `investigators`, and on
`patents` using the index `inventors`.

In [369]:
%%dsldf 
search clinical_trials in investigators for "\"John Smith\"" 
return clinical_trials limit 5

Returned Clinical_trials: 2 (total = 2)


Unnamed: 0,title,id,investigator_details,active_years
0,VEPTR Implantation to Treat Children With Earl...,NCT00689533,"[[John M Flynn, MD, Principal Investigator, Ch...","[2008, 2009, 2010, 2011, 2012, 2013, 2014, 201..."
1,Prospective Evaluation of Symptom Resolution i...,NCT01241149,"[[Ellie Mentler, MD, Principal Investigator, U...",


In [370]:
%%dsldf 
search grants in investigators for "\"Satoko Shimazaki\"" 
return grants limit 5

Returned Grants: 4 (total = 4)


Unnamed: 0,id,start_date,project_num,end_date,start_year,title_language,title,original_title,funders,language,active_year,funding_org_name
0,grant.7925589,2020-09-01,FEL-263245-19,2021-08-31,2020,en,"Kabuki Actors, Print Technology, and the Theat...","Kabuki Actors, Print Technology, and the Theat...","[{'id': 'grid.422239.c', 'state_name': 'Distri...",en,"[2020, 2021]",National Endowment for the Humanities
1,grant.7527261,2018-04-01,18K00431,2021-03-31,2018,ja,Genealogy research on female saints in the Pal...,古・中英語期における女性聖人伝の系譜研究：Aelfricのテクストと言語を中心に,"[{'id': 'grid.54432.34', 'linkout': ['http://w...",ja,"[2018, 2019, 2020, 2021]",Japan Society for the Promotion of Science
2,grant.5858713,2015-04-01,15K02313,2018-03-31,2015,en,Images of Women in the Old English Lives of Sa...,Images of Women in the Old English Lives of Sa...,"[{'id': 'grid.54432.34', 'linkout': ['http://w...",en,"[2015, 2016, 2017, 2018]",Japan Society for the Promotion of Science
3,grant.6086985,2012-04-01,24520310,2015-03-31,2012,en,Reception and Transfromation of the Images of ...,Reception and Transfromation of the Images of ...,"[{'id': 'grid.54432.34', 'linkout': ['http://w...",en,"[2012, 2013, 2014, 2015]",Japan Society for the Promotion of Science


In [371]:
%%dsldf 
search patents in inventors for "\"John Smith\"" 
return patents limit 5

Returned Patents: 5 (total = 501)


Unnamed: 0,id,times_cited,assignee_names,title,publication_date,inventor_names,filing_status,year,assignees,granted_year
0,US-20020160362-A1,0,"[AstraZeneca AB, SMITH JOHN CRAIG]",Diagnostic method,2002-10-31,[John Smith],Application,2001,"[{'id': 'grid.418151.8', 'country_name': 'Swed...",
1,GB-2384299-B,0,"[Llanelli Radiators Ltd, Calsonic Kansei UK Lt...",Automotive heat exchanger,2006-03-22,[SMITH JOHN],Grant,2002,"[{'id': 'grid.472810.8', 'country_name': 'Unit...",2006.0
2,US-20050133900-A1,2,"[Tessera Inc, TESSERA INC]",Microelectronic assemblies with composite cond...,2005-06-23,[John Smith],Application,2005,"[{'id': 'grid.455499.0', 'country_name': 'Unit...",
3,IE-S20030195-A2,0,[SMITH JOHN],A lockable safety insert for an electrical dom...,2004-11-03,[SMITH JOHN],Grant,2003,,2004.0
4,GB-2513101-A,0,"[Eley Ltd, ELEY LTD]",Ammunition cartridge,2014-10-22,[SMITH JOHN],Application,2013,,


### 4.2 Fuzzy Searches

This type of search is similar to *full-text
search*, with the difference that it
allows searching by only a part of a name, e.g. only the 'last name' of
a person, by using the `where` clause. 

**Note** At this moment, this type of search is only available for
`publications`. Other sources will add this option in the future.

For example:

In [372]:
%%dsldf 
search publications where authors = "Hawking" 
return publications limit 5[id+doi+title+authors] limit 10

Returned Errors: 1
1 QuerySyntaxError found
1 ParserError found
  * [Line 1:73] ('[') mismatched input '[' expecting <EOF>


Generally speaking, using a `where` clause to search authors is less
precise that using the relevant exact-search syntax. 

On the other hand, using a
`where` clause can be handy if one wants to **combine an author search
with another full-text search index**.

For example:

In [373]:
%%dsldf 
search publications 
    in title_abstract_only for "dna replication" 
    where authors = "smith"  
return publications limit 5

Returned Publications: 5 (total = 1511)


Unnamed: 0,pages,id,title,volume,type,year,author_affiliations,issue,journal.id,journal.title
0,11.0,pub.1124060243,Longitudinal epigenome-wide association studie...,12.0,article,2020,"[[{'first_name': 'Clara', 'last_name': 'Snijde...",1.0,jour.1042271,Clinical Epigenetics
1,46.0,pub.1125664041,An epigenome-wide association study of posttra...,12.0,article,2020,"[[{'first_name': 'Mark W.', 'last_name': 'Logu...",1.0,jour.1042271,Clinical Epigenetics
2,37.0,pub.1124910780,Genetic associations with clozapine-induced my...,10.0,article,2020,"[[{'first_name': 'Paul', 'last_name': 'Lacaze'...",1.0,jour.1045271,Translational Psychiatry
3,,pub.1126888400,Telomere Length and Autism Spectrum Disorder W...,,article,2020,"[[{'first_name': 'Candace R.', 'last_name': 'L...",,jour.1039397,Autism Research
4,1760.0,pub.1126670480,Biophysical Screens Identify Fragments That Bi...,25.0,article,2020,"[[{'first_name': 'Troy E.', 'last_name': 'Mess...",7.0,jour.1312072,Molecules


### 4.3 Using the disambiguated Researchers database

The Dimensions [Researchers](https://docs.dimensions.ai/dsl/datasource-researchers.html) source is a database of
researchers information algorithmically extracted and disambiguated from
all of the other content sources (publications, grants, clinical trials
etc..).

By using the `researchers` source it is possible to match an
'aggregated' person object linking together multiple publication
authors, grant investigators etc.. irrespectively of the form their
names can take in the original source documents.

However, since database does not contain all authors and investigators information
available in Dimensions. 

E.g. think of authors from older publications,
or authors with very common names that are difficult to disambiguate, or
very new authors, who have only one or few publications. In such cases,
using full-text authors search might be more
appropriate.

Examples:

In [374]:
%%dsldf 
search researchers for "\"Satoko Shimazaki\"" 
return researchers[basics+obsolete] 

Returned Researchers: 4 (total = 4)


Unnamed: 0,last_name,first_name,research_orgs,id,obsolete
0,Shimazaki,Satoko,"[{'id': 'grid.19006.3e', 'state_name': 'Califo...",ur.014307627665.09,
1,Shimazaki,Satoko,"[{'id': 'grid.266190.a', 'state_name': 'Colora...",ur.015527473602.63,
2,Shimazaki,Satoko,,ur.010537333602.30,1.0
3,Shimazaki,Satoko,,ur.07751146721.59,


NOTE pay attentiont to the `obsolete` field. This indicates the researcher ID status. 0 means that the researcher ID is still **active**, 1 means that the researcher ID is **no longer valid**. This is due to the ongoing process of refinement of Dimensions researchers. 

Hence the query above is best written like this:

In [375]:
%%dsldf 
search researchers where obsolete=0 for "\"Satoko Shimazaki\"" 
return researchers[basics+obsolete] 

Returned Researchers: 3 (total = 3)


Unnamed: 0,research_orgs,id,last_name,first_name
0,"[{'id': 'grid.266190.a', 'longitude': -105.265...",ur.015527473602.63,Shimazaki,Satoko
1,"[{'id': 'grid.19006.3e', 'longitude': -118.444...",ur.014307627665.09,Shimazaki,Satoko
2,,ur.07751146721.59,Shimazaki,Satoko


With `Researchers`, one can use other fields as well:

In [403]:
%%dsldf 
search researchers 
    where obsolete=0 and last_name="Shimazaki" 
return researchers[basics] limit 5

Returned Researchers: 5 (total = 470)


Unnamed: 0,id,last_name,first_name,research_orgs
0,ur.01214370624.48,Shimazaki,Makiko,
1,ur.0651721117.33,Shimazaki,Megumi,"[{'id': 'grid.268397.1', 'longitude': 131.4687..."
2,ur.01061154476.35,Shimazaki,Reiri,"[{'id': 'grid.136304.3', 'longitude': 140.1034..."
3,ur.013166423052.62,Shimazaki,Shingo,"[{'id': 'grid.62167.34', 'longitude': 139.5582..."
4,ur.011036721776.57,Shimazaki,Toshiyuki,"[{'id': 'grid.411497.e', 'longitude': 130.3662..."


## 5. Returning results

After the `search` phrase, a query must contain one or more `return`
phrases, specifying the content and format of the information that
should be returned.



### 5.1 Returning Multiple Sources

Multiple results may not be returned in a single `return` phrase.

In [404]:
%%dsldf 
search publications 
return funders limit 5 
return research_orgs limit 5 
return year

Returned Research_orgs: 5
Returned Year: 20
Returned Funders: 5


Unnamed: 0,id,count,city_name,latitude,name,acronym,types,country_name,linkout,longitude,state_name
0,grid.26999.3d,323521,Tokyo,35.713333,University of Tokyo,UT,[Education],Japan,[http://www.u-tokyo.ac.jp/en/],139.76222,
1,grid.17063.33,294497,Toronto,43.661667,University of Toronto,,[Education],Canada,[http://www.utoronto.ca/],-79.395,Ontario
2,grid.38142.3c,294239,Cambridge,42.377052,Harvard University,,[Education],United States,[http://www.harvard.edu/],-71.11665,Massachusetts
3,grid.214458.e,258058,Ann Arbor,42.278305,University of Michigan,UM,[Education],United States,[https://www.umich.edu/],-83.73822,Michigan
4,grid.19006.3e,250895,Los Angeles,34.072224,"University of California, Los Angeles",UCLA,[Education],United States,[http://www.ucla.edu/],-118.4441,California



### 5.2 Returning Specific Fields

For control over which information from each given `record` will be
returned, a `source` or `entity` name in the `results` phrase can be
optionally followed by a specification of `fields` and `fieldsets` to be
included in the JSON results for each retrieved record.

The fields specification may be an arbitrary list of `field` names
enclosed in brackets (`[`, `]`), with field names separated by a plus
sign (`+`). Minus sign (`-`) can be used to exclude `field` or a
`fieldset` from the result. Field names thus listed within brackets must
be "known" to the DSL, and therefore only a subset of fields may be used
in this syntax (see note below).

In [378]:
%%dsldf 
search grants 
return grants[grant_number + title + language] limit 5

Returned Grants: 5 (total = 5263527)


Unnamed: 0,title,language,grant_number
0,APPROACH to Enriching the Real World Evidence ...,en,2018-HRSI-1548
1,Molecular mechanism of DNA double strand break...,en,1301720F
2,Life as concept and as science,en,M 2734
3,Jet quenching for heavy-ion collisions at the LHC,en,893021
4,Scintillation Light For New Physics with Liqui...,en,892933


In [379]:
%%dsldf 
search clinical_trials 
return clinical_trials [id+ title + acronym + phase] limit 5

Returned Clinical_trials: 5 (total = 555467)


Unnamed: 0,id,title,phase,acronym
0,NCT00249756,Re-Entry MTC for Offenders With MICA Disorders,,
1,NCT00249782,"A Phase II, Randomized, Partial-Blind, Paralle...",Phase 2,
2,NCT00249795,A Parallel Randomized Controlled Evaluation of...,Phase 3,ACTIVE I
3,NCT00249808,"A Multicentre, Open Label Phase IIIb/IV Study ...",Phase 4,
4,NCT00249847,A Feasibility Study of Positron Emission Tomog...,,


**Shortcuts: `fieldsets`**

The fields specification may be the name of a pre-defined `fieldset`
(e.g. `extras`, `basics`). These are shortcuts that can be handy when testing out new queries, for example. 

NOTE In general when writing code used in integrations or long-standing extraction scripts it is **best to return specific fields rather that a predefined set**. This has also the advantage of making queries faster by avoiding the extraction of unnecessary data.
    

In [380]:
%%dsldf 
search grants 
return grants [basics] limit 5 

Returned Grants: 5 (total = 5263527)
Field 'project_num' is deprecated in favor of grant_number. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'title_language' is deprecated in favor of language_title. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,funders,active_year,original_title,start_year,project_num,id,funding_org_name,start_date,title_language,title,language,end_date
0,"[{'id': 'grid.484521.e', 'longitude': -66.6605...",[2021],APPROACH to Enriching the Real World Evidence ...,2021,2018-HRSI-1548,grant.8690978,New Brunswick Health Research Foundation,2021-11-30,en,APPROACH to Enriching the Real World Evidence ...,en,
1,"[{'id': 'grid.424470.1', 'longitude': 4.370708...",[2021],Mécanismes moléculaires de la formation et la ...,2021,1301720F,grant.8950252,Fund for Scientific Research,2021-10-01,en,Molecular mechanism of DNA double strand break...,en,
2,"[{'id': 'grid.25111.36', 'longitude': 16.35238...","[2021, 2022, 2023]",Life as concept and as science,2021,M 2734,grant.8715161,FWF Austrian Science Fund,2021-10-01,en,Life as concept and as science,en,2023-09-30
3,"[{'id': 'grid.270680.b', 'longitude': 4.36367,...","[2021, 2022, 2023]",Jet quenching for heavy-ion collisions at the LHC,2021,893021,grant.8963889,European Commission,2021-09-01,en,Jet quenching for heavy-ion collisions at the LHC,en,2023-08-31
4,"[{'id': 'grid.270680.b', 'longitude': 4.36367,...","[2021, 2022, 2023]",Scintillation Light For New Physics with Liqui...,2021,892933,grant.8964235,European Commission,2021-09-01,en,Scintillation Light For New Physics with Liqui...,en,2023-08-31


In [381]:
%%dsldf 
search publications 
return publications [basics+times_cited] limit 5 

Returned Publications: 5 (total = 109295145)
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,pages,id,times_cited,title,type,year,author_affiliations
0,1071-1080,pub.1123548159,0,§ 33c. Vorbehalt der Gegenseitigkeit,chapter,2020,
1,1-46,pub.1123550575,0,Einleitung,chapter,2020,
2,258-276,pub.1122051582,0,CHAPTER XIV. Siting Absence,chapter,2020,"[[{'first_name': 'Nicole', 'last_name': 'Gerva..."
3,,pub.1123390607,0,Calling Philosophers Names,monograph,2020,"[[{'first_name': 'Christopher', 'last_name': '..."
4,207-228,pub.1123015862,0,"The End of the Socialist (An-)Economy, Money, ...",chapter,2020,"[[{'first_name': 'Jurij', 'last_name': 'Murašo..."


The fields specification may be an (`all`), to indicate that all fields
available for the given `source` should be returned.

In [382]:
%%dsldf
search publications 
return publications [all] limit 5 

Returned Publications: 5 (total = 109295145)
Field 'open_access' is deprecated in favor of open_access_categories. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_HC' is deprecated in favor of category_hrcs_hc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'RCDC' is deprecated in favor of category_rcdc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'category_ua' is deprecated in favor of category_uoa. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR_first' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl

Unnamed: 0,book_title,open_access_categories,book_doi,id,title,times_cited,doi,pages,publisher,date_inserted,year,type,date,authors,author_affiliations,terms,concepts
0,Wertpapiererwerbs- und Übernahmegesetz,"[{'id': 'closed', 'name': 'Closed', 'descripti...",10.9785/9783504386382,pub.1123548159,§ 33c. Vorbehalt der Gegenseitigkeit,0,10.9785/9783504386382-043,1071-1080,Verlag Dr. Otto Schmidt,2019-12-23,2020,chapter,2020-12-31,,,,
1,Wertpapiererwerbs- und Übernahmegesetz,"[{'id': 'closed', 'name': 'Closed', 'descripti...",10.9785/9783504386382,pub.1123550575,Einleitung,0,10.9785/9783504386382-005,1-46,Verlag Dr. Otto Schmidt,2019-12-23,2020,chapter,2020-12-31,,,,
2,Women Mobilizing Memory,"[{'id': 'closed', 'name': 'Closed', 'descripti...",10.7312/alti19184,pub.1122051582,CHAPTER XIV. Siting Absence,0,10.7312/alti19184-016,258-276,Columbia University Press,2019-10-26,2020,chapter,2020-12-31,"[{'first_name': 'Nicole', 'last_name': 'Gervas...","[[{'first_name': 'Nicole', 'last_name': 'Gerva...",[absence],[absence]
3,,"[{'id': 'closed', 'name': 'Closed', 'descripti...",,pub.1123390607,Calling Philosophers Names,0,10.1515/9780691197425,,De Gruyter,2019-12-15,2020,monograph,2020-12-31,"[{'first_name': 'Christopher', 'last_name': 'M...","[[{'first_name': 'Christopher', 'last_name': '...",,
4,Cultures of Economy in South-Eastern Europe,"[{'id': 'closed', 'name': 'Closed', 'descripti...",10.14361/9783839450260,pub.1123015862,"The End of the Socialist (An-)Economy, Money, ...",0,10.14361/9783839450260-012,207-228,Transcript Verlag,2019-12-01,2020,chapter,2020-12-31,"[{'first_name': 'Jurij', 'last_name': 'Murašov...","[[{'first_name': 'Jurij', 'last_name': 'Murašo...",[end],[end]


### 5.3 Returning Facets

In addition to returning source records matching a query, it is possible
to $facet$ on the [entity](data-entities.ipynb) fields related to a
particular source and return only those entity values as an aggregrated
view of the related source data. This operation is similar to a
$group by$ or $pivot table$.

**Warning** Faceting can return up to a maximum of 1000 results. This is to ensure
adequate performance with all queries. Furthemore, although the `limit`
operator is allowed, the `skip` operator cannot be used.

In [383]:
%%dsldf 
search publications 
    for "coronavirus" 
return research_orgs limit 5

Returned Research_orgs: 5


Unnamed: 0,id,count,state_name,linkout,country_name,latitude,types,acronym,longitude,name,city_name
0,grid.194645.b,892,Hong Kong,[http://www.hku.hk/],China,22.283287,[Education],HKU,114.13708,University of Hong Kong,Hong Kong
1,grid.21107.35,676,Maryland,[https://www.jhu.edu/],United States,39.328888,[Education],JHU,-76.62028,Johns Hopkins University,Baltimore
2,grid.25879.31,612,Pennsylvania,[http://www.upenn.edu/],United States,39.952457,[Education],,-75.19322,University of Pennsylvania,Philadelphia
3,grid.416738.f,563,Georgia,[http://www.cdc.gov/],United States,33.798817,[Government],CDC,-84.3256,Centers for Disease Control and Prevention,Atlanta
4,grid.38142.3c,547,Massachusetts,[http://www.harvard.edu/],United States,42.377052,[Education],,-71.11665,Harvard University,Cambridge


In [384]:
%%dsldf 
search publications 
    for "coronavirus" 
return research_org_countries limit 5
return year limit 5
return category_for limit 5

Returned Category_for: 5
Returned Research_org_countries: 20
Returned Year: 20


Unnamed: 0,id,count,name
0,2211,48186,11 Medical and Health Sciences
1,2206,19903,06 Biological Sciences
2,3114,17594,1108 Medical Microbiology
3,3053,10857,1103 Clinical Sciences
4,3177,9930,1117 Public Health and Health Services


For control over the organization and headers of the JSON query results,
the `return` keyword in a return phrase may be followed by the keyword
`in` and then a `group` name for this group of results, where the group
name is enclosed in double quotes(`"`).

Also, one can define `aliases` that replace the defaul JSON fields names with other ones provided by the user. 

See the [official documentation](https://docs.dimensions.ai/dsl/language.html#aliases) for more details about this feature. 

In [405]:
%%dsldf 
search publications 
return in "facets" funders 
return in "facets" research_orgs

Returned Facets: 2


Unnamed: 0,research_orgs,funders
0,"[{'id': 'grid.26999.3d', 'count': 323521, 'lin...","[{'id': 'grid.419696.5', 'count': 1884970, 'li..."


### 5.4 What the query statistics refer to - sources VS facets

When performing a DSL search, a `_stats` object is return which contains some useful info eg the total number of records available for a search. 

In [386]:
%%dsldf 
search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
return publications limit 5

Returned Publications: 5 (total = 3798)


Unnamed: 0,pages,id,title,volume,type,year,author_affiliations,issue,journal.id,journal.title
0,18124-18131,pub.1110885950,Development of Organo-Dispersible Graphene Oxi...,3.0,article,2018,"[[{'first_name': 'Siewteng', 'last_name': 'Sim...",12.0,jour.1157000,ACS Omega
1,29200-29209,pub.1110369527,"Anisotropic Crystal Growth, Optical Absorption...",122.0,article,2018,"[[{'first_name': 'Taro', 'last_name': 'Toyoda'...",51.0,jour.1038386,The Journal of Physical Chemistry C
2,,pub.1110925389,Nuclear Ab Initio Calculations with the Unitar...,,proceeding,2018,"[[{'first_name': 'T.', 'last_name': 'Miyagi', ...",,,
3,28491-28496,pub.1110271601,Indium Zinc Oxide Electron Transport Layer for...,122.0,article,2018,"[[{'first_name': 'Liang', 'last_name': 'Wang',...",50.0,jour.1038386,The Journal of Physical Chemistry C
4,43682-43690,pub.1110222625,Chalcopyrite ZnSnSb2: A Promising Thermoelectr...,10.0,article,2018,"[[{'first_name': 'Ami', 'last_name': 'Nomura',...",50.0,jour.1041450,ACS Applied Materials & Interfaces




It is important to note though that the **total number always refers to the main source, never the facets** one is searching for. 

For example, in this query we return `researchers` linked to publications: 

In [406]:
%%dsldf 
search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
return researchers limit 5

Returned Researchers: 5


Unnamed: 0,id,count,research_orgs,last_name,first_name,orcid_id
0,ur.01055753603.27,138,"[grid.136593.b, grid.410825.a, grid.482504.f, ...",Hayase,Shuzi Shuzi,
1,ur.011212042763.67,101,"[grid.462727.2, grid.258806.1, grid.27476.30]",Hikita,Masayuki,
2,ur.01144540527.52,98,"[grid.11135.37, grid.177174.3, grid.411485.d, ...",Ma,Ting-Li,[0000-0002-3310-459X]
3,ur.07644453127.11,95,"[grid.462727.2, grid.4444.0, grid.258806.1, gr...",Kozako,M Kozako M,
4,ur.016357156077.09,90,"[grid.16821.3c, grid.1024.7, grid.454850.8, gr...",Lu,Huimin,[0000-0001-9794-3221]


NOTE: facet results can be 1000 at most (due to performance limitations) so if there are more than 1000 it is not possible to know the total number. 

### 5.5 Paginating Results

At the end of a `return` phrase, the user can specify the maximum number
of results to be returned and the number of top records to skip over
before returning the first result record, for e.g. returning large
result sets page-by-page (i.e. "paging" results) as described below.

This is done using the keyword `limit` followed by the maximum number of
results to return, optionally followed by the keyword `skip` and the
number of results to skip (the offset).

In [388]:
%%dsldf 
search publications return publications limit 10

Returned Publications: 10 (total = 109295145)


Unnamed: 0,pages,id,title,type,year,author_affiliations
0,1071-1080,pub.1123548159,§ 33c. Vorbehalt der Gegenseitigkeit,chapter,2020,
1,1-46,pub.1123550575,Einleitung,chapter,2020,
2,258-276,pub.1122051582,CHAPTER XIV. Siting Absence,chapter,2020,"[[{'first_name': 'Nicole', 'last_name': 'Gerva..."
3,,pub.1123390607,Calling Philosophers Names,monograph,2020,"[[{'first_name': 'Christopher', 'last_name': '..."
4,207-228,pub.1123015862,"The End of the Socialist (An-)Economy, Money, ...",chapter,2020,"[[{'first_name': 'Jurij', 'last_name': 'Murašo..."
5,,pub.1123390647,Utopophobia,monograph,2020,"[[{'first_name': 'David', 'last_name': 'Estlun..."
6,viii-viii,pub.1123553347,Bearbeiterverzeichnis,chapter,2020,
7,137-153,pub.1123555428,§ 4. Aufgaben und Befugnisse,chapter,2020,
8,15-220,pub.1122620908,§ 2 Rechtsformvergleich und besondere Erschein...,chapter,2020,"[[{'first_name': 'Thomas', 'last_name': 'Muell..."
9,463-732,pub.1122620912,§ 6 Laufende Besteuerung der Gesellschaft und ...,chapter,2020,"[[{'first_name': 'Petra', 'last_name': 'Eckl',..."


If paging information is not provided, the default values
`limit 20 skip 0` are used, so the two following queries are equivalent:

Combining `limit` and `skip` across multiple queries enables paging or
batching of results; e.g. to retrieve 30 grant records divided into 3
pages of 10 records each, the following three queries could be used:

```
return grants limit 10           => get 1st 10 records for page 1 (skip 0, by default)
return grants limit 10 skip 10   => get next 10 for page 2; skip the 10 we already have
return grants limit 10 skip 20   => get another 10 for page 3, for a total of 30
```

### 5.6 Sorting Results

A sort order for the results in a given `return` phrase can be specified
with the keyword `sort by` followed by the name of 
* a `field` (in the
case that a `source` is being requested) 
* an `indicator (aggregation)` (in the case
that one or more facets are being requested). 

 By default, the result set of full text
queries ($search ... for "full text query"$) is sorted by "relevance".
Additionally, it is possible to specify the sort order, using `asc` or
`desc` keywords. By default, descending order is selected.

In [407]:
%%dsldf 
search grants 
    for "nanomaterials"
return grants sort by title desc limit 5 

Returned Grants: 5 (total = 17425)


Unnamed: 0,start_year,end_date,original_title,start_date,id,title_language,active_year,funders,funding_org_name,project_num,title,language
0,2019,2022-03-31,x,2019-04-01,grant.8518592,pl,"[2019, 2020, 2021, 2022]","[{'id': 'grid.436846.b', 'city_name': 'Krakow'...",National Science Center,2018/29/N/ST5/01240,x,pl
1,2012,,Transmissionselektronenmikroskop,2012-01-01,grant.4823271,en,[2012],"[{'id': 'grid.424150.6', 'city_name': 'Bonn', ...",German Research Foundation,220923099,Transmissionselektronenmikroskop,de
2,2015,,Transmissionselektronenmikroskop,2015-01-01,grant.4841519,en,[2015],"[{'id': 'grid.424150.6', 'city_name': 'Bonn', ...",German Research Foundation,280331443,Transmissionselektronenmikroskop,en
3,2011,2015-06-13,Snowcontrol.,2011-06-16,grant.6774902,en,"[2011, 2012, 2013, 2014, 2015]","[{'id': 'grid.425119.a', 'city_name': 'Brussel...",Belgian Federal Science Policy Office,3E120109,Snowcontrol.,en
4,2014,,Röntgenquelle,2014-01-01,grant.4834305,en,[2014],"[{'id': 'grid.424150.6', 'city_name': 'Bonn', ...",German Research Foundation,245513494,Röntgenquelle,de


In [408]:
%%dsldf  
search grants  
    for "nanomaterials"
return grants  sort by relevance desc limit 5

Returned Grants: 5 (total = 17425)


Unnamed: 0,funders,active_year,original_title,start_year,end_date,project_num,id,funding_org_name,start_date,title_language,title,language
0,"[{'id': 'grid.437854.9', 'longitude': -6.24968...","[2012, 2013]",Optically-active chiral nanomaterials,2012,2013-05-31,11/W.1/I2065,grant.3984032,Science Foundation Ireland,2012-06-01,en,Optically-active chiral nanomaterials,en
1,"[{'id': 'grid.452912.9', 'longitude': -75.6925...","[2016, 2017]",Polymer Nanomaterials,2016,2017-03-31,617153,grant.6973270,Natural Sciences and Engineering Research Council,2016-04-01,en,Polymer Nanomaterials,en
2,"[{'id': 'grid.452912.9', 'longitude': -75.6925...","[2016, 2017]",Polymer Nanomaterials,2016,2017-03-31,617505,grant.6973622,Natural Sciences and Engineering Research Council,2016-04-01,en,Polymer Nanomaterials,en
3,"[{'id': 'grid.452912.9', 'longitude': -75.6925...","[2014, 2015]",Polymer Nanomaterials,2014,2015-03-31,557300,grant.4167216,Natural Sciences and Engineering Research Council,2014-04-01,en,Polymer Nanomaterials,en
4,"[{'id': 'grid.452912.9', 'longitude': -75.6925...","[2014, 2015]",Polymer Nanomaterials,2014,2015-03-31,557542,grant.4168751,Natural Sciences and Engineering Research Council,2014-04-01,en,Polymer Nanomaterials,en


Number of citations per publication

In [409]:
%%dsldf  
search publications
return publications  [doi + times_cited] 
    sort by times_cited limit 5

Returned Publications: 5 (total = 109295145)


Unnamed: 0,times_cited,doi
0,230371,
1,196273,10.1038/227680a0
2,177891,10.1016/0003-2697(76)90527-3
3,86003,10.1006/meth.2001.1262
4,81735,10.1103/physrevlett.77.3865


Recent citations per publication.
Note: Recent citation refers to the number of citations accrued in the last two year period. A single value is stored per document and the year window rolls over in July.

In [411]:
%%dsldf 
search publications
return publications [doi + recent_citations]
    sort by recent_citations limit 5

Returned Publications: 5 (total = 109295145)


Unnamed: 0,recent_citations,doi
0,27944,10.1006/meth.2001.1262
1,20853,10.1103/physrevlett.77.3865
2,20215,10.1176/appi.books.9780890425596
3,18436,10.1191/1478088706qp063oa
4,15617,10.1016/0003-2697(76)90527-3


When a facet is being returned, the `indicator` used in the
`sort` phrase must either be `count` (the default, such that
`sort by count` is unnecessary), or one of the indicators specified in
the `aggregate` phrase, i.e. one whose values are being computed in the
faceting operation. 


In [412]:
%%dsldf 
search publications 
    for "nanomaterials"
return research_orgs 
    aggregate altmetric_median, rcr_avg sort by rcr_avg limit 5 

Returned Research_orgs: 5


Unnamed: 0,id,count,rcr_avg,altmetric_median,city_name,latitude,name,types,country_name,linkout,longitude,acronym,state_name
0,grid.11444.34,1,207.350006,338.0,Shanghai,31.211678,Shanghai Institute of Hypertension,[Facility],China,[http://www.china-sih.com/],121.467255,,
1,grid.11485.39,1,207.350006,338.0,London,51.531322,Cancer Research UK,[Nonprofit],United Kingdom,[http://www.cancerresearchuk.org/],-0.106269,CRUK,
2,grid.11642.30,1,207.350006,338.0,Saint-Denis,-20.901735,University of La Réunion,[Education],Reunion,[http://www.univ-reunion.fr/university-of-reun...,55.48455,,
3,grid.120073.7,1,207.350006,338.0,Cambridge,52.176,Addenbrooke's Hospital,[Healthcare],United Kingdom,[http://www.cuh.org.uk/addenbrookes-hospital],0.14,,Cambridgeshire
4,grid.20931.39,1,207.350006,338.0,London,51.5368,Royal Veterinary College,[Education],United Kingdom,[http://www.rvc.ac.uk/],-0.134,RVC,


## 6. Aggregations

In a `return` phrase requesting one or more `facet` results, aggregation
operations to perform during faceting can be specified after the facet
name(s) by using the keyword `aggregate` followed by a comma-separated
list of one or more `indicator` names corresponding to the `source`
being searched.

In [413]:
%%dsldf
search publications 
    where year > 2010 
return research_orgs  
    aggregate rcr_avg, altmetric_median limit 5

Returned Research_orgs: 5


Unnamed: 0,id,count,rcr_avg,altmetric_median,state_name,city_name,latitude,name,types,country_name,linkout,longitude,acronym
0,grid.17063.33,138610,1.69156,3.0,Ontario,Toronto,43.661667,University of Toronto,[Education],Canada,[http://www.utoronto.ca/],-79.395,
1,grid.38142.3c,134298,2.211888,5.0,Massachusetts,Cambridge,42.377052,Harvard University,[Education],United States,[http://www.harvard.edu/],-71.11665,
2,grid.11899.38,129887,1.045246,2.0,,São Paulo,-23.563051,University of Sao Paulo,[Education],Brazil,[http://www5.usp.br/en/],-46.730103,USP
3,grid.83440.3b,118825,1.905526,4.0,,London,51.52447,University College London,[Education],United Kingdom,[http://www.ucl.ac.uk/],-0.133982,UCL
4,grid.26999.3d,117806,1.180117,2.0,,Tokyo,35.713333,University of Tokyo,[Education],Japan,[http://www.u-tokyo.ac.jp/en/],139.76222,UT


**What are the metrics/aggregations available?** See the data sources documentation for information about available [indicators](https://docs.dimensions.ai/dsl/datasource-publications.html#publications-indicators).  

Alternatively, we can use the 'schema' API ([describe](https://docs.dimensions.ai/dsl/data-sources.html#metadata-api)) to return this information programmatically:

In [395]:
schema = dsl.query("describe schema")
sources = [x for x in schema['sources']]
# for each source name, extract metrics info
for s in sources:
    print("SOURCE:", s)
    for m in schema['sources'][s]['metrics']:
        print("--", schema['sources'][s]['metrics'][m]['name'], " => ", schema['sources'][s]['metrics'][m]['description'], )

SOURCE: publications
-- count  =>  Total count
-- altmetric_median  =>  Median Altmetric attention score
-- altmetric_avg  =>  Altmetric attention score mean
-- citations_total  =>  Aggregated number of citations
-- citations_avg  =>  Arithmetic mean of citations
-- citations_median  =>  Median of citations
-- recent_citations_total  =>  For a given article, in a given year, the number of citations accrued in the last two year period. Single value stored per document, year window rolls over in July.
-- rcr_avg  =>  Arithmetic mean of `relative_citation_ratio` field.
-- fcr_gavg  =>  Geometric mean of `field_citation_ratio` field (note: This field cannot be used for sorting results).
SOURCE: grants
-- count  =>  Total count
-- funding  =>  Total funding amount, in USD.
SOURCE: patents
-- count  =>  Total count
SOURCE: clinical_trials
-- count  =>  Total count
SOURCE: policy_documents
-- count  =>  Total count
SOURCE: researchers
-- count  =>  Total count
SOURCE: organizations
-- count  

**NOTE** In addition to any specified aggregations, `count` is always computed
and reported when facet results are requested.

In [415]:
%%dsldf
search grants 
    for "5g network" 
return funders 
    aggregate count, funding sort by funding limit 5 

Returned Funders: 5


Unnamed: 0,id,count,funding,city_name,latitude,name,acronym,types,country_name,linkout,longitude,state_name
0,grid.270680.b,175,834354500.0,Brussels,50.85165,European Commission,EC,[Government],Belgium,[http://ec.europa.eu/index_en.htm],4.36367,
1,grid.421091.f,68,52650403.0,Swindon,51.567093,Engineering and Physical Sciences Research Cou...,EPSRC,[Government],United Kingdom,[https://www.epsrc.ac.uk/],-1.784602,England
2,grid.457785.c,104,49173505.0,Arlington,38.88058,Directorate for Computer & Information Science...,NSF CISE,[Government],United States,[http://www.nsf.gov/dir/index.jsp?org=CISE],-77.111,Virginia
3,grid.55047.33,5,47182381.0,Warsaw,52.227455,National Centre for Research and Development,NCRD,[Government],Poland,[http://www.ncbr.gov.pl/en/],21.00763,
4,grid.457810.f,71,24096160.0,Arlington,38.88058,Directorate for Engineering,NSF ENG,[Government],United States,[http://www.nsf.gov/dir/index.jsp?org=ENG],-77.111,Virginia


Aggregated total number of citations

In [417]:
%%dsldf
search publications
    for "ontologies"
return funders 
    aggregate citations_total 
    sort by citations_total  limit 5

Returned Funders: 5


Unnamed: 0,id,count,citations_total,state_name,city_name,latitude,name,acronym,types,country_name,linkout,longitude
0,grid.48336.3a,11902,787593.0,Maryland,Rockville,39.004326,National Cancer Institute,NCI,[Government],United States,[http://www.cancer.gov/],-77.10119
1,grid.280785.0,11460,761933.0,Maryland,Bethesda,38.997833,National Institute of General Medical Sciences,NIGMS,[Facility],United States,[http://www.nigms.nih.gov/Pages/default.aspx],-77.09938
2,grid.280128.1,4379,563695.0,Maryland,Bethesda,38.996967,National Human Genome Research Institute,NHGRI,[Facility],United States,[https://www.genome.gov/],-77.09693
3,grid.270680.b,17489,530976.0,,Brussels,50.85165,European Commission,EC,[Government],Belgium,[http://ec.europa.eu/index_en.htm],4.36367
4,grid.52788.30,4777,409828.0,,London,51.525867,Wellcome Trust,WT,[Nonprofit],United Kingdom,[http://www.wellcome.ac.uk/],-0.135005


Arithmetic mean number of citations

In [418]:
%%dsldf
search publications
return funders 
    aggregate citations_avg 
    sort by citations_avg limit 5

Returned Funders: 5


Unnamed: 0,id,count,citations_avg,longitude,linkout,types,state_name,name,country_name,city_name,latitude
0,grid.478308.0,165,279.181818,-77.03973,[http://www.stewart-trust.org/],[Nonprofit],District of Columbia,Alexander & Margaret Stewart Trust,United States,Washington D.C.,38.90116
1,grid.453780.d,142,186.457746,-77.03952,[http://www.abc2.org/],[Nonprofit],District of Columbia,Accelerate Brain Cancer Cure,United States,Washington D.C.,38.90672
2,grid.478789.d,567,163.790123,-115.29985,[http://www.dwreynolds.org/],[Other],Nevada,Donald W. Reynolds Foundation,United States,Las Vegas,36.19046
3,grid.417710.4,180,161.838889,-77.20376,[http://www.hgsi.com],[Company],Maryland,Human Genome Sciences (United States),United States,Rockville,39.09665
4,grid.429197.0,718,145.075209,-73.982895,[http://www.hhwf.org/],[Other],New York,Helen Hay Whitney Foundation,United States,New City,41.15845


Geometric mean of FCR


In [419]:
%%dsldf
search publications
return funders 
    aggregate fcr_gavg limit 5

Returned Funders: 5


Unnamed: 0,id,fcr_gavg,count,linkout,country_name,latitude,types,acronym,longitude,name,city_name,state_name
0,grid.419696.5,2.310093,1884970,[http://www.nsfc.gov.cn/publish/portal1/],China,40.005177,[Government],NSFC,116.33983,National Natural Science Foundation of China,Beijing,
1,grid.270680.b,3.277453,659906,[http://ec.europa.eu/index_en.htm],Belgium,50.85165,[Government],EC,4.36367,European Commission,Brussels,
2,grid.424020.0,2.523333,592418,[http://www.most.gov.cn/eng/],China,39.827835,[Government],MOST,116.316284,Ministry of Science and Technology of the Peop...,Beijing,
3,grid.48336.3a,4.873172,581356,[http://www.cancer.gov/],United States,39.004326,[Government],NCI,-77.10119,National Cancer Institute,Rockville,Maryland
4,grid.54432.34,2.255033,566646,[http://www.jsps.go.jp/],Japan,35.68716,[Nonprofit],JSPS,139.74039,Japan Society for the Promotion of Science,Tokyo,


Median Altmetric Attention Score

In [420]:
%%dsldf 
search publications
return funders aggregate altmetric_median 
    sort by altmetric_median limit 5 

Returned Funders: 5


Unnamed: 0,id,count,altmetric_median,linkout,country_name,latitude,types,acronym,longitude,name,city_name,state_name
0,grid.258806.1,6,306.0,[https://www.kyutech.ac.jp/english/],Japan,33.894436,[Education],KIT,130.8392,Kyushu Institute of Technology,Kitakyushu,
1,grid.470711.4,2,111.5,[http://www.chss.org.uk/],United Kingdom,55.946075,[Nonprofit],CHSS,-3.219597,Chest Heart and Stroke Scotland,Edinburgh,
2,grid.443873.f,5,101.0,[http://www.lungevity.org/],United States,41.878674,[Nonprofit],LUNG,-87.62648,LUNGevity Foundation,Chicago,Illinois
3,grid.473856.b,2,51.0,[https://www.acf.hhs.gov/],United States,38.88594,[Government],ACF,-77.01637,Administration for Children and Families,Washington D.C.,District of Columbia
4,grid.473769.8,1,33.0,[http://www.bcan.org/],United States,38.988724,[Nonprofit],BCAN,-77.09788,Bladder Cancer Advocacy Network,Bethesda,Maryland
