# Exploring The Dimensions Search Language (DSL) - Deep Dive

This tutorial provides a detailed walkthrough of the most important features of the [Dimensions Search Language](https://docs.dimensions.ai/dsl/). 

This tutorial is based on the [Query Syntax](https://docs.dimensions.ai/dsl/language.html) section of the official documentation. So, it can be used as an interactive version of the documentation, as it allows to try out the various DSL queries presented there.

## What is the Dimensions Search Language?

The DSL aims to capture the type of interaction with Dimensions data
that users are accustomed to performing graphically via the [web
application](https://app.dimensions.ai/), and enable web app developers, power users, and others to
carry out such interactions by writing query statements in a syntax
loosely inspired by SQL but particularly suited to our specific domain
and data organization.

**Note:** this notebook uses the Python programming language, however all the **DSL queries are not Python-specific** and can in fact be reused with any other API client. 



## Prerequisites

This notebook assumes you have installed the [Dimcli](https://pypi.org/project/dimcli/) library and are familiar with the *Getting Started* tutorial.


In [1]:
!pip install dimcli --quiet 

import dimcli
from dimcli.shortcuts import *
import json
import sys
import pandas as pd
#

print("==\nLogging in..")
# https://github.com/digital-science/dimcli#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  USERNAME = getpass.getpass(prompt='Username: ')
  PASSWORD = getpass.getpass(prompt='Password: ')    
  dimcli.login(USERNAME, PASSWORD, ENDPOINT)
else:
  USERNAME, PASSWORD  = "", ""
  dimcli.login(USERNAME, PASSWORD, ENDPOINT)
dsl = dimcli.Dsl()

==
Logging in..
[2mDimcli - Dimensions API Client (v0.7.4.2)[0m
[2mConnected to: https://app.dimensions.ai - DSL v1.27[0m
[2mMethod: dsl.ini file[0m



## Sections Index 

1. Basic query structure
2. Full-text searching
3. Field searching
4. Searching for researchers
5. Returning results 
6. Aggregations

## 1. Basic query structure

DSL queries consist of two required components: a `search` phrase that
indicates the scientific records to be searched, and one or
more `return` phrases which specify the contents and structure of the
desired results.

The simplest valid DSL query is of the form `search <source>|return <result>`:

In [2]:
%%dsldf 
search grants return  grants limit 5

Returned Grants: 5 (total = 5514056)
[2mTime: 0.61s[0m


Unnamed: 0,funders,title,start_year,title_language,original_title,project_num,funding_org_name,language,start_date,id,active_year,end_date
0,"[{'id': 'grid.420488.2', 'city_name': 'The Hag...",Sensing alarm responses of ungulate herds to p...,2021,en,Sensing alarm responses of ungulate herds to p...,RAAK.PRO02.048,Dutch Research Council,en,2021-12-27,grant.6946936,[2021],
1,"[{'id': 'grid.270680.b', 'city_name': 'Brussel...",Functional analysis of ribosome heterogeneity ...,2021,en,Functional analysis of ribosome heterogeneity ...,890218,European Commission,en,2021-12-01,grant.9064785,"[2021, 2022, 2023]",2023-11-30
2,"[{'id': 'grid.484521.e', 'state_name': 'New Br...",APPROACH to Enriching the Real World Evidence ...,2021,en,APPROACH to Enriching the Real World Evidence ...,2018-HRSI-1548,New Brunswick Health Research Foundation,en,2021-11-30,grant.8690978,[2021],
3,"[{'id': 'grid.270680.b', 'city_name': 'Brussel...",Knowledge Transfer in Global Gender Programmes...,2021,en,Knowledge Transfer in Global Gender Programmes...,894029,European Commission,en,2021-10-01,grant.9064813,"[2021, 2022, 2023, 2024]",2024-09-30
4,"[{'id': 'grid.424470.1', 'city_name': 'Brussel...",Molecular mechanism of DNA double strand break...,2021,en,Mécanismes moléculaires de la formation et la ...,1301720F,Fund for Scientific Research,en,2021-10-01,grant.8950252,[2021],


### `search source`

A query must begin with the word `search` followed by a `source` name, i.e. the name of a type of scientific `record`, such as `grants` or `publications`.

**What are the sources available?** See the [data sources](https://docs.dimensions.ai/dsl/data-sources.html) section of the documentation. 

Alternatively, we can use the 'schema' API ([describe](https://docs.dimensions.ai/dsl/data-sources.html#metadata-api)) to return this information programmatically:

In [3]:
dsl.query("describe schema")

<dimcli.DslDataset object #4399011200. Dict keys: 'sources', 'entities'>

A more useful query might also make use of the optional `for` and
`where` phrases to limit the set of records returned.

In [4]:
%%dsldf 
search grants  for "lung cancer" 
    where active_year=2000 
return  grants  limit 5

Returned Grants: 5 (total = 1745)
[2mTime: 0.50s[0m


Unnamed: 0,funders,title,end_date,start_year,title_language,original_title,project_num,funding_org_name,language,start_date,id,active_year
0,"[{'id': 'grid.279885.9', 'state_name': 'Maryla...",ROLE OF CD44 ISOFORMS IN ENDOTHELIAL CELL DAMAGE,2002-01-01,2000,en,ROLE OF CD44 ISOFORMS IN ENDOTHELIAL CELL DAMAGE,F32HL010455,National Heart Lung and Blood Institute,en,2000-12-31,grant.2386513,"[2000, 2001, 2002]"
1,"[{'id': 'grid.279885.9', 'state_name': 'Maryla...","ESTROGEN, ANGIOGENESIS AND ENDOTHELIAL PROGENI...",2004-11-30,2000,en,"ESTROGEN, ANGIOGENESIS AND ENDOTHELIAL PROGENI...",R01HL063695,National Heart Lung and Blood Institute,en,2000-12-18,grant.2537116,"[2000, 2001, 2002, 2003, 2004]"
2,"[{'id': 'grid.279885.9', 'state_name': 'Maryla...",GENETIC ANALYSIS OF EPHRIN-EPH SIGNALING IN AN...,2007-11-30,2000,en,GENETIC ANALYSIS OF EPHRIN-EPH SIGNALING IN AN...,R01HL066221,National Heart Lung and Blood Institute,en,2000-12-18,grant.2537801,"[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007]"
3,"[{'id': 'grid.279885.9', 'state_name': 'Maryla...",Synthetic Heparan Sulfate: Probing Biosynthesi...,2017-12-31,2000,en,Synthetic Heparan Sulfate: Probing Biosynthesi...,R01HL062244,National Heart Lung and Blood Institute,en,2000-12-15,grant.2536777,"[2000, 2001, 2002, 2003, 2004, 2005, 2006, 200..."
4,"[{'id': 'grid.419213.c', 'state_name': 'New Je...",SmokeLess States Program - Implementation,2001-02-28,2000,en,SmokeLess States Program - Implementation,41067,Robert Wood Johnson Foundation,en,2000-12-01,grant.8616620,"[2000, 2001]"


### `return` result (source or facet)

The most basic `return` phrase consists of the keyword `return` followed
by the name of a `record` or `facet` to be returned. 

This must be the
name of the `source` used in the `search` phrase, or the name of a
`facet` of that source.

In [5]:
%%dsldf
search grants for "laryngectomy" 
return grants limit 5

Returned Grants: 5 (total = 115)
[2mTime: 0.50s[0m


Unnamed: 0,start_date,language,id,original_title,title_language,title,active_year,start_year,funding_org_name,end_date,project_num,funders
0,2020-04-01,ja,grant.9201764,喉頭全摘出者の家族の術後生活への移行を促進する外来での生活支援プログラムの開発,ja,Development of an outpatient life support prog...,"[2020, 2021, 2022, 2023, 2024]",2020,Japan Society for the Promotion of Science,2024-03-31,20K10777,"[{'id': 'grid.54432.34', 'types': ['Nonprofit'..."
1,2019-09-29,en,grant.8674095,UKRI CDT in SLT- Continuous End-to-End Streami...,en,UKRI CDT in SLT- Continuous End-to-End Streami...,"[2019, 2020, 2021, 2022, 2023]",2019,Engineering and Physical Sciences Research Cou...,2023-09-28,2268211,"[{'id': 'grid.421091.f', 'types': ['Government..."
2,2019-08-15,en,grant.8554260,Wearable silent speech technology to enhance i...,en,Wearable silent speech technology to enhance i...,"[2019, 2020, 2021, 2022, 2023, 2024]",2019,National Institute on Deafness and Other Commu...,2024-07-31,R01DC016621,"[{'id': 'grid.214431.1', 'types': ['Facility']..."
3,2019-04-01,ja,grant.8428997,Construction of a nursing system leading to im...,en,Construction of a nursing system leading to im...,"[2019, 2020, 2021, 2022, 2023]",2019,Japan Society for the Promotion of Science,2023-03-31,19H03937,"[{'id': 'grid.54432.34', 'types': ['Nonprofit'..."
4,2019-04-01,ja,grant.8422934,喉頭がん、下咽頭がんにより喉頭摘出術を受けた患者に対する嗅覚向上プログラムの開発,ja,Development of an olfactory improvement progra...,"[2019, 2020, 2021]",2019,Japan Society for the Promotion of Science,2021-03-31,19K19574,"[{'id': 'grid.54432.34', 'types': ['Nonprofit'..."


Eg let's see what are the *facets* available for the *grants* source:

In [6]:
fields = dsl.query("describe schema")['sources']['grants']['fields']
[x for x in fields if fields[x]['is_facet']]

['category_hrcs_rac',
 'active_year',
 'funding_org_acronym',
 'category_rcdc',
 'funder_countries',
 'funders',
 'research_org_state_codes',
 'start_year',
 'research_orgs',
 'research_org_countries',
 'funding_org_name',
 'researchers',
 'language',
 'category_icrp_cso',
 'category_sdg',
 'category_uoa',
 'category_for',
 'language_title',
 'category_bra',
 'category_hrcs_hc',
 'category_hra',
 'research_org_cities',
 'funding_currency',
 'category_icrp_ct',
 'funding_org_city']

## 2. Full-text Searching

Full-text search or keyword search finds all instances of a term
(keyword) in a document, or group of documents. 

Full text search works
by using search indexes, which can be targeting specific sections of a
document e.g. its $abstract$, $authors$, $full text$ etc...

In [7]:
%%dsldf 
search publications 
    in full_data for "moon landing" 
return publications limit 5

Returned Publications: 5 (total = 174854)
[2mTime: 1.33s[0m


Unnamed: 0,title,pages,author_affiliations,year,id,type
0,Bringing My Wife and Children to the Field,185-206,"[[{'first_name': 'Leberecht', 'last_name': 'Fu...",2020,pub.1128623295,chapter
1,UJ to Frederick and Maud Clapp,3-250,,2020,pub.1130258832,chapter
2,"1. Into the Woods (Via Cuma 320, Bacoli)",14-30,"[[{'first_name': 'Alessandro', 'last_name': 'B...",2020,pub.1127643502,chapter
3,1898–1899 Movies and Entrepreneurs,66-90,"[[{'first_name': 'Patrick', 'last_name': 'Loug...",2020,pub.1126778139,chapter
4,2. Grand Steerage,51-81,"[[{'first_name': 'Barry', 'last_name': 'Naught...",2020,pub.1129002686,chapter


### 2.1 `in [search index]`

This optional phrase consists of the particle `in` followed by a term indicating a `search index`, specifying for example whether the search
is limited to full text, title and abstract only, or title only. 

In [8]:
%%dsldf 
search grants 
    in title_abstract_only for "something" 
return grants limit 5

Returned Grants: 5 (total = 10001)
[2mTime: 0.53s[0m


Unnamed: 0,start_year,funding_org_name,end_date,active_year,language,start_date,funders,title_language,id,original_title,title,project_num
0,2021,European Commission,2023-08-31,"[2021, 2022, 2023]",en,2021-09-01,"[{'id': 'grid.270680.b', 'name': 'European Com...",en,grant.9064570,Deciphering fundamental constraints on pathoge...,Deciphering fundamental constraints on pathoge...,890630
1,2021,European Research Council,2025-12-31,"[2021, 2022, 2023, 2024, 2025]",en,2021-01-01,"[{'id': 'grid.452896.4', 'name': 'European Res...",en,grant.8964099,Overcoming stellar activity in radial velocity...,Overcoming stellar activity in radial velocity...,865624
2,2021,Swedish Research Council for Health Working Li...,2022-12-31,"[2021, 2022]",en,2021-01-01,"[{'id': 'grid.434365.3', 'name': 'Swedish Rese...",en,grant.9242822,Everyday Violence: Understanding and preventin...,Everyday Violence: Understanding and preventin...,2020-01152_Forte
3,2021,European Commission,2022-12-31,"[2021, 2022]",en,2021-01-01,"[{'id': 'grid.270680.b', 'name': 'European Com...",en,grant.9065705,Political Dynamics of Slow-Onset Disasters: Co...,Political Dynamics of Slow-Onset Disasters: Co...,897656
4,2020,Directorate for Computer & Information Science...,2024-09-30,"[2020, 2021, 2022, 2023, 2024]",en,2020-10-01,"[{'id': 'grid.457785.c', 'name': 'Directorate ...",en,grant.9046367,SaTC: CORE: Medium: Collaborative: Hardening O...,SaTC: CORE: Medium: Collaborative: Hardening O...,1954521


Eg let's see what are the *search fields* available for the *grants* source:

In [9]:
dsl.query("describe schema")['sources']['grants']['search_fields']

['title_only', 'investigators', 'title_abstract_only', 'full_data', 'concepts']

In [10]:
%%dsldf 
search grants 
    in full_data for "graphene AND computer AND iron" 
return grants limit 5

Returned Grants: 5 (total = 10)
[2mTime: 0.51s[0m


Unnamed: 0,start_year,funding_org_name,end_date,active_year,language,start_date,funders,title_language,id,original_title,title,project_num
0,2019,Russian Science Foundation,2021-12-31,"[2019, 2020, 2021]",en,2019-01-01,"[{'id': 'grid.454869.2', 'name': 'Russian Scie...",en,grant.8413990,Weyl and Dirac semimetals and beyond - predict...,Weyl and Dirac semimetals and beyond - predict...,19-43-04129
1,2018,Russian Foundation for Basic Research,2018-12-31,[2018],ru,2018-01-01,"[{'id': 'grid.452899.b', 'name': 'Russian Foun...",ru,grant.8731867,Проект организации 18-ой Международной конфере...,Project of the organization of the 18th Intern...,18-02-20097
2,2016,Ministry of Science and Higher Education,2016-12-31,[2016],pl,2016-02-22,"[{'id': 'grid.425823.a', 'name': 'Ministry of ...",pl,grant.7397800,Dotacja podmiotowa na utrzymanie potencjału ba...,Subject subsidy for maintaining the research p...,4491/E-370/S/2016
3,2015,Ministry of Science and Higher Education,2015-12-31,[2015],pl,2015-02-19,"[{'id': 'grid.425823.a', 'name': 'Ministry of ...",pl,grant.7397795,Dotacja podmiotowa na utrzymanie potencjału ba...,Subject subsidy for maintaining the research p...,4491/E-370/S/2015
4,2014,Ministry of Science and Higher Education,2014-12-31,[2014],pl,2014-04-09,"[{'id': 'grid.425823.a', 'name': 'Ministry of ...",pl,grant.7397490,Dotacja celowa na prowadzenie w 2014 przez Wyd...,Intentional grant for conducting in 2014 the F...,4491/E-370/M/2014


Special search indexes for persons names permit to perform full text
searches on publications `authors` or grants `investigators`. Please see the
*Researchers Search* section below for more information
on how searches work in this case.

In [11]:
%dsldf search publications in authors for "\"Jennifer A Doudna\"" return publications limit 5

Returned Publications: 5 (total = 332)
[2mTime: 0.69s[0m


Unnamed: 0,id,title,volume,pages,type,year,author_affiliations,journal.id,journal.title,issue
0,pub.1129492680,Engineering of Monosized Lipid-Coated Mesoporo...,114.0,358-368,article,2020,"[[{'first_name': 'Achraf', 'last_name': 'Noure...",jour.1034525,Acta Biomaterialia,
1,pub.1130231355,Site-Specific Bioconjugation through Enzyme-Ca...,,,article,2020,"[[{'first_name': 'Marco J.', 'last_name': 'Lob...",jour.1051962,ACS Central Science,
2,pub.1130116638,Chemistry of Class 1 CRISPR-Cas effectors: bin...,,jbc.rev120.007034,article,2020,"[[{'first_name': 'Tina Y.', 'last_name': 'Liu'...",jour.1077138,Journal of Biological Chemistry,
3,pub.1129110288,A scoutRNA Is Required for Some Type V CRISPR-...,79.0,416-424.e5,article,2020,"[[{'first_name': 'Lucas B.', 'last_name': 'Har...",jour.1117828,Molecular Cell,3.0
4,pub.1129776449,DNA capture by a CRISPR-Cas9–guided adenine ba...,369.0,566-571,article,2020,"[[{'first_name': 'Audrone', 'last_name': 'Lapi...",jour.1346339,Science,6503.0


### 2.2 `for "search term"`

This optional phrase consists of the keyword `for` followed by a
`search term` `string`, enclosed in double quotes (`"`).

Strings in double quotes can contain nested quotes escaped by a
backslash `\`. This will ensure that the string in nested double quotes
is searched for as if it was a single phrase, not multiple words.

An example of a phrase: `"\"Machine Learning\""` : results must contain
`Machine Learning` as a phrase.

In [12]:
%dsldf search publications for "\"Machine Learning\"" return publications limit 5

Returned Publications: 5 (total = 1217944)
[2mTime: 1.88s[0m


Unnamed: 0,type,pages,author_affiliations,id,year,title,volume,issue,journal.id,journal.title
0,chapter,243-248,"[[{'first_name': 'Eetu', 'last_name': 'Heikkil...",pub.1124666091,2020,Towards maritime traffic coordination in the e...,,,,
1,chapter,39-60,"[[{'first_name': 'Anya', 'last_name': 'Kamenet...",pub.1130268195,2020,2. DIY U,,,,
2,article,1726672,"[[{'first_name': 'Sytske', 'last_name': 'Wiege...",pub.1125710665,2020,Recognizing hotspots in Brief Eclectic Psychot...,11.0,1.0,jour.1045059,European Journal of Psychotraumatology
3,article,41-54,"[[{'first_name': 'Baze University Abuja', 'las...",pub.1126735888,2020,Capacitated vehicle routing problem with colum...,3.0,1.0,jour.1365688,Open Journal of Discrete Applied Mathematics
4,chapter,219-250,"[[{'first_name': 'Jan', 'last_name': 'Goldenst...",pub.1124034443,2020,Die Erfassung und Messung von Bedeutungsstrukt...,,,,


Example of multiple keywords: `"Machine Learning"` : this searches for
keywords independently.

In [13]:
%dsldf search publications for "Machine Learning" return publications limit 5

Returned Publications: 5 (total = 2524786)
[2mTime: 1.53s[0m


Unnamed: 0,type,pages,id,year,title,author_affiliations
0,chapter,65-368,pub.1127396158,2020,Documents,
1,chapter,114-125,pub.1127466829,2020,The influence of ecological constraints on the...,"[[{'first_name': 'André', 'last_name': 'Boyer'..."
2,chapter,84-118,pub.1124947017,2020,4. Visualizing the Division of Labor: William ...,"[[{'first_name': 'John', 'last_name': 'Barrell..."
3,chapter,217-276,pub.1126774980,2020,4 Hinduism,"[[{'first_name': 'Laurie L.', 'last_name': 'Pa..."
4,chapter,44-58,pub.1125150382,2020,3. Rural-Urban Divides and Digital Literacy in...,"[[{'first_name': 'Daariimaa', 'last_name': 'Ma..."


Note: Special characters, such as any of `^ " : ~ \ [ ] { } ( ) ! | & +` must be escaped by a backslash `\`. Also, please note escaping rules in
[Python](http://python-reference.readthedocs.io/en/latest/docs/str/escapes.html) (or other languages). For example, when writing a query with escaped quotes, such as `search publications for "\"phrase 1\" AND \"phrase 2\""`, in Python, it is necessary to escape the backslashes as well, so it
would look like: `'search publications for "\\"phrase 1\\" AND \\"phrase 2\\""'`. 

See the [official docs](https://docs.dimensions.ai/dsl/language.html#for-search-term) for more details.

### 2.3 Boolean Operators

Search term can consist of multiple keywords or phrases connected using
boolean logic operators, e.g. `AND`, `OR` and `NOT`.

In [14]:
%dsldf search publications for "(dose AND concentration)" return publications limit 5

Returned Publications: 5 (total = 5370106)
[2mTime: 1.00s[0m


Unnamed: 0,id,title,volume,issue,pages,type,year,author_affiliations,journal.id,journal.title
0,pub.1124948447,Translational studies of estradiol and progest...,11.0,1.0,1723857,article,2020,"[[{'first_name': 'Antonia V', 'last_name': 'Se...",jour.1045059,European Journal of Psychotraumatology
1,pub.1128226413,Interrupting traumatic memories in the emergen...,11.0,1.0,1750170,article,2020,"[[{'first_name': 'Sara A.', 'last_name': 'Free...",jour.1045059,European Journal of Psychotraumatology
2,pub.1128351891,7. Wetland Animal Ecology,,,242-284,chapter,2020,"[[{'first_name': 'Darold P.', 'last_name': 'Ba...",,
3,pub.1130114635,Correlation Of Calcium Levels With The Strengh...,,,174-181,chapter,2020,"[[{'first_name': 'Joserizal', 'last_name': 'Se...",,
4,pub.1125801745,7. Conservation of the Amsterdam Sunflowers: F...,,,175-206,chapter,2020,"[[{'first_name': 'Ella', 'last_name': 'Hendrik...",,


When specifying Boolean operators with keywords such as `AND`, `OR` and
`NOT`, the keywords must appear in all uppercase. 

The operators available are shown in the table below.
.

| Boolean Operator | Alternative Symbol | Description                                                                                                                                                                 |
|------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `AND`            | `&&`               | Requires both terms on either side of the Boolean operator to be present for a match.                                                                                       |
| `NOT`            | `!`                | Requires that the following term not be present.                                                                                                                            |
| `OR`             | `||`               | Requires that either term (or both terms) be present for a match.                                                                                                           |
|                  | `+`                | Requires that the following term be present.                                                                                                                                |
|                  | `-`                | Prohibits the following term (that is, matches on fields or documents that do not include that term). The `-` operator is functionally similar to the Boolean operator `!`. |

In [15]:
%dsldf search publications for "(dose OR concentration) AND (-malaria +africa)" return publications limit 5

Returned Publications: 5 (total = 1402625)
[2mTime: 0.88s[0m


Unnamed: 0,type,pages,author_affiliations,id,year,title
0,chapter,634-688,"[[{'first_name': 'Antonio', 'last_name': 'Esta...",pub.1124248682,2020,17. Institutions for Infrastructure in Develop...
1,chapter,285-304,"[[{'first_name': 'Eliot A.', 'last_name': 'Bre...",pub.1124946791,2020,16. The Neuroethology of Birdsong
2,chapter,1-8,"[[{'first_name': 'John S.', 'last_name': 'Hend...",pub.1125788851,2020,"Introduction: Murra, Materialism, Anthropology..."
3,chapter,129-143,"[[{'first_name': 'Campbell', 'last_name': 'Cra...",pub.1124248733,2020,8. India in the Early Nuclear Age
4,chapter,100-114,"[[{'first_name': 'Isabelle', 'last_name': 'Roh...",pub.1128661435,2020,4. The Franco Regime and the Jews of North Afr...


The combination of keywords and boolean operators allow to construct rather sophisticated queries. For example, here's a real-world query used to extract publications related to COVID-19. 

In [16]:
q_inner = """ "2019-nCoV" OR "COVID-19" OR "SARS-CoV-2" OR "HCoV-2019" OR "hcov" OR "NCOVID-19" OR  
    "severe acute respiratory syndrome coronavirus 2" OR "severe acute respiratory syndrome corona virus 2" 
    OR (("coronavirus"  OR "corona virus") AND (Wuhan OR China OR novel)) """

# tip: dsl_escape is a dimcli utility function for escaping special characters 
q_outer = f"""search publications in full_data for "{dsl_escape(q_inner)}" return publications"""
print(q_outer)

dsl.query(q_outer)

search publications in full_data for " \"2019-nCoV\" OR \"COVID-19\" OR \"SARS-CoV-2\" OR \"HCoV-2019\" OR \"hcov\" OR \"NCOVID-19\" OR  
    \"severe acute respiratory syndrome coronavirus 2\" OR \"severe acute respiratory syndrome corona virus 2\" 
    OR ((\"coronavirus\"  OR \"corona virus\") AND (Wuhan OR China OR novel)) " return publications
Returned Publications: 20 (total = 193181)
[2mTime: 6.47s[0m


<dimcli.DslDataset object #4662481968. Records: 20/193181>

### 2.4 Wildcard Searches

The DSL supports single and multiple character wildcard searches within
single terms. Wildcard characters can be applied to single terms, but
not to search phrases.

In [17]:
%dsldf search publications in title_only for "ital? malaria" return publications limit 5

Returned Publications: 5 (total = 144)
[2mTime: 0.88s[0m


Unnamed: 0,title,pages,author_affiliations,year,issue,id,type,volume,journal.id,journal.title
0,Non-imported malaria in Italy: paradigmatic ap...,857,"[[{'first_name': 'Daniela', 'last_name': 'Bocc...",2020,1.0,pub.1128245696,article,20.0,jour.1024954,BMC Public Health
1,A Cluster of Cryptic Plasmodium falciparum Mal...,,"[[{'first_name': 'Gaetano', 'last_name': 'Brin...",2020,,pub.1130290794,article,,jour.1023805,Vector-Borne and Zoonotic Diseases
2,"Seasons in Italy: Northern European travelers,...",1-20,"[[{'first_name': 'Benjamin', 'last_name': 'Rei...",2020,,pub.1124231018,article,,jour.1141817,Journal of Tourism and Cultural Change
3,Updated guidelines for malaria prophylaxis in ...,101544,"[[{'first_name': 'Guido', 'last_name': 'Caller...",2020,,pub.1123222257,article,33.0,jour.1034401,Travel Medicine and Infectious Disease
4,Clinical management of imported malaria in Ita...,28-33,"[[{'first_name': 'Luciana', 'last_name': 'Lepo...",2020,1.0,pub.1125332077,article,43.0,jour.1089291,Microbiologica


In [18]:
%dsldf search publications in title_only for "it* malaria" return publications limit 5

Returned Publications: 5 (total = 1541)
[2mTime: 0.51s[0m


Unnamed: 0,type,volume,pages,author_affiliations,id,year,issue,title,journal.id,journal.title
0,article,20.0,857,"[[{'first_name': 'Daniela', 'last_name': 'Bocc...",pub.1128245696,2020,1.0,Non-imported malaria in Italy: paradigmatic ap...,jour.1024954,BMC Public Health
1,article,19.0,24,"[[{'first_name': 'Monica P.', 'last_name': 'Sh...",pub.1124106064,2020,1.0,The effectiveness of older insecticide-treated...,jour.1030597,Malaria Journal
2,article,19.0,299,"[[{'first_name': 'Lemu', 'last_name': 'Golassa...",pub.1130290155,2020,1.0,The biology of unconventional invasion of Duff...,jour.1030597,Malaria Journal
3,article,13.0,348,"[[{'first_name': 'Richard', 'last_name': 'Echo...",pub.1129556766,2020,1.0,High insecticide resistances levels in Anophel...,jour.1039457,BMC Research Notes
4,article,,104530,"[[{'first_name': 'Kirti', 'last_name': 'Upmany...",pub.1130570962,2020,,Allelic variation of msp-3α gene in Plasmodium...,jour.1027256,Infection Genetics and Evolution


| Wildcard Search Type                                             | Special Character | Example                                                                                                                                                                                                                         |
|------------------------------------------------------------------|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Single character - matches a single character                    | `?`               | The search string `te?t` would match both `test` and `text`.                                                                                                                                                                    |
| Multiple characters - matches zero or more sequential characters | `*`               | The wildcard search: `tes*` would match `test`, `testing`, and `tester`. You can also use wildcard characters in the middle of a term. For example: `te*t` would match `test` and `text`. `*est` would match `pest` and `test`. |

### 2.5 Proximity Searches

A proximity search looks for terms that are within a specific distance
from one another.

To perform a proximity search, add the tilde character `~` and a numeric
value to the end of a search phrase. For example, to search for a
`formal` and `model` within 10 words of each other in a document, use
the search:

In [19]:
%dsldf search publications for "\"formal model\"~10" return publications limit 5

Returned Publications: 5 (total = 483576)
[2mTime: 2.48s[0m


Unnamed: 0,title,pages,author_affiliations,year,id,type,issue,volume,journal.id,journal.title
0,1. The Political Economy of Environmental Just...,1-20,"[[{'first_name': 'H. Spencer', 'last_name': 'B...",2020,pub.1130374367,chapter,,,,
1,15. Organizational Governance,513-555,"[[{'first_name': 'Nicolai J.', 'last_name': 'F...",2020,pub.1130267294,chapter,,,,
2,2. Clientelistic Politics and Economic Develop...,84-102,"[[{'first_name': 'Pranab', 'last_name': 'Bardh...",2020,pub.1124248667,chapter,,,,
3,4. The Classification of Organizational Forms,84-110,"[[{'first_name': 'Martin', 'last_name': 'Ruef'...",2020,pub.1130269657,chapter,,,,
4,Building cooperative learning to address alcoh...,1726722,"[[{'first_name': 'Oladapo', 'last_name': 'Olad...",2020,pub.1125320181,article,1.0,13.0,jour.1041075,Global Health Action


In [20]:
%dsldf search publications for "\"digital humanities\"~5  +ontology" return publications limit 5

Returned Publications: 5 (total = 8109)
[2mTime: 1.36s[0m


Unnamed: 0,id,title,volume,issue,pages,type,year,author_affiliations,journal.id,journal.title
0,pub.1128167997,The gains of reduction in translational proces...,6.0,1.0,109,article,2020,"[[{'first_name': 'Anita', 'last_name': 'Wohlma...",jour.1136613,Palgrave Communications
1,pub.1127423858,Citizen science in the social sciences and hum...,6.0,1.0,89,article,2020,"[[{'first_name': 'Loreta', 'last_name': 'Taugi...",jour.1136613,Palgrave Communications
2,pub.1129593819,A methodology for multilayer networks analysis...,5.0,1.0,41,article,2020,"[[{'first_name': 'Maria', 'last_name': 'Malek'...",jour.1158525,Applied Network Science
3,pub.1127978306,Atlante dei siti fortificati della provincia d...,,,471-478,proceeding,2020,"[[{'first_name': 'Maurizio', 'last_name': 'Tos...",,
4,pub.1122198573,Semantic-based privacy settings negotiation an...,111.0,,879-898,article,2020,"[[{'first_name': 'Odnan Ref', 'last_name': 'Sa...",jour.1125399,Future Generation Computer Systems


The distance referred to here is the number of term movements needed to match the specified phrase.  
In the example above, if `formal` and `model` were 10 spaces apart in a
field, but `formal` appeared before `model`, more than 10 term movements
would be required to move the terms together and position `formal` to
the right of `model` with a space in between.

## 3. Field Searching

Field searching allows to use a specific `field` of a `source` as a
query filter. For example, this can be a
[Literal](supported-types.ipynb) field such as the $type$ of a
publication, its $date$, $mesh terms$, etc.. Or it can be an
[entity](data-entities.ipynb) field, such as the $journal title$ for a
publication, the $country name$ of its author affiliations, etc..

**What are the fields available for each source?** See the [data sources](https://docs.dimensions.ai/dsl/data-sources.html) section of the documentation. 

Alternatively, we can use the 'schema' API ([describe](https://docs.dimensions.ai/dsl/data-sources.html#metadata-api)) to return this information programmatically: 

In [21]:
%dsldocs publications  

Unnamed: 0,sources,field,type,description,is_filter,is_entity,is_facet
0,publications,altmetric,float,Altmetric attention score.,True,False,False
1,publications,altmetric_id,integer,AltMetric Publication ID,True,False,False
2,publications,authors,json,Ordered list of authors names and their affili...,True,False,False
3,publications,book_doi,string,The DOI of the book a chapter belongs to (note...,True,False,False
4,publications,book_series_title,string,"The title of the book series book, belong to.",False,False,False
5,publications,book_title,string,The title of the book a chapter belongs to (no...,False,False,False
6,publications,category_bra,categories,`Broad Research Areas <https://dimensions.fres...,True,True,True
7,publications,category_for,categories,`ANZSRC Fields of Research classification <htt...,True,True,True
8,publications,category_hra,categories,`Health Research Areas <https://dimensions.fre...,True,True,True
9,publications,category_hrcs_hc,categories,`HRCS - Health Categories <https://dimensions....,True,True,True


### 3.1 `where`

This optional phrase consists of the keyword `where` followed by a
`filters` phrase consisting of DSL filter expressions, as described
below.

In [22]:
%dsldf search publications where type = "book" return publications limit 5

Returned Publications: 5 (total = 296478)
[2mTime: 0.57s[0m


Unnamed: 0,id,title,type,year,volume
0,pub.1125300609,Duoethnography in English Language Teaching,book,2020,
1,pub.1108455576,The Indo-Aryans of Ancient South Asia,book,2020,
2,pub.1125300607,Sociolinguistic Perspectives on Migration Control,book,2020,
3,pub.1108473781,Die Passion Christi in Literatur und Kunst des...,book,2020,
4,pub.1129458015,Neuromodulation for Facial Pain,book,2020,35.0


If a `for` phrase is also used in a filtered query, the
system will first apply the filters, and then search the resulting
restricted set of documents for the `search term`.

In [23]:
%dsldf search publications for "malaria" where type = "book" return publications limit 5

Returned Publications: 5 (total = 12497)
[2mTime: 0.48s[0m


Unnamed: 0,type,id,year,title
0,book,pub.1130620714,2020,Nano-Enabled Medical Applications
1,book,pub.1130505886,2020,"Human Ecology, Human Economy"
2,book,pub.1130318304,2020,Pharmaceutical Biocatalysis
3,book,pub.1129886893,2020,Wild Plants
4,book,pub.1130227719,2020,Medicine in the Twentieth Century


### 3.2 `in`

For convenience, the DSL also supports shorthand notation for filters
where a particular field should be restricted to a specified range or
list of values (although the same logic may be expressed using complex
filters as shown below).

Syntax: a **range filter** consists of the `field` name, the keyword `in`, and a
range of values enclosed in square brackets (`[]`), where the range
consists of a `low` value, colon `:`, and a `high` value.

In [24]:
%%dsldf 
search grants 
    for "malaria" 
    where start_year in [ 2010 : 2015 ] 
return grants limit 5

Returned Grants: 5 (total = 3134)
[2mTime: 0.52s[0m


Unnamed: 0,funders,title,end_date,start_year,title_language,original_title,project_num,funding_org_name,language,start_date,id,active_year
0,"[{'id': 'grid.419681.3', 'state_name': 'Maryla...",Bloodborne tropical pathogen detection using m...,2017-11-30,2015,en,Bloodborne tropical pathogen detection using m...,R21AI120981,National Institute of Allergy and Infectious D...,en,2015-12-28,grant.4729738,"[2015, 2016, 2017]"
1,"[{'id': 'grid.419681.3', 'state_name': 'Maryla...",Field-deployable Assay for Differential Diagno...,2019-02-28,2015,en,Field-deployable Assay for Differential Diagno...,R21AI120973,National Institute of Allergy and Infectious D...,en,2015-12-24,grant.4729736,"[2015, 2016, 2017, 2018, 2019]"
2,"[{'id': 'grid.419681.3', 'state_name': 'Maryla...",T cell driven antigen discovery for vaccine ca...,2018-11-30,2015,en,T cell driven antigen discovery for vaccine ca...,R21AI109439,National Institute of Allergy and Infectious D...,en,2015-12-21,grant.4729699,"[2015, 2016, 2017, 2018]"
3,"[{'id': 'grid.452969.5', 'city_name': 'Hanover...",Senior Fellowship for Dr. Eduardo Samo Gudo: E...,2018-12-18,2015,en,Senior Fellowship for Dr. Eduardo Samo Gudo: E...,91488,Volkswagen Foundation,en,2015-12-18,grant.4854433,"[2015, 2016, 2017, 2018]"
4,"[{'id': 'grid.482914.2', 'state_name': 'Distri...","Biology, Ecology & Management of Emerging Dise...",2019-09-30,2015,en,"Biology, Ecology & Management of Emerging Dise...",,National Institute of Food and Agriculture,en,2015-12-10,grant.8821176,"[2015, 2016, 2017, 2018, 2019]"


Syntax: a **list filter** consists of the `field` name, the keyword `in`, and a list
of one or more `value` s enclosed in square brackets (`[]`), where
values are separated by commas (`,`):

In [25]:
%%dsldf 
search grants 
    for "malaria" 
    where research_org_name in [ "UC Berkeley", "UC Davis", "UCLA"  ] 
return grants limit 5

Returned Grants: 0
[2mTime: 0.46s[0m
Field 'research_org_name' is deprecated in favor of research_orgs. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


### 3.3 `count` - filter function

The filter function `count` is supported on some fields in
[publications](publications.ipynb) (e.g. `researchers` and
`research_orgs`).

Use of this filter is shown on the example below:

In [26]:
%%dsldf 
search publications 
    for "malaria" 
    where count(research_orgs) > 5 
return research_orgs limit 5

Returned Research_orgs: 5
[2mTime: 2.59s[0m


Unnamed: 0,id,count,types,name,latitude,longitude,linkout,city_name,country_name,state_name,acronym
0,grid.4991.5,1571,[Education],University of Oxford,51.753437,-1.25401,[http://www.ox.ac.uk/],Oxford,United Kingdom,Oxfordshire,
1,grid.8991.9,1473,[Education],London School of Hygiene & Tropical Medicine,51.5209,-0.1307,[http://www.lshtm.ac.uk/],London,United Kingdom,Camden,LSHTM
2,grid.38142.3c,1095,[Education],Harvard University,42.377052,-71.11665,[http://www.harvard.edu/],Cambridge,United States,Massachusetts,
3,grid.21107.35,867,[Education],Johns Hopkins University,39.328888,-76.62028,[https://www.jhu.edu/],Baltimore,United States,Maryland,JHU
4,grid.7445.2,803,[Education],Imperial College London,51.4986,-0.175478,[http://www.imperial.ac.uk/],London,United Kingdom,Westminster,


Number of publications with more than 50 researcher.

In [27]:
%%dsldf 
search publications 
    for "malaria" 
    where count(researchers) > 50 
return publications limit 5

Returned Publications: 5 (total = 241)
[2mTime: 0.78s[0m


Unnamed: 0,id,title,volume,issue,pages,type,year,author_affiliations,journal.id,journal.title
0,pub.1130215447,The global distribution of lymphatic filariasi...,8.0,9.0,e1186-e1194,article,2020,"[[{'first_name': 'Aniruddha', 'last_name': 'De...",jour.1048786,The Lancet Global Health
1,pub.1126915860,Health sector spending and spending on HIV/AID...,396.0,10252.0,693-724,article,2020,[[{'first_name': 'Global Burden of Disease Hea...,jour.1077219,The Lancet
2,pub.1130211369,Mapping geographical inequalities in access to...,8.0,9.0,e1162-e1185,article,2020,"[[{'first_name': 'Aniruddha', 'last_name': 'De...",jour.1048786,The Lancet Global Health
3,pub.1129557093,Mapping geographical inequalities in oral rehy...,8.0,8.0,e1038-e1060,article,2020,[[{'first_name': 'Local Burden of Disease Diar...,jour.1048786,The Lancet Global Health
4,pub.1130303438,Use of hydroxychloroquine in hospitalised COVI...,,,,article,2020,"[[{'first_name': 'Augusto', 'last_name': 'Di C...",jour.1100229,European Journal of Internal Medicine


Number of publications with more than one researcher.

In [28]:
%%dsldf 
search publications
where count(researchers) > 1
return funders limit 5

Returned Funders: 5
[2mTime: 1.89s[0m


Unnamed: 0,id,count,city_name,types,name,country_name,linkout,latitude,acronym,longitude,state_name
0,grid.419696.5,1870091,Beijing,[Government],National Natural Science Foundation of China,China,[http://www.nsfc.gov.cn/publish/portal1/],40.005177,NSFC,116.33983,
1,grid.270680.b,674156,Brussels,[Government],European Commission,Belgium,[http://ec.europa.eu/index_en.htm],50.85165,EC,4.36367,
2,grid.424020.0,597126,Beijing,[Government],Ministry of Science and Technology of the Peop...,China,[http://www.most.gov.cn/eng/],39.827835,MOST,116.316284,
3,grid.48336.3a,568639,Rockville,[Government],National Cancer Institute,United States,[http://www.cancer.gov/],39.004326,NCI,-77.10119,Maryland
4,grid.54432.34,542093,Tokyo,[Nonprofit],Japan Society for the Promotion of Science,Japan,[http://www.jsps.go.jp/],35.68716,JSPS,139.74039,


International collaborations: number of publications with more than one author and affiliations located in more than one country.

In [29]:
%%dsldf 
search publications
where count(researchers) > 1
and count(research_org_countries) > 1
return funders limit 5

Returned Funders: 5
[2mTime: 1.09s[0m


Unnamed: 0,id,count,types,name,latitude,longitude,linkout,city_name,country_name,acronym
0,grid.419696.5,452873,[Government],National Natural Science Foundation of China,40.005177,116.33983,[http://www.nsfc.gov.cn/publish/portal1/],Beijing,China,NSFC
1,grid.270680.b,344656,[Government],European Commission,50.85165,4.36367,[http://ec.europa.eu/index_en.htm],Brussels,Belgium,EC
2,grid.424150.6,157949,[Facility],German Research Foundation,50.69934,7.147797,[http://www.dfg.de/en/],Bonn,Germany,DFG
3,grid.424020.0,149276,[Government],Ministry of Science and Technology of the Peop...,39.827835,116.316284,[http://www.most.gov.cn/eng/],Beijing,China,MOST
4,grid.54432.34,136016,[Nonprofit],Japan Society for the Promotion of Science,35.68716,139.74039,[http://www.jsps.go.jp/],Tokyo,Japan,JSPS


Domestic collaborations: number of publications with more than one author and more than one affiliation located in exactly one country.

In [30]:
%%dsldf 
search publications
where count(researchers) > 1
and count(research_org_countries) = 1
return funders limit 5

Returned Funders: 5
[2mTime: 2.58s[0m


Unnamed: 0,id,count,city_name,types,name,country_name,linkout,latitude,acronym,longitude,state_name
0,grid.419696.5,1373160,Beijing,[Government],National Natural Science Foundation of China,China,[http://www.nsfc.gov.cn/publish/portal1/],40.005177,NSFC,116.33983,
1,grid.424020.0,435916,Beijing,[Government],Ministry of Science and Technology of the Peop...,China,[http://www.most.gov.cn/eng/],39.827835,MOST,116.316284,
2,grid.48336.3a,415902,Rockville,[Government],National Cancer Institute,United States,[http://www.cancer.gov/],39.004326,NCI,-77.10119,Maryland
3,grid.54432.34,371463,Tokyo,[Nonprofit],Japan Society for the Promotion of Science,Japan,[http://www.jsps.go.jp/],35.68716,JSPS,139.74039,
4,grid.280785.0,326036,Bethesda,[Facility],National Institute of General Medical Sciences,United States,[http://www.nigms.nih.gov/Pages/default.aspx],38.997833,NIGMS,-77.09938,Maryland


### 3.4 Filter Operators

A simple filter expression consists of a `field` name, an in-/equality
operator `op`, and the desired field `value`. 

The `value` must be a
`string` enclosed in double quotes (`"`) or an integer (e.g. `1234`).

The available operators are:

| `op`           | meaning                                                                                  |
|----------------|------------------------------------------------------------------------------------------|
| `=`            | *is* (or *contains* if the given `field` is multi-value)                                 |
| `!=`           | *is not*                                                                                 |
| `>`            | *is greater than*                                                                        |
| `<`            | *is less than*                                                                           |
| `>=`           | *is greater than or equal to*                                                            |
| `<=`           | *is less than or equal to*                                                               |
| `~`            | *partially matches* (see partial-string-matching below) |
| `is empty`     | *is empty* (see emptiness-filters below)                      |
| `is not empty` | *is not empty* (see emptiness-filters below)                  |

A couple of examples 

In [31]:
%dsldf search datasets where year > 2010 and year < 2012 return datasets limit 5

Returned Datasets: 5 (total = 38764)
[2mTime: 0.53s[0m


Unnamed: 0,keywords,id,authors,year,title
0,"[human populations, single nucleotide polymorp...",105,"[{'name': 'Blaise Li', 'orcid': '0000-0003-308...",2011,India Africa Asia HGDP HapMap frappe K3
1,"[human populations, single nucleotide polymorp...",106,"[{'name': 'Blaise Li', 'orcid': '0000-0003-308...",2011,India Africa Asia HGDP HapMap frappe K4
2,"[human populations, single nucleotide polymorp...",107,"[{'name': 'Blaise Li', 'orcid': '0000-0003-308...",2011,India Africa Asia HGDP HapMap frappe K5
3,"[human populations, single nucleotide polymorp...",108,"[{'name': 'Blaise Li', 'orcid': '0000-0003-308...",2011,India Africa Asia HGDP HapMap frappe K6
4,"[human populations, single nucleotide polymorp...",109,"[{'name': 'Blaise Li', 'orcid': '0000-0003-308...",2011,India Africa Asia HGDP HapMap frappe K7


In [32]:
%dsldf search patents where assignees != "grid.410484.d" return patents limit 5

Returned Patents: 5 (total = 40195054)
[2mTime: 0.66s[0m


Unnamed: 0,id,times_cited,title,assignees,granted_year,assignee_names,year,publication_date,inventor_names,filing_status
0,EP-1409282-B1,0,METHODS FOR OPERATING A MOTOR VEHICLE DRIVEN B...,"[{'id': 'grid.6584.f', 'name': 'Robert Bosch (...",2009,"[Robert Bosch GmbH, BOSCH GMBH ROBERT]",2001,2009-12-09,"[TUMBACK, STEFAN, SCHNELLE, KLAUS-PETER]",Grant
1,EP-0868664-B1,0,MULTI-CYCLE LOOP INJECTION FOR TRACE ANALYSIS ...,"[{'id': 'grid.418190.5', 'name': 'Thermo Fishe...",2009,"[Dionex Corp, DIONEX CORP]",1996,2009-12-09,"[RIVIELLO, JOHN, M., REY, MARIA, A.]",Grant
2,EP-0861808-B1,1,Waste water treatment apparatus,"[{'id': 'grid.471210.1', 'name': 'Kuraray (Jap...",2009,"[Kuraray Co Ltd, KURARAY CO]",1998,2009-12-09,"[TANAKA, EIJI, HIGASHI, TAMIO, KITAMURA, TAKAN...",Grant
3,EP-0805365-B1,0,Optical waveguide grating and production metho...,"[{'id': 'grid.471143.4', 'name': 'Fujikura (Ja...",2009,"[Fujikura Ltd, FUJIKURA LTD]",1997,2009-12-09,"[NAKAI, MICHIHIRO, SHIMA, KENSUKE, HIDAKA, HIR...",Grant
4,EP-1970973-B1,0,Method for thermal matching of a thermoelectri...,"[{'id': 'grid.426571.3', 'name': 'Imec the Net...",2009,[INTERUNIVERSITAIR MICROELEKTRONICA CENTRUM NE...,2007,2009-12-09,"[LEONOV, VLADIMIR]",Grant


### 3.5 Partial string matching with `~`

The `~` operator indicates that the given `field` need only partially,
instead of exactly, match the given `string` (the `value` used with this
operator must be a `string`, not an integer).

For example, the filter `where research_orgs.name~"Saarland Uni"` would
match both the organization named "Saarland University" and the one
named "Universitätsklinikum des Saarlandes", and any other organization
whose name includes the terms "Saarland" and "Uni" (the order is
unimportant). 

In [33]:
%%dsldf 
search patents 
    where assignee_names ~ "IBM" 
return assignees limit 5

Returned Assignees: 5
[2mTime: 2.04s[0m


Unnamed: 0,id,count,name,city_name,country_name
0,grid.410484.d,336471,IBM (United States),Armonk,United States
1,grid.471366.1,22104,GlobalFoundries (Cayman Islands),George Town,Cayman Islands
2,grid.14648.3f,5139,IBM (United Kingdom),Winchester,United Kingdom
3,grid.420451.6,3555,Google,Mountain View,United States
4,grid.472772.3,2716,Lenovo (China),Beijing,China


### 3.6 Emptiness filters `is empty`

To filter records which contain specific field or to filter those which
contain an empty field, it is possible to use something like
`where research_orgs is not empty` or `where issn is empty`.

In [34]:
%%dsldf
search publications 
    for "iron graphene" 
    where researchers is empty 
    and research_orgs is not empty 
return publications[id+title+researchers+research_orgs+type] limit 5

Returned Publications: 5 (total = 1883)
[2mTime: 1.71s[0m


Unnamed: 0,id,research_orgs,type,title
0,pub.1129668998,"[{'id': 'grid.440673.2', 'name': 'Changzhou Un...",article,"Removal of Toxic Heavy Metal Ions (Pb, Cr, Cu,..."
1,pub.1129771684,"[{'id': 'grid.412246.7', 'name': 'Northeast Fo...",article,Nanofluid-based pulsating heat pipe for therma...
2,pub.1129041696,"[{'id': 'grid.79703.3a', 'name': 'South China ...",article,Fabrication of the novel Ag-doped SnS2@InVO4 c...
3,pub.1130477930,"[{'id': 'grid.452276.0', 'name': 'Institute of...",article,Atomically-precise dopant-controlled single cl...
4,pub.1130537929,"[{'id': 'grid.79703.3a', 'name': 'South China ...",article,Crafting visible-light-absorbing dye-doped pha...


## 4. Searching for Researchers

The DSL offers different mechanisms for searching for researchers (e.g.
publication authors, grant investigators), each of them presenting
specific advantages.

### 4.1 Exact name searches

Special full-text indices allows to look up a researcher's name and
surname **exactly as they appear in the source documents** they derive from.

This approach has a broad scope, as it allows to search the full
collection of Dimensions documents irrespectively of whether a
researcher was succesfully disambiguated (and hence given a Dimensions
ID). On the other hand, this approach will only match names as they
appear in the source document, so different spellings or initials are
not necessarily returned via a single query. 

```
search in [authors|investigators|inventors]
```

It is possible to look up publications authors using a specific
`search index` called `authors`. 

This method expects case insensitive
phrases, in format $"<first name> <last name>"$ or reverse order. Note
that strings in double quotes that contain nested quotes must always be
escaped by a backslash `\`.

In [35]:
%dsldf search publications in authors for "\"Charles Peirce\"" return publications limit 5

Returned Publications: 5 (total = 144)
[2mTime: 0.62s[0m


Unnamed: 0,title,pages,author_affiliations,year,id,type
0,5. On Logical Graphs,211-261,"[[{'first_name': 'Charles S.', 'last_name': 'P...",2019,pub.1123488521,chapter
1,12. Peripatetic Talks,348-366,"[[{'first_name': 'Charles S.', 'last_name': 'P...",2019,pub.1123488528,chapter
2,Bibliography of Peirce’s References,642-651,"[[{'first_name': 'Charles S.', 'last_name': 'P...",2019,pub.1123488545,chapter
3,14. On the First Principles of Logical Algebra,385-398,"[[{'first_name': 'Charles S.', 'last_name': 'P...",2019,pub.1123488530,chapter
4,26. Assurance through Reasoning,565-585,"[[{'first_name': 'Charles S.', 'last_name': 'P...",2019,pub.1123488542,chapter


Instead of first name, initials can also be used. These are examples of
valid research search phrases:

-   `\"Peirce, Charles S.\"`
-   `\"Charles S. Peirce\"`
-   `\"CS Peirce\"`
-   `\"Peirce CS\"`
-   `\"C S Peirce\"`
-   `\"Peirce C S\"`
-   `\"C Peirce\"`
-   `\"Peirce C\"`
-   `\"Charles Peirce\"`
-   `\"Peirce Charles\"`

**Warning**: In order to produce valid results an author or an investigator search
query must contain **at least two components or more** (e.g., name and
surname, either in full or initials).

Investigators search is similar to *authors* search, only it allows to search on `grants` and
`clinical trials` using a separate search index `investigators`, and on
`patents` using the index `inventors`.

In [36]:
%%dsldf 
search clinical_trials in investigators for "\"John Smith\"" 
return clinical_trials limit 5

Returned Clinical_trials: 3 (total = 3)
[2mTime: 0.72s[0m


Unnamed: 0,id,active_years,title,investigator_details
0,NCT00689533,"[2008, 2009, 2010, 2011, 2012, 2013, 2014, 201...",VEPTR Implantation to Treat Children With Earl...,"[[John M Flynn, MD, Principal Investigator, Ch..."
1,NCT01241149,,Prospective Evaluation of Symptom Resolution i...,"[[Ellie Mentler, MD, Principal Investigator, U..."
2,NCT04072380,"[2019, 2020]","A Phase 2, Double-blind, Placebo-controlled, P...","[[Rohith G. Patel, MD, Principal Investigator,..."


In [37]:
%%dsldf 
search grants in investigators for "\"Satoko Shimazaki\"" 
return grants limit 5

Returned Grants: 4 (total = 4)
[2mTime: 0.54s[0m


Unnamed: 0,funders,title,end_date,start_year,title_language,original_title,project_num,funding_org_name,language,start_date,id,active_year
0,"[{'id': 'grid.422239.c', 'state_name': 'Distri...","Kabuki Actors, Print Technology, and the Theat...",2022-08-31,2021,en,"Kabuki Actors, Print Technology, and the Theat...",FEL-263245-19,National Endowment for the Humanities,en,2021-09-01,grant.7925589,"[2021, 2022]"
1,"[{'id': 'grid.54432.34', 'city_name': 'Tokyo',...",Genealogy research on female saints in the Pal...,2021-03-31,2018,ja,古・中英語期における女性聖人伝の系譜研究：Aelfricのテクストと言語を中心に,18K00431,Japan Society for the Promotion of Science,ja,2018-04-01,grant.7527261,"[2018, 2019, 2020, 2021]"
2,"[{'id': 'grid.54432.34', 'city_name': 'Tokyo',...",Images of Women in the Old English Lives of Sa...,2018-03-31,2015,en,Images of Women in the Old English Lives of Sa...,15K02313,Japan Society for the Promotion of Science,en,2015-04-01,grant.5858713,"[2015, 2016, 2017, 2018]"
3,"[{'id': 'grid.54432.34', 'city_name': 'Tokyo',...",Reception and Transfromation of the Images of ...,2015-03-31,2012,en,Reception and Transfromation of the Images of ...,24520310,Japan Society for the Promotion of Science,en,2012-04-01,grant.6086985,"[2012, 2013, 2014, 2015]"


In [38]:
%%dsldf 
search patents in inventors for "\"John Smith\"" 
return patents limit 5

Returned Patents: 5 (total = 502)
[2mTime: 0.75s[0m


Unnamed: 0,title,publication_date,granted_year,assignee_names,year,inventor_names,times_cited,filing_status,id,assignees
0,A lockable safety insert for an electrical dom...,2004-11-03,2004.0,[SMITH JOHN],2003,[SMITH JOHN],0.0,Grant,IE-S20030195-A2,
1,Automotive heat exchanger,2006-03-22,2006.0,"[Llanelli Radiators Ltd, Calsonic Kansei UK Lt...",2002,[SMITH JOHN],0.0,Grant,GB-2384299-B,"[{'id': 'grid.472810.8', 'city_name': 'Llanell..."
2,Extractor,2007-10-25,,[SMITH JOHN A],2007,[John Smith],6.0,Application,US-20070245563-A1,
3,Boom utilized in a geometric end effector system,2018-02-06,2018.0,"[DESTACO Europe GmbH, CAPITAL FORMATION INC, D...",2014,[John Smith],,Grant,US-9884426-B2,"[{'id': 'grid.472738.d', 'city_name': 'Teltow'..."
4,Ammunition cartridge,2014-10-22,,"[Eley Ltd, ELEY LTD]",2013,[SMITH JOHN],0.0,Application,GB-2513101-A,


### 4.2 Fuzzy Searches

This type of search is similar to *full-text
search*, with the difference that it
allows searching by only a part of a name, e.g. only the 'last name' of
a person, by using the `where` clause. 

**Note** At this moment, this type of search is only available for
`publications`. Other sources will add this option in the future.

For example:

In [39]:
%%dsldf 
search publications where authors = "Hawking" 
return publications limit 5[id+doi+title+authors] limit 10

Returned Errors: 1
[2mTime: 0.44s[0m
1 QuerySyntaxError found
1 ParserError found
  * [Line 2:27] ('[') mismatched input '[' expecting <EOF>


Generally speaking, using a `where` clause to search authors is less
precise that using the relevant exact-search syntax. 

On the other hand, using a
`where` clause can be handy if one wants to **combine an author search
with another full-text search index**.

For example:

In [40]:
%%dsldf 
search publications 
    in title_abstract_only for "dna replication" 
    where authors = "smith"  
return publications limit 5

Returned Publications: 5 (total = 1544)
[2mTime: 1.14s[0m


Unnamed: 0,title,pages,author_affiliations,year,issue,id,type,volume,journal.id,journal.title
0,Identifying epigenetic biomarkers of establish...,95,"[[{'first_name': 'Ryan', 'last_name': 'Langdon...",2020,1,pub.1128835470,article,12,jour.1042271,Clinical Epigenetics
1,Genetic associations with clozapine-induced my...,37,"[[{'first_name': 'Paul', 'last_name': 'Lacaze'...",2020,1,pub.1124910780,article,10,jour.1045271,Translational Psychiatry
2,Genomic analyses of early responses to radiati...,8979,"[[{'first_name': 'Saket', 'last_name': 'Choudh...",2020,1,pub.1128124846,article,10,jour.1045337,Scientific Reports
3,An epigenome-wide association study of posttra...,46,"[[{'first_name': 'Mark W.', 'last_name': 'Logu...",2020,1,pub.1125664041,article,12,jour.1042271,Clinical Epigenetics
4,Longitudinal epigenome-wide association studie...,11,"[[{'first_name': 'Clara', 'last_name': 'Snijde...",2020,1,pub.1124060243,article,12,jour.1042271,Clinical Epigenetics


### 4.3 Using the disambiguated Researchers database

The Dimensions [Researchers](https://docs.dimensions.ai/dsl/datasource-researchers.html) source is a database of
researchers information algorithmically extracted and disambiguated from
all of the other content sources (publications, grants, clinical trials
etc..).

By using the `researchers` source it is possible to match an
'aggregated' person object linking together multiple publication
authors, grant investigators etc.. irrespectively of the form their
names can take in the original source documents.

However, since database does not contain all authors and investigators information
available in Dimensions. 

E.g. think of authors from older publications,
or authors with very common names that are difficult to disambiguate, or
very new authors, who have only one or few publications. In such cases,
using full-text authors search might be more
appropriate.

Examples:

In [41]:
%%dsldf 
search researchers for "\"Satoko Shimazaki\"" 
return researchers[basics+obsolete] 

Returned Researchers: 4 (total = 4)
[2mTime: 1.24s[0m


Unnamed: 0,id,first_name,last_name,obsolete,research_orgs
0,ur.07751146721.59,Satoko,Shimazaki,0,
1,ur.010537333602.30,Satoko,Shimazaki,1,
2,ur.014307627665.09,Satoko,Shimazaki,0,"[{'id': 'grid.19006.3e', 'types': ['Education'..."
3,ur.015527473602.63,Satoko,Shimazaki,0,"[{'id': 'grid.266190.a', 'types': ['Education'..."


NOTE pay attentiont to the `obsolete` field. This indicates the researcher ID status. 0 means that the researcher ID is still **active**, 1 means that the researcher ID is **no longer valid**. This is due to the ongoing process of refinement of Dimensions researchers. 

Hence the query above is best written like this:

In [42]:
%%dsldf 
search researchers where obsolete=0 for "\"Satoko Shimazaki\"" 
return researchers[basics+obsolete] 

Returned Researchers: 3 (total = 3)
[2mTime: 1.21s[0m


Unnamed: 0,last_name,first_name,id,obsolete,research_orgs
0,Shimazaki,Satoko,ur.07751146721.59,0,
1,Shimazaki,Satoko,ur.014307627665.09,0,"[{'id': 'grid.19006.3e', 'name': 'University o..."
2,Shimazaki,Satoko,ur.015527473602.63,0,"[{'id': 'grid.266190.a', 'name': 'University o..."


With `Researchers`, one can use other fields as well:

In [43]:
%%dsldf 
search researchers 
    where obsolete=0 and last_name="Shimazaki" 
return researchers[basics] limit 5

Returned Researchers: 5 (total = 454)
[2mTime: 0.72s[0m


Unnamed: 0,last_name,first_name,id,research_orgs
0,Shimazaki,Tatsuo,ur.013510032403.65,"[{'id': 'grid.419075.e', 'name': 'Ames Researc..."
1,Shimazaki,Tomomi,ur.010700310627.87,"[{'id': 'grid.471199.3', 'name': 'Murata (Japa..."
2,Shimazaki,Dai,ur.011035131473.19,"[{'id': 'grid.415776.6', 'name': 'National Ins..."
3,Shimazaki,Koji,ur.016627632300.80,
4,Shimazaki,Toshiyuki,ur.013205240215.48,"[{'id': 'grid.420062.2', 'name': 'Nissan Chemi..."


## 5. Returning results

After the `search` phrase, a query must contain one or more `return`
phrases, specifying the content and format of the information that
should be returned.



### 5.1 Returning Multiple Sources

Multiple results may not be returned in a single `return` phrase.

In [44]:
%%dsldf 
search publications 
return funders limit 5 
return research_orgs limit 5 
return year

Returned Year: 20
Returned Research_orgs: 5
Returned Funders: 5
[2mTime: 4.38s[0m


Unnamed: 0,id,count
0,2019,5573486
1,2018,5172592
2,2017,4817375
3,2016,4426951
4,2015,4244304
5,2020,4166850
6,2014,4101478
7,2013,3909910
8,2012,3646455
9,2011,3527334



### 5.2 Returning Specific Fields

For control over which information from each given `record` will be
returned, a `source` or `entity` name in the `results` phrase can be
optionally followed by a specification of `fields` and `fieldsets` to be
included in the JSON results for each retrieved record.

The fields specification may be an arbitrary list of `field` names
enclosed in brackets (`[`, `]`), with field names separated by a plus
sign (`+`). Minus sign (`-`) can be used to exclude `field` or a
`fieldset` from the result. Field names thus listed within brackets must
be "known" to the DSL, and therefore only a subset of fields may be used
in this syntax (see note below).

In [45]:
%%dsldf 
search grants 
return grants[grant_number + title + language] limit 5

Returned Grants: 5 (total = 5514056)
[2mTime: 0.46s[0m


Unnamed: 0,grant_number,title,language
0,RAAK.PRO02.048,Sensing alarm responses of ungulate herds to p...,en
1,890218,Functional analysis of ribosome heterogeneity ...,en
2,2018-HRSI-1548,APPROACH to Enriching the Real World Evidence ...,en
3,894029,Knowledge Transfer in Global Gender Programmes...,en
4,1301720F,Molecular mechanism of DNA double strand break...,en


In [46]:
%%dsldf 
search clinical_trials 
return clinical_trials [id+ title + acronym + phase] limit 5

Returned Clinical_trials: 5 (total = 582398)
[2mTime: 0.50s[0m


Unnamed: 0,phase,id,title,acronym
0,,NCT02318264,Influence of Elastic Tape on Activation of the...,
1,,NCT02318290,Opioids Withdrawal Syndrome in Critically Ill ...,WAAICUP
2,Phase 2,NCT02318303,"A Double-blind, Randomized, Parallel-group, Co...",
3,,NCT02318316,"""Exhaled Breath Condensate"" in Allogeneic Stem...",
4,Phase 1,NCT02318329,"A Phase 1 Open-Label, Dose-Finding Study Evalu...",


**Shortcuts: `fieldsets`**

The fields specification may be the name of a pre-defined `fieldset`
(e.g. `extras`, `basics`). These are shortcuts that can be handy when testing out new queries, for example. 

NOTE In general when writing code used in integrations or long-standing extraction scripts it is **best to return specific fields rather that a predefined set**. This has also the advantage of making queries faster by avoiding the extraction of unnecessary data.
    

In [47]:
%%dsldf 
search grants 
return grants [basics] limit 5 

Returned Grants: 5 (total = 5514056)
[2mTime: 0.57s[0m
Field 'project_num' is deprecated in favor of grant_number. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'title_language' is deprecated in favor of language_title. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,language,start_date,project_num,title_language,funding_org_name,id,original_title,start_year,active_year,title,funders,end_date
0,en,2021-12-27,RAAK.PRO02.048,en,Dutch Research Council,grant.6946936,Sensing alarm responses of ungulate herds to p...,2021,[2021],Sensing alarm responses of ungulate herds to p...,"[{'id': 'grid.420488.2', 'name': 'Dutch Resear...",
1,en,2021-12-01,890218,en,European Commission,grant.9064785,Functional analysis of ribosome heterogeneity ...,2021,"[2021, 2022, 2023]",Functional analysis of ribosome heterogeneity ...,"[{'id': 'grid.270680.b', 'name': 'European Com...",2023-11-30
2,en,2021-11-30,2018-HRSI-1548,en,New Brunswick Health Research Foundation,grant.8690978,APPROACH to Enriching the Real World Evidence ...,2021,[2021],APPROACH to Enriching the Real World Evidence ...,"[{'id': 'grid.484521.e', 'name': 'New Brunswic...",
3,en,2021-10-01,894029,en,European Commission,grant.9064813,Knowledge Transfer in Global Gender Programmes...,2021,"[2021, 2022, 2023, 2024]",Knowledge Transfer in Global Gender Programmes...,"[{'id': 'grid.270680.b', 'name': 'European Com...",2024-09-30
4,en,2021-10-01,1301720F,en,Fund for Scientific Research,grant.8950252,Mécanismes moléculaires de la formation et la ...,2021,[2021],Molecular mechanism of DNA double strand break...,"[{'id': 'grid.424470.1', 'name': 'Fund for Sci...",


In [48]:
%%dsldf 
search publications 
return publications [basics+times_cited] limit 5 

Returned Publications: 5 (total = 112275335)
[2mTime: 1.20s[0m
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,year,id,pages,author_affiliations,volume,times_cited,issue,type,title,journal.id,journal.title
0,2020,pub.1130041027,1793599,"[[{'first_name': 'Thanos', 'last_name': 'Karat...",11.0,0,1.0,article,Adverse and benevolent childhood experiences i...,jour.1045059,European Journal of Psychotraumatology
1,2020,pub.1129454261,191-202,"[[{'first_name': 'Rafael', 'last_name': 'Valdi...",,0,,chapter,FACTORES PSICOSOCIALES ASOCIADOS A MENORES CON...,,
2,2020,pub.1125632078,333-349,,,0,,chapter,Literature,,
3,2020,pub.1124099280,1704540,"[[{'first_name': 'Mahendra M', 'last_name': 'R...",13.0,0,1.0,article,To start or to complete? – Challenges in imple...,jour.1041075,Global Health Action
4,2020,pub.1124649186,1717411,"[[{'first_name': 'Benjamin-Samuel', 'last_name...",13.0,1,1.0,article,Long-term trends in seasonality of mortality i...,jour.1041075,Global Health Action


The fields specification may be an (`all`), to indicate that all fields
available for the given `source` should be returned.

In [49]:
%%dsldf
search publications 
return publications [all] limit 5 

Returned Publications: 5 (total = 112275334)
[2mTime: 1.27s[0m
Field 'references' is deprecated in favor of reference_ids. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'terms' is deprecated in favor of concepts. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'RCDC' is deprecated in favor of category_rcdc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_RAC' is deprecated in favor of category_hrcs_rac. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'category_ua' is deprecated in favor of category_uoa. Please refer to https://docs.dimensions.ai

Unnamed: 0,open_access_categories,pages,publisher,altmetric_id,type,title,year,recent_citations,doi,times_cited,...,authors,references,HRCS_RAC,volume,open_access,concepts_scores,journal.id,journal.title,research_org_state_names,research_org_state_codes
0,"[{'id': 'closed', 'description': 'No freely av...",333-349,De Gruyter,0,chapter,Literature,2020,0,10.1515/9783110823547-013,0,...,,,,,,,,,,
1,"[{'id': 'oa_all', 'description': 'Article is f...",1704540,Taylor & Francis,74041725,article,To start or to complete? – Challenges in imple...,2020,0,10.1080/16549716.2019.1704540,0,...,"[{'first_name': 'Mahendra M', 'last_name': 'Re...","[pub.1084776885, pub.1026226848, pub.100783600...","[{'id': '10801', 'name': '8.1 Organisation and...",13.0,"[Open Access - all, Open Access - publisher, O...","[{'concept': 'isoniazid preventive therapy', '...",jour.1041075,Global Health Action,,
2,"[{'id': 'oa_all', 'description': 'Article is f...",1717411,Taylor & Francis,75135566,article,Long-term trends in seasonality of mortality i...,2020,1,10.1080/16549716.2020.1717411,1,...,"[{'first_name': 'Benjamin-Samuel', 'last_name'...","[pub.1070577469, pub.1035360137, pub.111994906...",,13.0,"[Open Access - all, Open Access - publisher, O...","[{'concept': 'cause-specific mortality', 'rele...",jour.1041075,Global Health Action,[New Jersey],"[{'id': 'US-NJ', 'name': 'New Jersey'}]"
3,"[{'id': 'closed', 'description': 'No freely av...",167-190,De Gruyter,0,chapter,"Eine Warnung an alle, dy sych etwaz duncken: D...",2020,0,10.1515/9783110950762-012,0,...,"[{'first_name': 'Ulla', 'last_name': 'Williams...",,,,,,,,,
4,"[{'id': 'closed', 'description': 'No freely av...",241-276,De Gruyter,0,chapter,Marienklagen und Pietà,2020,0,10.1515/9783110922035-011,0,...,"[{'first_name': 'Georg', 'last_name': 'Satzing...",,,,,,,,,


### 5.3 Returning Facets

In addition to returning source records matching a query, it is possible
to $facet$ on the [entity](data-entities.ipynb) fields related to a
particular source and return only those entity values as an aggregrated
view of the related source data. This operation is similar to a
$group by$ or $pivot table$.

**Warning** Faceting can return up to a maximum of 1000 results. This is to ensure
adequate performance with all queries. Furthemore, although the `limit`
operator is allowed, the `skip` operator cannot be used.

In [50]:
%%dsldf 
search publications 
    for "coronavirus" 
return research_orgs limit 5

Returned Research_orgs: 5
[2mTime: 0.53s[0m


Unnamed: 0,id,count,name,latitude,state_name,types,linkout,country_name,longitude,city_name,acronym
0,grid.38142.3c,1394,Harvard University,42.377052,Massachusetts,[Education],[http://www.harvard.edu/],United States,-71.11665,Cambridge,
1,grid.21107.35,1288,Johns Hopkins University,39.328888,Maryland,[Education],[https://www.jhu.edu/],United States,-76.62028,Baltimore,JHU
2,grid.17063.33,1199,University of Toronto,43.661667,Ontario,[Education],[http://www.utoronto.ca/],Canada,-79.395,Toronto,
3,grid.4991.5,1183,University of Oxford,51.753437,Oxfordshire,[Education],[http://www.ox.ac.uk/],United Kingdom,-1.25401,Oxford,
4,grid.194645.b,1176,University of Hong Kong,22.283287,Hong Kong,[Education],[http://www.hku.hk/],China,114.13708,Hong Kong,HKU


In [51]:
%%dsldf 
search publications 
    for "coronavirus" 
return research_org_countries limit 5
return year limit 5
return category_for limit 5

Returned Research_org_countries: 5
Returned Year: 5
Returned Category_for: 5
[2mTime: 0.60s[0m


Unnamed: 0,id,count,name
0,US,44418,United States
1,CN,19128,China
2,GB,14325,United Kingdom
3,DE,8371,Germany
4,IT,8316,Italy


For control over the organization and headers of the JSON query results,
the `return` keyword in a return phrase may be followed by the keyword
`in` and then a `group` name for this group of results, where the group
name is enclosed in double quotes(`"`).

Also, one can define `aliases` that replace the defaul JSON fields names with other ones provided by the user. 

See the [official documentation](https://docs.dimensions.ai/dsl/language.html#aliases) for more details about this feature. 

In [70]:
%%dsl
search publications 
return in "facets" funders 
return in "facets" research_orgs

Returned Facets: 2
[2mTime: 2.77s[0m


<dimcli.DslDataset object #4663838032. Records: 2/112275334>

### 5.4 What the query statistics refer to - sources VS facets

When performing a DSL search, a `_stats` object is return which contains some useful info eg the total number of records available for a search. 

In [53]:
%%dsldf 
search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
return publications limit 5

Returned Publications: 5 (total = 3727)
[2mTime: 0.55s[0m


Unnamed: 0,type,volume,pages,author_affiliations,id,year,issue,title,journal.id,journal.title
0,article,3.0,18124-18131,"[[{'first_name': 'Siewteng', 'last_name': 'Sim...",pub.1110885950,2018,12.0,Development of Organo-Dispersible Graphene Oxi...,jour.1157000,ACS Omega
1,proceeding,,,"[[{'first_name': 'T.', 'last_name': 'Miyagi', ...",pub.1110925389,2018,,Nuclear Ab Initio Calculations with the Unitar...,,
2,article,122.0,29200-29209,"[[{'first_name': 'Taro', 'last_name': 'Toyoda'...",pub.1110369527,2018,51.0,"Anisotropic Crystal Growth, Optical Absorption...",jour.1038386,The Journal of Physical Chemistry C
3,article,122.0,28491-28496,"[[{'first_name': 'Liang', 'last_name': 'Wang',...",pub.1110271601,2018,50.0,Indium Zinc Oxide Electron Transport Layer for...,jour.1038386,The Journal of Physical Chemistry C
4,article,10.0,43682-43690,"[[{'first_name': 'Ami', 'last_name': 'Nomura',...",pub.1110222625,2018,50.0,Chalcopyrite ZnSnSb2: A Promising Thermoelectr...,jour.1041450,ACS Applied Materials & Interfaces




It is important to note though that the **total number always refers to the main source, never the facets** one is searching for. 

For example, in this query we return `researchers` linked to publications: 

In [54]:
%%dsldf 
search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
return researchers limit 5

Returned Researchers: 5
[2mTime: 0.86s[0m


Unnamed: 0,id,count,last_name,first_name,research_orgs,orcid_id
0,ur.01055753603.27,140,Hayase,Shuzi Shuzi,"[grid.419082.6, grid.482504.f, grid.14003.36, ...",
1,ur.011212042763.67,102,Hikita,Masayuki,"[grid.27476.30, grid.462727.2, grid.258806.1]",
2,ur.01144540527.52,100,Ma,Ting-Li,"[grid.177174.3, grid.30055.33, grid.11135.37, ...",[0000-0002-3310-459X]
3,ur.07644453127.11,96,Kozako,M Kozako M,"[grid.462727.2, grid.471634.3, grid.258806.1, ...",
4,ur.016357156077.09,86,Lu,Huimin,"[grid.454850.8, grid.41156.37, grid.258806.1, ...",[0000-0001-9794-3221]


NOTE: facet results can be 1000 at most (due to performance limitations) so if there are more than 1000 it is not possible to know the total number. 

### 5.5 Paginating Results

At the end of a `return` phrase, the user can specify the maximum number
of results to be returned and the number of top records to skip over
before returning the first result record, for e.g. returning large
result sets page-by-page (i.e. "paging" results) as described below.

This is done using the keyword `limit` followed by the maximum number of
results to return, optionally followed by the keyword `skip` and the
number of results to skip (the offset).

In [55]:
%%dsldf 
search publications return publications limit 10

Returned Publications: 10 (total = 112275335)
[2mTime: 0.46s[0m


Unnamed: 0,title,pages,author_affiliations,year,issue,id,type,volume,journal.id,journal.title
0,Adverse and benevolent childhood experiences i...,1793599,"[[{'first_name': 'Thanos', 'last_name': 'Karat...",2020,1.0,pub.1130041027,article,11.0,jour.1045059,European Journal of Psychotraumatology
1,FACTORES PSICOSOCIALES ASOCIADOS A MENORES CON...,191-202,"[[{'first_name': 'Rafael', 'last_name': 'Valdi...",2020,,pub.1129454261,chapter,,,
2,Literature,333-349,,2020,,pub.1125632078,chapter,,,
3,To start or to complete? – Challenges in imple...,1704540,"[[{'first_name': 'Mahendra M', 'last_name': 'R...",2020,1.0,pub.1124099280,article,13.0,jour.1041075,Global Health Action
4,Long-term trends in seasonality of mortality i...,1717411,"[[{'first_name': 'Benjamin-Samuel', 'last_name...",2020,1.0,pub.1124649186,article,13.0,jour.1041075,Global Health Action
5,"Eine Warnung an alle, dy sych etwaz duncken: D...",167-190,"[[{'first_name': 'Ulla', 'last_name': 'William...",2020,,pub.1125632729,chapter,,,
6,Marienklagen und Pietà,241-276,"[[{'first_name': 'Georg', 'last_name': 'Satzin...",2020,,pub.1125635978,chapter,,,
7,Johannes Taulers Via negationis,76-93,"[[{'first_name': 'Walter', 'last_name': 'Haug'...",2020,,pub.1125632704,chapter,,,
8,"Die editorische Einheit ,Textstufe'",177-194,"[[{'first_name': 'Hermann', 'last_name': 'Zwer...",2020,,pub.1125636152,chapter,,,
9,ad Iliadis librum Ζ,123-221,,2020,,pub.1125636759,chapter,,,


If paging information is not provided, the default values
`limit 20 skip 0` are used, so the two following queries are equivalent:

Combining `limit` and `skip` across multiple queries enables paging or
batching of results; e.g. to retrieve 30 grant records divided into 3
pages of 10 records each, the following three queries could be used:

```
return grants limit 10           => get 1st 10 records for page 1 (skip 0, by default)
return grants limit 10 skip 10   => get next 10 for page 2; skip the 10 we already have
return grants limit 10 skip 20   => get another 10 for page 3, for a total of 30
```

### 5.6 Sorting Results

A sort order for the results in a given `return` phrase can be specified
with the keyword `sort by` followed by the name of 
* a `field` (in the
case that a `source` is being requested) 
* an `indicator (aggregation)` (in the case
that one or more facets are being requested). 

 By default, the result set of full text
queries ($search ... for "full text query"$) is sorted by "relevance".
Additionally, it is possible to specify the sort order, using `asc` or
`desc` keywords. By default, descending order is selected.

In [56]:
%%dsldf 
search grants 
    for "nanomaterials"
return grants sort by title desc limit 5 

Returned Grants: 5 (total = 18268)
[2mTime: 0.51s[0m


Unnamed: 0,start_date,language,id,original_title,title_language,title,active_year,start_year,funding_org_name,project_num,funders,end_date
0,2012-01-01,de,grant.4823271,Transmissionselektronenmikroskop,en,Transmissionselektronenmikroskop,[2012],2012,German Research Foundation,220923099,"[{'id': 'grid.424150.6', 'types': ['Facility']...",
1,2015-01-01,en,grant.4841519,Transmissionselektronenmikroskop,en,Transmissionselektronenmikroskop,[2015],2015,German Research Foundation,280331443,"[{'id': 'grid.424150.6', 'types': ['Facility']...",
2,2011-06-16,en,grant.6774902,Snowcontrol.,en,Snowcontrol.,"[2011, 2012, 2013, 2014, 2015]",2011,Belgian Federal Science Policy Office,3E120109,"[{'id': 'grid.425119.a', 'types': ['Government...",2015-06-13
3,2014-01-01,de,grant.4834305,Röntgenquelle,en,Röntgenquelle,[2014],2014,German Research Foundation,245513494,"[{'id': 'grid.424150.6', 'types': ['Facility']...",
4,2015-01-01,de,grant.4839883,Röntgendiffraktometer,en,Röntgendiffraktometer,[2015],2015,German Research Foundation,279250642,"[{'id': 'grid.424150.6', 'types': ['Facility']...",


In [57]:
%%dsldf  
search grants  
    for "nanomaterials"
return grants  sort by relevance desc limit 5

Returned Grants: 5 (total = 18268)
[2mTime: 0.45s[0m


Unnamed: 0,start_date,language,id,original_title,title_language,title,active_year,start_year,funding_org_name,end_date,project_num,funders
0,2012-06-01,en,grant.3984032,Optically-active chiral nanomaterials,en,Optically-active chiral nanomaterials,"[2012, 2013]",2012,Science Foundation Ireland,2013-05-31,11/W.1/I2065,"[{'id': 'grid.437854.9', 'types': ['Nonprofit'..."
1,2000-09-01,en,grant.3526883,NOVEL LANTHANIDE LUMINESCENT SYSTEMS: FROM SUP...,en,NOVEL LANTHANIDE LUMINESCENT SYSTEMS: FROM SUP...,"[2000, 2001, 2002, 2003]",2000,Foundation for Science and Technology,2003-12-31,35378,"[{'id': 'grid.22919.31', 'types': ['Nonprofit'..."
2,2003-03-01,en,grant.3531153,Transport properties and electrochemical appli...,en,Transport properties and electrochemical appli...,"[2003, 2004, 2005, 2006]",2003,Foundation for Science and Technology,2006-08-31,39381,"[{'id': 'grid.22919.31', 'types': ['Nonprofit'..."
3,2014-04-01,en,grant.4167216,Polymer Nanomaterials,en,Polymer Nanomaterials,"[2014, 2015]",2014,Natural Sciences and Engineering Research Council,2015-03-31,557300,"[{'id': 'grid.452912.9', 'types': ['Government..."
4,2012-01-01,en,grant.4849153,Novel biocomposite nanomaterials,en,Novel biocomposite nanomaterials,"[2012, 2013, 2014, 2015]",2012,Israel Science Foundation,2015-12-31,25813,"[{'id': 'grid.425339.a', 'types': ['Nonprofit'..."


Number of citations per publication

In [58]:
%%dsldf  
search publications
return publications  [doi + times_cited] 
    sort by times_cited limit 5

Returned Publications: 5 (total = 112275334)
[2mTime: 1.70s[0m


Unnamed: 0,times_cited,doi
0,231730,
1,197598,10.1038/227680a0
2,180841,10.1016/0003-2697(76)90527-3
3,91278,10.1006/meth.2001.1262
4,85717,10.1103/physrevlett.77.3865


Recent citations per publication.
Note: Recent citation refers to the number of citations accrued in the last two year period. A single value is stored per document and the year window rolls over in July.

In [59]:
%%dsldf 
search publications
return publications [doi + recent_citations]
    sort by recent_citations limit 5

Returned Publications: 5 (total = 112275334)
[2mTime: 1.24s[0m


Unnamed: 0,recent_citations,doi
0,33085,10.1006/meth.2001.1262
1,25320,10.1109/cvpr.2016.90
2,24834,10.1103/physrevlett.77.3865
3,24068,10.1176/appi.books.9780890425596
4,23012,10.1191/1478088706qp063oa


When a facet is being returned, the `indicator` used in the
`sort` phrase must either be `count` (the default, such that
`sort by count` is unnecessary), or one of the indicators specified in
the `aggregate` phrase, i.e. one whose values are being computed in the
faceting operation. 


In [60]:
%%dsldf 
search publications 
    for "nanomaterials"
return research_orgs 
    aggregate altmetric_median, rcr_avg sort by rcr_avg limit 5 

Returned Research_orgs: 5
[2mTime: 3.12s[0m


Unnamed: 0,id,count,rcr_avg,altmetric_median,types,name,latitude,longitude,linkout,city_name,country_name,acronym,state_name
0,grid.11444.34,1,207.399994,345.0,[Facility],Shanghai Institute of Hypertension,31.211678,121.467255,[http://www.china-sih.com/],Shanghai,China,,
1,grid.11485.39,1,207.399994,345.0,[Nonprofit],Cancer Research UK,51.531322,-0.106269,[http://www.cancerresearchuk.org/],London,United Kingdom,CRUK,
2,grid.11642.30,1,207.399994,345.0,[Education],University of La Réunion,-20.901735,55.48455,[http://www.univ-reunion.fr/university-of-reun...,Saint-Denis,Reunion,,
3,grid.120073.7,1,207.399994,345.0,[Healthcare],Addenbrooke's Hospital,52.176,0.14,[http://www.cuh.org.uk/addenbrookes-hospital],Cambridge,United Kingdom,,Cambridgeshire
4,grid.20931.39,1,207.399994,345.0,[Education],Royal Veterinary College,51.5368,-0.134,[http://www.rvc.ac.uk/],London,United Kingdom,RVC,


## 6. Aggregations

In a `return` phrase requesting one or more `facet` results, aggregation
operations to perform during faceting can be specified after the facet
name(s) by using the keyword `aggregate` followed by a comma-separated
list of one or more `indicator` names corresponding to the `source`
being searched.

In [61]:
%%dsldf
search publications 
    where year > 2010 
return research_orgs  
    aggregate rcr_avg, altmetric_median limit 5

Returned Research_orgs: 5
[2mTime: 14.61s[0m


Unnamed: 0,id,count,rcr_avg,altmetric_median,name,latitude,state_name,types,linkout,country_name,longitude,city_name,acronym
0,grid.17063.33,146656,1.701046,4.0,University of Toronto,43.661667,Ontario,[Education],[http://www.utoronto.ca/],Canada,-79.395,Toronto,
1,grid.38142.3c,144250,2.230168,5.132735,Harvard University,42.377052,Massachusetts,[Education],[http://www.harvard.edu/],United States,-71.11665,Cambridge,
2,grid.11899.38,138910,1.050863,2.0,University of São Paulo,-23.563051,,[Education],[http://www5.usp.br/en/],Brazil,-46.730103,São Paulo,USP
3,grid.83440.3b,126466,1.914593,4.0,University College London,51.52447,,[Education],[http://www.ucl.ac.uk/],United Kingdom,-0.133982,London,UCL
4,grid.26999.3d,122350,1.185757,2.0,University of Tokyo,35.713333,,[Education],[http://www.u-tokyo.ac.jp/en/],Japan,139.76222,Tokyo,UT


**What are the metrics/aggregations available?** See the data sources documentation for information about available [indicators](https://docs.dimensions.ai/dsl/datasource-publications.html#publications-indicators).  

Alternatively, we can use the 'schema' API ([describe](https://docs.dimensions.ai/dsl/data-sources.html#metadata-api)) to return this information programmatically:

In [62]:
schema = dsl.query("describe schema")
sources = [x for x in schema['sources']]
# for each source name, extract metrics info
for s in sources:
    print("SOURCE:", s)
    for m in schema['sources'][s]['metrics']:
        print("--", schema['sources'][s]['metrics'][m]['name'], " => ", schema['sources'][s]['metrics'][m]['description'], )

SOURCE: publications
-- count  =>  Total count
-- altmetric_median  =>  Median Altmetric attention score
-- altmetric_avg  =>  Altmetric attention score mean
-- citations_total  =>  Aggregated number of citations
-- citations_avg  =>  Arithmetic mean of citations
-- citations_median  =>  Median of citations
-- recent_citations_total  =>  For a given article, in a given year, the number of citations accrued in the last two year period. Single value stored per document, year window rolls over in July.
-- rcr_avg  =>  Arithmetic mean of `relative_citation_ratio` field.
-- fcr_gavg  =>  Geometric mean of `field_citation_ratio` field (note: This field cannot be used for sorting results).
SOURCE: grants
-- count  =>  Total count
-- funding  =>  Total funding amount, in USD.
SOURCE: patents
-- count  =>  Total count
SOURCE: clinical_trials
-- count  =>  Total count
SOURCE: policy_documents
-- count  =>  Total count
SOURCE: researchers
-- count  =>  Total count
SOURCE: organizations
-- count  

**NOTE** In addition to any specified aggregations, `count` is always computed
and reported when facet results are requested.

In [63]:
%%dsldf
search grants 
    for "5g network" 
return funders 
    aggregate count, funding sort by funding limit 5 

Returned Funders: 5
[2mTime: 0.47s[0m


Unnamed: 0,id,count,funding,types,name,latitude,longitude,linkout,city_name,country_name,acronym,state_name
0,grid.270680.b,194,923867691.0,[Government],European Commission,50.85165,4.36367,[http://ec.europa.eu/index_en.htm],Brussels,Belgium,EC,
1,grid.421091.f,69,53295321.0,[Government],Engineering and Physical Sciences Research Cou...,51.567093,-1.784602,[https://www.epsrc.ac.uk/],Swindon,United Kingdom,EPSRC,England
2,grid.457785.c,113,51989327.0,[Government],Directorate for Computer & Information Science...,38.88058,-77.111,[http://www.nsf.gov/dir/index.jsp?org=CISE],Arlington,United States,NSF CISE,Virginia
3,grid.55047.33,8,50109038.0,[Government],National Centre for Research and Development,52.227455,21.00763,[http://www.ncbr.gov.pl/en/],Warsaw,Poland,NCRD,
4,grid.453115.7,33,29462562.0,[Government],Innovation and Technology Commission,22.28264,114.16658,[http://www.itc.gov.hk/en/about/org.htm],Hong Kong,China,ITC,


Aggregated total number of citations

In [64]:
%%dsldf
search publications
    for "ontologies"
return funders 
    aggregate citations_total 
    sort by citations_total  limit 5

Returned Funders: 5
[2mTime: 1.18s[0m


Unnamed: 0,id,count,citations_total,types,name,latitude,longitude,linkout,city_name,country_name,state_name,acronym
0,grid.48336.3a,13207,864977.0,[Government],National Cancer Institute,39.004326,-77.10119,[http://www.cancer.gov/],Rockville,United States,Maryland,NCI
1,grid.280785.0,12900,830574.0,[Facility],National Institute of General Medical Sciences,38.997833,-77.09938,[http://www.nigms.nih.gov/Pages/default.aspx],Bethesda,United States,Maryland,NIGMS
2,grid.280128.1,4857,608945.0,[Facility],National Human Genome Research Institute,38.996967,-77.09693,[https://www.genome.gov/],Bethesda,United States,Maryland,NHGRI
3,grid.270680.b,19178,588854.0,[Government],European Commission,50.85165,4.36367,[http://ec.europa.eu/index_en.htm],Brussels,Belgium,,EC
4,grid.52788.30,5530,447416.0,[Nonprofit],Wellcome Trust,51.525867,-0.135005,[http://www.wellcome.ac.uk/],London,United Kingdom,,WT


Arithmetic mean number of citations

In [65]:
%%dsldf
search publications
return funders 
    aggregate citations_avg 
    sort by citations_avg limit 5

Returned Funders: 5
[2mTime: 2.17s[0m


Unnamed: 0,id,count,citations_avg,name,latitude,state_name,types,linkout,country_name,longitude,city_name,acronym
0,grid.478308.0,185,260.87027,Alexander & Margaret Stewart Trust,38.90116,District of Columbia,[Nonprofit],[http://www.stewart-trust.org/],United States,-77.03973,Washington D.C.,
1,grid.453780.d,144,190.722222,Accelerate Brain Cancer Cure,38.90672,District of Columbia,[Nonprofit],[http://www.abc2.org/],United States,-77.03952,Washington D.C.,
2,grid.478789.d,586,168.203072,Donald W. Reynolds Foundation,36.19046,Nevada,[Other],[http://www.dwreynolds.org/],United States,-115.29985,Las Vegas,
3,grid.417710.4,182,164.71978,Human Genome Sciences (United States),39.09665,Maryland,[Company],[http://www.hgsi.com],United States,-77.20376,Rockville,
4,grid.484432.d,1,150.0,Macmillan Cancer Support,51.488003,,[Nonprofit],[https://www.macmillan.org.uk/],United Kingdom,-0.123164,London,Macmillan Cancer Support


Geometric mean of FCR


In [66]:
%%dsldf
search publications
return funders 
    aggregate fcr_gavg limit 5

Returned Funders: 5
[2mTime: 3.48s[0m


Unnamed: 0,id,fcr_gavg,count,types,name,latitude,longitude,linkout,city_name,country_name,acronym,state_name
0,grid.419696.5,2.33755,2048348,[Government],National Natural Science Foundation of China,40.005177,116.33983,[http://www.nsfc.gov.cn/publish/portal1/],Beijing,China,NSFC,
1,grid.270680.b,3.310395,706071,[Government],European Commission,50.85165,4.36367,[http://ec.europa.eu/index_en.htm],Brussels,Belgium,EC,
2,grid.424020.0,2.555937,641515,[Government],Ministry of Science and Technology of the Peop...,39.827835,116.316284,[http://www.most.gov.cn/eng/],Beijing,China,MOST,
3,grid.48336.3a,4.933944,598242,[Government],National Cancer Institute,39.004326,-77.10119,[http://www.cancer.gov/],Rockville,United States,NCI,Maryland
4,grid.54432.34,2.288957,587177,[Nonprofit],Japan Society for the Promotion of Science,35.68716,139.74039,[http://www.jsps.go.jp/],Tokyo,Japan,JSPS,


Median Altmetric Attention Score

In [67]:
%%dsldf 
search publications
return funders aggregate altmetric_median 
    sort by altmetric_median limit 5 

Returned Funders: 5
[2mTime: 7.19s[0m


Unnamed: 0,id,count,altmetric_median,city_name,types,name,country_name,linkout,latitude,acronym,longitude,state_name
0,grid.258806.1,8,150.5,Kitakyushu,[Education],Kyushu Institute of Technology,Japan,[https://www.kyutech.ac.jp/english/],33.894436,KIT,130.8392,
1,grid.470711.4,2,108.5,Edinburgh,[Nonprofit],Chest Heart and Stroke Scotland,United Kingdom,[http://www.chss.org.uk/],55.946075,CHSS,-3.219597,
2,grid.443873.f,5,96.0,Chicago,[Nonprofit],LUNGevity Foundation,United States,[http://www.lungevity.org/],41.878674,LUNG,-87.62648,Illinois
3,grid.473856.b,2,66.0,Washington D.C.,[Government],Administration for Children and Families,United States,[https://www.acf.hhs.gov/],38.88594,ACF,-77.01637,District of Columbia
4,grid.419979.b,2,44.0,Philadelphia,[Healthcare],Einstein Healthcare Network,United States,[http://www.einstein.edu/],40.036827,AEHN,-75.14314,Pennsylvania
