# Exploring The Dimensions Search Language (DSL) - Quick Intro

This Notebook takes you through the basics of using the Dimensions API.  

> See also: [official DSL documentation online](https://docs.dimensions.ai/dsl/)

In this tutorial we leverage the capabilities of the [Dimcli library](https://github.com/lambdamusic/dimcli) in the context of Jupyter Notebooks. Dimcli is an open source Python library that simplifies common operations like logging in, querying and displaying results. 


### Prerequisites

This notebook assumes you have installed the [Dimcli](https://pypi.org/project/dimcli/) library and are familiar with the *Getting Started* tutorial.


In [2]:
!pip install dimcli -U --quiet 

import dimcli
from dimcli.utils import *
import sys

print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  KEY = getpass.getpass(prompt='API Key: ')  
  dimcli.login(KEY, ENDPOINT)
else:
  KEY = ""
  dimcli.login(KEY, ENDPOINT)
dsl = dimcli.Dsl()

==
Logging in..
[2mDimcli - Dimensions API Client (v0.8.2)[0m
[2mConnected to: https://app.dimensions.ai - DSL v1.28[0m
[2mMethod: dsl.ini file[0m


## What the query statistics refer to

When performing a DSL search, a `_stats` object is return which contains some useful info eg the total number of records available for a search. 

In [2]:
res1 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications""", verbose=False)
print(res1.stats) # PS this is short for `res.json['_stats'])`

{'total_count': 3727}




It is important to note though that the **total number always refers to the main source** one is searching for, not necessarily the results being returned. For example, in this query we return `researchers` linked to publications: 

In [3]:
res2 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers""", verbose=False)
print(res2.stats)

{'total_count': 3727}


Still 3815 records! That's because the total count always refers to the main object type one is searching for, not to the *facet* being returned. 


Tip: this basic information about objects returned is also available via the `count_batch` and `count_total` methods of the query results object.

In [4]:
result = dsl.query("""
     search publications
       for "malaria AND congo"
     return publications[basics]
     limit 30
""", verbose=False)
# print some stats using the Result object
print("Results in this batch: ", result.count_batch)
print("Results in total: ", result.count_total)
print("Errors: ",result.errors)

Results in this batch:  30
Results in total:  71812
Errors:  None


## Working with fields

Note: in the following examples we use the magic command `%%dsldf` for quicker querying. 

### Control the fields you return

In [5]:
%%dsldf 

search publications
return publications[id+title+year+doi]
limit 5

Returned Publications: 5 (total = 112275334)
[2mTime: 1.41s[0m


Unnamed: 0,title,year,id,doi
0,Literature,2020,pub.1125632078,10.1515/9783110823547-013
1,To start or to complete? – Challenges in imple...,2020,pub.1124099280,10.1080/16549716.2019.1704540
2,Long-term trends in seasonality of mortality i...,2020,pub.1124649186,10.1080/16549716.2020.1717411
3,"Eine Warnung an alle, dy sych etwaz duncken: D...",2020,pub.1125632729,10.1515/9783110950762-012
4,Marienklagen und Pietà,2020,pub.1125635978,10.1515/9783110922035-011


### Make a mistake, and the DSL will tell you what fields that you could have used

In [6]:
%%dsldf 

search publications 
return publications[dois]
limit 100

Returned Errors: 1
[2mTime: 0.45s[0m
Semantic Error
Semantic errors found:
	Field / Fieldset 'dois' is not present in Source 'publications'. Available fields: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,altmetric,altmetric_id,author_affiliations,authors,book_doi,book_series_title,book_title,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,concepts,concepts_scores,date,date_inserted,dimensions_url,doi,field_citation_ratio,funder_countries,funders,id,issn,issue,journal,journal_lists,linkout,mesh_terms,open_access,open_access_categories,pages,pmcid,pmid,proceedings_title,publisher,recent_citations,reference_ids,referenced_pubs,references,relative_citation_ratio,research_org_cities,research_org_countries,research_org_country_names,research_org_names,research_org_state_codes,research_org_state_names,research_orgs,researchers,resulting_publication_doi,supporting_grant_ids,terms,times_cit

### Get all fields

In [7]:
%%dsldf 

search publications 
  for "malaria"
return publications[all]
limit 1

Returned Publications: 1 (total = 786126)
[2mTime: 0.92s[0m
Field 'references' is deprecated in favor of reference_ids. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'terms' is deprecated in favor of concepts. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'RCDC' is deprecated in favor of category_rcdc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_RAC' is deprecated in favor of category_hrcs_rac. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'category_ua' is deprecated in favor of category_uoa. Please refer to https://docs.dimensions.ai/ds

Unnamed: 0,title,terms,type,recent_citations,referenced_pubs,year,category_ua,research_org_countries,research_org_cities,linkout,...,category_uoa,publisher,RCDC,category_hra,altmetric_id,concepts,pmid,research_org_state_names,journal.id,journal.title
0,Long-term trends in seasonality of mortality i...,"[patterns, mortality, Sub-Saharan Africa, chan...",article,1,"[{'id': 'pub.1070577469', 'doi': '10.2307/4148...",2020,"[{'id': '30002', 'name': 'A02 Public Health, H...","[{'id': 'US', 'name': 'United States'}, {'id':...","[{'id': 2792073, 'name': 'Louvain-la-Neuve'}, ...",https://www.tandfonline.com/doi/pdf/10.1080/16...,...,"[{'id': '30002', 'name': 'A02 Public Health, H...",Taylor & Francis,"[{'id': '547', 'name': 'Pediatric'}]","[{'id': '3903', 'name': 'Population & Society'}]",75135566,"[cause-specific mortality, cause mortality, ep...",32027239,[New Jersey],jour.1041075,Global Health Action


## Full text search

You can search for full text in the full text, in abstracts or in the title only.

In [8]:
%dsldf search publications in concepts for "situ detection OR malaria" return publications

Returned Publications: 20 (total = 201452)
[2mTime: 1.08s[0m


Unnamed: 0,title,pages,author_affiliations,year,issue,id,type,volume,journal.id,journal.title
0,Breeding soundness examination and herd profic...,840-855,"[[{'first_name': 'Víctor Fernando', 'last_name...",2020,1.0,pub.1130093974,article,19,jour.1032284,Italian Journal of Animal Science
1,In‐situ sol–gel synthesis of zirconia networks...,49506,"[[{'first_name': 'Mohammad Hossein', 'last_nam...",2020,46.0,pub.1128193344,article,137,jour.1135048,Journal of Applied Polymer Science
2,Hierarchical NiCo2O4-MnO x -NF monolithic cata...,147485,"[[{'first_name': 'Dongdong', 'last_name': 'Wan...",2020,,pub.1130006001,article,532,jour.1038686,Applied Surface Science
3,Nanostructured selenium-doped biphasic calcium...,13738,"[[{'first_name': 'Lei', 'last_name': 'Nie', 'c...",2020,1.0,pub.1130064403,article,10,jour.1045337,Scientific Reports
4,MRI of prostatic urethral mucinous urothelial ...,68-70,"[[{'first_name': 'Neel', 'last_name': 'Patel',...",2020,,pub.1128497262,article,68,jour.1087975,Clinical Imaging
5,In vitro modeling of the neurovascular unit: a...,22,"[[{'first_name': 'Aditya', 'last_name': 'Bhale...",2020,1.0,pub.1125684242,article,17,jour.1034919,Fluids and Barriers of the CNS
6,Bacterial cellulose: From production optimizat...,2598-2611,"[[{'first_name': 'Isabela', 'last_name': 'de A...",2020,,pub.1129823711,article,164,jour.1090368,International Journal of Biological Macromolec...
7,Hydrogen sulfide removal from geothermal fluid...,21,"[[{'first_name': 'S.', 'last_name': 'Regenspur...",2020,1.0,pub.1129357222,article,8,jour.1136002,Geothermal Energy
8,Manufacturing and characterization of in-situ ...,101436,"[[{'first_name': 'A.M.', 'last_name': 'Vilarde...",2020,,pub.1128995596,article,36,jour.1147513,Additive Manufacturing
9,Relative survival in early-stage cancers in th...,49,"[[{'first_name': 'Avinash G.', 'last_name': 'D...",2020,1.0,pub.1127549599,article,13,jour.1039771,Journal of Hematology & Oncology


In [9]:
%%dsldf 

search publications in title_abstract_only for "nanotechnology"
return publications
limit 3

Returned Publications: 3 (total = 78065)
[2mTime: 0.60s[0m


Unnamed: 0,title,pages,author_affiliations,year,issue,id,type,volume,journal.id,journal.title
0,The inventions in nanotechnologies as practica...,719-729,"[[{'first_name': 'L.A.', 'last_name': 'Ivanov'...",2020,6,pub.1125757275,article,11,jour.1153140,Nanotechnologies in Construction A Scientific ...
1,The Development of Antibiotics Based on Nanost...,7618-7628,"[[{'first_name': 'Ayesha', 'last_name': 'Taj',...",2020,12,pub.1129669026,article,20,jour.1297328,Journal of Nanoscience and Nanotechnology
2,"Interactions Between Remdesivir, Ribavirin, Fa...",7311-7323,"[[{'first_name': 'Tiago da', 'last_name': 'Sil...",2020,12,pub.1129669004,article,20,jour.1297328,Journal of Nanoscience and Nanotechnology


### A simple author search


In [10]:
%%dsldf 

search publications in authors for "\"Daniel Hook\""
return publications
limit 10

Returned Publications: 10 (total = 78)
[2mTime: 0.55s[0m


Unnamed: 0,id,title,volume,issue,pages,type,year,author_affiliations,journal.id,journal.title
0,pub.1124226668,Dimensions: Bringing down barriers between sci...,1.0,1.0,387-395,article,2020,"[[{'first_name': 'Christian', 'last_name': 'He...",jour.1377615,Quantitative Science Studies
1,pub.1115957159,"Perception, prestige and PageRank",14.0,5.0,e0216783,article,2019,"[[{'first_name': 'David', 'last_name': 'Zeitly...",jour.1037553,PLoS ONE
2,pub.1119449118,The Price of Gold: Curiosity?,,,,preprint,2019,"[[{'first_name': 'Daniel W.', 'last_name': 'Ho...",jour.1371339,arXiv
3,pub.1118864658,"Perception, Prestige and PageRank",,,,preprint,2019,"[[{'first_name': 'David', 'last_name': 'Zeitly...",jour.1371339,arXiv
4,pub.1108567148,PT Symmetry,,,,monograph,2019,"[[{'first_name': 'Carl M', 'last_name': 'Bende...",,
5,pub.1111011264,Optical Fiber Sensor Design for Ground Slope M...,0.0,,1-4,proceeding,2018,"[[{'first_name': 'Daniel', 'last_name': 'Hook'...",,
6,pub.1106289502,Dimensions: Building Context for Search and Ev...,3.0,,23,article,2018,"[[{'first_name': 'Daniel W.', 'last_name': 'Ho...",jour.1292498,Frontiers in Research Metrics and Analytics
7,pub.1105321123,Assessment of the interaction between a natura...,41.0,,s45,article,2018,"[[{'first_name': 'Daniel', 'last_name': 'Hook'...",jour.1091476,Contact Lens and Anterior Eye
8,pub.1085413261,Characterization and quantitation of PVP conte...,106.0,3.0,1064-1072,article,2018,"[[{'first_name': 'Andrew J.', 'last_name': 'Ho...",jour.1312091,Journal of Biomedical Materials Research Part ...
9,pub.1085511076,Behavior of eigenvalues in a region of broken ...,95.0,5.0,052113,article,2017,"[[{'first_name': 'Carl M.', 'last_name': 'Bend...",jour.1053349,Physical Review A


### ..or search for a researcher by a specific id

In [11]:
%%dsldf 

search publications 
where researchers.id = "ur.013514345521.07"
return publications[doi+researchers]
limit 1

Returned Publications: 1 (total = 16)
[2mTime: 0.48s[0m


Unnamed: 0,doi,researchers
0,10.1038/s41385-020-0334-2,"[{'id': 'ur.015441462403.62', 'last_name': 'Be..."


## Sources VS Facets
One of the queries above is using the `researchers` facet of the `publications` source. 

In general source-queries can return up to 1000 records. For example this throws an exception:

In [12]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 2000
  """)

Returned Errors: 1
[2mTime: 0.45s[0m
Semantic Error
Semantic errors found:
	Limit 2000 exceeds maximum allowed limit 1000


<dimcli.DslDataset object #4523680624. Errors: 1>

### You can paginate through *source* results up to 50000 rows

With [sources](https://docs.dimensions.ai/dsl/data-sources.html), you can use the [limit/skip syntax](https://docs.dimensions.ai/dsl/language.html#paginating-results) in order to paginate through results:

In [13]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 1000 skip 1000
  """)

Returned Publications: 1000 (total = 3727)
[2mTime: 2.38s[0m


<dimcli.DslDataset object #4506154128. Records: 1000/3727>

### You can return max 1000 `facet` rows

It is important to remember that [when using facets](https://docs.dimensions.ai/dsl/language.html#returning-facets) you cannot use the *skip* operation so the maximum number of records is always 1000. 


In [14]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1 skip 1000
  """)

Returned Errors: 1
[2mTime: 0.44s[0m
Semantic Error
Semantic errors found:
	Offset is not supported for facet results


<dimcli.DslDataset object #4770861936. Errors: 1>

While this works...

In [15]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1000
  """)

Returned Researchers: 1000
[2mTime: 1.64s[0m


<dimcli.DslDataset object #4523681536. Records: 1000/3727>

### Just make a mistake, and you will ge the complete list of available facets

In [16]:
dsl.query("""
search publications 
return years 
""")

Returned Errors: 1
[2mTime: 0.46s[0m
Semantic Error
Semantic errors found:
	Facet 'years' is not present in source 'publications'. Available facets are: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,experts,funder_countries,funders,journal,journal_lists,mesh_terms,open_access_categories,pf01,publisher,referenced_pubs,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,times_cited,type,year


<dimcli.DslDataset object #4779131712. Errors: 1>