# Exploring The Dimensions Search Language (DSL) - Quick Intro

This Notebook takes you through the basics of using the Dimensions API.  

> See also: [official DSL documentation online](https://docs.dimensions.ai/dsl/)

In this tutorial we leverage the capabilities of the [Dimcli library](https://github.com/lambdamusic/dimcli) in the context of Jupyter Notebooks. Dimcli is an open source Python library that simplifies common operations like logging in, querying and displaying results. 


In [1]:
import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))

==
CHANGELOG
This notebook was last run on Jan 24, 2022
==


### Prerequisites

This notebook assumes you have installed the [Dimcli](https://pypi.org/project/dimcli/) library and are familiar with the ['Getting Started' tutorial](https://api-lab.dimensions.ai/cookbooks/1-getting-started/1-Using-the-Dimcli-library-to-query-the-API.html).

In [1]:
!pip install dimcli -U --quiet 

import dimcli
from dimcli.utils import *
import sys

print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  KEY = getpass.getpass(prompt='API Key: ')  
  dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
  KEY = ""
  dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()

[2mSearching config file credentials for 'https://app.dimensions.ai' endpoint..[0m


==
Logging in..
[2mDimcli - Dimensions API Client (v0.9.6)[0m
[2mConnected to: <https://app.dimensions.ai/api/dsl> - DSL v2.0[0m
[2mMethod: dsl.ini file[0m


## What the query statistics refer to

When performing a DSL search, a `_stats` object is return which contains some useful info eg the total number of records available for a search. 

In [2]:
res1 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications""", verbose=False)
print(res1.stats) # PS this is short for `res.json['_stats'])`

{'total_count': 5807}




It is important to note though that the **total number always refers to the main source** one is searching for, not necessarily the results being returned. For example, in this query we return `researchers` linked to publications: 

In [3]:
res2 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers""", verbose=False)
print(res2.stats)

{'total_count': 5807}


Still 3815 records! That's because the total count always refers to the main object type one is searching for, not to the *facet* being returned. 


Tip: this basic information about objects returned is also available via the `count_batch` and `count_total` methods of the query results object.

In [4]:
result = dsl.query("""
     search publications
       for "malaria AND congo"
     return publications[basics]
     limit 30
""", verbose=False)
# print some stats using the Result object
print("Results in this batch: ", result.count_batch)
print("Results in total: ", result.count_total)
print("Errors: ",result.errors)

Results in this batch:  30
Results in total:  86890
Errors:  None


## Working with fields

Note: in the following examples we use the magic command `%%dsldf` for quicker querying. 

### Control the fields you return

In [5]:
%%dsldf 

search publications
return publications[id+title+year+doi]
limit 5

Returned Publications: 5 (total = 124736479)
[2mTime: 2.29s[0m


Unnamed: 0,doi,id,title,year
0,10.13170/depik.10.3.22492,pub.1144593888,Profile of ectoparasites and biometric conditi...,2022
1,10.1007/s11708-021-0812-6,pub.1144587500,Experimental study of stratified lean burn cha...,2022
2,10.1145/3480027,pub.1141731113,Opportunities and Challenges in Code Search Tools,2022
3,10.1145/3479393,pub.1141731112,Ransomware Mitigation in the Modern Era: A Com...,2022
4,10.1145/3478680,pub.1141731111,Service Computing for Industry 4.0: State of t...,2022


### Make a mistake, and the DSL will tell you what fields that you could have used

In [6]:
%%dsldf 

search publications 
return publications[dois]
limit 100

Returned Errors: 1
[2mTime: 4.06s[0m
1 QueryError found
Semantic errors found:
	Field / Fieldset 'dois' is not present in Source 'publications'. Available fields: abstract,acknowledgements,altmetric,altmetric_id,arxiv_id,authors,authors_count,book_doi,book_series_title,book_title,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_uoa,clinical_trial_ids,concepts,concepts_scores,date,date_inserted,date_online,date_print,dimensions_url,doi,field_citation_ratio,funder_countries,funders,id,issn,issue,journal,journal_lists,journal_title_raw,linkout,mesh_terms,open_access,pages,pmcid,pmid,proceedings_title,publisher,recent_citations,reference_ids,referenced_pubs,relative_citation_ratio,research_org_cities,research_org_countries,research_org_country_names,research_org_names,research_org_state_codes,research_org_state_names,research_orgs,researchers,resulting_publication_doi,source_title,subtitles,su

## Full text search

You can search for full text in the full text, in abstracts or in the title only.

In [8]:
%dsldf search publications in concepts for "situ detection OR malaria" return publications

Returned Publications: 20 (total = 238349)
[2mTime: 2.24s[0m


Unnamed: 0,authors,id,pages,title,type,volume,year,journal.id,journal.title,issue
0,"[{'affiliations': [{'city': 'Qingdao', 'city_i...",pub.1143924946,111-122,In-situ constructing visible light CdS/Cd-MOF ...,article,69,2022,jour.1138885,Particuology,
1,"[{'affiliations': [{'city': 'Johor Bahru', 'ci...",pub.1141511490,27-34,In situ biosynthesized silver nanoparticle-inc...,article,67,2022,jour.1138885,Particuology,
2,"[{'affiliations': [{'city': 'Tianjin', 'city_i...",pub.1141114620,59-70,Thermodynamic and kinetic mechanism of phase t...,article,66,2022,jour.1138885,Particuology,
3,"[{'affiliations': [{'city': 'Montpellier', 'ci...",pub.1144553393,107498,Small angle x-ray scattering to investigate th...,article,127,2022,jour.1096852,Food Hydrocolloids,
4,"[{'affiliations': [{'city': 'Beijing', 'city_i...",pub.1144437185,199-206,Dual-function redox mediator enhanced lithium-...,article,113,2022,jour.1053018,Journal of Material Science and Technology,
5,"[{'affiliations': [{'city': 'Wollongong', 'cit...",pub.1144337486,90-104,Effects of inter-layer remelting frequency on ...,article,113,2022,jour.1053018,Journal of Material Science and Technology,
6,"[{'affiliations': [{'city': 'Taipei', 'city_id...",pub.1144230882,100831,Traditional Chinese medicine attenuates hospit...,article,11,2022,jour.1048721,Integrative Medicine Research,2.0
7,"[{'affiliations': [{'city': 'Wuhan', 'city_id'...",pub.1143825067,1-10,Solar fuel generation over nature-inspired rec...,article,112,2022,jour.1053018,Journal of Material Science and Technology,
8,"[{'affiliations': [{'city': 'Pretoria', 'city_...",pub.1143661252,153-161,Heat-treatment effect on anti-corrosion behavi...,article,5,2022,jour.1319579,International Journal of Lightweight Materials...,2.0
9,"[{'affiliations': [{'city': 'Aachen', 'city_id...",pub.1143575354,100081,Adjustment of chemical composition with dissim...,article,5,2022,jour.1386545,Journal of Advanced Joining Processes,


In [9]:
%%dsldf 

search publications in title_abstract_only for "nanotechnology"
return publications
limit 3

Returned Publications: 3 (total = 98598)
[2mTime: 1.14s[0m


Unnamed: 0,authors,id,pages,title,type,volume,year,journal.id,journal.title,issue
0,"[{'affiliations': [{'city': 'Guangzhou', 'city...",pub.1143936192,334-361,Energetics Systems and artificial intelligence...,article,8,2022,jour.1150945,Energy Reports,
1,[{'affiliations': [{'name': 'CAS Key Laborator...,pub.1143460385,31-48,Toxicity of manufactured nanomaterials,article,69,2022,jour.1138885,Particuology,
2,"[{'affiliations': [{'city': 'Huzhou', 'city_id...",pub.1144622580,978-983,The Effect of Bone Morphogenetic Protein 2 (BM...,article,12,2022,jour.1047400,Journal of Biomaterials and Tissue Engineering,5.0


### A simple author search


In [10]:
%%dsldf 

search publications in authors for "\"Daniel Hook\""
return publications
limit 10

Returned Publications: 10 (total = 85)
[2mTime: 1.14s[0m


Unnamed: 0,authors,id,title,type,year,journal.id,journal.title,issue,pages,volume
0,"[{'affiliations': [], 'corresponding': '', 'cu...",pub.1143968248,Connecting Scientometrics: Dimensions as a rou...,preprint,2021,jour.1371339,arXiv,,,
1,"[{'affiliations': [{'city': 'St Louis', 'city_...",pub.1142152310,PT -symmetric classical mechanics,article,2021,jour.1043366,Journal of Physics Conference Series,1.0,012003,2038.0
2,"[{'affiliations': [{'city': 'Townsville', 'cit...",pub.1141486003,Can I breastfeed my baby with Down syndrome? A...,article,2021,jour.1057714,Journal of Paediatrics and Child Health,12.0,1866-1880,57.0
3,"[{'affiliations': [{'city': 'London', 'city_id...",pub.1137191304,Scaling Scientometrics: Dimensions on Google B...,article,2021,jour.1292498,Frontiers in Research Metrics and Analytics,,656233,6.0
4,"[{'affiliations': [], 'corresponding': '', 'cu...",pub.1136235066,$PT$-symmetric classical mechanics,preprint,2021,jour.1371339,arXiv,,,
5,"[{'affiliations': [], 'corresponding': '', 'cu...",pub.1134860042,Scaling Scientometrics: Dimensions on Google B...,preprint,2021,jour.1371339,arXiv,,,
6,"[{'affiliations': [{'city': 'London', 'city_id...",pub.1134491856,Real-Time Bibliometrics: Dimensions as a Resou...,article,2021,jour.1292498,Frontiers in Research Metrics and Analytics,,595299,5.0
7,[{'affiliations': [{'name': 'Digital Science'}...,pub.1124226668,Dimensions: Bringing down barriers between sci...,article,2020,jour.1377615,Quantitative Science Studies,1.0,387-395,1.0
8,"[{'affiliations': [{'city': 'Oxford', 'city_id...",pub.1115957159,"Perception, prestige and PageRank",article,2019,jour.1037553,PLOS ONE,5.0,e0216783,14.0
9,"[{'affiliations': [], 'corresponding': '', 'cu...",pub.1119449118,The Price of Gold: Curiosity?,preprint,2019,jour.1371339,arXiv,,,


### ..or search for a researcher by a specific id

In [11]:
%%dsldf 

search publications 
where researchers.id = "ur.013514345521.07"
return publications[doi+researchers]
limit 1

Returned Publications: 1 (total = 22)
[2mTime: 2.68s[0m


Unnamed: 0,doi,researchers
0,10.1201/9781003042570-10,"[{'first_name': 'Rashi', 'id': 'ur.01001350755..."


## Sources VS Facets
One of the queries above is using the `researchers` facet of the `publications` source. 

In general source-queries can return up to 1000 records. For example this throws an exception:

In [12]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 2000
  """)

Returned Errors: 1
[2mTime: 0.57s[0m
1 QueryError found
Semantic errors found:
	Limit 2000 exceeds maximum allowed limit 1000


<dimcli.DslDataset object #4812964912. Errors: 1>

### You can paginate through *source* results up to 50000 rows

With [sources](https://docs.dimensions.ai/dsl/data-sources.html), you can use the [limit/skip syntax](https://docs.dimensions.ai/dsl/language.html#paginating-results) in order to paginate through results:

In [13]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 1000 skip 1000
  """)

Returned Publications: 1000 (total = 5807)
[2mTime: 2.40s[0m


<dimcli.DslDataset object #4407315520. Records: 1000/5807>

### You can return max 1000 `facet` rows

It is important to remember that [when using facets](https://docs.dimensions.ai/dsl/language.html#returning-facets) you cannot use the *skip* operation so the maximum number of records is always 1000. 


In [14]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1 skip 1000
  """)

Returned Errors: 1
[2mTime: 0.95s[0m
1 QueryError found
Semantic errors found:
	Offset is not supported for facet results


<dimcli.DslDataset object #4811599632. Errors: 1>

While this works...

In [15]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1000
  """)

Returned Researchers: 1000
[2mTime: 2.94s[0m


<dimcli.DslDataset object #4811691728. Records: 1000/5807>

### Just make a mistake, and you will ge the complete list of available facets

In [16]:
dsl.query("""
search publications 
return years 
""")

Returned Errors: 1
[2mTime: 0.74s[0m
1 QueryError found
Semantic errors found:
	Facet 'years' is not present in source 'publications'. Available facets are: authors_count,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_uoa,funder_countries,funders,journal,journal_lists,mesh_terms,open_access,publisher,referenced_pubs,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,source_title,times_cited,type,year


<dimcli.DslDataset object #4811597088. Errors: 1>