# Exploring The Dimensions Search Language (DSL) - Quick Intro

This Notebook takes you through the basics of using the Dimensions API.  

> See also: [official DSL documentation online](https://docs.dimensions.ai/dsl/)

In this tutorial we leverage the capabilities of the [Dimcli library](https://github.com/lambdamusic/dimcli) in the context of Jupyter Notebooks. Dimcli is an open source Python library that simplifies common operations like logging in, querying and displaying results. 


### Prerequisites

This notebook assumes you have installed the [Dimcli](https://pypi.org/project/dimcli/) library and are familiar with the *Getting Started* tutorial.

In [1]:
!pip install dimcli -U --quiet

In [2]:
username = "" 
password = "" 
endpoint = "https://app.dimensions.ai" 

# import all libraries and login
import dimcli
dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()

Dimcli - Dimensions API Client (v0.6.9)
Connected to endpoint: https://app.dimensions.ai - DSL version: 1.24
Method: dsl.ini file


## What the query statistics refer to

When performing a DSL search, a `_stats` object is return which contains some useful info eg the total number of records available for a search. 

In [3]:
res1 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications""", verbose=False)
print(res1.stats) # PS this is short for `res.json['_stats'])`

{'total_count': 3769}




It is important to note though that the **total number always refers to the main source** one is searching for, not necessarily the results being returned. For example, in this query we return `researchers` linked to publications: 

In [4]:
res2 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers""", verbose=False)
print(res2.stats)

{'total_count': 3769}


Still 3815 records! That's because the total count always refers to the main object type one is searching for, not to the *facet* being returned. 


Tip: this basic information about objects returned is also available via the `count_batch` and `count_total` methods of the query results object.

In [5]:
result = dsl.query("""
     search publications
       for "malaria AND congo"
     return publications[basics]
     limit 30
""", verbose=False)
# print some stats using the Result object
print("Results in this batch: ", result.count_batch)
print("Results in total: ", result.count_total)
print("Errors: ",result.errors)

Results in this batch:  30
Results in total:  66828
Errors:  None


## Working with fields

Note: in the following examples we use the magic command `%%dsldf` for quicker querying. 

### Control the fields you return

In [6]:
%%dsldf 

search publications
return publications[id+title+year+doi]
limit 5

Returned Publications: 5 (total = 109848482)


Unnamed: 0,title,doi,year,id
0,Visual research on the trustability of classic...,10.15672/hujms.630402,2020,pub.1125931386
1,"5. ‘Martyrs of Love’. Genesis, Development and...",10.1515/9789048540211-008,2020,pub.1125801610
2,"Introduction: Murra, Materialism, Anthropology...",10.7591/9781501734977-002,2020,pub.1125788851
3,22. Structure and application of the slanting ...,10.7591/9781501737688-031,2020,pub.1125789246
4,4. Perpetual Contest,10.1515/9789048540211-007,2020,pub.1125801609


### Make a mistake, and the DSL will tell you what fields that you could have used

In [7]:
%%dsldf 

search publications 
return publications[dois]
limit 100

Returned Errors: 1
Semantic Error
Semantic errors found:
	Field / Fieldset 'dois' is not present in Source 'publications'. Available fields: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,altmetric,altmetric_id,author_affiliations,authors,book_doi,book_series_title,book_title,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,concepts,date,date_inserted,doi,field_citation_ratio,funder_countries,funders,id,issn,issue,journal,journal_lists,linkout,mesh_terms,open_access,open_access_categories,pages,pmcid,pmid,proceedings_title,publisher,recent_citations,reference_ids,references,relative_citation_ratio,research_org_cities,research_org_countries,research_org_country_names,research_org_names,research_org_state_codes,research_org_state_names,research_orgs,researchers,resulting_publication_doi,supporting_grant_ids,terms,times_cited,title,type,volume,year and available fieldsets: all,basics,book,

### Get all fields

In [8]:
%%dsldf 

search publications 
  for "malaria"
return publications[all]
limit 1

Returned Publications: 1 (total = 756489)
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'terms' is deprecated in favor of concepts. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'RCDC' is deprecated in favor of category_rcdc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'open_access' is deprecated in favor of open_access_categories. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'references' is deprecated in favor of reference_ids. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_RAC' is deprecated in favor of category_hrcs_rac. Please refer to https://docs.dimensions.ai/dsl/releasen

Unnamed: 0,doi,open_access_categories,volume,date,date_inserted,recent_citations,type,category_sdg,altmetric,year,...,pmcid,title,research_org_country_names,FOR_first,pages,FOR,pmid,id,journal.id,journal.title
0,10.1080/16549716.2019.1711335,"[{'id': 'oa_all', 'description': 'Article is f...",13,2020-12-31,2020-01-21,0,article,[],13.0,2020,...,PMC7006634,The gender responsiveness of social marketing ...,[Switzerland],"[{'id': '2211', 'name': '11 Medical and Health...",1711335,"[{'id': '3177', 'name': '1117 Public Health an...",31955668,pub.1124196727,jour.1041075,Global Health Action


## Full text search

You can search for full text in the full text, in abstracts or in the title only.

In [9]:
%dsldf search publications in concepts for "situ detection OR malaria" return publications

Returned Publications: 20 (total = 130031)


Unnamed: 0,author_affiliations,issue,year,type,title,id,pages,volume,journal.id,journal.title
0,"[[{'first_name': 'Jaco J.', 'last_name': 'Geuc...",1,2020,article,Unravelling three-dimensional adsorption geome...,pub.1125315284,28,3,jour.1319511,Communications Chemistry
1,"[[{'first_name': 'Shawn-Yu', 'last_name': 'Lin...",1,2020,article,An In-situ and Direct Confirmation of Super-Pl...,pub.1125822283,5209,10,jour.1045337,Scientific Reports
2,"[[{'first_name': 'Jiajun', 'last_name': 'Lu', ...",1,2020,article,Electric Field-Modulated Surface Enhanced Rama...,pub.1125830666,5269,10,jour.1045337,Scientific Reports
3,"[[{'first_name': 'Andrés', 'last_name': 'Gonza...",1,2020,article,Ninhydrin reaction with phenylethylamine: unav...,pub.1125164078,49,132,jour.1048512,Journal of Chemical Sciences
4,"[[{'first_name': 'Xiaokang', 'last_name': 'Wan...",1,2020,article,Mechanical breathing in organic electrochromics,pub.1124004513,211,11,jour.1043282,Nature Communications
5,"[[{'first_name': 'Garth R.', 'last_name': 'Ils...",1,2020,article,Finding cell-specific expression patterns in t...,pub.1125710140,4961,10,jour.1045337,Scientific Reports
6,"[[{'first_name': 'Gökhan', 'last_name': 'Gizer...",1,2020,article,Improved kinetic behaviour of Mg(NH2)2-2LiH do...,pub.1123884363,8,10,jour.1045337,Scientific Reports
7,"[[{'first_name': 'Aditya', 'last_name': 'Bhale...",1,2020,article,In vitro modeling of the neurovascular unit: a...,pub.1125684242,22,17,jour.1034919,Fluids and Barriers of the CNS
8,"[[{'first_name': 'Arthur', 'last_name': 'Leis'...",1,2020,article,Room temperature in-situ measurement of the sp...,pub.1124935086,2816,10,jour.1045337,Scientific Reports
9,"[[{'first_name': 'Greta', 'last_name': 'Giljan...",1,2020,article,Bacterioplankton reveal years-long retention o...,pub.1125615963,4715,10,jour.1045337,Scientific Reports


In [10]:
%%dsldf 

search publications in title_abstract_only for "nanotechnology"
return publications
limit 3

Returned Publications: 3 (total = 75811)


Unnamed: 0,title,author_affiliations,volume,issue,pages,type,year,id,journal.id,journal.title
0,The inventions in nanotechnologies as practica...,"[[{'first_name': 'L.A.', 'last_name': 'Ivanov'...",11,6,719-729,article,2020,pub.1125757275,jour.1153140,Nanotechnologies in Construction A Scientific ...
1,"Phyto-Synthesis, Characterization and Biologic...","[[{'first_name': 'Hamed A.', 'last_name': 'Ghr...",9,12,1628-1634,article,2020,pub.1127485410,jour.1047400,Journal of Biomaterials and Tissue Engineering
2,Exploring the configuration spaces of surface ...,"[[{'first_name': 'Daniel M.', 'last_name': 'Pa...",10,1,5868,article,2020,pub.1126103870,jour.1045337,Scientific Reports


### A simple author search


In [11]:
%%dsldf 

search publications in authors for "\"Daniel Hook\""
return publications
limit 10

Returned Publications: 10 (total = 78)


Unnamed: 0,id,title,volume,author_affiliations,type,year,issue,pages,journal.id,journal.title
0,pub.1124226668,Dimensions: Bringing down barriers between sci...,1.0,"[[{'first_name': 'Christian', 'last_name': 'He...",article,2020,1.0,387-395,jour.1377615,Quantitative Science Studies
1,pub.1115957159,"Perception, prestige and PageRank",14.0,"[[{'first_name': 'David', 'last_name': 'Zeitly...",article,2019,5.0,e0216783,jour.1037553,PLoS ONE
2,pub.1119449118,The Price of Gold: Curiosity?,,"[[{'first_name': 'Daniel W.', 'last_name': 'Ho...",preprint,2019,,,jour.1371339,arXiv
3,pub.1118864658,"Perception, Prestige and PageRank",,"[[{'first_name': 'David', 'last_name': 'Zeitly...",preprint,2019,,,jour.1371339,arXiv
4,pub.1108567148,PT Symmetry,,"[[{'first_name': 'Carl M', 'last_name': 'Bende...",monograph,2019,,,,
5,pub.1111011264,Optical Fiber Sensor Design for Ground Slope M...,0.0,"[[{'first_name': 'Daniel', 'last_name': 'Hook'...",proceeding,2018,,1-4,,
6,pub.1106289502,Dimensions: Building Context for Search and Ev...,3.0,"[[{'first_name': 'Daniel W.', 'last_name': 'Ho...",article,2018,,23,jour.1292498,Frontiers in Research Metrics and Analytics
7,pub.1105321123,Assessment of the interaction between a natura...,41.0,"[[{'first_name': 'Daniel', 'last_name': 'Hook'...",article,2018,,s45,jour.1091476,Contact Lens and Anterior Eye
8,pub.1085413261,Characterization and quantitation of PVP conte...,106.0,"[[{'first_name': 'Andrew J.', 'last_name': 'Ho...",article,2018,3.0,1064-1072,jour.1312091,Journal of Biomedical Materials Research Part ...
9,pub.1085511076,Behavior of eigenvalues in a region of broken ...,95.0,"[[{'first_name': 'Carl M.', 'last_name': 'Bend...",article,2017,5.0,052113,jour.1053349,Physical Review A


### ..or search for a researcher by a specific id

In [12]:
%%dsldf 

search publications 
where researchers.id = "ur.013514345521.07"
return publications[doi+researchers]
limit 1

Returned Publications: 1 (total = 15)


Unnamed: 0,doi,researchers
0,10.12928/telkomnika.v17i5.12802,"[{'id': 'ur.013505711524.10', 'first_name': 'R..."


## Sources VS Facets
One of the queries above is using the `researchers` facet of the `publications` source. 

In general source-queries can return up to 1000 records. For example this throws an exception:

In [13]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 2000
  """)

Returned Errors: 1
Semantic Error
Semantic errors found:
	Limit 2000 exceeds maximum allowed limit 1000


<dimcli.DslDataset object #4733978448. Errors: 1>

### You can paginate through *source* results up to 50000 rows

With [sources](https://docs.dimensions.ai/dsl/data-sources.html), you can use the [limit/skip syntax](https://docs.dimensions.ai/dsl/language.html#paginating-results) in order to paginate through results:

In [14]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 1000 skip 1000
  """)

Returned Publications: 1000 (total = 3769)


<dimcli.DslDataset object #4748413712. Records: 1000/3769>

### You can return max 1000 `facet` rows

It is important to remember that [when using facets](https://docs.dimensions.ai/dsl/language.html#returning-facets) you cannot use the *skip* operation so the maximum number of records is always 1000. 


In [15]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1 skip 1000
  """)

Returned Errors: 1
Semantic Error
Semantic errors found:
	Offset is not supported for facet results


<dimcli.DslDataset object #4748450064. Errors: 1>

While this works...

In [16]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1000
  """)

Returned Researchers: 1000


<dimcli.DslDataset object #4753798096. Records: 1000/3769>

### Just make a mistake, and you will ge the complete list of available facets

In [17]:
dsl.query("""
search publications 
return years 
""")

Returned Errors: 1
Semantic Error
Semantic errors found:
	Facet 'years' is not present in source 'publications'. Available facets are: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,experts,funder_countries,funders,journal,journal_lists,mesh_terms,open_access_categories,pf01,publisher,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,times_cited,type,year


<dimcli.DslDataset object #4753799888. Errors: 1>