<a href="https://colab.research.google.com/github/digital-science/dimensions-api-lab/blob/master/3-workshops/2019-09-Rome-University-ISSI-conference/1-Exploring-the-Dimensions-Search-Language.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Dimensions API Lab In Google Colab"/></a>

# 1. Exploring The Dimensions Search Language (DSL)

This Notebook takes you through the basics of using the Dimensions API with Jupyter Notebooks. 

> See also: [official DSL documentation online](https://docs.dimensions.ai/dsl/)

In this and the other notebooks in this tutorial we will be using the [Dimcli library](https://github.com/lambdamusic/dimcli). DimCLI is an open source Python library which contains various commands that make it easier to interact with the Dimensions API from Python notebooks.

In [7]:
from dimcli.shortcuts import dslquery

## The Basics

The `dslquery` function works in just the same way as the dimensions API inteface works in the application. 
* submit your query
* get back json results

In [8]:
#A Basic Query  

dslquery("""
     search publications
       where id = "pub.1104296509"
     return publications[all]
     limit 1
""")


Returned Publications: 1 (total = 1)




## The Basics- control the fields you return

In [9]:
#A Basic Query - control the fields you return

dslquery("""
     search publications
     return publications[id+title+year]
     limit 5
""")

Returned Publications: 5 (total = 103377517)


<dimcli.Result object #4792658000. Dict keys: '_stats', 'publications'>

## The Basics- make a mistake, and the DSL will tell you what fields that you could have used

In [10]:
#Make a mistake, and the DSL will tell you what fields are available

badquery = dslquery("""
     search publications 
     return publications[dois]
     limit 100
""")

print(badquery['errors']['query']['details'][0])


Returned Errors: 1
Semantic errors found:
	Field / Fieldset 'dois' is not present in Source 'publications'. Available fields: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,abstract,altmetric,altmetric_id,author_affiliations,authors,book_doi,book_series_title,book_title,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_rcdc,concepts,date,date_inserted,doi,field_citation_ratio,funder_countries,funders,id,issn,issue,journal,journal_lists,linkout,mesh_terms,open_access,open_access_categories,pages,pmcid,pmid,proceedings_title,publisher,recent_citations,reference_ids,references,relative_citation_ratio,research_org_cities,research_org_countries,research_org_country_names,research_org_state_codes,research_org_state_names,research_orgs,researchers,supporting_grant_ids,terms,times_cited,title,type,volume,year and available fieldsets: all,basics,book,categories,extras


In [11]:
#A Basic Query - get all fields

dslquery("""
     search publications 
     return publications[all]
     limit 1
""")

Returned Publications: 1 (total = 103377517)




## The Basics: you can search for full text in the full text or in abstracts

In [12]:
dslquery("""
     search publications for "nanotechnology"
       
     return publications
     limit 1
""")

Returned Publications: 1 (total = 760669)


<dimcli.Result object #4792711056. Dict keys: '_stats', 'publications'>

In [13]:
dslquery("""
     search publications in title_abstract_only for "nanotechnology" 
     return publications
     limit 1
""")

Returned Publications: 1 (total = 40595)


<dimcli.Result object #4792741584. Dict keys: '_stats', 'publications'>

## The Basics - a simple author search


In [14]:

dslquery("""
     search publications in authors for "\\"Daniel Hook\\""
     return publications
     limit 10
""")

Returned Publications: 10 (total = 49)


<dimcli.Result object #4792820432. Dict keys: '_stats', 'publications'>

## ..or search for a researcher by a specific id

In [15]:
#A Basic Query - or request just thoses that you want...

dslquery("""
     search publications 
     where researchers.id = "ur.013514345521.07"
       
     return publications[doi+researchers]
     limit 1
""")

Returned Publications: 1 (total = 6)


<dimcli.Result object #4792748176. Dict keys: '_stats', 'publications'>

## You can also query by facets

In [16]:
dslquery("""
     search publications 
     return FOR_first 
""")

Returned For_first: 20


<dimcli.Result object #4792822928. Dict keys: '_stats', 'FOR_first'>

## You can return up to 1000 facet rows

In [17]:

dslquery("""
     search publications for "nanotechnology"
     return year limit 1000
""")

Returned Year: 59


<dimcli.Result object #4792697936. Dict keys: '_stats', 'year'>

## Just make a mistake, and you will ge the complete list of available facets

In [18]:
dslquery("""
     search publications 
     return years 
""")

Returned Errors: 1


<dimcli.Result object #4791936528. Dict keys: 'errors'>

## Dimcli makes it easier to turn JSON results into Pandas Dataframes

DimCli includes a few utilities that make it easier to transform Dimensions JSON data into Pandas [dataframe objects](https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#dataframe). Dataframes are then easy to sort, analyse, export as CSV and use within visualisation softwares.

>  [pandas](https://pandas.pydata.org/pandas-docs/stable/) is a popular software library written for the Python programming language for data manipulation and analysis.

In [19]:
res = dslquery("""
     search publications for "nanotechnology"
     return publications[doi+title+altmetric] sort by altmetric
     limit 100 
""")

Returned Publications: 100 (total = 760669)


In [20]:
df = res.as_dataframe()
df.head(10)

Unnamed: 0,altmetric,doi,title
0,3728,10.1038/nature21377,Evidence for early life in Earth’s oldest hydr...
1,2330,10.1126/science.aae0061,Emergence of healing in the Antarctic ozone layer
2,2290,10.1038/ncomms15261,A bioprosthetic ovary created using 3D printed...
3,2135,10.1038/s41562-018-0520-3,Extreme opponents of genetically modified food...
4,2104,10.1016/j.pbiomolbio.2018.03.004,Cause of Cambrian Explosion - Terrestrial or c...
5,1966,10.1126/science.aal1579,Observation of the Wigner-Huntington transitio...
6,1831,10.1002/advs.201900344,3D Printing of Personalized Thick and Perfusab...
7,1564,10.1126/science.aam8743,Water harvesting from air with metal-organic f...
8,1540,10.1038/nature23282,Hypothalamic stem cells control ageing speed p...
9,1518,10.1038/s41550-019-0813-0,Enabling Martian habitability with silica aero...


In [21]:
# save the data to a CSV file
df.to_csv("query_results.csv", index=False)

---
# Activities

* Try modifying any of the queries above to see how the result change 
* Open up a terminal window (New > Terminal) and type `dimcli` to launch the query console
    * you will be able to launch queries interactively with autocomplete - which is a handy way to learn the DSL query language via trial and error!  

---
# Want to learn more?

Check out the [Dimensions API Lab](https://digital-science.github.io/dimensions-api-lab/) website, which contains many tutorials and reusable Jupyter notebooks for scholarly data analytics. 