<a href="https://colab.research.google.com/github/digital-science/dimensions-api-lab/blob/master/3-workshops/2019-09-Rome-University-ISSI-conference/1-Exploring-the-Dimensions-Search-Language.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Dimensions API Lab In Google Colab"/></a>

# 1. Exploring The Dimensions Search Language (DSL)

This Notebook takes you through the basics of using the Dimensions API.  

> See also: [official DSL documentation online](https://docs.dimensions.ai/dsl/)

In this tutorial we leverage the capabilities of the [Dimcli library](https://github.com/lambdamusic/dimcli) in the context of Jupyter Notebooks. Dimcli is an open source Python library that simplifies common operations like logging in, querying and displaying results. 

Furthemore, we show how the data returned from the API can be explored  interactively as data visualizations, by using the freely available [plotly_express](https://plot.ly/python/plotly-express/) library. Data visualizations permit to highlight existing patterns in the data and to develop new insights.  

Have fun!

## Prerequisites: install the libraries and login

In [0]:
# if you haven't installed it already or are in Google Colab, run this cell!
!pip install dimcli plotly_express -U --quiet 

[?25l[K     |██▉                             | 10kB 20.8MB/s eta 0:00:01[K     |█████▊                          | 20kB 1.8MB/s eta 0:00:01[K     |████████▌                       | 30kB 2.6MB/s eta 0:00:01[K     |███████████▍                    | 40kB 1.7MB/s eta 0:00:01[K     |██████████████▎                 | 51kB 2.1MB/s eta 0:00:01[K     |█████████████████               | 61kB 2.5MB/s eta 0:00:01[K     |████████████████████            | 71kB 2.9MB/s eta 0:00:01[K     |██████████████████████▉         | 81kB 3.3MB/s eta 0:00:01[K     |█████████████████████████▋      | 92kB 3.7MB/s eta 0:00:01[K     |████████████████████████████▌   | 102kB 2.8MB/s eta 0:00:01[K     |███████████████████████████████▍| 112kB 2.8MB/s eta 0:00:01[K     |████████████████████████████████| 122kB 2.8MB/s 
[?25h

In [0]:
username = "" 
password = "" 
endpoint = "https://app.dimensions.ai" 

# import all libraries and login
import dimcli
import plotly_express as px
dimcli.login(username, password)
dsl = dimcli.Dsl()

DimCli v0.6.1 - Succesfully connected to <https://app.dimensions.ai> (method: manual login)


## 1. The Basics: interacting with the API

The `dsl.query` function works in just the same way as the dimensions API inteface works in the application. 
* submit your query
* get back a Dimcli.Result object - essentially a wrapper for the results that contains also the JSON payload

In [0]:
#A Basic Query  

dsl.query("""
     search publications
       where doi in ["10.1080/0194262X.2016.1181023", "10.1007/3-540-69728-4", "10.1007/978-3-319-91473-2_1"]
     return publications[basics]
     limit 1
""").json


### Exploring query results

The dimcli.Result object contains method to access quickly statistics about the data obtained:

In [0]:
result = dsl.query("""
     search publications
       where doi in ["10.1080/0194262X.2016.1181023", "10.1007/3-540-69728-4", "10.1007/978-3-319-91473-2_1"]
     return publications[basics]
     limit 30
""", verbose=False)
# print some stats using the Result object
print("Results in this batch: ", result.count_batch)
print("Results in total: ", result.count_total)
print("Errors: ",result.errors)

Results in this batch:  3
Results in total:  3
Errors:  None


### Getting Pandas dataframes

DimCli includes a few utilities that make it easier to transform Dimensions JSON data into Pandas [dataframe objects](https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#dataframe). 

Dataframes are then easy to sort, analyse, export as CSV and use within visualisation softwares.

>  [pandas](https://pandas.pydata.org/pandas-docs/stable/) is a popular software library written for the Python programming language for data manipulation and analysis.

In [0]:
df = result.as_dataframe()
df.head()

Unnamed: 0,title,volume,author_affiliations,pages,year,type,id,issue,journal.id,journal.title
0,A Bibliometric Analysis of the Explainable Art...,853.0,"[[{'first_name': 'Jose M.', 'last_name': 'Alon...",3-15,2018,chapter,pub.1104043086,,,
1,Artificial Intelligence Research in India: A S...,35.0,"[[{'first_name': 'Rishabh', 'last_name': 'Shri...",136-151,2016,article,pub.1003593704,2.0,jour.1122594,Science & Technology Libraries
2,Visualizing the Structure of Science,,"[[{'first_name': 'Benjamín', 'last_name': 'Var...",,2007,monograph,pub.1023026484,,,


In [0]:
# the 'value_counts' method returns the distribution of a specific field eg publication [years]
df['year'].value_counts()

2007    1
2018    1
2016    1
Name: year, dtype: int64

In [0]:
authors = result.as_dataframe_authors_affiliations()
authors.head()

Unnamed: 0,aff_id,aff_name,aff_city,aff_city_id,aff_country,aff_country_code,aff_state,aff_state_code,pub_id,researcher_id,first_name,last_name
0,grid.11794.3a,University of Santiago de Compostela,Santiago de Compostela,3109642,Spain,ES,,,pub.1104043086,ur.012624556256.28,Jose M.,Alonso
1,grid.7644.1,University of Bari Aldo Moro,Bari,3182351,Italy,IT,,,pub.1104043086,ur.012335351107.59,Ciro,Castiello
2,grid.7644.1,University of Bari Aldo Moro,Bari,3182351,Italy,IT,,,pub.1104043086,ur.013770463243.26,Corrado,Mencar


### Query shortcuts allow to test things out quickly

DimCli includes a few [Python magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html) which make it much easier to interrogate the API.

In [0]:
%dsldocs grants

Unnamed: 0,sources,field,type,description,is_filter,is_entity,is_facet
0,grants,FOR,categories,`ANZSRC Fields of Research classification <htt...,True,True,True
1,grants,FOR_first,categories,`ANZSRC Fields of Research classification <htt...,True,True,True
2,grants,abstract,text,Abstract or summary from a grant proposal.,False,False,False
3,grants,active_year,integer,List of active years for a grant.,True,False,True
4,grants,category_bra,categories,`Broad Research Areas <https://app.dimensions....,True,True,True
5,grants,category_hra,categories,`Health Research Areas <https://app.dimensions...,True,True,True
6,grants,category_hrcs_hc,categories,`HRCS - Health Categories <https://app.dimensi...,True,True,True
7,grants,category_hrcs_rac,categories,`HRCS – Research Activity Codes <https://app.d...,True,True,True
8,grants,category_rcdc,categories,"`Research, Condition, and Disease Categorizati...",True,True,True
9,grants,concepts,text,Concepts describing the main topics of a grant...,False,False,False


The results of a 'magic' command is always stored in a variable called `dsl_last_results` (note: only the most recent query results get saved). 

In [0]:
dsl_last_results.count_total

708

Similarly, the `%%dsldf` magic returns a dataframe right away.

In [0]:
%%dsldf 
search publications 
  where doi in ["10.1080/0194262X.2016.1181023", "10.1007/3-540-69728-4", "10.1007/978-3-319-91473-2_1"]
return publications[basics]
limit 3

Returned Publications: 3 (total = 3)
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,title,volume,author_affiliations,pages,year,type,id,issue,journal.id,journal.title
0,A Bibliometric Analysis of the Explainable Art...,853.0,"[[{'first_name': 'Jose M.', 'last_name': 'Alon...",3-15,2018,chapter,pub.1104043086,,,
1,Artificial Intelligence Research in India: A S...,35.0,"[[{'first_name': 'Rishabh', 'last_name': 'Shri...",136-151,2016,article,pub.1003593704,2.0,jour.1122594,Science & Technology Libraries
2,Visualizing the Structure of Science,,"[[{'first_name': 'Benjamín', 'last_name': 'Var...",,2007,monograph,pub.1023026484,,,


## 2. Exploring the Dimensions Search Language (DSL)

In this section we'll take a look at the most important features of the Dimensions Search Language. 



### Control the fields you return

In [0]:
%%dsldf 

search publications
return publications[id+title+year+doi]
limit 5

Returned Publications: 5 (total = 106004287)


Unnamed: 0,doi,id,title,year
0,10.7312/alti19184,pub.1122051566,Women Mobilizing Memory,2020
1,10.7312/alti19184-022,pub.1122051588,CHAPTER XX. Making Memory,2020
2,10.7312/alti19184-010,pub.1122051576,CHAPTER VIII. Aquí,2020
3,10.7312/alti19184-011,pub.1122051577,CHAPTER IX. #NiUnaMenos (#NotOneWomanLess,2020
4,10.7312/alti19184-015,pub.1122051581,CHAPTER XIII. Instilling Interference,2020


### Make a mistake, and the DSL will tell you what fields that you could have used

In [0]:
%%dsldf 

search publications 
return publications[dois]
limit 100

Returned Errors: 1
Semantic Error
Semantic errors found:
	Field / Fieldset 'dois' is not present in Source 'publications'. Available fields: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,abstract,altmetric,altmetric_id,author_affiliations,authors,book_doi,book_series_title,book_title,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_rcdc,concepts,date,date_inserted,doi,field_citation_ratio,funder_countries,funders,id,issn,issue,journal,journal_lists,linkout,mesh_terms,open_access,open_access_categories,pages,pmcid,pmid,proceedings_title,publisher,recent_citations,reference_ids,references,relative_citation_ratio,research_org_cities,research_org_countries,research_org_country_names,research_org_state_codes,research_org_state_names,research_orgs,researchers,score,supporting_grant_ids,terms,times_cited,title,type,volume,year and available fieldsets: all,basics,book,categories,extras


### Get all fields

In [0]:
%%dsldf 

search publications 
  for "malaria"
return publications[all]
limit 1

Returned Publications: 1 (total = 716702)
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'open_access' is deprecated in favor of open_access_categories. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR_first' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_HC' is deprecated in favor of category_hrcs_hc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'references' is deprecated in favor of reference_ids. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'RCDC' is deprecated in favor of category_rcdc. Please refer to https://docs.dimensions.ai/dsl/re

Unnamed: 0,research_orgs,category_hra,publisher,researchers,score,FOR,research_org_cities,research_org_state_codes,reference_ids,references,author_affiliations,authors,research_org_state_names,date_inserted,concepts,terms,supporting_grant_ids,category_for,volume,year,research_org_country_names,date,research_org_countries,journal_lists,FOR_first,doi,issn,pages,open_access_categories,title,funders,type,pmid,id,funder_countries,journal.id,journal.title
0,"[{'id': 'grid.24434.35', 'acronym': 'UNL ', 'n...","[{'id': '3900', 'name': 'Biomedical'}]",Elsevier,"[{'id': 'ur.0746317400.02', 'last_name': 'Obat...",0.114568,"[{'id': '2581', 'name': '0601 Biochemistry and...","[{'id': 5072006, 'name': 'Lincoln'}]","[{'id': 'US-NE', 'name': 'Nebraska'}]","[pub.1031637764, pub.1092735356, pub.104575255...","[pub.1031637764, pub.1092735356, pub.104575255...","[[{'first_name': 'Toshihiro', 'last_name': 'Ob...","[{'first_name': 'Toshihiro', 'last_name': 'Oba...",[Nebraska],2019-10-26,"[intermediates, metabolic pathways, pathway, e...","[intermediates, metabolic pathways, pathway, e...",[grant.7873949],"[{'id': '2581', 'name': '0601 Biochemistry and...",64,2020,[United States],2020-08-01,"[{'id': 'US', 'name': 'United States'}]","[ERA 2015, Norwegian register level 1, VABB-SH...","[{'id': '2206', 'name': '06 Biological Science...",10.1016/j.copbio.2019.09.013,"[0958-1669, 1879-0429]",55-61,"[{'id': 'closed', 'name': 'Closed', 'descripti...",Toward an evaluation of metabolite channeling ...,"[{'id': 'grid.457768.f', 'acronym': 'NSF BIO',...",article,31669681,pub.1122053200,"[{'id': 'US', 'name': 'United States'}]",jour.1100889,Current Opinion in Biotechnology


### You can search for full text in the full text or in abstracts

In [0]:
%dsldf search publications in concepts for "situ detection OR malaria" return publications

Returned Publications: 20 (total = 131414)


Unnamed: 0,title,volume,author_affiliations,pages,year,type,id,issue,journal.id,journal.title
0,Comparative Study on In Situ and Laboratory Te...,48,"[[{'first_name': 'Bijivemula Sudheer Kumar', '...",20170373,2020,article,pub.1110652249,6.0,jour.1044510,Journal of Testing and Evaluation
1,Rearrangement on surface structures by boride ...,45,"[[{'first_name': 'Shubiao', 'last_name': 'Xia'...",110-118,2020,article,pub.1121451463,,jour.1141184,Journal of Energy Chemistry
2,Facile preparation of N-doped corncob-derived ...,44,"[[{'first_name': 'Wei', 'last_name': 'Yan', 'c...",121-130,2020,article,pub.1120901265,,jour.1141184,Journal of Energy Chemistry
3,Assembling Amorphous (Fe-Ni)Co x -OH/Ni3S2 Nan...,263,"[[{'first_name': 'Qijun', 'last_name': 'Che', ...",118338,2020,article,pub.1122258450,,jour.1039901,Applied Catalysis B Environmental
4,Enhancing oxygen reduction performance of oxid...,263,"[[{'first_name': 'Fengjiao', 'last_name': 'Li'...",118297,2020,article,pub.1121921532,,jour.1039901,Applied Catalysis B Environmental
5,In-situ constructing Bi2S3 nanocrystals-modifi...,235,"[[{'first_name': 'Fei', 'last_name': 'Chang', ...",116171,2020,article,pub.1121498135,,jour.1043159,Separation and Purification Technology
6,Micro-Structural and Morphological Properties ...,497,"[[{'first_name': 'Heiddy P.', 'last_name': 'Qu...",165942,2020,article,pub.1121476796,,jour.1038953,Journal of Magnetism and Magnetic Materials
7,Evolution of magnetic anisotropy in cobalt fil...,497,"[[{'first_name': 'Khushboo', 'last_name': 'Buk...",165934,2020,article,pub.1121501604,,jour.1038953,Journal of Magnetism and Magnetic Materials
8,Development of a method to evaluate the tender...,308,"[[{'first_name': 'Yingying', 'last_name': 'Zha...",125648,2020,article,pub.1121831587,,jour.1086261,Food Chemistry
9,Deep learning-based retrieval of cyanobacteria...,110,"[[{'first_name': 'Inhyeok', 'last_name': 'Yim'...",105879,2020,article,pub.1122413487,,jour.1032186,Ecological Indicators


In [0]:
%%dsldf 

search publications in title_abstract_only for "nanotechnology"
return publications
limit 3

Returned Publications: 3 (total = 42268)


Unnamed: 0,type,year,pages,id,title,volume,issue,author_affiliations,journal.id,journal.title
0,article,2020,1-1,pub.1120774302,IEEE Open Journal of Nanotechnology (OJNANO) A...,1,,,,
1,article,2020,1-1,pub.1120849045,Announcing the IEEE Open Journal of Nanotechno...,1,,,,
2,article,2020,1993-2006,pub.1120909015,Toxicological Evaluation of Graphene-Family Na...,20,4.0,"[[{'first_name': 'Linlin', 'last_name': 'Chen'...",jour.1297328,Journal of Nanoscience and Nanotechnology


### A simple author search


In [0]:
%%dsldf 

search publications in authors for "\"Daniel Hook\""
return publications
limit 10

Returned Publications: 10 (total = 76)


Unnamed: 0,type,id,pages,volume,author_affiliations,issue,title,year,journal.id,journal.title
0,article,pub.1115957159,e0216783,14.0,"[[{'first_name': 'David', 'last_name': 'Zeitly...",5.0,"Perception, prestige and PageRank",2019,jour.1037553,PLoS ONE
1,preprint,pub.1119449118,,,"[[{'first_name': 'Daniel W.', 'last_name': 'Ho...",,The Price of Gold: Curiosity?,2019,jour.1371339,arXiv
2,preprint,pub.1118864658,,,"[[{'first_name': 'David', 'last_name': 'Zeitly...",,"Perception, Prestige and PageRank",2019,jour.1371339,arXiv
3,monograph,pub.1108567148,,,"[[{'first_name': 'Carl M', 'last_name': 'Bende...",,PT Symmetry,2019,,
4,proceeding,pub.1111011264,1-4,,"[[{'first_name': 'Daniel', 'last_name': 'Hook'...",,Optical Fiber Sensor Design for Ground Slope M...,2018,jour.1047781,2010 IEEE Sensors
5,article,pub.1106289502,23,3.0,"[[{'first_name': 'Daniel W.', 'last_name': 'Ho...",,Dimensions: Building Context for Search and Ev...,2018,jour.1292498,Frontiers in Research Metrics and Analytics
6,article,pub.1105321123,s45,41.0,"[[{'first_name': 'Daniel', 'last_name': 'Hook'...",,Assessment of the interaction between a natura...,2018,jour.1091476,Contact Lens and Anterior Eye
7,article,pub.1085413261,1064-1072,106.0,"[[{'first_name': 'Andrew J.', 'last_name': 'Ho...",3.0,Characterization and quantitation of PVP conte...,2018,jour.1312091,Journal of Biomedical Materials Research Part ...
8,article,pub.1085511076,052113,95.0,"[[{'first_name': 'Carl M.', 'last_name': 'Bend...",5.0,Behavior of eigenvalues in a region of broken ...,2017,jour.1053349,Physical Review A
9,preprint,pub.1118734450,,,"[[{'first_name': 'Carl M.', 'last_name': 'Bend...",,Behavior of eigenvalues in a region of broken-...,2017,jour.1371339,arXiv


### ..or search for a researcher by a specific id

In [0]:
%%dsldf 

search publications 
where researchers.id = "ur.013514345521.07"
return publications[doi+researchers]
limit 1

Returned Publications: 1 (total = 11)


Unnamed: 0,researchers,doi
0,"[{'id': 'ur.013514345521.07', 'last_name': 'Pa...",10.1021/jacs.9b06036


### You can also query by facets

In [0]:
%%dsldf 

search publications 
return category_for 

Returned Category_for: 20


Unnamed: 0,id,count,name
0,2211,26066211,11 Medical and Health Sciences
1,2209,10606711,09 Engineering
2,3053,9326567,1103 Clinical Sciences
3,2206,8162871,06 Biological Sciences
4,2203,7232159,03 Chemical Sciences
5,2202,5766221,02 Physical Sciences
6,2201,4500401,01 Mathematical Sciences
7,2208,4467293,08 Information and Computing Sciences
8,3177,4175489,1117 Public Health and Health Services
9,2217,3351949,17 Psychology and Cognitive Sciences


### You can return up to 1000 facet rows

In [0]:
%%dsldf 

search publications for "nanotechnology"
return year limit 1000

Returned Year: 54


Unnamed: 0,id,count
0,2018,84307
1,2019,83156
2,2017,78540
3,2016,77944
4,2015,73890
5,2014,69014
6,2013,66151
7,2012,56831
8,2011,55804
9,2008,45726


### Just make a mistake, and you will ge the complete list of available facets

In [0]:
%%dsldf

search publications 
return years 

Returned Errors: 1
Semantic Error
Semantic errors found:
	Facet 'years' is not present in source 'publications'. Available facets are: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_rcdc,funder_countries,funders,journal,mesh_terms,open_access_categories,publisher,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,type,year


## 3. From DSL data to visualizations

The data returned by a DSL query has a standard format which is compatible with popular visualization tools e.g. plotly. 

In [0]:
%%dsldf 
search publications 
  for "\"machine learning\" AND vaccines" 
  where times_cited > 10 
return publications[basics+times_cited] limit 1000

### Build a simple histogram with plotly

In [0]:
 px.histogram(dsl_last_results, x="journal.title", y="id", color="year")

### Plot citations against journals 

In [0]:
# use plotly_express to map citations against journals 
 px.scatter(dsl_last_results, x="journal.title", y="times_cited", color="year")

---
## Want to learn more?

Check out the [Dimensions API Lab](https://digital-science.github.io/dimensions-api-lab/) website, which contains many tutorials and reusable Jupyter notebooks for scholarly data analytics. 