# Transforming API results into Pandas dataframes

DimCli includes a few utilities that make it easier to transform Dimensions JSON data into Pandas [dataframe objects](https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#dataframe). 

Dataframes are then easy to sort, analyse, export as CSV and use within visualisation softwares.

>  [pandas](https://pandas.pydata.org/pandas-docs/stable/) is a popular software library written for the Python programming language for data manipulation and analysis.

In [2]:
# @markdown Click the 'play' button on the left (or shift+enter) after entering your API credentials

username = "" #@param {type: "string"}
password = "" #@param {type: "string"}
endpoint = "https://app.dimensions.ai"

!pip install dimcli -U --quiet

# import all libraries and login
import pandas
import dimcli
dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()

DimCli v0.6.6.3 - Succesfully connected to <https://app.dimensions.ai> (method: dsl.ini file)


## 1. The `Dataset.as_dataframe` method

This utility method allows to quickly turn any query results into a dataframe. 

In [3]:
# we'll reuse this query later on 
query = """search publications for "graphene" 
            where year in [2013:2019] 
            return publications sort by times_cited limit 1000"""
res = dsl.query(query)

Returned Publications: 1000 (total = 405843)


In [4]:
df = res.as_dataframe()
df.head(10)

Unnamed: 0,author_affiliations,volume,pages,year,title,id,type,issue,journal.id,journal.title
0,"[[{'first_name': 'Manish', 'last_name': 'Chhow...",5.0,263-275,2013,The chemistry of two-dimensional layered trans...,pub.1050119463,article,4.0,jour.1041224,Nature Chemistry
1,"[[{'first_name': 'A. K.', 'last_name': 'Geim',...",499.0,419-425,2013,Van der Waals heterostructures,pub.1024857999,article,7459.0,jour.1018957,Nature
2,,,,2013,"Nanoenergy, Nanotechnology Applied for Energy ...",pub.1031762191,book,,,
3,"[[{'first_name': 'C.', 'last_name': 'Patrignan...",40.0,100001,2016,Review of Particle Physics,pub.1059158429,article,10.0,jour.1327822,Chinese Physics C
4,"[[{'first_name': 'John B.', 'last_name': 'Good...",135.0,1167-76,2013,The Li-ion rechargeable battery: a perspective.,pub.1019126274,article,4.0,jour.1081898,Journal of the American Chemical Society
5,"[[{'first_name': 'Andrea C.', 'last_name': 'Fe...",8.0,235-246,2013,Raman spectroscopy as a versatile tool for stu...,pub.1015305822,article,4.0,jour.1037429,Nature Nanotechnology
6,"[[{'first_name': 'Han', 'last_name': 'Liu', 'c...",8.0,4033-41,2014,Phosphorene: an unexplored 2D semiconductor wi...,pub.1009826879,article,4.0,jour.1038917,ACS Nano
7,"[[{'first_name': 'Sheneve Z.', 'last_name': 'B...",7.0,2898-926,2013,"Progress, challenges, and opportunities in two...",pub.1038090434,article,4.0,jour.1038917,ACS Nano
8,"[[{'first_name': 'Mingsheng', 'last_name': 'Xu...",113.0,3766-98,2013,Graphene-like two-dimensional materials.,pub.1022830330,article,5.0,jour.1077147,Chemical Reviews
9,"[[{'first_name': 'Oriol', 'last_name': 'Lopez-...",8.0,497-501,2013,Ultrasensitive photodetectors based on monolay...,pub.1023181230,article,7.0,jour.1037429,Nature Nanotechnology


Pandas dataframes offer a myriad of utilities for inspecting data. Check out the [official docs](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html) or google a [pandas tutorial](https://www.google.com/search?q=pandas+tutorial) to lean more about it. 

In [5]:
# the table shape
df.shape

(1000, 10)

In [6]:
# the 'value_counts' method returns the distribution of a specific field eg publication [years]
df['year'].value_counts()

2013    357
2014    295
2015    201
2016     92
2017     47
2018      8
Name: year, dtype: int64

In [7]:
# eg distribution of publication [type]
df['type'].value_counts()

article      995
monograph      2
book           2
chapter        1
Name: type, dtype: int64

### Magic commands: `%dsldf` and `%dslloopdf` 

Dimcli includes a few [Python magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html) that make working with dataframes easier. 

Magic commands can be very useful when testing things out e.g. while trying out a new query, or checking what data is available in Dimensions on a certain topic. 

Moreover, if you hit **'tab'** after the command, you can also take advantage of a custom **Dimensions Search Language autocompleter**. 

**Single-line version** 

In [5]:
%dsldf search publications where journal.id="jour.1136447" return publications

Returned Publications: 20 (total = 943)


Unnamed: 0,author_affiliations,id,issue,journal.id,journal.title,pages,title,type,volume,year
0,,pub.1124361199,,jour.1136447,Nature Energy,1-1,Publisher Correction: Perovskites take steps t...,article,,2020
1,"[[{'first_name': 'Kimberly S.', 'last_name': '...",pub.1124340939,,jour.1136447,Nature Energy,1-11,Peer influence on household energy behaviours,article,,2020
2,"[[{'first_name': 'Jennifer', 'last_name': 'Wil...",pub.1124342004,,jour.1136447,Nature Energy,1-2,An electro-swing approach,article,,2020
3,"[[{'first_name': 'Xiaopeng', 'last_name': 'Zhe...",pub.1124190675,,jour.1136447,Nature Energy,1-10,Managing grains and interfaces via ligand anch...,article,,2020
4,"[[{'first_name': 'Abhishek', 'last_name': 'Kar...",pub.1124192098,,jour.1136447,Nature Energy,1-2,Capital cost subsidies through India’s Ujjwala...,article,,2020
5,"[[{'first_name': 'Giulia', 'last_name': 'Tregn...",pub.1124223699,1.0,jour.1136447,Nature Energy,2-2,High-speed films,article,5.0,2020
6,"[[{'first_name': 'Deepak', 'last_name': 'Rajag...",pub.1124048660,1.0,jour.1136447,Nature Energy,18-19,The United States can generate up to 3.2 EJ of...,article,5.0,2020
7,"[[{'first_name': 'Lee V.', 'last_name': 'White...",pub.1123426945,1.0,jour.1136447,Nature Energy,16-17,Varied health and financial impacts of time-of...,article,5.0,2020
8,"[[{'first_name': 'Nicky', 'last_name': 'Dean',...",pub.1124213396,1.0,jour.1136447,Nature Energy,5-5,Performance factors,article,5.0,2020
9,"[[{'first_name': 'Linan', 'last_name': 'Zhou',...",pub.1123828460,1.0,jour.1136447,Nature Energy,61-70,Light-driven methane dry reforming with single...,article,5.0,2020


**Multi-line version** 

You can split the query into multiple lines, only this time you need to use the `%%dsldf` command (two `%`): 

In [21]:
%%dsldf
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications[title+year+times_cited] sort by times_cited

Returned Publications: 20 (total = 3826)


Unnamed: 0,times_cited,title,year
0,423,Asymmetric Supercapacitors Using 3D Nanoporous...,2015
1,410,CH3NH3SnxPb(1-x)I3 Perovskite Solar Cells Cove...,2014
2,308,Pt‐Free Counter Electrode for Dye‐Sensitized S...,2014
3,237,Hierarchical Gaussian Descriptor for Person Re...,2016
4,233,Brain Intelligence: Go beyond Artificial Intel...,2018
5,231,Improved understanding of the electronic and e...,2014
6,211,Comparative study of ceramic and single crysta...,2013
7,205,Underwater image dehazing using joint trilater...,2014
8,182,Flexible Graphene-Based Supercapacitors: A Review,2016
9,164,Recent Progress of Counter Electrode Catalysts...,2014


Note: the autocompleter is available only with single-line queries.

**Loop versions**

Two more magic commands are available: 

* `%dslloopdf` (single-line) 
* `%%dslloopdf` (multi-line) 

These commands behave just like the ones above, only they trigger an **iterative query** that will attempt to extract all records available for a chosen DSL query up to the maximum limit of 50k.  


### Accessing data returned by magic queries

By default the results of magic command queries are saved into a variable called `dsl_last_results`:

In [22]:
type(dsl_last_results)

pandas.core.frame.DataFrame

In [24]:
dsl_last_results.describe()

Unnamed: 0,times_cited,year
count,20.0,20.0
mean,192.6,2015.05
std,94.126119,1.669384
min,104.0,2013.0
25%,128.25,2014.0
50%,158.0,2014.5
75%,231.5,2016.0
max,423.0,2018.0


## 2. Dataframe Methods for 'Publications' queries

What follows are specialized versions of the `as_dataframe` method for results sets composed of publication records. 

###  Extracting authors: `as_dataframe_authors`

Publications authors are usually returned by the Dimensions API inside a nested JSON object in the `authors_affiliations` sub-key. 

> Note: the order of authors in the JSON is consistent with the ordering of authors in the original publication

This methods allows to quickly extract that data and return a dataframe with **one row per author**.

In [8]:
authors = res.as_dataframe_authors()
authors.head()

Unnamed: 0,first_name,last_name,corresponding,orcid,current_organization_id,researcher_id,affiliations,pub_id
0,Manish,Chhowalla,True,,grid.430387.b,ur.0633062306.03,"[{'id': 'grid.430387.b', 'name': 'Rutgers, The...",pub.1050119463
1,Hyeon Suk,Shin,,,grid.42687.3f,ur.07617630407.83,"[{'id': 'grid.42687.3f', 'name': 'Ulsan Nation...",pub.1050119463
2,Goki,Eda,,,grid.4280.e,ur.01150450507.27,"[{'id': 'grid.4280.e', 'name': 'National Unive...",pub.1050119463
3,Lain-Jong,Li,,['0000-0002-4059-7783'],grid.45672.32,ur.01313340113.13,"[{'id': 'grid.28665.3f', 'name': 'Academia Sin...",pub.1050119463
4,Kian Ping,Loh,,['0000-0002-1491-743X'],grid.4280.e,ur.0752174033.73,"[{'id': 'grid.4280.e', 'name': 'National Unive...",pub.1050119463


Using the authors dataframe, we can easily get the top ten values for `current_organization_id`. 

In [9]:
authors['current_organization_id'].value_counts()[:10]

                 188
grid.59025.3b    144
grid.12527.33    116
grid.5379.8      105
grid.59053.3a     85
grid.19006.3e     78
grid.168010.e     76
grid.13402.34     72
grid.21940.3e     69
grid.116068.8     65
Name: current_organization_id, dtype: int64

> Explanation: the most frequent value turns to be grid.59025.3b ie [Nanyang Technological University in Singapore](https://www.grid.ac/institutes/grid.59025.3b). The first result is empty, meaning that for those authors Dimensions has no info about `current_organization_id`. 

### Extracting Affiliations: `as_dataframe_authors_affiliations`

As you can see from the results of the previous section, the `affiliations` of each author is yet another nested JSON object. 

> Note: the order of affiliations in the JSON is consistent with the affiliations order in the original publication

The `as_dataframe_authors_affiliations` method allows to quickly extract that affiliations data and return a dataframe with **one row per affiliation**.

This can be useful e.g. if one wants to count research organizations at *the time of writing* (as opposed to `current_organization_id`, which is the *most recent organization* of a researcher). 

In [10]:
affiliations = res.as_dataframe_authors_affiliations()
affiliations.head()

Unnamed: 0,aff_id,aff_name,aff_city,aff_city_id,aff_country,aff_country_code,aff_state,aff_state_code,pub_id,researcher_id,first_name,last_name
0,grid.430387.b,"Rutgers, The State University of New Jersey",New Brunswick,5101720.0,United States,US,New Jersey,US-NJ,pub.1050119463,ur.0633062306.03,Manish,Chhowalla
1,grid.42687.3f,Ulsan National Institute of Science and Techno...,Ulsan,1833750.0,South Korea,KR,,,pub.1050119463,ur.07617630407.83,Hyeon Suk,Shin
2,grid.4280.e,National University of Singapore,Singapore,1880250.0,Singapore,SG,,,pub.1050119463,ur.01150450507.27,Goki,Eda
3,grid.28665.3f,Academia Sinica,Taipei,1668340.0,Taiwan,TW,,,pub.1050119463,ur.01313340113.13,Lain-Jong,Li
4,grid.4280.e,National University of Singapore,Singapore,1880250.0,Singapore,SG,,,pub.1050119463,ur.0752174033.73,Kian Ping,Loh


In [11]:
affiliations.describe(include="all")

Unnamed: 0,aff_id,aff_name,aff_city,aff_city_id,aff_country,aff_country_code,aff_state,aff_state_code,pub_id,researcher_id,first_name,last_name
count,6999.0,6999,6999.0,6999.0,6999,6999,6999.0,6999.0,6999,6999.0,6999,6999
unique,750.0,1061,460.0,463.0,53,53,56.0,56.0,984,4159.0,3415,1891
top,,Nanyang Technological University,,,China,CN,,,pub.1019661721,,Wei,Zhang
freq,925.0,220,925.0,930.0,1972,1972,5120.0,5120.0,105,159.0,62,294


Let's get the top ten values for `aff_id`. 

In [12]:
affiliations['aff_id'].value_counts()[:10]

                 925
grid.59025.3b    220
grid.12527.33    134
grid.5379.8      115
grid.19006.3e    100
grid.168010.e    100
grid.21940.3e     96
grid.59053.3a     91
grid.21729.3f     87
grid.116068.8     86
Name: aff_id, dtype: int64

> Explanation: the most frequent value is still [grid.59025.3b](https://www.grid.ac/institutes/grid.59025.3b), meaning that most authors' current organization is the same organization of when they published these articles. 

Another example: we can now easily analyze the data by country too. 

In [13]:
affiliations['aff_country'].value_counts()[:10]

China             1972
United States     1570
                   925
South Korea        342
Singapore          327
United Kingdom     312
Australia          171
Japan              158
Canada             138
Germany            126
Name: aff_country, dtype: int64

> Explanation: the vast majority of authors in this dataset are from China, closely followed by the USA. 

## 3. Dataframe Methods for 'Grants' queries

###  Extracting Funders: `as_dataframe_funders`

Grant funders authors are usually returned by the Dimensions API inside a nested JSON object in the `funders` sub-key. 

This methods allows to quickly extract that data and return a dataframe with **one row per funder**.

In [19]:
# get a sample list of grants
query = """search grants for "malaria" return grants limit 1000"""
res = dsl.query(query)

Returned Grants: 1000 (total = 9204)


In [20]:
res.as_dataframe_funders().head(10)

Unnamed: 0,id,city_name,types,acronym,state_name,latitude,name,country_name,linkout,longitude,grant_id,grant_title,grant_start_date,grant_end_date
0,grid.421091.f,Swindon,[Government],EPSRC,England,51.567093,Engineering and Physical Sciences Research Cou...,United Kingdom,[https://www.epsrc.ac.uk/],-1.784602,grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30
1,grid.270680.b,Brussels,[Government],EC,,50.85165,European Commission,Belgium,[http://ec.europa.eu/index_en.htm],4.36367,grant.8585457,Estimating the Prevalence of AntiMicrobial Res...,2020-01-01,2021-12-31
2,grid.270680.b,Brussels,[Government],EC,,50.85165,European Commission,Belgium,[http://ec.europa.eu/index_en.htm],4.36367,grant.8586121,Earth observation service for preventive contr...,2019-11-01,2022-10-31
3,grid.454774.1,New Delhi,[Government],DBT,,28.601473,Department of Biotechnology,India,[http://www.dbtindia.nic.in/],77.23578,grant.8657420,Translational research and clinical developmen...,2019-10-07,2022-10-07
4,grid.248883.d,Ottawa,[Government],CIHR,Ontario,45.381893,Canadian Institutes of Health Research,Canada,[http://www.cihr-irsc.gc.ca/e/193.html],-75.745224,grant.8527034,Mechanisms of Leishmania dissemination and tra...,2019-10-01,2024-09-30
5,grid.52788.30,London,[Nonprofit],WT,,51.525867,Wellcome Trust,United Kingdom,[http://www.wellcome.ac.uk/],-0.135005,grant.8558648,Molecular mechanisms of carbohydrate uptake in...,2019-10-01,2022-09-30
6,grid.52788.30,London,[Nonprofit],WT,,51.525867,Wellcome Trust,United Kingdom,[http://www.wellcome.ac.uk/],-0.135005,grant.8103743,The Chemical Empire: A New History of Syntheti...,2019-10-01,2023-10-01
7,grid.425888.b,Bern,[Government],SNF,,46.94923,Swiss National Science Foundation,Switzerland,[http://www.snf.ch/en],7.432395,grant.8599112,Gauging Global Governance: The Effectiveness o...,2019-10-01,2022-09-30
8,grid.270680.b,Brussels,[Government],EC,,50.85165,European Commission,Belgium,[http://ec.europa.eu/index_en.htm],4.36367,grant.8585082,Understanding the roles of pathogen infection ...,2019-10-01,2021-09-30
9,grid.457875.c,Arlington,[Government],NSF MPS,Virginia,38.880566,Directorate for Mathematical & Physical Sciences,United States,[http://www.nsf.gov/dir/index.jsp?org=MPS],-77.11099,grant.8566624,Collaborative Research: Principal Component An...,2019-10-01,2022-09-30


### Extracting investigators: `as_dataframe_investigators`

Grant investigators are usually returned by the Dimensions API inside a nested JSON object in the `investigator_details` sub-key. 

This methods allows to quickly extract that data and return a dataframe with **one row per investigator**.

> NOTE: `investigator_details` are not returned by default in a grants query hence one must specify this in the query results

In [21]:
# get a sample list of grants
query = """search grants for "malaria" return grants[basics+investigator_details] limit 1000"""
res = dsl.query(query)

Returned Grants: 1000 (total = 9204)
Field 'project_num' is deprecated in favor of grant_number. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


In [22]:
res.as_dataframe_investigators().head(10)

Unnamed: 0,first_name,last_name,middle_name,id,role,affiliations,grant_id,grant_title,grant_start_date,grant_end_date
0,Anotida,Madzvamuse,,ur.01301306304.17,PI,"[{'country': 'United Kingdom', 'state_code': N...",grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30
1,Gift,Muchatibaya,,ur.01125437006.53,Co-PI,,grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30
2,Farai,Nyabadza,,ur.016431672433.40,Co-PI,"[{'country': 'South Africa', 'state_code': Non...",grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30
3,Zindoga,Mukandavire,,ur.01176116461.04,Co-PI,,grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30
4,Jasmina,Panovska-Griffiths,,ur.01037661532.12,Co-PI,,grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30
5,Edward,Lungu,,ur.0644520733.99,Co-PI,,grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30
6,Hatson John Boscoh,Njagarah,,ur.010541731643.13,Co-PI,,grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30
7,Eduard,Campillo Funollet,,ur.014162252725.80,Co-PI,"[{'country': 'United Kingdom', 'state_code': N...",grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30
8,K,White,,ur.015153160033.34,Co-PI,,grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30
9,Istvan,Kiss,Zoltan,ur.016546121033.78,Co-PI,"[{'country': 'United Kingdom', 'state_code': N...",grant.8558055,UK-Africa Postgraduate Advanced Study Institut...,2020-03-31,2021-03-30


## 4. Dataframe Methods for 'Concepts' queries

These methods can be used with all content types that support the extraction of concepts, i.e., `publications` or `grants`. See the [official documentation](https://docs.dimensions.ai/dsl/data-sources.html) for more details.

### Extracting Concepts: `as_dataframe_concepts`

The `as_dataframe_concepts` method allows to quickly extract all concepts attached to a record, **one row per concept**, so to make it easier to do operations like counting or plotting the results.

NOTE: concepts are normalized keywords describing the main topics of a document, which are automatically derived from the full text  using machine learning. In the JSON data, concepts are returned with an ordered list (=first items are the most relevant), like this one: 

```
{'concepts': ['electrochemical conversion',
  'conversion',
  'CO2',
  'formate',
  'formic acid',
  'acid'],
 'id': 'pub.1122072646'}
```

The `as_dataframe_concepts` extracts all concepts data from JSON to a dataframe (ps this is functionally equivalent to pandas's [explode method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html)). Moreover, it automatically creates a number of metrics that can be used to carry out further analyses:

1. `concepts_count`: an integer representing the total number of concepts per single document.
2. `rank`: an integer representing the ranking of the concept within the list of concepts for a single document. E.g., the first concept has rank=1, while the fifth has rank=5.
3. `score`: a float representing the weighted importance of the concept within a document. This is obtained by  normalizing its ranking against the total number of concepts for a single document. E.g., if a document has 10 concepts in total, the first concept gets a score=1, the second score=0.9, etc..
4. `frequency`: an integer representing how often that concept occurs within the full results-set returned by a query, i.e. how many documents have that concept name. E.g., if a concept appears in 5 documents, frequency=5.
5. `rank_avg`: the average (mean) value of all ranks for a single concept, across the full set of documents returned by the query. 
6. `score_avg`: the average (mean) value of all scores for a single concept, across the full set of documents returned by a query. 
7. `score_sum`: the sum of all scores for a single concept, across the full set of documents returned by a query. 


By sorting and segmenting concepts using these parameters, it is possible to fine-tune the concept extraction algorithm, so to make more suitable for the application at hand.

In [3]:
q = """search publications for "graphene" 
            where year=2019 
       return publications[id+title+year+concepts] limit 100"""
concepts = dsl.query(q).as_dataframe_concepts()
concepts.head()

Returned Publications: 100 (total = 101818)


Unnamed: 0,year,id,title,concept,concepts_count,rank,score,frequency,rank_avg,score_avg,score_sum
0,2019,pub.1122259223,Surface Modification of Water Purification Mem...,method,54,1,1.0,23,20.3,0.58,13.23
1,2019,pub.1122259223,Surface Modification of Water Purification Mem...,water supply,54,2,0.98,1,2.0,0.98,0.98
2,2019,pub.1122259223,Surface Modification of Water Purification Mem...,cycle,54,3,0.96,1,3.0,0.96,0.96
3,2019,pub.1122259223,Surface Modification of Water Purification Mem...,desalination,54,4,0.94,1,4.0,0.94,0.94
4,2019,pub.1122259223,Surface Modification of Water Purification Mem...,water reuse,54,5,0.93,1,5.0,0.93,0.93


In [4]:
concepts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3912 entries, 0 to 3911
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   year            3912 non-null   int64  
 1   id              3912 non-null   object 
 2   title           3912 non-null   object 
 3   concept         3912 non-null   object 
 4   concepts_count  3912 non-null   int64  
 5   rank            3912 non-null   int64  
 6   score           3912 non-null   float64
 7   frequency       3912 non-null   int64  
 8   rank_avg        3912 non-null   float64
 9   score_avg       3912 non-null   float64
 10  score_sum       3912 non-null   float64
dtypes: float64(4), int64(4), object(3)
memory usage: 336.3+ KB


E.g. Sorting by **rank_avg** permits to highlight concepts that are important but don't necessarily appear in many documents.

NOTE: `rank_avg` goes from 1 to N, where 1 is the highest value. So we want to sort in ascending order to show the most interesting concepts!

In [6]:
concepts_unique = concepts.drop_duplicates("concept")

In [11]:
concepts_unique.sort_values("rank_avg", ascending=True)

Unnamed: 0,year,id,title,concept,concepts_count,rank,score,frequency,rank_avg,score_avg,score_sum
1142,2019,pub.1123924941,Surface Enhanced CdSe/ZnS QD/SiNP Electrochemi...,study,52,1,1.00,2,1.0,1.00,2.00
796,2019,pub.1123764781,"Microstructure, Wettability, Corrosion Resista...",toxic metal ions,67,1,1.00,1,1.0,1.00,1.00
3777,2019,pub.1110197336,"Stretchable electronics: functional materials,...",stretchable electronics,50,1,1.00,1,1.0,1.00,1.00
1825,2019,pub.1123764874,Mercury Removal from Aqueous Solutions Using M...,pyrite,49,1,1.00,1,1.0,1.00,1.00
2206,2019,pub.1123769652,Indirect detection of 5-hydroxytryptamine and ...,solid-state electrochemiluminescence sensor,32,1,1.00,1,1.0,1.00,1.00
...,...,...,...,...,...,...,...,...,...,...,...
603,2019,pub.1110633851,Self-assembly as a key player for materials na...,delivery,98,93,0.06,1,93.0,0.06,0.06
604,2019,pub.1110633851,Self-assembly as a key player for materials na...,supramolecular differentiation,98,94,0.05,1,94.0,0.05,0.05
605,2019,pub.1110633851,Self-assembly as a key player for materials na...,molecular recognition,98,95,0.04,1,95.0,0.04,0.04
606,2019,pub.1110633851,Self-assembly as a key player for materials na...,molecular tuning,98,96,0.03,1,96.0,0.03,0.03


E.g. Sorting by **frequency** highlights concepts that are shared by many documents in our dataset (= the one generated by the original query).

In [12]:
concepts_unique.sort_values("frequency", ascending=False)

Unnamed: 0,year,id,title,concept,concepts_count,rank,score,frequency,rank_avg,score_avg,score_sum
220,2019,pub.1123010291,Chitosan for Sensors and Electrochemical Appli...,properties,35,17,0.54,35,18.83,0.63,21.93
90,2019,pub.1123764869,Design of Monovalent Ion Selective Membranes f...,materials,60,37,0.40,33,21.03,0.58,19.24
99,2019,pub.1123764869,Design of Monovalent Ion Selective Membranes f...,applications,60,46,0.25,32,26.69,0.44,14.07
73,2019,pub.1123764869,Design of Monovalent Ion Selective Membranes f...,effect,60,20,0.68,24,21.54,0.56,13.46
0,2019,pub.1122259223,Surface Modification of Water Purification Mem...,method,54,1,1.00,23,20.30,0.58,13.23
...,...,...,...,...,...,...,...,...,...,...,...
1326,2019,pub.1120513650,Smart polymers driven by multiple and tunable ...,viscoelasticity,44,27,0.41,1,27.00,0.41,0.41
1325,2019,pub.1120513650,Smart polymers driven by multiple and tunable ...,surface viscoelasticity,44,26,0.43,1,26.00,0.43,0.43
1324,2019,pub.1120513650,Smart polymers driven by multiple and tunable ...,casein,44,25,0.45,1,25.00,0.45,0.45
1322,2019,pub.1120513650,Smart polymers driven by multiple and tunable ...,remarkable adsorption,44,23,0.50,1,23.00,0.50,0.50


#### Which concepts metrics should you use? 

That depends on the data you have (eg how homogeneous it is) and your projects goals too. 

The various indicators available are meant to help you construct filtered lists of concepts more easily. But it's down to you to determine the right combination of `score`, `frequency` and `rank`, so that the 'right' concepts become more apparent! 

Tip: see also the [Topic Modeling Analysis](https://api-lab.dimensions.ai/cookbooks/2-publications/Simple-topic-analysis.html) notebook for more examples about this topic.



## Conclusions 

Moving Dimensions API results to pandas dataframes **makes it easier** to **analyze the data** and **answer research questions**. 

Note: the examples above only scratch the surface of what can be done with pandas! 