# Expert Identification with the Dimensions API - An Introduction

This notebook shows to use the [expert identification](https://docs.dimensions.ai/dsl/expert-identification.html) workflow available via Dimensions Analytics API. 

In [1]:
import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))

==
CHANGELOG
This notebook was last run on Jan 25, 2022
==


## Prerequisites

This notebook assumes you have installed the [Dimcli](https://pypi.org/project/dimcli/) library and are familiar with the ['Getting Started' tutorial](https://api-lab.dimensions.ai/cookbooks/1-getting-started/1-Using-the-Dimcli-library-to-query-the-API.html).

In [2]:
!pip install dimcli --quiet 

import dimcli
from dimcli.utils import *

import json
import sys
import pandas as pd

print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  KEY = getpass.getpass(prompt='API Key: ')  
  dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
  KEY = ""
  dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()

[2mSearching config file credentials for 'https://app.dimensions.ai' endpoint..[0m


==
Logging in..
[2mDimcli - Dimensions API Client (v0.9.6)[0m
[2mConnected to: <https://app.dimensions.ai/api/dsl> - DSL v2.0[0m
[2mMethod: dsl.ini file[0m


## At a glance

At its simplest, an expert search query looks like this:

In [3]:
%%dsl

identify experts from concepts "malaria OR \"effective malaria vaccine\" OR \"effective prevention\""
      using publications
      where year >= 2015
return experts[basics]

<dimcli.DslDataset object #4415239408. Dict keys: '_copyright', '_stats', '_version', 'experts'>

The query takes a list of **concepts** defining the expertise you're looking for, plus other parameters defining the pool of publications to be used, and it returns a list of researchers sorted by relevance. 

In [4]:
pd.DataFrame(dsl_last_results['experts'])

Unnamed: 0,docs_found,first_name,id,last_name,research_orgs,score,orcid_id
0,5,Martha,ur.01162445502.98,Sedegah,"[grid.4437.4, grid.94365.3d, grid.411439.a, gr...",124.891458,
1,5,James G,ur.01225135650.70,Beeson,"[grid.1002.3, grid.10223.32, grid.33058.3d, gr...",112.559316,[0000-0002-1018-7898]
2,4,Danielle I,ur.01323510115.98,Stanisic,"[grid.1022.1, grid.1049.c, grid.1042.7, grid.1...",109.252085,[0000-0003-3908-7468]
3,5,Kazutoyo,ur.01253714727.65,Miura,"[grid.94365.3d, grid.429651.d, grid.265107.7, ...",101.304842,[0000-0003-4455-2432]
4,3,Michael Francis,ur.0752141120.95,Good,"[grid.1008.9, grid.1043.6, grid.415913.b, grid...",90.852515,
5,4,Jack S,ur.01354757704.29,Richards,"[grid.1056.2, grid.1623.6, grid.416153.4, grid...",80.522901,[0000-0001-5786-6989]
6,3,Michael R,ur.01165702423.17,Hollingdale,"[grid.507680.c, grid.418352.9, grid.265436.0, ...",80.158875,
7,3,Eileen D,ur.0703623237.41,Villasante,[grid.415913.b],80.158875,
8,4,Carole A,ur.01153247161.33,Long,"[grid.419681.3, grid.94365.3d, grid.4991.5, gr...",79.714262,[0000-0002-3835-5443]
9,3,Harini D,ur.01066177176.10,Ganeshan,"[grid.415913.b, grid.201075.1]",77.814179,


Often though, we start from some text and want to find experts relevant to that text (as opposed to starting from concepts).  

The expert identification workflow, in such a case, consists of two steps: 

1. Concepts extraction from text 
2. Expert identification using concepts 

In the first step, the user extracts concepts from an abstract. The user can review and modify the list of extracted concepts and then feed it into the actual expert identification workflow. In the following sections we will go though these steps in details. 

## Step 1: Concept Extraction

### What are concepts? 

Concepts are noun-phrases automatically extracted from a document’s abstract as well as the rest of the Dimensions database, which is used to weight their importance and relevance within the document’s field of study (see also the official documentation: [searching using concepts](https://docs.dimensions.ai/dsl/language.html#concepts-search-main)).

For instance, the phrases machine learning and neural network will be considered very relevant in a computer science paper, while project and study will have their relevance scores low as they are generic phrases.

### Extracting concepts with the DSL

Extracting concepts is implemented using the [extract_concepts DSL function](https://docs.dimensions.ai/dsl/functions.html#function-extract-concepts). This is the syntax:
```
extract_concepts("publication abstract")
```

This query will return a list of extracted concepts, ordered by weight, in descending order. For example:

In [5]:
abstract = """We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions, 
metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between 
valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and 
holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square 
centimeters per volt-second can be induced by applying gate voltage.
"""

abstract = abstract.replace("\n", " ")

res = dsl.query(f"""extract_concepts("{abstract}")""")

CONCEPTS = res['extracted_concepts']

pd.DataFrame(CONCEPTS)

Unnamed: 0,0
0,films
1,ambipolar electric field effect
2,two-dimensional semimetal
3,electric field effects
4,room temperature mobility
5,conductance band
6,field effects
7,graphitic films
8,centimeters
9,gate voltage


## Step 2: Expert Identification

Extracted concepts, from step one, can be used in a `identify experts` queries, for example:

```
identify experts from concepts "+malaria OR \"effective malaria vaccine\" OR \"effective prevention\""
      using publications
      where research_org_countries is not empty
          and year >= 2013
return experts[basics]
      limit 20 skip 0
      annotate organizational, coauthorship overlap
          with ["ur.016204724721.35", "ur.012127355561.32"]
```

Returned experts are ordered by their **relevance**.

A few important things to remember:

1. **Sources.** Experts identification can use either `publications` or `grants` (when not specified, publications are used)
1. **Default connector is AND**. When multiple concepts are provided, these are transformed automatically into an ``AND`` query. To match any of the concepts, one should then explicitly add ``OR`` connectors. 
3. **Where conditions**. It is possible to specify `where-filters` but that's not required. Fields available for filtering are exactly the same as the ones in standard `search` expressions.
4. **Pagination**. Similarly, the `paging-phrase` is optional. By default, the top 20 experts get returned - using limit/skip it is possible up to a maximum of 200.
5. **Overlap annotations**. Annotating results with organizational and/or coauthorship overlap will produce another JSON object for each identified expert. This object has two parts.
    * The **Organizational** overlap is defined as a boolean value that is true if the expert and the researchers from the query have the same current research organization.
    * The **Coauthorship** conflict is defined as the number documents the expert has coauthored with any of the researchers provided in the query, in the last three years.


### Example 1. Basic query using `concepts`

In [6]:
# take the top 15 concepts
some_concepts = " ".join(['"%s"' % x for x in CONCEPTS[:15]])

q = f"""
        identify experts 
            from concepts "{dsl_escape(some_concepts)}"
        return experts
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()


Query:
        identify experts 
            from concepts "\"films\" \"ambipolar electric field effect\" \"two-dimensional semimetal\" \"electric field effects\" \"room temperature mobility\" \"conductance band\" \"field effects\" \"graphitic films\" \"centimeters\" \"gate voltage\" \"semimetals\" \"electrons\" \"atoms\" \"holes\" \"square centimeter\""
        return experts
        


Unnamed: 0,docs_found,first_name,id,last_name,research_orgs,score,orcid_id
0,1,Anatoly A,ur.011033016243.08,Firsov,"[grid.4886.2, grid.424048.e, grid.425037.7, gr...",269.41174,
1,1,Da,ur.01146544531.57,Jiang,[grid.5379.8],269.41174,
2,1,Sergey V,ur.011535264111.51,Dubonos,"[grid.5254.6, grid.510709.a, grid.5379.8, grid...",269.41174,
3,1,Konstantin Sergeevich,ur.01207120103.29,Novoselov,"[grid.5335.0, grid.423905.9, grid.425037.7, gr...",269.41174,[0000-0003-4972-5371]
4,1,Yuanbo,ur.0657076451.24,Zhang,"[grid.8547.e, grid.30389.31, grid.184769.5, gr...",269.41174,[0000-0003-1290-7980]
5,1,Andre Konstantin,ur.0721730631.45,Geim,"[grid.418975.6, grid.9026.d, grid.12527.33, gr...",269.41174,[0000-0003-2861-8331]
6,1,Sergey V,ur.07423561367.62,Morozov,"[grid.5379.8, grid.9026.d, grid.470117.4, grid...",269.41174,[0000-0003-3075-7787]
7,1,Irina V,ur.0767105504.29,Grigorieva,"[grid.418975.6, grid.500282.d, grid.418751.e, ...",269.41174,[0000-0001-5991-7778]


### Example 2. Query with `OR` connectors

Note: this time we return all experts fields by using the syntax `experts[all]`.

In [7]:
some_concepts = " OR ".join(['"%s"' % x for x in CONCEPTS[:15]])

q = f"""
        identify experts 
            from concepts "{dsl_escape(some_concepts)}"
        return experts[all]
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()


Query:
        identify experts 
            from concepts "\"films\" OR \"ambipolar electric field effect\" OR \"two-dimensional semimetal\" OR \"electric field effects\" OR \"room temperature mobility\" OR \"conductance band\" OR \"field effects\" OR \"graphitic films\" OR \"centimeters\" OR \"gate voltage\" OR \"semimetals\" OR \"electrons\" OR \"atoms\" OR \"holes\" OR \"square centimeter\""
        return experts[all]
        
1 QueryError found
Semantic errors found:
	Field / Fieldset 'all' is not present in Source 'researchers'. Available fields: current_research_org,dimensions_url,first_grant_year,first_name,first_publication_year,id,last_grant_year,last_name,last_publication_year,nih_ppid,obsolete,orcid_id,redirect,research_orgs,total_grants,total_publications and available fieldsets: basics,extras


### Example 3. Query with `where` filters
 

In [8]:
some_concepts = " ".join(['"%s"' % x for x in CONCEPTS[:10]])

q = f"""identify experts 
            from concepts "{dsl_escape(some_concepts)}"
            using publications
            where research_org_countries is not empty
              and year >= 2000
              and times_cited > 100
        return experts
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()

Query:
            from concepts "\"films\" \"ambipolar electric field effect\" \"two-dimensional semimetal\" \"electric field effects\" \"room temperature mobility\" \"conductance band\" \"field effects\" \"graphitic films\" \"centimeters\" \"gate voltage\""
            using publications
            where research_org_countries is not empty
              and year >= 2000
              and times_cited > 100
        return experts
        


Unnamed: 0,docs_found,first_name,id,last_name,research_orgs,score,orcid_id
0,1,Anatoly A,ur.011033016243.08,Firsov,"[grid.4886.2, grid.424048.e, grid.425037.7, gr...",204.01543,
1,1,Da,ur.01146544531.57,Jiang,[grid.5379.8],204.01543,
2,1,Sergey V,ur.011535264111.51,Dubonos,"[grid.5254.6, grid.510709.a, grid.5379.8, grid...",204.01543,
3,1,Konstantin Sergeevich,ur.01207120103.29,Novoselov,"[grid.5335.0, grid.423905.9, grid.425037.7, gr...",204.01543,[0000-0003-4972-5371]
4,1,Yuanbo,ur.0657076451.24,Zhang,"[grid.8547.e, grid.30389.31, grid.184769.5, gr...",204.01543,[0000-0003-1290-7980]
5,1,Andre Konstantin,ur.0721730631.45,Geim,"[grid.418975.6, grid.9026.d, grid.12527.33, gr...",204.01543,[0000-0003-2861-8331]
6,1,Sergey V,ur.07423561367.62,Morozov,"[grid.5379.8, grid.9026.d, grid.470117.4, grid...",204.01543,[0000-0003-3075-7787]
7,1,Irina V,ur.0767105504.29,Grigorieva,"[grid.418975.6, grid.500282.d, grid.418751.e, ...",204.01543,[0000-0001-5991-7778]


### Example 4. Adding Overlap Annotations  (eg for conflict of interests checks)


In [9]:
overlap_researchers = ["ur.011535264111.51", "ur.011033016243.08", "ur.01207120103.29"]

q = f"""
        identify experts 
            from concepts "{dsl_escape(some_concepts)}"
            using publications
            where research_org_countries is not empty
              and year >= 2000
        return experts
            annotate coauthorship, organizational overlap
            with {json.dumps(overlap_researchers)}
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()


Query:
        identify experts 
            from concepts "\"films\" \"ambipolar electric field effect\" \"two-dimensional semimetal\" \"electric field effects\" \"room temperature mobility\" \"conductance band\" \"field effects\" \"graphitic films\" \"centimeters\" \"gate voltage\""
            using publications
            where research_org_countries is not empty
              and year >= 2000
        return experts
            annotate coauthorship, organizational overlap
            with ["ur.011535264111.51", "ur.011033016243.08", "ur.01207120103.29"]
        


Unnamed: 0,docs_found,first_name,id,last_name,research_orgs,score,overlap.coauthorship,overlap.organizational,orcid_id
0,1,Anatoly A,ur.011033016243.08,Firsov,"[grid.4886.2, grid.424048.e, grid.425037.7, gr...",204.01543,0,True,
1,1,Da,ur.01146544531.57,Jiang,[grid.5379.8],204.01543,0,False,
2,1,Sergey V,ur.011535264111.51,Dubonos,"[grid.5254.6, grid.510709.a, grid.5379.8, grid...",204.01543,0,True,
3,1,Konstantin Sergeevich,ur.01207120103.29,Novoselov,"[grid.5335.0, grid.423905.9, grid.425037.7, gr...",204.01543,175,True,[0000-0003-4972-5371]
4,1,Yuanbo,ur.0657076451.24,Zhang,"[grid.8547.e, grid.30389.31, grid.184769.5, gr...",204.01543,1,False,[0000-0003-1290-7980]
5,1,Andre Konstantin,ur.0721730631.45,Geim,"[grid.418975.6, grid.9026.d, grid.12527.33, gr...",204.01543,26,False,[0000-0003-2861-8331]
6,1,Sergey V,ur.07423561367.62,Morozov,"[grid.5379.8, grid.9026.d, grid.470117.4, grid...",204.01543,7,False,[0000-0003-3075-7787]
7,1,Irina V,ur.0767105504.29,Grigorieva,"[grid.418975.6, grid.500282.d, grid.418751.e, ...",204.01543,8,False,[0000-0001-5991-7778]


### Example 5. Query with MUST/NOT Operators

By default, the string containing a list of concepts is interpreted as a sequence of `AND` clauses. That is, the query tries to match the highest number of concepts without any preference. 

It is possible to specify MUST/NOT rules with concepts by passing them via a string and using the `+` and `-` operators. 

Note: please remember that concepts phrases (= concepts that are composed by more than one word) need to be wrapped using quotes, and the quotes need to be escaped with a `\`.


In [10]:
concepts = """ 
    +"ambipolar electric field effect" 
    -"graphitic films" 
    +"films"
    "electric field effects"
    """

q = f"""
identify experts 
    from concepts "{dsl_escape(concepts)}"
    using publications
return experts
"""

print("Query:\n======", q)

dsl.query(q).as_dataframe()



Query:
identify experts 
    from concepts " 
    +\"ambipolar electric field effect\" 
    -\"graphitic films\" 
    +\"films\"
    \"electric field effects\"
    "
    using publications
return experts



Unnamed: 0,docs_found,first_name,id,last_name,orcid_id,research_orgs,score
0,1,Luc,ur.01005576245.93,Henrard,[0000-0002-2564-1221],"[grid.5284.b, grid.121334.6, grid.6520.1]",51.43535
1,1,Sylvain,ur.01251242035.86,Latil,,"[grid.14095.39, grid.462531.7, grid.457336.0, ...",51.43535
2,1,Paul,ur.01000623240.81,Syers,,[grid.164295.d],44.719536
3,1,Nicholas Patrick,ur.01046736440.46,Butch,[0000-0002-6083-8388],"[grid.8547.e, grid.507868.4, grid.94225.38, gr...",44.719536
4,1,John-Pierre,ur.01060352233.12,Paglione,,"[grid.8547.e, grid.507868.4, grid.440050.5, gr...",44.719536
5,1,Michael Sears,ur.01200656557.13,Fuhrer,[0000-0001-6183-2773],"[grid.184769.5, grid.1002.3, grid.499241.3, gr...",44.719536
6,1,Dohun,ur.01205352017.54,Kim,[0000-0001-9687-2089],"[grid.14003.36, grid.35541.36, grid.15444.30, ...",44.719536
7,1,Victor V,ur.01025667341.62,Sysoev,[0000-0002-0372-1802],"[grid.446088.6, grid.263856.c, grid.78837.33, ...",38.569305
8,1,Mikhail A,ur.01245543252.06,Shekhirev,[0000-0002-8381-1276],"[grid.14476.30, grid.24434.35, grid.166341.7]",38.569305
9,1,Alexey,ur.01276657166.76,Lipatov,[0000-0001-5043-1616],"[grid.14476.30, grid.426324.5, grid.10420.37, ...",38.569305


### Example 6. MUST together with AND/OR 

In [11]:
concepts = """ 
    (+"ambipolar electric field effect" -"graphitic films") OR 
    (+"films" -"electric field effects")
    """

q = f"""
identify experts 
    from concepts "{dsl_escape(concepts)}"
    using publications
return experts
"""

print("Query:\n======", q)

dsl.query(q).as_dataframe()



Query:
identify experts 
    from concepts " 
    (+\"ambipolar electric field effect\" -\"graphitic films\") OR 
    (+\"films\" -\"electric field effects\")
    "
    using publications
return experts



Unnamed: 0,docs_found,first_name,id,last_name,orcid_id,research_orgs,score
0,3,Pablo,ur.01034030721.03,Jarillo-Herrero,[0000-0001-8217-8213],"[grid.159791.2, grid.5338.d, grid.116068.8, gr...",78.260747
1,3,Young Sang,ur.01342755473.89,Lee,,"[grid.69566.3a, grid.94225.38, grid.507868.4, ...",78.260747
2,3,Lan,ur.014670440227.86,Wang,[0000-0001-7124-2718],"[grid.418788.a, grid.1007.6, grid.17635.36, gr...",75.645182
3,3,Shun-Qing,ur.0624630056.98,Shen,[0000-0002-1954-5882],"[grid.8547.e, grid.450298.2, grid.464262.0, gr...",75.645182
4,3,Alexander S,ur.0646414360.09,Sinitskii,[0000-0002-8688-3451],"[grid.24434.35, grid.170430.1, grid.1957.a, gr...",69.48809
5,2,Peng,ur.01150036175.42,Ren,,[grid.59025.3b],51.162295
6,2,Azat,ur.056250446.77,Sulaev,,[grid.59025.3b],51.162295
7,2,Bin,ur.0756673070.05,Xia,,[grid.59025.3b],51.162295
8,2,James Mitchell,ur.01275626274.52,Tour,[0000-0002-8479-9328],"[grid.264756.4, grid.21940.3e, grid.254567.7, ...",49.416673
9,2,Christian F,ur.01010600302.93,Kisielowski,,"[grid.184769.5, grid.8385.6, grid.469490.6, gr...",47.051859


### Example 7. Wildcard searches

In [12]:
concepts = """temperat* "ray diffraction" -magnet* """

q = f"""
identify experts 
    from concepts "{dsl_escape(concepts)}"
    using publications
return experts
"""

print("Query:\n======", q)

dsl.query(q).as_dataframe()

Query:
identify experts 
    from concepts "temperat* \"ray diffraction\" -magnet* "
    using publications
return experts



Unnamed: 0,docs_found,first_name,id,last_name,research_orgs,score,orcid_id
0,4,Akinori,ur.07620725665.51,Katsui,"[grid.26999.3d, grid.69566.3a, grid.265061.6, ...",45.133475,
1,3,Andrey V,ur.010274015357.59,Khoroshilov,"[grid.435216.7, grid.431939.5]",34.575968,[0000-0002-0678-1421]
2,3,Konstantin S,ur.014606545157.85,Gavrichev,[grid.435216.7],34.575968,[0000-0001-5304-3555]
3,3,Paul,ur.014146743075.39,Hagenmuller,"[grid.4795.f, grid.463879.7, grid.411840.8, gr...",34.065056,
4,3,Yi-Tai,ur.01261545713.97,Qian,"[grid.503014.3, grid.27255.37, grid.12527.33, ...",33.943935,
5,3,Jean Pierre,ur.012446305716.07,Chaminade,"[grid.4444.0, grid.461891.3, grid.5292.c, grid...",33.930549,
6,2,Tatyana V,ur.011457114721.52,Dyachkova,"[grid.426536.0, grid.465372.1, grid.446087.9]",23.329706,[0000-0001-6204-797X]
7,2,Sergey A,ur.015627070115.78,Gromilov,"[grid.4886.2, grid.4605.7, grid.415877.8, grid...",23.329706,
8,2,Elena V,ur.01264404625.74,Boldyreva,"[grid.4605.7, grid.424048.e, grid.4708.b, grid...",23.184193,[0000-0002-1401-2438]
9,2,Alexander P,ur.015443160631.46,Tyutyunnik,"[grid.426536.0, grid.4886.2, grid.10548.38, gr...",23.063437,[0000-0003-1360-0913]


## Additional resources: shortcut functions included in Dimcli

Dimcli includes a number of 'shortcut' [Python functions](https://digital-science.github.io/dimcli/modules.html#module-dimcli.core.functions) that make it easier to work with the expert identification API. 


In [13]:
from dimcli.functions import extract_concepts, identify_experts, build_reviewers_matrix

### extract_concepts

A Python wrapper for the DSL function extract_concept ([see source](https://digital-science.github.io/dimcli/modules.html#dimcli.core.functions.extract_concepts)).

Extract concepts from any text. Text input is processed and extracted concepts are returned as an array of strings ordered by their relevance

In [14]:
%%extract_concepts

We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
 metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
 valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
 holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
 centimeters per volt-second can be induced by applying gate voltage.

Unnamed: 0,concept,relevance
0,square centimeter,0.681
1,films,0.669
2,ambipolar electric field effect,0.653
3,two-dimensional semimetal,0.646
4,electric field effects,0.628
5,room temperature mobility,0.621
6,conductance band,0.601
7,graphitic films,0.596
8,field effects,0.596
9,centimeters,0.587


### identify_experts

A Python wrapper for the full expert identification workflow ([see source](https://digital-science.github.io/dimcli/modules.html#dimcli.core.functions.identify_experts)). 

This wrapper provide a simpler version of the expert identification API. It is meant to be a convenient alternative for basic queries. For more options, it is advised to use the API directly.

In [15]:
%%identify_experts

We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
 metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
 valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
 holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
 centimeters per volt-second can be induced by applying gate voltage.

Unnamed: 0,docs_found,first_name,first_publication_year,id,last_name,orcid_id,score,total_grants,total_publications,dimensions_url
0,17,Daichi,2000,ur.01203703171.12,Chiba,[0000-0002-6631-5131],720.802273,14,226,https://app.dimensions.ai/discover/publication...
1,12,Ze Don,1983,ur.01055006635.53,Kvon,,564.498955,16,367,https://app.dimensions.ai/discover/publication...
2,14,Nobuhiro,1976,ur.011513332561.53,Ohta,,547.722462,23,229,https://app.dimensions.ai/discover/publication...
3,12,Tomohiro,2008,ur.01311211105.43,Koyama,[0000-0003-4796-1776],497.447215,3,111,https://app.dimensions.ai/discover/publication...
4,10,Teruo,1993,ur.012735754655.38,Ono,,407.495159,29,486,https://app.dimensions.ai/discover/publication...
5,7,Pablo,1999,ur.01034030721.03,Jarillo-Herrero,[0000-0001-8217-8213],402.322615,7,276,https://app.dimensions.ai/discover/publication...
6,9,Kenji,1987,ur.010575643400.34,Watanabe,[0000-0003-3701-8119],360.808606,13,2694,https://app.dimensions.ai/discover/publication...
7,9,Takashi,1989,ur.0765715521.02,Taniguchi,,360.808606,24,2874,https://app.dimensions.ai/discover/publication...
8,8,Eugene,1989,ur.0740560235.48,Olshanetsky,[0000-0001-7027-9084],357.604337,0,98,https://app.dimensions.ai/discover/publication...
9,8,Takahiro,2002,ur.014407221755.12,Moriyama,[0000-0001-7071-0823],313.140915,10,181,https://app.dimensions.ai/discover/publication...


### Build a reviewers matrix

Generates a matrix of candidate reviewers for abstracts, using the expert identification workflow ([see source](https://digital-science.github.io/dimcli/modules.html#dimcli.core.functions.build_reviewers_matrix)).

If the input abstracts include identifiers, then those are used in the resulting matrix. Alternatively, a simple list of strings as input will result in a matrix where the identifiers are auto-generated from the abstracts order (first one is 1, etc..).

In [16]:
abstracts = [
     {
     'id' : 'A1',
     'text' : """We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
 metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
 valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
 holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
 centimeters per volt-second can be induced by applying gate voltage."""
     },
     {
     'id' : "A2",
     'text' : """The physicochemical properties of a molecule-metal interface, in principle, can play a significant role in tuning the electronic properties
 of organic devices. In this report, we demonstrate an electrode engineering approach in a robust, reproducible molecular memristor that
 enables a colossal tunability in both switching voltage (from 130 mV to 4 V i.e. >2500% variation) and current (by ~6 orders of magnitude).
 This provides a spectrum of device design parameters that can be “dialed-in” to create fast, scalable and ultralow energy organic
 memristors optimal for applications spanning digital memory, logic circuits and brain-inspired computing."""
     }
 ]

In [17]:
candidates = ["ur.01146544531.57", "ur.011535264111.51", "ur.0767105504.29", "ur.011513332561.53", "ur.01055006635.53"]

In [18]:
build_reviewers_matrix(abstracts, candidates, verbose=False)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.95s/it]


Unnamed: 0,researcher,A1,A2
0,ur.01146544531.57,0.0,0.0
1,ur.011535264111.51,500.057833,237.479195
2,ur.0767105504.29,860.072228,924.316053
3,ur.011513332561.53,3235.742721,1140.205152
4,ur.01055006635.53,2518.152591,1183.93619
