# Building a simple rejected articles tracker

In this notebook we are going to 

* load a list of publications titles from a local CSV file
* match those titles against the Dimensions database
* save the matched items in a new CSV file

## Prerequisites: Installing the Dimensions Library and Logging in

In [1]:

# @markdown # Get the API library and login
# @markdown Click the 'play' button on the left (or shift+enter) after entering your API credentials

username = "" #@param {type: "string"}
password = "" #@param {type: "string"}
endpoint = "https://app.dimensions.ai" #@param {type: "string"}


!pip install dimcli plotly tqdm -U --quiet
import dimcli
from dimcli.shortcuts import *
dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()

#
# load common libraries
import time
import sys
import json
import pandas as pd
from pandas.io.json import json_normalize
from tqdm.notebook import tqdm as progress

#
# charts libs
# import plotly_express as px
import plotly.express as px
if not 'google.colab' in sys.modules:
  # make js dependecies local / needed by html exports
  from plotly.offline import init_notebook_mode
  init_notebook_mode(connected=True)

DimCli v0.6.7 - Succesfully connected to <https://app.dimensions.ai> (method: dsl.ini file)


## Load a CSV file with publication titles 

In [2]:
data = pd.read_csv("pubs_titles_to_match.csv")
data.head(10)

Unnamed: 0,title
0,The Importance of Truth Telling and Trust
1,How municipalities support energy cooperatives...
2,Real-world study for the optimal charging of e...
3,The Impact of 3D Printing on Manufacturer-Reta...
4,Influences of supply chain finance on the mass...
5,Review of probabilistic load flow approaches f...
6,Economic analysis of the disassembling activit...
7,Third-party remanufacturing mode selection for...
8,Comparison of Different Electric Vehicle Integ...
9,A new fuzzy approach based on BWM and fuzzy pr...


## Query to match publications in Dimensions

PS new version of dimcli has better string escape

In [50]:
def dsl_escape(stringa, all=False):   
    """
    Helper for escaping the full-text inner query string, when it includes quotes. Usage:

    `search publications for "{dsl_escape(complex_q)}" return publications`

    EG imagine the query string:
    '"2019-nCoV" OR "COVID-19" OR "SARS-CoV-2" OR (("coronavirus"  OR "corona virus") AND (Wuhan OR China))'
    
    In Python, if you want to embed it into a DSL query, it has to become:
    '\\"2019-nCoV\\" OR \\"COVID-19\\" OR \\"SARS-CoV-2\\" OR ((\\"coronavirus\\"  OR \\"corona virus\\") AND (Wuhan OR China))'

    See also: https://docs.dimensions.ai/dsl/language.html#for-search-term
    CHARS = '^ " : ~ \ [ ] { } ( ) ! | & +'
    """
    
    if all:
        escaped = stringa.translate(str.maketrans({"^":  r"\^",
                                                    '"':  r'\"',
                                                    "\\": r"\\",
                                                    ":":  r"\:",
                                                    "~":  r"\~",
                                                    "[":  r"\[",
                                                    "]":  r"\]",
                                                    "{":  r"\{",
                                                    "}":  r"\}",
                                                    "(":  r"\(",
                                                    ")":  r"\)",
                                                    "!":  r"\!",
                                                    "|":  r"\|",
                                                    "&":  r"\&",
                                                    "+":  r"\+",
                                                    }))
    else:
        escaped = stringa.translate(str.maketrans({'"':  r'\"'}))        
    return escaped


In [57]:
dsl_escape('Solar cells: a new technology?', True)

'Solar cells\\: a new technology?'

In [52]:
q = '"2019-nCoV" OR "COVID-19" OR "SARS-CoV-2" OR (("coronavirus"  OR "corona virus") AND (Wuhan OR China))'
q_template.format(dsl_escape(q))

'\nsearch publications in title_abstract_only for "\\"2019-nCoV\\" OR \\"COVID-19\\" OR \\"SARS-CoV-2\\" OR ((\\"coronavirus\\"  OR \\"corona virus\\") AND (Wuhan OR China))" return publications[id+doi+title+year+journal] limit 1\n'

In [53]:
q_template.format(dsl_escape(q))

'\nsearch publications in title_abstract_only for "\\"2019-nCoV\\" OR \\"COVID-19\\" OR \\"SARS-CoV-2\\" OR ((\\"coronavirus\\"  OR \\"corona virus\\") AND (Wuhan OR China))" return publications[id+doi+title+year+journal] limit 1\n'

In [54]:
dsl.query(q_template.format(dsl_escape(q)))

Returned Publications: 1 (total = 17496)


<dimcli.Dataset object #4577230224. Records: 1/17496>

In [55]:
q_template = """
search publications in title_abstract_only for "{}" return publications[id+doi+title+year+journal] limit 1
"""

# TODO escape ^ " : ~ \ [ ] { } ( ) ! | & +

out = pd.DataFrame
for x in data.title.to_list()[:10]:
    q = q_template.format(dsl_escape(x, True))
    print("===\n", q)
    res = dsl.query(q)
    if res.count_batch:
        if out.empty:
            out = res.as_dataframe()
        else:
            out.append(res.as_dataframe())



===
 
search publications in title_abstract_only for "The Importance of Truth Telling and Trust" return publications[id+doi+title+year+journal] limit 1

Returned Publications: 1 (total = 12)
===
 
search publications in title_abstract_only for "How municipalities support energy cooperatives\: survey results from Germany and Switzerland" return publications[id+doi+title+year+journal] limit 1

Returned Publications: 1 (total = 1)
===
 
search publications in title_abstract_only for "Real-world study for the optimal charging of electric vehicles" return publications[id+doi+title+year+journal] limit 1

Returned Publications: 1 (total = 31)
===
 
search publications in title_abstract_only for "The Impact of 3D Printing on Manufacturer-Retailer Supply Chains" return publications[id+doi+title+year+journal] limit 1

Returned Publications: 1 (total = 1)
===
 
search publications in title_abstract_only for "Influences of supply chain finance on the mass customization program\: risk attitudes and

# TODO

* fix escaped chars
* in results include the source string too
* for each result include how many items were found in total eg 1 or 30