# Improving Coverage Rate for Author ID's by Modifying Previous Scripts

Note that in "Vtech_retrieve_auth_id_v0", I use 

```df = frame.apply(enrich_w_auth, axis = 1)``` 

This applies the auth_looker function to each row in the VTech csv file.  See [this page](https://github.com/MatthewRGonzalez/Vitae/blob/main/scopus%20api%20work/scripts/Vtech_retrieve%20auth_id_v0.ipynb) for details

The auth_looker function searches for first and last names associated with "Virginia Polytechnic Institute". 

It came to my attention that this function did not capture every individual's author_id's, because some authors listed their affiliation as "Virginia Tech" or a similiar variation: See "Dipankar Chakravarti".

To improve coverage. I believe that I can run a similiar function over the updated csv file to captured auth_id's from those whose afiliation name is similar to "Virginia Polytechnic Institute"









In [None]:
# Some imports
import pandas as pd

# Read in CSV. Here we have a list of auth_ids, but lets try to improve coverage...
frame = pd.read_csv('multiple_authors.csv')
frame.id


In [None]:
## Define a function to apply api calls over dataframe

from elsapy.elsclient import ElsClient
from elsapy.elsprofile import ElsAuthor, ElsAffil
from elsapy.elsdoc import FullDoc, AbsDoc
from elsapy.elssearch import ElsSearch
import json
    
## Load configuration (contains my API key for scopus)
con_file = open("config.json")
config = json.load(con_file)
con_file.close()

## Initialize client
client = ElsClient(config['apikey'])


#Based almost entirely on function from CV_Prog_beta
#Try, Except passes over errors with api calls

def auth_looker(first,last):
    auth_srch = ElsSearch(('authlast(%s) and authfirst(%s)and affil(Virginia Tech)' %(last,first)),'author')
    auth_srch.execute(client)
    #Changed Affil value to increase coverage (Virginia Tech) 
    
    try:
        
        ids = [(item['dc:identifier']) for item in auth_srch.results]
    
    except:
        pass
    
    return(ids)

# This is the function that allows us to use elements of our dataframe as arguments in the
# auth_looker function. Function then "enriches" original dataframe with new 'id' column


def enrich_w_auth(row):
    column_first = 'first'
    column_last = 'last'
    first_val = row[column_first]
    second_val = row[column_last]
    try:
        
        auth_ids = (auth_looker(first_val,second_val))
        row['id_other'] = auth_ids
    
    except:
        row['id_other'] = ''
    return row



In [None]:
df = frame.apply(enrich_w_auth, axis = 1)

### Thoughts:

This isn't the most efficient or concise code. Also, I'm not sure if this will work perfectly (I'll know after I run it in Syracuse). My guess is that it will create a new col where for individuals whose affiliation name is captured by "Virginia Tech", instead of "Virginia Polytechnic Institue". Then I can just clean the data and merge the columns.