# Generate NSF Collaborators and Other Affiliations Information from Your Publications

[NSF grant proposal guidelines](https://www.nsf.gov/pubs/policydocs/pappg18_1/pappg_2.jsp) require the listing of all possible reviewers who may have a conflict of interest with reviewing your application.

Some of these reviewers are relatively static (your Ph.D. advisor), but the bulk of them are people you've published with in the last 48 months. This notebook helps specifically with creating that section of the [COA form](https://www.nsf.gov/bfa/dias/policy/coa.jsp).

To start, you will either need:

1. A citation database of your publications

2. A Google Scholar profile ([see how to create](https://libguides.reading.ac.uk/boost/google-scholar-profile))

The output of the notebook will be tab-delimited data that can be copy/pasted into the NSF template

## Step 1: Get Citations in RIS format

### You have a citation database of your publications

These instructions are generic because there are many different types of citation database. The logical steps are:

1. Do a query to select all  publications that you are either an author of OR an editor of. In JabRef this would be, e.g., `author=olney or editor=olney`

2. Select the publications matching your query and then save/export in [RIS format](https://en.wikipedia.org/wiki/RIS_%28file_format%29). In JabRef this would be `File -> Export selected entries -> Save dialogue: enter file name and choose RIS file type`

### You have a Google Scholar profile

Note Google Scholar does not have gold-standard citation information but it is very likely to be correct for authors IF you have done some curation. It is not unusual to have to purge publications that aren't yours on initial set up.

1. Install [Publish or Perish](https://harzing.com/resources/publish-or-perish)

2. Do `Menu -> Query -> New Google Scholar Query -> input your name` followed by `Query pane -> Right click your name -> Save to File -> Save to RIS`

## Step 2: Change the information below and run

In [3]:
//CHANGE THIS TO YOUR FILE PATH!
let filePath = """/z/aolney/grant_proposals/1-CV/nsf-coauthor/092520.ris"""

//CHANGE THIS TO YOUR LAST NAME!
let yourName = "Olney"

//OPTIONAL - IF YOU HAVE AN OLD NSF COA FORM WITH INSTITUTIONAL INFORMATION, THEN
// 1. SAVE AS TAB DELIMITED TEXT
// 2. PUT THE PATH BELOW
// OTHERWISE LEAVE EMPTY, E.G. ""
let oldPath = "" //"""/z/aolney/grant_proposals/1-CV/nsf-coauthor/coa-olney-datawhys-120819.csv"""

open System.IO
open System.Text.RegularExpressions

type RISElement =
    {
        Tag : string
        Content : string
    }

//remove initials and trim
let removeInitial name =
    Regex.Replace( name, @"[A-Z]\.","").Trim()
    
//Map of old names to institutions. Assume relevant rows start with [A-Z]:
let institutionMap =
    if oldPath <> "" then
        oldPath
        |> System.IO.File.ReadAllLines
        |> Seq.choose( fun row ->
            let s = row.Split('\t')
            if s.Length > 2 && s.[0].Trim().EndsWith(":") then
                let collaborator = s.[1] |> removeInitial                    
                let institution = s.[2].Trim()
                if collaborator <> "" && institution <> "" then
                    Some(collaborator, institution)
                else 
                    None
            else
                None
        )
        |> Map.ofSeq
    else
        Map.empty
    
let getEntries (entryText : string) =
    entryText.Split('\n')
    //remove rows without elements
    |> Seq.filter( fun row -> row.Contains("-"))
    //map row to RISElement
    |> Seq.map( fun element -> 
        let delimIndex = element.IndexOf("-")
        let tag = element.Substring(0, delimIndex).Trim()
        let content = element.Substring( delimIndex + 1 ).Trim() |> removeInitial
        { Tag = tag; Content = content}
    )
    
let authorTuples,editorTuples =
    //ER is end of record; use to group
    Regex.Split(
        filePath |> File.ReadAllText,
        @"^ER\s*-\s*",
        RegexOptions.Multiline
        )
    //Remove blank lines
    |> Seq.filter( fun entryText -> entryText <> "" )
    //Map entry rows to authors and editors in the last 4 years 
    |> Seq.collect( fun entryText ->
        let entries = entryText |> getEntries
        let year = 
            match entries |> Seq.tryFind( fun entry -> entry.Tag = "Y1" ) with
            | Some(y) -> y.Content.Substring(0,4) //year not month
            | None -> "NA"
            
        //We assume authors are always AU
        let authors = entries |> Seq.choose( fun entry -> if entry.Tag = "AU" then Some(entry) else None )
        //Tricky: RIS doesn't handle editors super well, e.g. Jabref makes them A2
        let editors = entries |> Seq.choose( fun entry -> if entry.Tag = "A2" || entry.Tag = "ED" then Some(entry) else None)
        let youEditor = editors |> Seq.exists(  fun entry -> entry.Content.Contains(yourName))
       
        let temp = new ResizeArray<string*string*string>()

        for author in authors do
            temp.Add( author.Content, year, "A:")
        
        //They are coeditors if you are among them
        if youEditor then
            for editor in editors do
                temp.Add( editor.Content, year, "E:")
            
        //collect flattens these
        temp
        )
    //Filter out only last 48 months (~ 4 years)
    |> Seq.filter( fun (c,y,t) -> 
        match y |> System.Int32.TryParse with
        | true, v -> System.DateTime.Now.Year - v < 4
        | false, _ -> true //keep all NAs for user to fix manually
    )
    |> Seq.toList
    //Partition to separate authors and editors so we can have identical names on both
    |> List.partition (fun (c,y,t) -> t = "A:" )
   
//Get only the last year of collaboration for each individual
let getLastActive tuples =
    tuples
    |> List.groupBy (fun (c,y,t) -> c )
    |> List.map( fun (g,tuples) -> tuples |> Seq.maxBy( fun(c,y,t) -> y) ) //works on strings b/c 4 digit year
    |> List.sortBy( fun (c,y,t) -> t + c) 

//map to something printable
let output = 
    //Get last active year for authors and editors and concatenate
    (authorTuples |> getLastActive) @ (editorTuples |> getLastActive)
    |> Seq.map( fun (c,y,t) -> 
        let institution = 
            match institutionMap.TryFind c with
            | Some inst -> inst
            //Will create a hyperlink in excel to search the name in Google Scholar to get affiliation
            | None -> "\"=HYPERLINK(\"\"https://scholar.google.com/scholar?hl=en&q=" + c + "&btnG=\"\", \"\"SearchAffiliation\"\")\""
            
        t + "\t" + c + "\t" + institution + "\t\t" + y )

//Done!
System.IO.File.WriteAllLines( filePath + "_OUTPUT.tsv", output )

printfn "DONE - CHECK YOUR OUTPUT FOLDER"

DONE - CHECK YOUR OUTPUT FOLDER


## Step 3: Copy/paste

The output of the notebook will be tab-delimited data that can be copy/pasted into the NSF template.

Things to look out for:

- If you imported affiliations from a previous COA, double check any colleagues who have moved (esp. students).
- Double check for duplicates. This happens if colleagues are not consistently using a professional name.
- Add missing affiliations. `Ctrl+click` on the appropriate spreadsheet cell should launch a browser query to Google Scholar to make this easier.
- Remember authors and editors go in different sections.