# Making the 2015-2016 KIPAC Publication List

Yao-Yuan Mao and Phil Marshall

In this notebook we describe how we made the publication list for the KIPAC 2016 Annual Report. It uses the code in this repository, and produces some clearly labelled "2015-2016" products, including:


> WARNING: this notebook *looks* like it could be just run all the way through, but that's not quite true. The process of verifying the list of papers found requires human intervention and iteration. Don't just run this all the way through!

In [16]:
from PublicationListUtils import *

## Get the List of Members

This currently has to be maintained by hand. Phil took the 2014-2015 list that Yao had used, and updated it based on the KIPAC website, finding 201 faculty, postdocs, research associates, staff and students.

> This list does not match the list of KIPAC Members in the Annual Report! Phil used the website before remembering that he had a complete list in the report PDF... So the numbers below could be changed, if there is time. If you can still read this message, though, it was not been done!

In [2]:
# 2014-2015:
# url = 'https://docs.google.com/spreadsheets/d/1Ok5i25gibuLTHoRhGhmNMqTplMh--0-i4L_8cxzkUxI/export?format=csv&gid=0'

# 2015-2016 - same Google Sheets file, different worksheet:
url = 'https://docs.google.com/spreadsheets/d/1Ok5i25gibuLTHoRhGhmNMqTplMh--0-i4L_8cxzkUxI/export?format=csv&gid=147959823'

members = load_members_from_google_sheets(url)

## Find Everyone's Papers

First we query ADS for all papers "published" (either on arxiv or in a journal) during the report period. Phil chose a complete academic year, between 9/1/15 and 8/31/16. Note that this methodology leads to some double counting when reports are combined: papers can appear on the arxiv for one report, and then again when they are published duing the following year. We decided not to worry about this, on the grounds that both arxiv and journal publication are occasions for celebration!

In [3]:
articles = Articles(query_constraints={'pubdate':'["2015-09-01" TO "2016-08-31"]', 
                                       'database':'("astronomy" OR "physics")'})

Now we go through the member list, finding articles whose author list appears to match their names. The output of this command is the number of papers found for that member.

In [4]:
for member in members:
    print member['key'], articles.add(member)

Noah Kurinsky



 2
Yajie Yuan 3
Hongjun An 6
Keith Thompson 9
Jean-Baptiste Ruffio 4
James Panetta 0
Ranjan Laha 7
Aaron Roodman 41
Gary Godfrey 18
Yao-Yuan Mao 9
Todd Hoeksema 0
Phil Marshall 17
Toshiya Namikawa 13
Warren Morningstar 2
Tony Johnson 5
Anders Borgland 4
Kirk Gilmore 1
Tom Shutt 7
Makoto Asai 6
Fatima Rubio da Costa 4
Ralf Kaehler 0
Tom Abel 4
Warit Mitthumsiri 0
Philipp Mertsch 1
Homer Neal 0
Elisabeth Krause 16
Marco Viero 8
Irina Zhuravleva 12
Patricia Burchat 11
Anja von der Linden 9
Kelly Stifter 1
Daniel Gruen 22
William East 8
James Grayson 7
Maria Elena Monzani 29
Scott Reid 0
Joshua Meyers 0
Tracy Usher 3
Wei Ji 1
Shantha Condamoor 0
Eva Silverstein 4
Ondrej Urban 2
Sean McLaughlin 0
Eric Charles 1
Sami Tantawi 9
Stewart Koppell 0
Matthew Wood 1
Christina Ignarra 8
Dan Akerib 9
Ki Won Yoon 10
Blas Cabrera 4
Richard Dubois 1
Sowmya Kamath 0
Dale Li 10
Greg Madejski 17
Norbert Werner 15
Saptarshi Chaudhuri 3
Lori White 0
Alice Allafort 0
Frederic Effenberger 5
James Russell 7
Ada

## Clean Up The List of Papers

Some articles are wrongly attributed to KIPAC members - these need to have their bibcodes added to the "remove" list. 
Other papers could be missing - these are hard to identify, other than by asking `everyone@kipac` to review the publication list! But when they are found, they can be added to the "white" list to force them into our final list of bibcodes.

In [5]:
articles.whiten_member_collab()

In [6]:
with open('white_list.txt') as f:
    white_list = [l.strip() for l in f]
articles.white_list(white_list)

In [7]:
with open('remove_list.txt') as f:
    remove_list = [l.strip() for l in f]
articles.remove(remove_list)

Even after applying the remove list and white list, there could still be papers left to check: the following call reports on the papers that need verification. 

> It's not clear what the operation one is supposed to do here: Phil initially made a giant `white_list` by copying in all the bibcodes that needed verification, and just removed the few that did not look like ours - but Yao may have done something different...

In [8]:
require_verification = articles.get_require_verification()
for item in require_verification:
    print u'{0[bibcode]} [{1}?]\n"{0[title]}" by {0[first_author]} et al.\n'.format(item, u'? '.join(item['to_verify']))

Let's see how many papers we have:

In [None]:
print 'total #', articles.get_count()
print 'arxiv #', articles.get_count(arxiv_only=True)
print 'awaits verification', len(require_verification)

Here's a funny case - a paper that has two arxiv IDs?

In [11]:
articles._bib['2016arXiv160708697E'] = articles._bib['2016arXiv160708697A']

## Writing Out the Publication List

Now that we have a polished list of bibcodes, we can write out the titles and authors and journals in various ways.
There's a few formats we might want to use: latex, html and plain text. David (the annual report designer) wanted plain text.

### LaTeX

In [12]:
with open('tex/entries.tex', 'w') as f:
    for l in articles.generate_formatted_output(AuthorsFormatter(name_formatter_tex), entry_formatter_tex):
        f.write(l + '\n')

### HTML

In [13]:
output = articles.generate_formatted_output(AuthorsFormatter(name_formatter_html), entry_formatter_html)

with open('html/template.html', 'r') as f:
    with open('html/index.html', 'w') as fo:
        for l in f:
            if l.strip() == '<!-- INSERT HERE -->':
                for o in output:
                    fo.write(o+'\n')
            else:
                fo.write(l)

### Plain Text

In [17]:
with open('text/entries.txt', 'w') as f:
    for l in articles.generate_formatted_output(AuthorsFormatter(name_formatter_text), entry_formatter_text):
        f.write(l + '\n')

This plain text file is our main product this year, so let's make a copy so that David can find it easily:

In [None]:
# ! cp text/entries.txt KIPAC_Publications_2015-2016.txt

# Actually I needed to strip out the SUP and SUB formatting, like this:

! cat text/entries.txt | sed s%'<SUB>'%''%g | sed s%'</SUB>'%''%g | sed s%'<SUP>'%''%g | sed s%'</SUP>'%''%g > KIPAC_Publications_2015-2016.txt 

## ADS Metrics Analysis

For this we need to query ADS for all 547 papers, and then "Explore" them, choosing "Citation Metrics" from the menu. The "big query" needs a list of bibcodes, which we write out as follows:

In [19]:
with open('all_bibcodes.txt', 'w') as f:
    for l in articles.get_bibcodes():
        f.write(l + '\n')

In [None]:
! cp all_bibcodes.txt KIPAC_Publications_2015-2016_bibcodes.txt

The option to upload a list of bibcodes is kind of hard to find. It's in the ["Paper Form"](https://ui.adsabs.harvard.edu/#paper-form) tab of the ADS Bumblebee interface. Once there you can paste in a long list of bibcodes to the form box.

Phil saved the metric analysis output as a PDF (from a simple browser print operation) which can be viewed [here](https://github.com/KIPAC/PublicationList/raw/2015-2016/KIPAC_Publications_2015-2016_ADSmetrics.pdf). 

From this, Phil manually edited `tex/main.tex` to say the following (having commented out the 2014 version):

```tex
All 547 papers published by KIPAC members between September 1, 2015 and August 31, 2016.
These papers received more than 79,000 ``reads'' by the community, and by the end of the year they had already been cited by 2,701 other papers (at a mean citation rate of 8.2 per paper),
 \href{https://ui.adsabs.harvard.edu/#search/q=*%3A*&__qid=1d427b14eaa3b54b6bf4455466e67f49}{according to SAO/NASA ADS}.
KIPAC's $h$-index just for this year was 28 (meaning that we wrote 28 papers that had already received 28 citations by the end of August).
```

## Making a Word Cloud

We can visualize what KIPAC has been working on by stripping out the titles and passing them verbatim to some online wordcloud tools. Phil made a file containing only titles like this:

In [None]:
! cat KIPAC_Publications_2015-2016.txt | cut -d'"' -f2 | grep . > KIPAC_Publications_2015-2016_titles.txt

Phil used [wordle](http://www.wordle.net/create) to make the word cloud image. First he cropped out a small sample from the report PDF, and then pulled out colors using [this online picker](http://imagecolorpicker.com/). Here's the color sample image:
<img src="https://github.com/KIPAC/PublicationList/raw/2015-2016/color_sample.png" width=100px align="right"></img>


Turns out that our purple has (approximate) hex code `#371136` and our teal is `#114f4a`. Phil used the "Telephoto" font, and chose "rounder edges" and "horizontal" layout. Here's the final PNG, grabbed from the screen unfortunately, as the wordle "save as PNG" function failed (stupid Java).

<img src="https://github.com/KIPAC/PublicationList/raw/2015-2016/KIPAC_Publications_2015-2016_titles.png"></img>