# Publications markdown generator for academicpages

Takes a TSV of publications with metadata and converts them for use with [academicpages.github.io](academicpages.github.io). This is an interactive Jupyter notebook ([see more info here](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html)). The core python code is also in `publications.py`. Run either from the `markdown_generator` folder after replacing `publications.tsv` with one containing your data.

TODO: Make this work with BibTex and other databases of citations, rather than Stuart's non-standard TSV format and citation style.


## Data format

The TSV needs to have the following columns: pub_date, title, venue, excerpt, citation, site_url, and paper_url, with a header at the top. 

- `excerpt` and `paper_url` can be blank, but the others must have values. 
- `pub_date` must be formatted as YYYY-MM-DD.
- `url_slug` will be the descriptive part of the .md file and the permalink URL for the page about the paper. The .md file will be `YYYY-MM-DD-[url_slug].md` and the permalink will be `https://[yourdomain]/publications/YYYY-MM-DD-[url_slug]`

This is how the raw file looks (it doesn't look pretty, use a spreadsheet or other program to edit and create).

In [48]:
!cat publications.tsv

pub_date	title	citation	venue	url_slug	paper_url
2021-01-01	Mixseq: mixture sequencing using compressed sensing for in-situ and in-vitro applications 	AM Zador, <b>AG Vaughan</b>	US Patent App. PCT/US2020/066,853 (2021)	Mixseq: mixture sequencing using compressed sensing for in-situ and in-vitro applications (Patent)	https://agvaughan.github.io/files/Vaughan_MIXSEQ_WO2021133911A1.pdf
2021-01-01	Barcode demixing through non-negative spatial regression (bardensr) 	S Chen, J Loper, X Chen, <b>AG Vaughan</b>, AM Zador, L Paninski	PLoS computational biology 17 (3), e1008256 5 (2021)	Barcode demixing through non-negative spatial regression (bardensr)	https://agvaughan.github.io/files/Chen_Vaughan_Paninski.pdf
2019-01-01	Frontal cortex neuron types categorically encode single decision variables 	J Hirokawa, <b>AG Vaughan</b>, P Masset, T Ott, A Kepecs	Nature 576 (7787), 446-451 75 (2019)	Frontal cortex neuron types categorically encode single decision variables (Nature)	https://agvaughan.gith

## Import pandas

We are using the very handy pandas library for dataframes.

In [49]:
import pandas as pd

## Import TSV

Pandas makes this easy with the read_csv function. We are using a TSV, so we specify the separator as a tab, or `\t`.

I found it important to put this data in a tab-separated values format, because there are a lot of commas in this kind of data and comma-separated values can get messed up. However, you can modify the import statement, as pandas also has read_excel(), read_json(), and others.

In [50]:
publications = pd.read_csv("publications.tsv", sep="\t", header=0)
publications


Unnamed: 0,pub_date,title,citation,venue,url_slug,paper_url
0,2021-01-01,Mixseq: mixture sequencing using compressed se...,"AM Zador, <b>AG Vaughan</b>","US Patent App. PCT/US2020/066,853 (2021)",Mixseq: mixture sequencing using compressed se...,https://agvaughan.github.io/files/Vaughan_MIXSE...
1,2021-01-01,Barcode demixing through non-negative spatial ...,"S Chen, J Loper, X Chen, <b>AG Vaughan</b>, AM...","PLoS computational biology 17 (3), e1008256 5 ...",Barcode demixing through non-negative spatial ...,https://agvaughan.github.io/files/Chen_Vaughan_...
2,2019-01-01,Frontal cortex neuron types categorically enco...,"J Hirokawa, <b>AG Vaughan</b>, P Masset, T Ott...","Nature 576 (7787), 446-451 75 (2019)",Frontal cortex neuron types categorically enco...,https://agvaughan.github.io/files/Hirokawa_Vaug...
3,2018-01-01,A viral receptor complementation strategy to o...,"SJ Li, <b>AG Vaughan</b>, JF Sturgill, A Kepecs","Neuron 98 (5), 905-917. e541 (2018)",A viral receptor complementation strategy to o...,https://agvaughan.github.io/files/Li_Vaughan_St...
4,2017-01-01,Rapid and tunable method to temporally control...,"S Senturk, NH Shirole, DG Nowak, V Corbo, D Pa...","Nature Communications 8 (1), 1-10 104 (2017)",Rapid and tunable method to temporally control...,https://agvaughan.github.io/files/Senturk_2016.pdf
5,2016-01-01,A mechanosensory circuit that mixes opponent c...,"AEB Chang, <b>AG Vaughan</b>, RI Wilson","Neuron 92 (4), 888-901 22 (2016)",A mechanosensory circuit that mixes opponent c...,https://agvaughan.github.io/files/Chang_Vaughan...
6,2015-01-01,Central neural circuitry mediating courtship s...,"C Zhou, R Franconville, <b>AG Vaughan</b>, CC ...","Elife 4, e08477 68 (2015)",Central neural circuitry mediating courtship s...,https://agvaughan.github.io/files/Zhou_2016.pdf
7,2014-01-01,Neural pathways for the detection and discrimi...,"<b>AG Vaughan</b>, C Zhou, DS Manoli, BS Baker","Current Biology 24 (10), 1039-1049 68 (2014)",Neural pathways for the detection and discrimi...,https://agvaughan.github.io/files/Vaughan_2014.pdf
8,2010-01-01,Sex and the single cell. II. There is a time a...,"CC Robinett, <b>AG Vaughan</b>, JM Knapp, BS B...","PLoS biology 8 (5), e1000365 193 (2010)",Sex and the single cell. II. There is a time a...,https://agvaughan.github.io/files/Robinett_Vaug...
9,2009-01-01,Manipulation of an innate escape response in D...,"G Zimmermann, L Wang, <b>AG Vaughan</b>, DS Ma...","PloS one 4 (4), e5100 27 (2009)",Manipulation of an innate escape response in D...,https://agvaughan.github.io/files/Zimmerman.pdf


## Escape special characters

YAML is very picky about how it takes a valid string, so we are replacing single and double quotes (and ampersands) with their HTML encoded equivilents. This makes them look not so readable in raw format, but they are parsed and rendered nicely.

In [51]:
html_escape_table = {
    "&": "&amp;",
    '"': "&quot;",
    "'": "&apos;"
    }

def html_escape(text):
    """Produce entities within text."""
    return "".join(html_escape_table.get(c,c) for c in text)

## Creating the markdown files

This is where the heavy lifting is done. This loops through all the rows in the TSV dataframe, then starts to concatentate a big string (```md```) that contains the markdown for each type. It does the YAML metadata first, then does the description for the individual page.

In [52]:
import os
for row, item in publications.iterrows():
    print(item)
    
    md_filename = str(item.pub_date) + "-" + item.url_slug + ".md"
    html_filename = str(item.pub_date) + "-" + item.url_slug
    year = str(item.pub_date)[:4]
    
    ## YAML variables
    
    md = "---\ntitle: \""   + item.title + '"\n'
    
    md += """collection: publications"""
    
    md += """\npermalink: /publication/""" + html_filename
    
    # if len(str(item.excerpt)) > 5:
    #     md += "\nexcerpt: '" + html_escape(item.excerpt) + "'"
    
    md += "\ndate: " + str(item.pub_date) 
    
    md += "\nvenue: '" + html_escape(item.venue) + "'"
    
    if len(str(item.paper_url)) > 5:
        md += "\npaperurl: '" + item.paper_url + "'"
    
    md += "\ncitation: '" + html_escape(item.citation) + "'"
    
    md += "\n---"
    
    ## Markdown description for individual page
        
    # if len(str(item.excerpt)) > 5:
    #     md += "\n" + html_escape(item.excerpt) + "\n"
    
    if len(str(item.paper_url)) > 5:
        md += "\n[Download paper here](" + item.paper_url + ")\n" 
        
    #md += "\nRecommended citation: " + item.citation
    
    md_filename = os.path.basename(md_filename)
       
    with open("../_publications/" + md_filename, 'w') as f:
        f.write(md)

pub_date                                            2021-01-01
title        Mixseq: mixture sequencing using compressed se...
citation                           AM Zador, <b>AG Vaughan</b>
venue                 US Patent App. PCT/US2020/066,853 (2021)
url_slug     Mixseq: mixture sequencing using compressed se...
paper_url    https://agvaughan.github.io/files/Vaughan_MIXSE...
Name: 0, dtype: object
pub_date                                            2021-01-01
title        Barcode demixing through non-negative spatial ...
citation     S Chen, J Loper, X Chen, <b>AG Vaughan</b>, AM...
venue        PLoS computational biology 17 (3), e1008256 5 ...
url_slug     Barcode demixing through non-negative spatial ...
paper_url    https://agvaughan.github.io/files/Chen_Vaughan_...
Name: 1, dtype: object
pub_date                                            2019-01-01
title        Frontal cortex neuron types categorically enco...
citation     J Hirokawa, <b>AG Vaughan</b>, P Masset, T Ott...
venue  

These files are in the publications directory, one directory below where we're working from.

In [53]:
!ls ../_publications/

2009-01-01-Manipulation of an innate escape response in Drosophila: photoexcitation of acj6 neurons induces the escape response.md
2010-01-01-Sex and the single cell. II. There is a time and place for sex.md
2014-01-01-Neural pathways for the detection and discrimination of conspecific song in D. melanogaster.md
2015-01-01-Central neural circuitry mediating courtship song perception in male Drosophila.md
2016-01-01-A mechanosensory circuit that mixes opponent channels to produce selectivity for complex stimulus features.md
2017-01-01-Rapid and tunable method to temporally control gene editing based on conditional Cas9 stabilization.md
2018-01-01-A viral receptor complementation strategy to overcome CAV-2 tropism for efficient retrograde targeting of neurons.md
2019-01-01-Frontal cortex neuron types categorically encode single decision variables (Nature).md
2021-01-01-Barcode demixing through non-negative spatial regression (bardensr).md
2021-01-01-Mixseq: mixture sequencing using compr

In [54]:
!cat ../_publications/2009-10-01-paper-title-number-1.md

cat: ../_publications/2009-10-01-paper-title-number-1.md: No such file or directory
