# Talks markdown generator for agvaughan

Takes a TSV of talks with metadata and converts them for use with [agvaughan.github.io](agvaughan.github.io). This is an interactive Jupyter notebook ([see more info here](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html)). The core python code is also in `talks.py`. Run either from the `markdown_generator` folder after replacing `talks.tsv` with one containing your data.

TODO: Make this work with BibTex and other databases, rather than Stuart's non-standard TSV format and citation style.

In [28]:
import pandas as pd
import os

## Data format

The TSV needs to have the following columns: title, type, url_slug, venue, date, location, talk_url, description, with a header at the top. Many of these fields can be blank, but the columns must be in the TSV.

- Fields that cannot be blank: `title`, `url_slug`, `date`. All else can be blank. `type` defaults to "Talk" 
- `date` must be formatted as YYYY-MM-DD.
- `url_slug` will be the descriptive part of the .md file and the permalink URL for the page about the paper. 
    - The .md file will be `YYYY-MM-DD-[url_slug].md` and the permalink will be `https://[yourdomain]/talks/YYYY-MM-DD-[url_slug]`
    - The combination of `url_slug` and `date` must be unique, as it will be the basis for your filenames

This is how the raw file looks (it doesn't look pretty, use a spreadsheet or other program to edit and create).

In [29]:
!cat talks.tsv

title	type	url_slug	venue	date	location	talk_url	description
"Compressive approaches for neuronal reconstruction using DNA barcodes Alexander Vaughan, Xiaoyin Chen, Anthony Zador, Cold Spring Harbor Laboratory"	Talk		Cosyne	2017-2-23	"Salt Lake City, UT"		
"Categorical representations of decision variables within OFC Alexander Vaughan, Junya Hirokawa, Adam Kepecs, Cold Spring Harbor Laboratory . . . ."	Talk		Cosyne	2015-3-5	"Salt Lake City, UT"		
"Electrons, fluorophores, and nucleotides: bridging the gaps in high-throughput connectomics"	Workshop (Organizer)		Cosyne Workshops	2017-2-27	"Snowbird, UT"		
The perils and promises (and methods) of sequence-based connectomics	Workshop Talk (Organizer)		Cosyne Workshops	2017-2-27	"Snowbird, UT"		
Structured and cell-type-specific encoding of decision variables in orbitofrontal cortex	Talk		Annual Meeting of the Japan Neuroscience Society	2020-07-31	"Kobe, JP"		
Categorical representations of decision variables within Oribito frontal cortex	T

## Import TSV

Pandas makes this easy with the read_csv function. We are using a TSV, so we specify the separator as a tab, or `\t`.

I found it important to put this data in a tab-separated values format, because there are a lot of commas in this kind of data and comma-separated values can get messed up. However, you can modify the import statement, as pandas also has read_excel(), read_json(), and others.

In [30]:
talks = pd.read_csv("talks.tsv", sep="\t", header=0)
talks = talks.fillna(' ', inplace=False)
talks

Unnamed: 0,title,type,url_slug,venue,date,location,talk_url,description
0,Compressive approaches for neuronal reconstruc...,Talk,,Cosyne,2017-2-23,"Salt Lake City, UT",,
1,Categorical representations of decision variab...,Talk,,Cosyne,2015-3-5,"Salt Lake City, UT",,
2,"Electrons, fluorophores, and nucleotides: brid...",Workshop (Organizer),,Cosyne Workshops,2017-2-27,"Snowbird, UT",,
3,The perils and promises (and methods) of seque...,Workshop Talk (Organizer),,Cosyne Workshops,2017-2-27,"Snowbird, UT",,
4,Structured and cell-type-specific encoding of ...,Talk,,Annual Meeting of the Japan Neuroscience Society,2020-07-31,"Kobe, JP",,
5,Categorical representations of decision variab...,Talk,,Annual Meeting of the Society for Neuroscience,2016-7-1,"San Diego, CA",,
6,The representational content of rat orbitofron...,Talk,,CSHL Symposium,2014-05-01,"Cold Spring Harbor, NY",,
7,Interview with a Neuroscientist - Alex Vaughan,Podcast,,Numenta - On Intelligence,2018-10-24,Podcast,,
8,Fairness in AI Ð Lessons from Practice,Invited Talk,,Wharton Customer Analytics,2021-5-17,Conference,,
9,Molecular Tools For High-Throughput Neuronal C...,Talk,,Kavli Futures Symposium,2018-10-27,"Santa Monica, California",,


## Escape special characters

YAML is very picky about how it takes a valid string, so we are replacing single and double quotes (and ampersands) with their HTML encoded equivilents. This makes them look not so readable in raw format, but they are parsed and rendered nicely.

In [31]:
html_escape_table = {
    "&": "&amp;",
    '"': "&quot;",
    "'": "&apos;"
    }

def html_escape(text):
    if type(text) is str:
        return "".join(html_escape_table.get(c,c) for c in text)
    else:
        return "False"

## Creating the markdown files

This is where the heavy lifting is done. This loops through all the rows in the TSV dataframe, then starts to concatentate a big string (```md```) that contains the markdown for each type. It does the YAML metadata first, then does the description for the individual page.

In [32]:
loc_dict = {}

for row, item in talks.iterrows():
    
    print(item)
    
    md_filename = str(item.date) + "-" + item.url_slug + ".md"
    html_filename = str(item.date) + "-" + item.url_slug 
    year = item.date[:4]
    
    md = "---\ntitle: \""   + item.title + '"\n'
    md += "collection: talks" + "\n"
    
    if len(str(item.type)) > 3:
        md += 'type: "' + item.type + '"\n'
    else:
        md += 'type: "Talk"\n'
    
    md += "permalink: /talks/" + html_filename + "\n"
    
    if len(str(item.venue)) > 3:
        md += 'venue: "' + item.venue + '"\n'
        
    if len(str(item.location)) > 3:
        md += "date: " + str(item.date) + "\n"
    
    if len(str(item.location)) > 3:
        md += 'location: "' + str(item.location) + '"\n'
           
    md += "---\n"
    
    
    if len(str(item.talk_url)) > 3:
        md += "\n[More information here](" + item.talk_url + ")\n" 
        
    
    if len(str(item.description)) > 3:
        md += "\n" + html_escape(item.description) + "\n"
        
        
    md_filename = os.path.basename(md_filename)
    #print(md)
    
    with open("../_talks/" + md_filename, 'w') as f:
        f.write(md)

title          Compressive approaches for neuronal reconstruc...
type                                                        Talk
url_slug                                                        
venue                                                     Cosyne
date                                                   2017-2-23
location                                      Salt Lake City, UT
talk_url                                                        
description                                                     
Name: 0, dtype: object
title          Categorical representations of decision variab...
type                                                        Talk
url_slug                                                        
venue                                                     Cosyne
date                                                    2015-3-5
location                                      Salt Lake City, UT
talk_url                                                        
de

These files are in the talks directory, one directory below where we're working from.

In [33]:
!ls ../_talks

2014-05-01- .md 2017-2-23- .md  2018-10-27- .md 2020-07-31- .md 2022-3-1- .md
2015-3-5- .md   2017-2-27- .md  2018-7-29- .md  2021-5-17- .md
2016-7-1- .md   2018-10-24- .md 2018-9-30- .md  2022-2-23- .md


In [34]:
!cat ../_talks/2013-03-01-tutorial-1.md

cat: ../_talks/2013-03-01-tutorial-1.md: No such file or directory
