# Talks markdown generator for academicpages

Takes a TSV of talks with metadata and converts them for use with [academicpages.github.io](academicpages.github.io). This is an interactive Jupyter notebook ([see more info here](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html)). The core python code is also in `talks.py`. Run either from the `markdown_generator` folder after replacing `talks.tsv` with one containing your data.

TODO: Make this work with BibTex and other databases, rather than Stuart's non-standard TSV format and citation style.

In [1]:
import pandas as pd
import os

## Data format

The TSV needs to have the following columns: title, type, url_slug, venue, date, location, talk_url, description, with a header at the top. Many of these fields can be blank, but the columns must be in the TSV.

- Fields that cannot be blank: `title`, `url_slug`, `date`. All else can be blank. `type` defaults to "Talk" 
- `date` must be formatted as YYYY-MM-DD.
- `url_slug` will be the descriptive part of the .md file and the permalink URL for the page about the paper. 
    - The .md file will be `YYYY-MM-DD-[url_slug].md` and the permalink will be `https://[yourdomain]/talks/YYYY-MM-DD-[url_slug]`
    - The combination of `url_slug` and `date` must be unique, as it will be the basis for your filenames

This is how the raw file looks (it doesn't look pretty, use a spreadsheet or other program to edit and create).

In [2]:
!cat talks.tsv

title	type	url_slug	venue	date	location	talk_url	description	pdf_url	cont_title
Emergence of Quantum Phases in Novel Materials	Attended	Emergence 2021	Instituto de Ciencia de Materiales de Madrid (ICMM), CSIC	September 2021	Madrid, Spain	https://wp.icmm.csic.es/emergence/	Graduate Summer School focused on the effects of interactions and topology in materials and low-dimensional systems.		
Frontiers in Condensed Matter Physics	Poster	QDev 2022	Niels Bohr Institute, University of Copenhagen	July 2022	Copenhagen, Denmark	https://qdev.nbi.ku.dk/summerschool/qdevnbia-summer-school-2022/	Graduate Summer School that covers selected topics from the frontier of condensed matter, selected by high-profile invited teachers based on their ongoing research.	https://carlosp24.github.io/files/Poster2023_04.pdf	Theory of Caroli-de Gennes-Matricon analogs in full-shell hybrid nanowires
YouMat2023	Poster	YouMat 2023	Instituto de Ciencia de Materiales de Madrid (ICMM), CSIC	May 2023	Madrid, Spain	https://

## Import TSV

Pandas makes this easy with the read_csv function. We are using a TSV, so we specify the separator as a tab, or `\t`.

I found it important to put this data in a tab-separated values format, because there are a lot of commas in this kind of data and comma-separated values can get messed up. However, you can modify the import statement, as pandas also has read_excel(), read_json(), and others.

In [3]:
talks = pd.read_csv("talks.tsv", sep="\t", header=0)
talks

Unnamed: 0,title,type,url_slug,venue,date,location,talk_url,description,pdf_url,cont_title
0,Emergence of Quantum Phases in Novel Materials,Attended,Emergence 2021,Instituto de Ciencia de Materiales de Madrid (...,September 2021,"Madrid, Spain",https://wp.icmm.csic.es/emergence/,Graduate Summer School focused on the effects ...,,
1,Frontiers in Condensed Matter Physics,Poster,QDev 2022,"Niels Bohr Institute, University of Copenhagen",July 2022,"Copenhagen, Denmark",https://qdev.nbi.ku.dk/summerschool/qdevnbia-s...,Graduate Summer School that covers selected to...,https://carlosp24.github.io/files/Poster2023_0...,Theory of Caroli-de Gennes-Matricon analogs in...
2,YouMat2023,Poster,YouMat 2023,Instituto de Ciencia de Materiales de Madrid (...,May 2023,"Madrid, Spain",https://www.icmm.csic.es/es/icmm/i-seminario-d...,First Seminar for young Materials Researchers ...,https://carlosp24.github.io/files/Poster2023_0...,Theory of Caroli-de Gennes-Matricon analogs in...
3,QuantumMatter 2023,Poster,QuantumMatter 2023,Phantoms Foundation,May 2023,"Madrid, Spain",https://www.quantumconf.eu/2023/,International Conference aiming to gather the ...,https://carlosp24.github.io/files/Poster2023_0...,Theory of Caroli-de Gennes-Matricon analogs in...
4,Bound States in Superconducting Nanodevices,Poster,Bound States 2023,TopSquad and AndQC collaborations,June 2023,"Budapest, Hungary",https://www.boundstates2023.eu/,Workshop on Andreev and Majorana bound states ...,https://carlosp24.github.io/files/Poster2023_0...,Theory of Caroli-de Gennes-Matricon analogs in...
5,Emergence of Quantum Phases in Novel Materials,Poster,Emergence 2023,Instituto de Ciencia de Materiales de Madrid (...,September 2023,"Madrid, Spain",https://wp.icmm.csic.es/emergence/,Graduate Summer School focused on the effects ...,https://carlosp24.github.io/files/Poster2023_0...,Majorana zero modes in full-shell hybrid nanow...
6,Workshop on Superconductor-Semiconductor Hybrids,Attended,Copenhagen 2024,"Niels Bohr Institute, University of Copenhagen",March 2024,"Copenhagen, Denmark",https://qdev.nbi.ku.dk/,Workshop sponsored by the ERC Synergy grant NO...,,
7,European School on Superconductivity and Magne...,Poster,SuperQmap 2024,SuperQmap COST action,April 2024,"Gandía, Spain",https://superqumap.eu/european-school-on-super...,Interdisciplinary Graduate School on the found...,https://carlosp24.github.io/files/Poster2024_0...,Phenomenology of Majorana zero modes in full-s...
8,Quantum matter for Quantum Technologies Workshop,Poster,Spice 2024,SPICE,May 2024,"Mainz, Germany",https://www.spice.uni-mainz.de/qmqt-home/,,https://carlosp24.github.io/files/Poster2024_0...,Phenomenology of Majorana zero modes in full-s...


## Escape special characters

YAML is very picky about how it takes a valid string, so we are replacing single and double quotes (and ampersands) with their HTML encoded equivilents. This makes them look not so readable in raw format, but they are parsed and rendered nicely.

In [4]:
html_escape_table = {
    "&": "&amp;",
    '"': "&quot;",
    "'": "&apos;"
    }

def html_escape(text):
    if type(text) is str:
        return "".join(html_escape_table.get(c,c) for c in text)
    else:
        return "False"

## Creating the markdown files

This is where the heavy lifting is done. This loops through all the rows in the TSV dataframe, then starts to concatentate a big string (```md```) that contains the markdown for each type. It does the YAML metadata first, then does the description for the individual page.

In [5]:
loc_dict = {}

for row, item in talks.iterrows():
    
    md_filename = str(item.date) + "-" + item.url_slug + ".md"
    html_filename = str(item.date) + "-" + item.url_slug 
    year = item.date[:4]
    
    md = "---\ntitle: \""   + item.title + '"\n'
    md += "collection: talks" + "\n"
    
    if len(str(item.type)) > 3:
        md += 'type: "' + item.type + '"\n'
    else:
        md += 'type: "Talk"\n'
    
    md += "permalink: /talks/" + html_filename + "\n"
    
    if len(str(item.venue)) > 3:
        md += 'venue: "' + item.venue + '"\n'
        
    if len(str(item.location)) > 3:
        md += "date: " + str(item.date) + "\n"
    
    if len(str(item.location)) > 3:
        md += 'location: "' + str(item.location) + '"\n'
           
    md += "---\n"
    
    if len(str(item.pdf_url)) > 3:
        md += "\n[Check my contribution here.](" + item.pdf_url + ")\n"

     

    if len(str(item.description)) > 3:
        md += "\n" + html_escape(item.description) + "\n"

    if len(str(item.talk_url)) > 3:
        md += "\n[More info here.](" + item.talk_url + ")\n"
        
        
    md_filename = os.path.basename(md_filename)
    #print(md)
    
    with open("../_talks/" + md_filename, 'w') as f:
        f.write(md)

# Outreach

In [6]:
outreach = pd.read_csv("outreach.tsv", sep="\t", header=0)
outreach

Unnamed: 0,title,type,url_slug,venue,date,location,talk_url,description
0,European Researchers' Nigth,Logistics,ERN2022,Instituto de Ciencia de Materiales de Madrid (...,September 2022,"Madrid, Spain",https://lanochedelosinvestigadores.es/,Participation as a member of the logistics tea...
1,Ciencia en la Calle,Workshopper,Ciudad Real 2023,Casa de la Ciencia,June 2023,"Ciudad Real, Spain",https://casadelaciencia.es/,Science fair for families and general public.
2,European Researchers' Night 2023,Workshopper,ERN2023,Instituto de Ciencia de Materiales de Madrid (...,September 2023,"Madrid, Spain",https://lanochedelosinvestigadores.es/,Event for primary school and high-school stude...
3,Semana de la Ciencia (Science Week),Workshopper,Semana de la Ciencia 2023,CSIC,November 2023,"Madrid, Spain",https://www.semanadelaciencia.csic.es/,Scientific outreach event for primary school a...
4,Feria Madrid es Ciencia 2024 (Madrid Science F...,Workshopper,FeriaMadrid2024,Comunidad de Madrid,March 2024,"Madrid, Spain",https://www.madrimasd.org/feriamadridesciencia/,Scientific Fair for all ages students and gene...


In [7]:
loc_dict = {}

for row, item in outreach.iterrows():
    
    md_filename = str(item.date) + "-" + item.url_slug + ".md"
    html_filename = str(item.date) + "-" + item.url_slug 
    year = item.date[:4]
    
    md = "---\ntitle: \""   + item.title + '"\n'
    md += "collection: outreach" + "\n"
    
    if len(str(item.type)) > 3:
        md += 'type: "' + item.type + '"\n'
    else:
        md += 'type: "Outreach event"\n'
    
    md += "permalink: /outreach/" + html_filename + "\n"
    
    if len(str(item.venue)) > 3:
        md += 'venue: "' + item.venue + '"\n'
        
    if len(str(item.location)) > 3:
        md += "date: " + str(item.date) + "\n"
    
    if len(str(item.location)) > 3:
        md += 'location: "' + str(item.location) + '"\n'
           
    md += "---\n"

    if len(str(item.description)) > 3:
        md += "\n" + html_escape(item.description) + "\n"
        
    if len(str(item.talk_url)) > 3:
        md += "\n[More info here.](" + item.talk_url + ")\n"
        
    md_filename = os.path.basename(md_filename)
    #print(md)
    
    with open("../_outreach/" + md_filename, 'w') as f:
        f.write(md)