# 用于academicpages的出版物标记生成器

获取带有元数据的TSV出版物，并将其转换为[academicpages.github.io]（academicpages.github.io）使用。这是一个交互式的Jupyter笔记本（[查看更多信息]（http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html））。核心的python代码也在`publications.py`中。在将`publications.tsv`替换为包含你的数据的文件夹后，从`markdown_generator`文件夹中运行其中一个。

TODO: 让它与BibTex和其他数据库的引文一起工作，而不是Stuart的非标准TSV格式和引文风格。

## 数据格式

TSV需要有以下几列：pub_date, title, venue, excerpt, citation, site_url, and paper_url，在顶部有一个标题。

- `excerpt`和`paper_url`可以是空白，但其他的必须有值。
- `pub_date`必须格式化为YYY-MM-DD。
- `url_slug`将是.md文件的描述性部分和关于论文的固定URL。.md文件将是`YYYY-MM-DD-[url_slug].md`，固定网址将是`https://[yourdomain]/publications/YYYY-MM-DD-[url_slug]`。

这就是原始文件的样子（它看起来并不漂亮，使用电子表格或其他程序来编辑和创建）。

In [6]:
!cat publications.tsv

'cat' �����ڲ����ⲿ���Ҳ���ǿ����еĳ���
���������ļ���


## Import pandas

We are using the very handy pandas library for dataframes.

In [7]:
import pandas as pd

## Import TSV

Pandas makes this easy with the read_csv function. We are using a TSV, so we specify the separator as a tab, or `\t`.

I found it important to put this data in a tab-separated values format, because there are a lot of commas in this kind of data and comma-separated values can get messed up. However, you can modify the import statement, as pandas also has read_excel(), read_json(), and others.

In [8]:
publications = pd.read_csv("publications.tsv", sep="\t", header=0)
publications


Unnamed: 0,pub_date,title,venue,excerpt,citation,url_slug,paper_url
0,2022-11-20,Key Node Identification Method Integrating Inf...,The Computer Journal,This paper is about the number 1. The number 2...,"XiaoYang Liu, LuYuan Gao. (2022). ""Key Node Id...",Key Node Identification Method Integrating Inf...,https://github.com/LuYuanGao1017/LuYuanGao1017...


## Escape special characters

YAML is very picky about how it takes a valid string, so we are replacing single and double quotes (and ampersands) with their HTML encoded equivilents. This makes them look not so readable in raw format, but they are parsed and rendered nicely.

In [9]:
html_escape_table = {
    "&": "&amp;",
    '"': "&quot;",
    "'": "&apos;"
    }

def html_escape(text):
    """Produce entities within text."""
    return "".join(html_escape_table.get(c,c) for c in text)

## Creating the markdown files

This is where the heavy lifting is done. This loops through all the rows in the TSV dataframe, then starts to concatentate a big string (```md```) that contains the markdown for each type. It does the YAML metadata first, then does the description for the individual page.

In [10]:
import os
for row, item in publications.iterrows():
    
    md_filename = str(item.pub_date) + "-" + item.url_slug + ".md"
    html_filename = str(item.pub_date) + "-" + item.url_slug
    year = item.pub_date[:4]
    
    ## YAML variables
    
    md = "---\ntitle: \""   + item.title + '"\n'
    
    md += """collection: publications"""
    
    md += """\npermalink: /publication/""" + html_filename
    
    if len(str(item.excerpt)) > 5:
        md += "\nexcerpt: '" + html_escape(item.excerpt) + "'"
    
    md += "\ndate: " + str(item.pub_date) 
    
    md += "\nvenue: '" + html_escape(item.venue) + "'"
    
    if len(str(item.paper_url)) > 5:
        md += "\npaperurl: '" + item.paper_url + "'"
    
    md += "\ncitation: '" + html_escape(item.citation) + "'"
    
    md += "\n---"
    
    ## Markdown description for individual page
        
    if len(str(item.excerpt)) > 5:
        md += "\n" + html_escape(item.excerpt) + "\n"
    
    if len(str(item.paper_url)) > 5:
        md += "\n[Download paper here](" + item.paper_url + ")\n" 
        
    md += "\nRecommended citation: " + item.citation
    
    md_filename = os.path.basename(md_filename)
       
    with open("../_publications/" + md_filename, 'w') as f:
        f.write(md)

These files are in the publications directory, one directory below where we're working from.

In [11]:
!ls ../_publications/

'ls' �����ڲ����ⲿ���Ҳ���ǿ����еĳ���
���������ļ���


In [12]:
!cat ../_publications/2009-10-01-paper-title-number-1.md

'cat' �����ڲ����ⲿ���Ҳ���ǿ����еĳ���
���������ļ���
