# python实现HTML生成PDF

## 安装依赖

下载并系统安装：wkhtmltopdf, 链接：https://wkhtmltopdf.org/index.html,

注意: python 64位的对应wkhtmltopdf 64位版本.

总共需要安装：

1. pdfkit
2. wkhtmltopdf(python包)
3. wkhtmltopdf(windows安装包)以及路径配置 

流程: 程序会使用pdfkit，pdfkit会调用wkhtmltopdf，而wkhtmltopdf会调用windows中的wkhtmltopdf.exe来转化html为pdf。

### windows

将wkhtmltopdf的bin目录添加至path环境变量，注意：重启生效。命令行输入：

In [None]:
SETX PATH "%PATH%;C:\\Program Files\\wkhtmltopdf\\bin" /M

临时导入环境变量，命令行输入：

In [None]:
set PATH=C:\Program Files\wkhtmltopdf\bin;%PATH%

### Linux

In [None]:
sudo apt-get install wkhtmltopdf

### 安装python第三方库：

In [None]:
pip install pdfkit
pip install wkhtmltopdf

## 生成HTML

安装jinja2，并使用jinja2动态生成HTML模版 

In [None]:
pip install jinja2

构造HTML模版，这个根据具体需求并参考jinja2构建模版方法，自行构建。

## 使用python转换HTML为PDF

In [None]:
import pdfkit
from jinja2 import Environment, FileSystemLoader, select_autoescape

def convert_pdf(html, pdf_file):
    options = {
        'page-size': 'Letter',
        'margin-top': '0.75in',
        'margin-right': '0.75in',
        'margin-bottom': '0.75in',
        'margin-left': '0.75in',
        'encoding': "UTF-8",
        'custom-header': [
            ('Accept-Encoding', 'gzip')
        ],
        'cookie': [
            ('cookie-name1', 'cookie-value1'),
            ('cookie-name2', 'cookie-value2'),
        ],
        'outline-depth': 10,
    }
    # pdfkit.from_file(html, pdf_file, options=options)
    pdfkit.from_string(html, pdf_file, options=options)
    
env = Environment(
            loader=FileSystemLoader('../app/templates'),
            autoescape=select_autoescape(['html', 'xml'])
        )

template = env.get_template('report.html')
# template = template.render().encode('utf-8')
# template = template.decode('utf-8')
template = template.render()
# print(template)
convert_pdf(template, 'out3.pdf')

### 解决分页问题

当一个html页面很长, 需要在指定的地方分页怎么办呢？ 

wkhtmltopdf 有个很好的方法，就是在div样式后添加一个：page-break-inside:avoid;就可以了。例如:

In [None]:
div{ width:800px; min-height:1362px; margin:auto; page-break-inside:avoid;}

### 遇到的一些问题

#### 关于no such file or directory:b'' 这种错误

1. 在python中出现时，意味着有.exe文件需要被调用，而该.exe文件没有被安装或者在环境变量中没有添加该.exe的路径。
2. 有时候需要改pdfkit代码为下列两句，才可消除错误：

In [None]:
config=pdfkit.configuration(wkhtmltopdf=r"C:\Program File\wkhtmltopdf\bin\wkhtmltopdf.exe")
pdfkit.from_url(url, name,configuration=config)

### 关于wkhtmltopdf的配置

In [None]:
# 全局选项:
Global Options:                       
      --collate                       Collate when printing multiple copies(default)
      --no-collate                    Do not collate when printing multiple copies
      --cookie-jar <path>             Read and write cookies from and to the supplied cookie jar file
      --copies <number>               Number of copies to print into the pdf file (default 1)
  -d, --dpi <dpi>                     Change the dpi explicitly (this has no effect on X11 based systems)
                                      (default 96) #设定dpi的大小,默认值:96, 注意: 在X11平台该选项无效
  -H, --extended-help                 Display more extensive help, detailing less common command switches
  -g, --grayscale                     PDF will be generated in grayscale # 生成黑白色PDF,不彩打PDF
  -h, --help                          Display help
      --htmldoc                       Output program html help
      --image-dpi <integer>           When embedding images scale them down to this dpi
                                      (default 600) # 当嵌入图片缩小到指定dpi,默认600dpi
      --image-quality <integer>       When jpeg compressing images use this quality (default 94)
      --license                       Output license information and exit
  -l, --lowquality                    Generates lower quality pdf/ps. Useful to shrink the result document space
      --manpage                       Output program man page
  -B, --margin-bottom <unitreal>      Set the page bottom margin # 设置页面底边边距
  -L, --margin-left <unitreal>        Set the page left margin (default 10mm) # 设置页面左边距,默认10mm
  -R, --margin-right <unitreal>       Set the page right margin (default 10mm)# 设置页面右边距,默认10mm
  -T, --margin-top <unitreal>         Set the page top margin # 设置页面顶部边距
  -O, --orientation <orientation>     Set orientation to Landscape or Portrait(default Portrait) # 设置横向页面还是纵向页面,默认纵向
      --page-height <unitreal>        Page height # 设置页面高度
  -s, --page-size <Size>              Set paper size to: A4, Letter, etc. (default A4) # 设置页面尺寸,默认A4
      --page-width <unitreal>         Page width # 设置页面宽度
      --no-pdf-compression            Do not use lossless compression on pdf objects
  -q, --quiet                         Be less verbose
      --read-args-from-stdin          Read command line arguments from stdin
      --readme                        Output program readme
      --title <text>                  The title of the generated pdf file (The title of the first document is used if not specified)
  -V, --version                       Output version information and exit



# 大纲选项:(设置PDF文件左边的大纲,即书签)
Outline Options:
      --dump-default-toc-xsl          Dump the default TOC xsl style sheet to stdout
      --dump-outline <file>           Dump the outline to a file
      --outline                       Put an outline into the pdf (default) # 保存大纲到PDF(默认保存)
      --no-outline                    Do not put an outline into the pdf # 不保存大纲到PDF
      --outline-depth <level>         Set the depth of the outline (default 4) # 设置大纲层级深度,默认4级

        
# 页面选项:
Page Options:
      --allow <path>                  Allow the file or files from the specified
                                      folder to be loaded (repeatable) # 允许从指定目录下加载单个或多个文件
      --background                    Do print background (default) # 打印背景(默认)
      --no-background                 Do not print background # 不打印背景
      --bypass-proxy-for <value>      Bypass proxy for host (repeatable) # 绕过主机代理（可重复）
      --cache-dir <path>              Web cache directory
      --checkbox-checked-svg <path>   Use this SVG file when rendering checked checkboxes
      --checkbox-svg <path>           Use this SVG file when rendering unchecked checkboxes
      --cookie <name> <value>         Set an additional cookie (repeatable), value should be url encoded.
      --custom-header <name> <value>  Set an additional HTTP header (repeatable)
      --custom-header-propagation     Add HTTP headers specified by
                                      --custom-header for each resource request.
      --no-custom-header-propagation  Do not add HTTP headers specified by
                                      --custom-header for each resource request.
      --debug-javascript              Show javascript debugging output
      --no-debug-javascript           Do not show javascript debugging output (default)
      --default-header                Add a default header, with the name of the
                                      page to the left, and the page number to the right, this is short for:
                                      --header-left='[webpage]'
                                      --header-right='[page]/[toPage]' --top 2cm
                                      --header-line
      --encoding <encoding>           Set the default text encoding, for input # 指定输入文本的编码
      --disable-external-links        Do not make links to remote web pages # 禁用外部链接
      --enable-external-links         Make links to remote web pages (default) # 外部链接可用
      --disable-forms                 Do not turn HTML form fields into pdf form fields (default) # 不转换html表单到PDF表单(默认)
      --enable-forms                  Turn HTML form fields into pdf form fields # 转换html表单到pdf表单
      --images                        Do load or print images (default) # 加载并打印图片(默认)
      --no-images                     Do not load or print images # 不加载打印图片
      --disable-internal-links        Do not make local links # 不创建内部链接
      --enable-internal-links         Make local links (default) # 创建内部链接(默认)
  -n, --disable-javascript            Do not allow web pages to run javascript # 不允许运行js代码
      --enable-javascript             Do allow web pages to run javascript(default) # 允许运行js代码(默认)
      --javascript-delay <msec>       Wait some milliseconds for javascript finish (default 200)
      --keep-relative-links           Keep relative external links as relative external links
      --load-error-handling <handler> Specify how to handle pages that fail to load: 
                                      abort, ignore or skip (default abort) # 当页面加载失败,指定处理方式: 中止, 忽略 或跳过(默认中止)
      --load-media-error-handling <handler> Specify how to handle media files that fail to load: 
                                      abort, ignore or skip (default ignore) # 指定媒体文件加载失败处理方式, 默认忽略
      --disable-local-file-access     Do not allowed conversion of a local file
                                      to read in other local files, unless explicitly allowed with --allow
      --enable-local-file-access      Allowed conversion of a local file to read in other local files. (default)
      --minimum-font-size <int>       Minimum font size # 指定最小字体大小
      --exclude-from-outline          Do not include the page in the table of contents and outlines # 排除目录页和大纲页
      --include-in-outline            Include the page in the table of contents and outlines (default) # 包含目录页和大纲页
      --page-offset <offset>          Set the starting page number (default 0) # 设置起始页页码,默认是 0
      --password <password>           HTTP Authentication password
      --disable-plugins               Disable installed plugins (default)
      --enable-plugins                Enable installed plugins (plugins will likely not work)
      --post <name> <value>           Add an additional post field (repeatable)
      --post-file <name> <path>       Post an additional file (repeatable)
      --print-media-type              Use print media-type instead of screen
      --no-print-media-type           Do not use print media-type instead ofscreen (default)
  -p, --proxy <proxy>                 Use a proxy # 使用代理
      --radiobutton-checked-svg <path> Use this SVG file when rendering checked radiobuttons
      --radiobutton-svg <path>        Use this SVG file when rendering unchecked radiobuttons
      --resolve-relative-links        Resolve relative external links into absolute links (default)
      --run-script <js>               Run this additional javascript after the page is done loading (repeatable)
      --disable-smart-shrinking       Disable the intelligent shrinking strategy
                                      used by WebKit that makes the pixel/dpi ratio none constant
      --enable-smart-shrinking        Enable the intelligent shrinking strategy
                                      used by WebKit that makes the pixel/dpi ratio none constant (default)
      --stop-slow-scripts             Stop slow running javascripts (default)
      --no-stop-slow-scripts          Do not Stop slow running javascripts
      --disable-toc-back-links        Do not link from section header to toc (default) # 不链接段落标题到内容大纲(默认)
      --enable-toc-back-links         Link from section header to toc # 链接段落标题到内容大纲
      --user-style-sheet <url>        Specify a user style sheet, to load with every page # 指定自定义样式表来加载每个页面
      --username <username>           HTTP Authentication username
      --viewport-size <>              Set viewport size if you have custom scrollbars or css attribute overflow to emulate window size
      --window-status <windowStatus>  Wait until window.status is equal to this string before rendering page
      --zoom <float>                  Use this zoom factor (default 1)

# 页眉页脚选项:
Headers And Footer Options:
      --footer-center <text>          Centered footer text # 居中页脚文字
      --footer-font-name <name>       Set footer font name (default Arial) # 设置页脚字体,默认Arial
      --footer-font-size <size>       Set footer font size (default 12) # 设置页脚字体大小,默认12号字体
      --footer-html <url>             Adds a html footer # 添加html页脚
      --footer-left <text>            Left aligned footer text # 左对齐页脚文字
      --footer-line                   Display line above the footer # 页脚上方显示横线
      --no-footer-line                Do not display line above the footer (default) # 页脚上方不显示横线
      --footer-right <text>           Right aligned footer text # 右对齐页脚文字
      --footer-spacing <real>         Spacing between footer and content in mm (default 0)# 设置页脚和内容之间的距离,默认0mm
      
      --header-center <text>          Centered header text # 剧中页眉文字
      --header-font-name <name>       Set header font name (default Arial)
      --header-font-size <size>       Set header font size (default 12)
      --header-html <url>             Adds a html header
      --header-left <text>            Left aligned header text
      --header-line                   Display line below the header
      --no-header-line                Do not display line below the header (default)
      --header-right <text>           Right aligned header text
      --header-spacing <real>         Spacing between header and content in mm (default 0)
      --replace <name> <value>        Replace [name] with value in header and footer (repeatable) # 指定内容来替换页眉页脚的内容


# 内容大纲选项:(table of content)
TOC Options:
      --disable-dotted-lines          Do not use dotted lines in the toc # 在toc中不使用点线
      --toc-header-text <text>        The header text of the toc (default Table of Contents) # toc标题,默认内容的标题
      --toc-level-indentation <width> For each level of headings in the toc indent by this length (
                                      default 1em) # 每层toc标题的缩进,默认1em(em 相对长度单位。相对于当前对象内文本的字体尺寸。)
      --disable-toc-links             Do not link from toc to sections # 不连接toc到内容
      --toc-text-size-shrink <real>   For each level of headings in the toc the font 
                                      is scaled by this factor (default 0.8) # 缩小每一层级的标题到指定比例
      --xsl-style-sheet <file>        Use the supplied xsl style sheet for printing the table of content # 使用支持的xsl样式表打印目录

### 在python中pdfkit的配置

In [None]:
OPTIONS = {
    'page-size': 'A4',
    'margin-top': '0in',
    'margin-right': '0in',
    'margin-bottom': '0in',
    'margin-left': '0in',
    'encoding': "UTF-8",
}

### 用浏览器打开jinja2模版

In [None]:
from jinja2 import Environment
from jinja2 import FileSystemLoader
from jinja2 import select_autoescape
import webbrowser

templates_path = ''
env = Environment(loader=FileSystemLoader(templates_path),
                  autoescape=select_autoescape(['html', 'xml']))


def get_report_html(site=None, moniter_type=None, timespan=None, engine=None):
    report = env.get_template('report.html')
    css_path = os.path.join('file:///'+os.path.dirname(
        os.path.dirname(__file__)), 'static', 'css', 'mystyle.css')
    print(css_path)
    html = report.render({'css_path': css_path, 'date_time': date_time})
    return html


def open_webbrowser(html):
    with open('dst.html', 'w', encoding='utf-8') as f:
        f.write(html)
    html_url = 'file:///dst.html'
    webbrowser.open(html_url)


if __name__ == '__main__':
    html = get_report_html()
    open_webbrowser(html)