Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 单个网页导出为pdf #3085

Open
congmingyige opened this issue Mar 31, 2021 · 5 comments
Open

add 单个网页导出为pdf #3085

congmingyige opened this issue Mar 31, 2021 · 5 comments

Comments

@congmingyige
Copy link

congmingyige commented Mar 31, 2021

我遇到的问题是

想用笔记软件(比如iPad GoodNotes)阅读并做笔记,比如中午在宿舍查看。

我希望能有这样的解决方案

把网址上的所有html网页转为pdf

我觉得其他这些方案也可以接受

我之前未完成的处理方案:
功能:

  1. https://oi-wiki.org/ 一个网页下的所有网页转为pdf,放置于相应位置
  2. 该程序执行多次(以防出错)

语言:Python

思考:

  1. 是否有软件,导出所有html

待完成:

  1. 网页更新下的处理

版本:
2021.3.31 ver1 cgb

我这边:

  1. pdfkit.from_file 有问题
  2. pdfkit.from_url 当前路径文件不能处于打开状态

用户操作:

  1. 安装pdfkit和path_wkthmltopdf
  2. GitHub下载项目,放置于path_read
    注意:不要有中文
import pdfkit
import os

path_wkthmltopdf = r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe'
config = pdfkit.configuration(wkhtmltopdf=path_wkthmltopdf)
## 输入路径
path_read=r'D:\OI-wiki'
## 输出路径
path_write=r'D:\OI-wiki_to_pdf'
## 是否有修改(判断是否再操作一次)
vis=0
## 允许重复操作次数
times=10
while vis==0 and times>0:
    website_dir=r'https://oi-wiki.org/'
    for root,dirs,files in os.walk(path_read):
        for file in files:
            ## 关注所有index.html
            if file=='index.html':
                category_1=os.path.join(root,file)[len(path_read)+1:-5]
                
                ## 这个网址的特点:如果当前文件夹只有一个文件,则路径往前提
                if len(files)+len(dirs)==1:
                    # name_1=os.path.basename(root)
                    path_1=os.path.join(path_write,os.path.dirname(category_1)+'.pdf')
                else:
                    path_1=os.path.join(path_write,category_1)+'.pdf'
                ## 所有'\\'改为'/',否则不对
                website_1=os.path.join(website_dir,category_1)+'.html'
                website_1=website_1.replace('\\','/')
                if not os.path.exists( os.path.dirname(path_1) ):
                    os.makedirs(os.path.dirname(path_1))
                
                print(website_1,path_1) ## 持续输出,用于检查是否有问题
                
                ## todo 能否判断网页是否存在
                
                ## 再次处理(判断是否存在)
                if not os.path.exists(path_1) or os.path.getsize(path_1)<4000:
                    try:
                        pdfkit.from_url(website_1,path_1,configuration=config)
                        vis=1
                    except:
                        pass
    times-=1
    ## 总大小440MB
    
    ## todo 测试文件是否一一对应
            

问题1:

https://oi-wiki.org/basic/interaction/index.html D:\OI-wiki_to_pdf\basic\interaction.pdf
Loading pages (1/6)
Warning: A finished ResourceObject received a loading finished signal. This might be an indication of an iframe taking too long to load.
Warning: A finished ResourceObject received a loading progress signal. This might be an indication of an iframe taking too long to load.
Counting pages (2/6)
Resolving links (4/6)                                                       
Loading headers and footers (5/6)                                           
Printing pages (6/6)

有时候有些文件很小,2KB

解决1:
再次处理

解决2:
看内容,发现 <script type="text/javascript">location.href="/graph/cut"</script> 的跳转
todo 那么需要模拟浏览器访问网页才行

问题2:

OSError: wkhtmltopdf exited with non-zero code 3221226505. error:
Loading pages (1/6)
[=====================================================>      ] 89%

解决2:
会暂停操作
try except

问题3:
卡在 https://oi-wiki.org/graph/tree-diameter/index.html D:\OI-wiki_to_pdf\graph\tree-diameter.pdf

不知道为什么,可能是一直在连接网页中

解决3:

  1. 重新操作
  2. (之后)模拟浏览器运行,设置限制响应时间
    https://www.jb51.net/article/141647.htm
@welcome
Copy link

welcome bot commented Mar 31, 2021

感谢你对 OI Wiki 的关注!记得在 Issue 中表达清楚自己的意思哦~

@Enter-tainer
Copy link
Member

Enter-tainer commented Mar 31, 2021

@congmingyige see https://github.com/OI-wiki/OI-Wiki-export

GitHub
将 OI-Wiki 导出为印刷质量的 pdf 的工具. Contribute to OI-wiki/OI-Wiki-export development by creating an account on GitHub.

@Enter-tainer
Copy link
Member

我们目前有全量导出的方案,但是单个页面的导出目前好像暂时还没有解决

@congmingyige
Copy link
Author

congmingyige commented Mar 31, 2021 via email

@GavinZhengOI GavinZhengOI changed the title add 网页转成pdf,方便做笔记 add 单个网页导出为pdf Jul 17, 2021
@NachtgeistW
Copy link
Contributor

额,你不介意的话可以试一下Ctrl+P……

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants