A sample crawler for quick parser some books information.
-
Install the pyenv and pyenv-virtualenv.
-
create virtualenv of
py3
.[ jonny@xenial ~/vcs/crawler-book-outline ] $ pyenv virtualenv 3.9.6 py3
-
Use
py3
virtualenv under this directory.[ jonny@xenial ~/vcs/crawler-book-outline ] $ pyenv local py3
-
Install packages with pip.
(py3) [ jonny@xenial ~/vcs/crawler-book-outline ] $ pip3 install -r requirements.txt
-
Run crawler with ISBN-13.
(.py3) [ jonny@xenial ~/vcs/crawler-book-outline ] $ python3 tenlong.py 9781491915325
-
Run crawler with url.
(py3) [ jonny@xenial ~/vcs/crawler-book-outline ] $ python3 books.py https://www.books.com.tw/products/0010810939
-
Run crawler with product number.
(py3) [ jonny@xenial ~/vcs/crawler-book-outline ] $ python3 books.py 0010810939
Not support the ISBN-13 args yet on books.com.tw.
-
Open html via Firefox on GNU/Linux.
(py3) [ jonny@xenial ~/vcs/crawler-book-outline ] $ firefox index.html
-
We can see the https://www.tenlong.com.tw/products/9781491915325 , it is clean, now.
The Evernote Web Clipper is not support local files, so we can clip it with Nginx.
-
Run Nginx container.
docker run --name nginx -v "$(pwd)":/usr/share/nginx/html/ -p 80:80 -d nginx
-
Open html via Firefox on GNU/Linux.
(py3) [ jonny@xenial ~/vcs/crawler-book-outline ] $ firefox http://localhost
-
Finally, we can clip the information to Evernote with Evernote Web Clipper.
Copyright (c) chusiang from 2017-2024 under the MIT license.