Crawler Book Info
A sample crawler for quick parser some books information.
Initialization
-
Install the virtualenv.
[ jonny@xenial ~/vcs/crawler-book-outline ] $ sudo pip3 install virtualenv
-
create virtualenv.
[ jonny@xenial ~/vcs/crawler-book-outline ] $ virtualenv -p python3 .venv
-
Enter the virtualenv.
[ jonny@xenial ~/vcs/crawler-book-outline ] $ . .venv/bin/activate
-
Install packages with pip.
(.venv) [ jonny@xenial ~/vcs/crawler-book-outline ] $ pip3 install -r requirements.txt
Usage
tenlong.com.tw
-
Run crawler with ISBN-13.
(.venv) [ jonny@xenial ~/vcs/crawler-book-outline ] $ python3 tenlong.py 9781491915325
-
(option) Run crawler via make.
(.venv) [ jonny@xenial ~/vcs/crawler-book-outline ] $ make telong 9781491915325
books.com.tw
-
Run crawler with url.
(.venv) [ jonny@xenial ~/vcs/crawler-book-outline ] $ python3 books.py https://www.books.com.tw/products/0010810939
-
Run crawler with product number.
(.venv) [ jonny@xenial ~/vcs/crawler-book-outline ] $ python3 books.py 0010810939
Not support the ISBN-13 args yet on books.com.tw.
View Result
-
Open html via Firefox on GNU/Linux.
(.venv) [ jonny@xenial ~/vcs/crawler-book-outline ] $ firefox index.html
-
We can see the https://www.tenlong.com.tw/products/9781491915325, it is clean, now.
Run local Nginx for Evernote Web Clipper
The Evernote Web Clipper is not support local files, so we can clip it with Nginx.
-
Run Nginx container.
$ docker run --name nginx -v "$(pwd)":/usr/share/nginx/html/ -p 80:80 -d nginx
-
Open html via Firefox on GNU/Linux.
(.venv) [ jonny@xenial ~/vcs/crawler-book-outline ] $ firefox http://localhost
-
(option) Run Nginx container via make.
$ make run_containers
-
(option) Open web via make.
$ make review_serve
-
Finally, we can clip the information to Evernote with Evernote Web Clipper.
License
Copyright (c) chusiang from 2017-2022 under the MIT license.