Skip to content

chusiang/crawler-book-info

Repository files navigation

Crawler Book Info

Travis CI Python Version Docker Hub Download Size License: MIT

A sample crawler for quick parser some books information.

Initialization

  1. Install the pyenv and pyenv-virtualenv.

  2. create virtualenv of py3.

    [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ pyenv virtualenv 3.9.6 py3
  3. Use py3 virtualenv under this directory.

    [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ pyenv local py3
  4. Install packages with pip.

    (py3) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ pip3 install -r requirements.txt

Usage

tenlong.com.tw

  1. Run crawler with ISBN-13.

    (.py3) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ python3 tenlong.py 9781491915325

books.com.tw

  1. Run crawler with url.

    (py3) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ python3 books.py https://www.books.com.tw/products/0010810939
  2. Run crawler with product number.

    (py3) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ python3 books.py 0010810939

Not support the ISBN-13 args yet on books.com.tw.

View Result

  1. Open html via Firefox on GNU/Linux.

    (py3) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ firefox index.html

    ansiblebook

  2. We can see the https://www.tenlong.com.tw/products/9781491915325 , it is clean, now.

Run local Nginx for Evernote Web Clipper

The Evernote Web Clipper is not support local files, so we can clip it with Nginx.

  1. Run Nginx container.

    docker run --name nginx -v "$(pwd)":/usr/share/nginx/html/ -p 80:80 -d nginx
  2. Open html via Firefox on GNU/Linux.

    (py3) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ firefox http://localhost
  3. Finally, we can clip the information to Evernote with Evernote Web Clipper.

License

Copyright (c) chusiang from 2017-2024 under the MIT license.

About

A crawler for quick parser the book information

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published