Skip to content

chusiang/crawler-book-info

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Crawler Book Info

Travis CI Python Version Docker Hub Download Size License: MIT

A sample crawler for quick parser some books information.

Initialization

  1. Install the virtualenv.

    [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ sudo pip3 install virtualenv
    
  2. create virtualenv.

    [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ virtualenv -p python3 .venv
    
  3. Enter the virtualenv.

    [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ . .venv/bin/activate
    
  4. Install packages with pip.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ pip3 install -r requirements.txt
    

Usage

tenlong.com.tw

  1. Run crawler with ISBN-13.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ python3 tenlong.py 9781491915325
    
  2. (option) Run crawler via make.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ make telong 9781491915325
    

books.com.tw

  1. Run crawler with url.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ python3 books.py https://www.books.com.tw/products/0010810939
    
  2. Run crawler with product number.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ python3 books.py 0010810939
    

Not support the ISBN-13 args yet on books.com.tw.

View Result

  1. Open html via Firefox on GNU/Linux.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ firefox index.html
    

    ansiblebook

  2. We can see the https://www.tenlong.com.tw/products/9781491915325, it is clean, now.

Run local Nginx for Evernote Web Clipper

The Evernote Web Clipper is not support local files, so we can clip it with Nginx.

  1. Run Nginx container.

    $ docker run --name nginx -v "$(pwd)":/usr/share/nginx/html/ -p 80:80 -d nginx
    
  2. Open html via Firefox on GNU/Linux.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ firefox http://localhost
    
  3. (option) Run Nginx container via make.

    $ make run_containers
    
  4. (option) Open web via make.

    $ make review_serve
    
  5. Finally, we can clip the information to Evernote with Evernote Web Clipper.

License

Copyright (c) chusiang from 2017-2022 under the MIT license.