Skip to content

ItoSchum/LINE_Blog_img_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LINE Blog Image Crawler

Initial Purpose: Crawling Uesaka Sumire (上坂すみれ) 's LINE Blog images

Feature

  • Crawling LINE Blog Archive images (Archive Image ONLY)

Requirement

  • Python3
  • bs4 (aka BeautifulSoup 4.x)

How to Install

Python3

  • macOS:
    • Native built-in, no need for downloading
    • Or use brew install python3 to install the non-native Python3
  • Windows:
  • Linux:
    • Use the package manager
      • e.g.
        • Debian: apt install python3
        • SUSE: zypper install python3

bs4 (aka BeautifulSoup 4.x)

  • macOS:

    • Run the command pip3 install bs4 in Teriminal
  • Windows:

  • Linux:

    • Use the package manager: sudo apt install python3-pip

    • Or run the commands below

       curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
       // Download the installing script
       
       sudo python3 get-pip.py
       // Run the installing script
      

Usage

  1. Open up any terminal app (e.g. macOS - Teriminal.app, Windows - CMD and etc.) and input command python3 [SCRIPT_PATH]

  2. When Please Input the URL: displayed

    • Please input the web address from which you want to catch images.
    • Attention: The web address must be an LINE Blog Archive, which means the URL will definitely look like: https://lineblog.me/[PERSON_NAME]/archive/[YEAR]-[MONTH].html
  3. When Please Input the Saving Path: displayed

    • Please input the path where you want to save the images in the format.
      • e.g. ~/[YOUR_DIRNAME]
    • If the path doesn't exist, it will be made up automatically, or it will just use the exist one.
    • If you choose a path where there has already been some earlier image downloads by this script or not, the images will still be downloaded and replaced (if they share the same name).
  4. When Would yout like to have directory names without artile title? (Y/N) displayed

    • Please input Y or N depended on whether you want to have directory created without the article title.
  5. When Please Choose Mode: displayed

    • Input 0: Current Page's Lateset Article Only
      • It will download the first article's images on the current blog page referring to the URL given.
    • Input 1: Current Page Only
      • It will download all the article images on the current blog page referring to the URL given.
    • Input 2: All Related Pages
      • It will download all the article images on the related pages which referred to the navigation bar of the URL given.
    • Input 3: Current Page with Specific Position
      • It will download the specific article's images on the current blog page referring to the URL given and your next input.

Demo

demo.png

About

For crawling LINE Blog Archive images

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages