LINE Blog Image Crawler

Initial Purpose: Crawling Uesaka Sumire (上坂すみれ) 's LINE Blog images

Feature

Crawling LINE Blog Archive images (Archive Image ONLY)

Requirement

Python3
bs4 (aka BeautifulSoup 4.x)

How to Install

Python3

macOS:
- Native built-in, no need for downloading
- Or use brew install python3 to install the non-native Python3
Windows:
- Download Python3 manually from https://www.python.org/downloads/windows/
Linux:
- Use the package manager
  - e.g.
    - Debian: apt install python3
    - SUSE: zypper install python3

bs4 (aka BeautifulSoup 4.x)

macOS:
- Run the command pip3 install bs4 in Teriminal
Windows:
- Generally, it's included in Python3's installing package
- Or download pip manually from https://pypi.org/project/pip/

Linux:

Use the package manager: sudo apt install python3-pip

Or run the commands below

 curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
 // Download the installing script
 
 sudo python3 get-pip.py
 // Run the installing script

Usage

Open up any terminal app (e.g. macOS - Teriminal.app, Windows - CMD and etc.) and input command python3 [SCRIPT_PATH]
When Please Input the URL: displayed
- Please input the web address from which you want to catch images.
  - e.g. https://lineblog.me/uesaka_sumire/archives/2018-12.html
- Attention: The web address must be an LINE Blog Archive, which means the URL will definitely look like: https://lineblog.me/[PERSON_NAME]/archive/[YEAR]-[MONTH].html
When Please Input the Saving Path: displayed
- Please input the path where you want to save the images in the format.
  - e.g. ~/[YOUR_DIRNAME]
- If the path doesn't exist, it will be made up automatically, or it will just use the exist one.
- If you choose a path where there has already been some earlier image downloads by this script or not, the images will still be downloaded and replaced (if they share the same name).
When Would yout like to have directory names without artile title? (Y/N) displayed
- Please input Y or N depended on whether you want to have directory created without the article title.
When Please Choose Mode: displayed
- Input 0: Current Page's Lateset Article Only
  - It will download the first article's images on the current blog page referring to the URL given.
- Input 1: Current Page Only
  - It will download all the article images on the current blog page referring to the URL given.
- Input 2: All Related Pages
  - It will download all the article images on the related pages which referred to the navigation bar of the URL given.
- Input 3: Current Page with Specific Position
  - It will download the specific article's images on the current blog page referring to the URL given and your next input.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
media		media
LICENSE		LICENSE
LineBlog_img_crawler.py		LineBlog_img_crawler.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

media

media

LICENSE

LICENSE

LineBlog_img_crawler.py

LineBlog_img_crawler.py

README.md

README.md

Repository files navigation

LINE Blog Image Crawler

Feature

Requirement

How to Install

Python3

bs4 (aka BeautifulSoup 4.x)

Usage

Demo

About

Releases

Packages

Languages

License

ItoSchum/LINE_Blog_img_crawler

Folders and files

Latest commit

History

Repository files navigation

LINE Blog Image Crawler

Feature

Requirement

How to Install

Python3

bs4 (aka BeautifulSoup 4.x)

Usage

Demo

About

Topics

Resources

License

Stars

Watchers

Forks

Languages