Skip to content

Latest commit

History

History
119 lines (76 loc) 路 3.21 KB

README.md

File metadata and controls

119 lines (76 loc) 路 3.21 KB

spy-sample: Python Scrapy Learning Program

Powered by Scrapy 聽聽Github license

NOTE: The project is ONLY FOR LEARNING, TEST and EDUCATIONAL PURPOSE. It is NOT dedicated to be used as a practical part for certain specific purpose.

Development framework:

  • Python version: v3.7

  • Scrapy version: v1.8 (Check out HERE for details about Scrapy v1.8)

Install virtual enviroment:

python3 -m venv venv

Activate venv and run:

# Activate venv mode
source venv/bin/activate

# Install packages before first run. This is a one time action
pip install -r requirements.txt

# Jump to work directory and run python script
cd [project_dir] # './spytest' or './spyimg'
scrapy [command ...] # See below content

# Deactivate venv mode
deactivate

Packages info

Install packages by pip in virtual enviroment. All packages listed in requirements.txt.

# Check out `requirements.txt`
cat requirements.txt

# Export packages list to `requirements.txt` in virtual enviroment
pip freeze > requirements.txt

Sample Scripts CLIs

# Jump into the project directory './spytest' or './spyimg'
cd ./spytest    # or, cd ./spyimg

# List all spiders belong to the project
scrapy list
  • spytest

    # Fetch data from default url.
    scrapy crawl --nolog xmlsample -o xmlsample.csv
    
    # Fetch data and output to a json file from 'https://www.feng.com/rss.xml' according to the list 'avaliable_sites' in 'xmlsample.py'
    scrapy crawl xmlsample -a target=feng.com -o xmlsample.json
    scrapy crawl csvsample -o csvsample.json
    scrapy crawl sitemapsample -o sitemapsample.csv
    • Deprecated spiders 馃暦: cptrack, tttrack, uspstrack
  • spyimg

    scrapy crawl --nolog feimgs_svgrepo -a cat=wechat
    scrapy crawl --nolog feimgs_pornpics -a url=https://www.pornpics.com/galleries/met-art-diana-a-nika-b-35320148/
    • 馃暦 feimgs_imagefap (Fit for the gallery which contains less than 10-page photos)
    scrapy crawl --nolog feimgs_imagefap -a url=https://www.imagefap.com/pictures/11922724/les1506
    scrapy crawl --nolog feimgs_imagefap2 -a url=https://www.imagefap.com/gallery/11922185
    • Deprecated spiders 馃暦: feimgs_mtrtsy, feimgs_kkrtys, feimgs_ojbk


路路路 Last Modified on 26 January 2024 路路路

路路路 Created on 12 October 2019 路路路