Skip to content

dyslab/spy-sample

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

60 Commits

Repository files navigation

spy-sample: Python Scrapy Learning Program

Powered by Scrapy 聽聽Github license

NOTE: The project is ONLY FOR LEARNING, TEST and EDUCATIONAL PURPOSE. It is NOT dedicated to be used as a practical part for certain specific purpose.

Development framework:

  • Python version: v3.7

  • Scrapy version: v1.8 (Check out HERE for details about Scrapy v1.8)

Install virtual enviroment:

python3 -m venv venv

Activate venv and run:

# Activate venv mode
source venv/bin/activate

# Install packages before first run. This is a one time action
pip install -r requirements.txt

# Jump to work directory and run python script
cd [project_dir] # './spytest' or './spyimg'
scrapy [command ...] # See below content

# Deactivate venv mode
deactivate

Packages info

Install packages by pip in virtual enviroment. All packages listed in requirements.txt.

# Check out `requirements.txt`
cat requirements.txt

# Export packages list to `requirements.txt` in virtual enviroment
pip freeze > requirements.txt

Sample Scripts CLIs

# Jump into the project directory './spytest' or './spyimg'
cd ./spytest    # or, cd ./spyimg

# List all spiders belong to the project
scrapy list
  • spytest

    # Fetch data from default url.
    scrapy crawl --nolog xmlsample -o xmlsample.csv
    
    # Fetch data and output to a json file from 'https://www.feng.com/rss.xml' according to the list 'avaliable_sites' in 'xmlsample.py'
    scrapy crawl xmlsample -a target=feng.com -o xmlsample.json
    scrapy crawl csvsample -o csvsample.json
    scrapy crawl sitemapsample -o sitemapsample.csv
    • Deprecated spiders 馃暦: cptrack, tttrack, uspstrack
  • spyimg

    scrapy crawl --nolog feimgs_svgrepo -a cat=wechat
    scrapy crawl --nolog feimgs_pornpics -a url=https://www.pornpics.com/galleries/met-art-diana-a-nika-b-35320148/
    • 馃暦 feimgs_imagefap (Fit for the gallery which contains less than 10-page photos)
    scrapy crawl --nolog feimgs_imagefap -a url=https://www.imagefap.com/pictures/11922724/les1506
    scrapy crawl --nolog feimgs_imagefap2 -a url=https://www.imagefap.com/gallery/11922185
    • Deprecated spiders 馃暦: feimgs_mtrtsy, feimgs_kkrtys, feimgs_ojbk


路路路 Last Modified on 26 January 2024 路路路

路路路 Created on 12 October 2019 路路路

About

Scrapy Learning... 馃暦馃暩馃暩馃暦

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published