NOTE: The project is ONLY FOR LEARNING, TEST and EDUCATIONAL PURPOSE. It is NOT dedicated to be used as a practical part for certain specific purpose.
-
Python version: v3.7
-
Scrapy version: v1.8 (Check out HERE for details about Scrapy v1.8)
python3 -m venv venv
# Activate venv mode
source venv/bin/activate
# Install packages before first run. This is a one time action
pip install -r requirements.txt
# Jump to work directory and run python script
cd [project_dir] # './spytest' or './spyimg'
scrapy [command ...] # See below content
# Deactivate venv mode
deactivate
Install packages by pip in virtual enviroment. All packages listed in requirements.txt.
# Check out `requirements.txt`
cat requirements.txt
# Export packages list to `requirements.txt` in virtual enviroment
pip freeze > requirements.txt
# Jump into the project directory './spytest' or './spyimg'
cd ./spytest # or, cd ./spyimg
# List all spiders belong to the project
scrapy list
-
spytest
- 馃暦 xmlsample
# Fetch data from default url. scrapy crawl --nolog xmlsample -o xmlsample.csv # Fetch data and output to a json file from 'https://www.feng.com/rss.xml' according to the list 'avaliable_sites' in 'xmlsample.py' scrapy crawl xmlsample -a target=feng.com -o xmlsample.json
- 馃暦 csvsample
scrapy crawl csvsample -o csvsample.json
scrapy crawl sitemapsample -o sitemapsample.csv
- Deprecated spiders 馃暦:
cptrack, tttrack, uspstrack
-
spyimg
- 馃暦 feimgs_svgrepo (See demos on ./spyimg/feimgs_svgrepo_demos/README.md )
scrapy crawl --nolog feimgs_svgrepo -a cat=wechat
scrapy crawl --nolog feimgs_pornpics -a url=https://www.pornpics.com/galleries/met-art-diana-a-nika-b-35320148/
- 馃暦 feimgs_imagefap (Fit for the gallery which contains less than 10-page photos)
scrapy crawl --nolog feimgs_imagefap -a url=https://www.imagefap.com/pictures/11922724/les1506
- 馃暦 feimgs_imagefap2 (Fit for all galleries)
scrapy crawl --nolog feimgs_imagefap2 -a url=https://www.imagefap.com/gallery/11922185
- Deprecated spiders 馃暦:
feimgs_mtrtsy, feimgs_kkrtys, feimgs_ojbk
路路路 Last Modified on 26 January 2024 路路路
路路路 Created on 12 October 2019 路路路