ScrapySplashWrapper

IMPORTANT: This project isn't used by lookyloo anymore. It has been supersedded by the playwright capture module because splash isn't maintained and isn't able to capture properly more and more websites (it uses webkit from 2016). If you rely on this dependency for anything, you should look at the playwright capture module, and/or consider forking and maintaining it as it won't be monitored anymore.

ScrapySplashWrapper

A wrapper that uses scrappy and splash to crawl a website.

Usage

Warning: it requires a splash instance (docker is recommendended).

usage: scraper [-h] [-s SPLASH] -u URL [-d DEPTH] [-o OUTPUT] [-ua USERAGENT]
               [--debug]

Crawl a URL.

optional arguments:
  -h, --help            show this help message and exit
  -s SPLASH, --splash SPLASH
                        Splash URL to use for crawling.
  -u URL, --url URL     URL to crawl
  -d DEPTH, --depth DEPTH
                        Depth of the crawl.
  -o OUTPUT, --output OUTPUT
                        Output directory
  -ua USERAGENT, --useragent USERAGENT
                        User-Agent to use for crawling
  --debug               Enable debug mode on scrapy/splash

Name		Name	Last commit message	Last commit date
Latest commit History 255 Commits
.github		.github
docs		docs
scrapysplashwrapper		scrapysplashwrapper
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

docs

docs