Skip to content

Latest commit

 

History

History
273 lines (195 loc) · 6.39 KB

index.rst

File metadata and controls

273 lines (195 loc) · 6.39 KB

Scrapy documentation

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Getting help

Having trouble? We'd like to help!

First steps

intro/overview intro/install intro/tutorial intro/examples

intro/overview

Understand what Scrapy is and how it can help you.

intro/install

Get Scrapy installed on your computer.

intro/tutorial

Write your first Scrapy project.

intro/examples

Learn more by playing with a pre-made Scrapy project.

Basic concepts

topics/commands topics/spiders topics/selectors topics/items topics/loaders topics/shell topics/item-pipeline topics/feed-exports topics/request-response topics/link-extractors topics/settings topics/exceptions

topics/commands

Learn about the command-line tool used to manage your Scrapy project.

topics/spiders

Write the rules to crawl your websites.

topics/selectors

Extract the data from web pages using XPath.

topics/shell

Test your extraction code in an interactive environment.

topics/items

Define the data you want to scrape.

topics/loaders

Populate your items with the extracted data.

topics/item-pipeline

Post-process and store your scraped data.

topics/feed-exports

Output your scraped data using different formats and storages.

topics/request-response

Understand the classes used to represent HTTP requests and responses.

topics/link-extractors

Convenient classes to extract links to follow from pages.

topics/settings

Learn how to configure Scrapy and see all available settings <topics-settings-ref>.

topics/exceptions

See all available exceptions and their meaning.

Built-in services

topics/logging topics/stats topics/email topics/telnetconsole topics/webservice

topics/logging

Learn how to use Python's builtin logging on Scrapy.

topics/stats

Collect statistics about your scraping crawler.

topics/email

Send email notifications when certain events occur.

topics/telnetconsole

Inspect a running crawler using a built-in Python console.

topics/webservice

Monitor and control a crawler using a web service.

Solving specific problems

faq topics/debug topics/contracts topics/practices topics/broad-crawls topics/developer-tools topics/dynamic-content topics/leaks topics/media-pipeline topics/deploy topics/autothrottle topics/benchmarking topics/jobs topics/coroutines topics/asyncio

faq

Get answers to most frequently asked questions.

topics/debug

Learn how to debug common problems of your Scrapy spider.

topics/contracts

Learn how to use contracts for testing your spiders.

topics/practices

Get familiar with some Scrapy common practices.

topics/broad-crawls

Tune Scrapy for crawling a lot domains in parallel.

topics/developer-tools

Learn how to scrape with your browser's developer tools.

topics/dynamic-content

Read webpage data that is loaded dynamically.

topics/leaks

Learn how to find and get rid of memory leaks in your crawler.

topics/media-pipeline

Download files and/or images associated with your scraped items.

topics/deploy

Deploying your Scrapy spiders and run them in a remote server.

topics/autothrottle

Adjust crawl rate dynamically based on load.

topics/benchmarking

Check how Scrapy performs on your hardware.

topics/jobs

Learn how to pause and resume crawls for large spiders.

topics/coroutines

Use the coroutine syntax <async>.

topics/asyncio

Use asyncio and asyncio-powered libraries.

Extending Scrapy

topics/architecture topics/downloader-middleware topics/spider-middleware topics/extensions topics/api topics/signals topics/exporters

topics/architecture

Understand the Scrapy architecture.

topics/downloader-middleware

Customize how pages get requested and downloaded.

topics/spider-middleware

Customize the input and output of your spiders.

topics/extensions

Extend Scrapy with your custom functionality

topics/api

Use it on extensions and middlewares to extend Scrapy functionality

topics/signals

See all available signals and how to work with them.

topics/exporters

Quickly export your scraped items to a file (XML, CSV, etc).

All the rest

news contributing versioning

news

See what has changed in recent Scrapy versions.

contributing

Learn how to contribute to the Scrapy project.

versioning

Understand Scrapy versioning and API stability.