Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
Having trouble? We'd like to help!
- Try the
FAQ <faq>
-- it's got answers to some common questions. - Looking for specific information? Try the
genindex
ormodindex
. - Ask or search questions in StackOverflow using the scrapy tag.
- Ask or search questions in the Scrapy subreddit.
- Search for questions on the archives of the scrapy-users mailing list.
- Ask a question in the #scrapy IRC channel,
- Report bugs with Scrapy in our issue tracker.
intro/overview intro/install intro/tutorial intro/examples
intro/overview
Understand what Scrapy is and how it can help you.
intro/install
Get Scrapy installed on your computer.
intro/tutorial
Write your first Scrapy project.
intro/examples
Learn more by playing with a pre-made Scrapy project.
topics/commands topics/spiders topics/selectors topics/items topics/loaders topics/shell topics/item-pipeline topics/feed-exports topics/request-response topics/link-extractors topics/settings topics/exceptions
topics/commands
Learn about the command-line tool used to manage your Scrapy project.
topics/spiders
Write the rules to crawl your websites.
topics/selectors
Extract the data from web pages using XPath.
topics/shell
Test your extraction code in an interactive environment.
topics/items
Define the data you want to scrape.
topics/loaders
Populate your items with the extracted data.
topics/item-pipeline
Post-process and store your scraped data.
topics/feed-exports
Output your scraped data using different formats and storages.
topics/request-response
Understand the classes used to represent HTTP requests and responses.
topics/link-extractors
Convenient classes to extract links to follow from pages.
topics/settings
Learn how to configure Scrapy and see all
available settings <topics-settings-ref>
.topics/exceptions
See all available exceptions and their meaning.
topics/logging topics/stats topics/email topics/telnetconsole topics/webservice
topics/logging
Learn how to use Python's builtin logging on Scrapy.
topics/stats
Collect statistics about your scraping crawler.
topics/email
Send email notifications when certain events occur.
topics/telnetconsole
Inspect a running crawler using a built-in Python console.
topics/webservice
Monitor and control a crawler using a web service.
faq topics/debug topics/contracts topics/practices topics/broad-crawls topics/developer-tools topics/dynamic-content topics/leaks topics/media-pipeline topics/deploy topics/autothrottle topics/benchmarking topics/jobs topics/coroutines topics/asyncio
faq
Get answers to most frequently asked questions.
topics/debug
Learn how to debug common problems of your Scrapy spider.
topics/contracts
Learn how to use contracts for testing your spiders.
topics/practices
Get familiar with some Scrapy common practices.
topics/broad-crawls
Tune Scrapy for crawling a lot domains in parallel.
topics/developer-tools
Learn how to scrape with your browser's developer tools.
topics/dynamic-content
Read webpage data that is loaded dynamically.
topics/leaks
Learn how to find and get rid of memory leaks in your crawler.
topics/media-pipeline
Download files and/or images associated with your scraped items.
topics/deploy
Deploying your Scrapy spiders and run them in a remote server.
topics/autothrottle
Adjust crawl rate dynamically based on load.
topics/benchmarking
Check how Scrapy performs on your hardware.
topics/jobs
Learn how to pause and resume crawls for large spiders.
topics/coroutines
Use the
coroutine syntax <async>
.topics/asyncio
Use
asyncio
andasyncio
-powered libraries.
topics/architecture topics/downloader-middleware topics/spider-middleware topics/extensions topics/api topics/signals topics/exporters
topics/architecture
Understand the Scrapy architecture.
topics/downloader-middleware
Customize how pages get requested and downloaded.
topics/spider-middleware
Customize the input and output of your spiders.
topics/extensions
Extend Scrapy with your custom functionality
topics/api
Use it on extensions and middlewares to extend Scrapy functionality
topics/signals
See all available signals and how to work with them.
topics/exporters
Quickly export your scraped items to a file (XML, CSV, etc).
news contributing versioning
news
See what has changed in recent Scrapy versions.
contributing
Learn how to contribute to the Scrapy project.
versioning
Understand Scrapy versioning and API stability.