ScrapeOps Proxy Middleware for Scrapy

The ScrapeOps Scrapy Proxy Middleware is an easy way to integrate the ScrapeOps Proxy API into your Scrapy spiders.

To use the middleware you first need to get a free API key here if you don't have one already.

Install Using Pip

To install the ScrapeOps Scrapy Proxy Middleware simply use pip:

pip install scrapeops-scrapy-proxy-sdk

Integrating Into Your Scrapy Project

Integrate into your Scrapy project by updating your settings.py file:

SCRAPEOPS_API_KEY = 'YOUR_API_KEY'

  
SCRAPEOPS_PROXY_ENABLED = True


DOWNLOADER_MIDDLEWARES = {
    'scrapeops_scrapy_proxy_sdk.scrapeops_scrapy_proxy_sdk.ScrapeOpsScrapyProxySdk': 725,
}

Now when you run your spiders, the requests will be automatically sent through the ScrapeOps Proxy API.

Enabling Advanced Functionality

The ScrapeOps Proxy API supports a range of more advanced features that you can enable by adding extra query parameters to your request.

To enable them using the ScrapeOps Scrapy Proxy Middleware you can do so using 3 methods:

Method #1: Global Project Settings

You can apply the proxy setting to every spider that runs in your project by adding a SCRAPEOPS_PROXY_SETTINGS dictionary to your settings.py file with the extra features you want to enable.

SCRAPEOPS_API_KEY = 'YOUR_API_KEY'
SCRAPEOPS_PROXY_ENABLED = True
SCRAPEOPS_PROXY_SETTINGS = {'country': 'us'}

DOWNLOADER_MIDDLEWARES = {
    'scrapeops_scrapy_proxy_sdk.scrapeops_scrapy_proxy_sdk.ScrapeOpsScrapyProxySdk': 725,
}

Method #2: Spider Settings

You can apply the proxy setting to every request a spider makes by adding a SCRAPEOPS_PROXY_SETTINGS dictionary to the custom_settings attribute in your spider with the extra features you want to enable.

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    custom_settings = {
        'SCRAPEOPS_PROXY_SETTINGS': {'country': 'us'}
    }

    def start_requests(self):
        urls = [
            'https://quotes.toscrape.com/page/1/',
            'https://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        pass

Method #3: Request Settings

You can apply the proxy setting to each individual request a spider makes by adding the extra features you want to enable to the meta parameter of each request.

When using this method you need to add 'sops_' to the start of the feature you key you want to enable. So to enable 'country': 'uk', you would use 'sops_country': 'uk'.

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        urls = [
            'https://quotes.toscrape.com/page/1/',
            'https://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            yield scrapy.Request(url=url, meta={'sops_country': 'uk'}, callback=self.parse)

    def parse(self, response):
        pass

A full list of advanced features can be found here.

Concurrency Management

When using Scrapy with the ScrapeOps Proxy you need to make sure you don't exceed your concurrency limit of the plan you are using.

For example, if you were using the Free Plan which has a concurrency limit of 1 thread, then you would set CONCURRENT_REQUESTS=1 in your settings.py file.

For maximum performance you would also ensure that DOWNLOAD_DELAY is set to zero in your settings.py file (this is the default setting).

## settings.py

CONCURRENT_REQUESTS=1
DOWNLOAD_DELAY=0

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scrapeops_scrapy_proxy_sdk		scrapeops_scrapy_proxy_sdk
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scrapeops_scrapy_proxy_sdk

scrapeops_scrapy_proxy_sdk

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

ScrapeOps Proxy Middleware for Scrapy

Install Using Pip

Integrating Into Your Scrapy Project

Enabling Advanced Functionality

Method #1: Global Project Settings

Method #2: Spider Settings

Method #3: Request Settings

Concurrency Management

About

Releases

Packages

Contributors 2

Languages

License

ScrapeOps/scrapeops-scrapy-proxy-sdk

Folders and files

Latest commit

History

Repository files navigation

ScrapeOps Proxy Middleware for Scrapy

Install Using Pip

Integrating Into Your Scrapy Project

Enabling Advanced Functionality

Method #1: Global Project Settings

Method #2: Spider Settings

Method #3: Request Settings

Concurrency Management

About

Resources

License

Stars

Watchers

Forks

Languages