Skip to content

post request with proxies doesn't work #1695

@DoctorEvil92

Description

@DoctorEvil92
Image

Hi, I had an issue the other day with a get request and proxies that was about http_client=ImpitHttpClient(http3=False)
Right now, I'm having an issue with doing a post request without any headers that seems to work fine with vanilla python requests. I'm not sure if I'm doing something wrong, from what I see in your docs this should be working. To recreate it, you should put your proxy link inside proxy_url and have it target Colombian IP addresses.

import requests
from crawlee.router import Router
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee import ConcurrencySettings, Request
from crawlee.sessions import SessionPool
from crawlee.proxy_configuration import ProxyConfiguration
from crawlee.http_clients import ImpitHttpClient
import asyncio
from datetime import timedelta
from urllib.parse import urlencode
import json



async def main():
    # define link
    proxy_url = "" # left blank to not expose my credentials

    # first do vanilla request
    r = requests.post("https://www.exito.com/api/graphql?operationName=GetCities",
                      timeout=30,
                      proxies={"http":proxy_url, "https":proxy_url},
                      json={"operationName":"GetCities","variables":{"channel":11,"shippingType":"PE"}} )
    print("Content length with vanilla requests:", len(r.text))


    # define router
    router = Router[BeautifulSoupCrawlingContext]()
    @router.handler("MAIN")
    async def main_handler(context : BeautifulSoupCrawlingContext) -> None:
        print("inside handler")
        response = await context.http_response.read()
        print(response[0:50])
        return
    
    # then try to do a request with crawlee with the same proxy
    crawler = BeautifulSoupCrawler(request_handler=router,
                                   concurrency_settings=ConcurrencySettings(desired_concurrency=1, max_concurrency=1),
                                   max_request_retries=15,
                                   http_client=ImpitHttpClient(http3=False),
                                   session_pool=SessionPool(max_pool_size=1,
                                                            create_session_settings={'max_usage_count': 999999999, "max_age":timedelta(hours=999999), 'max_error_score': 100000}),
                                   proxy_configuration=ProxyConfiguration(proxy_urls=[proxy_url]) )

    # run it
    await crawler.run( [Request.from_url(url="https://www.exito.com/api/graphql?operationName=GetCities", method='POST',
                                         payload=urlencode({"operationName":"GetCities","variables":{"channel":11,"shippingType":"PE"}}).encode(), label='MAIN' )] )

    
    

if __name__ == '__main__':
    asyncio.run(main())

Valid response should have about 9k characters.

Metadata

Metadata

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions