Skip to content

Conversation

vdusek
Copy link
Contributor

@vdusek vdusek commented Jan 23, 2024

Description

  • Based on the bug report by @JJetmar on Slack:

There is a question regarding the RequestQueue and Python SDK:
Link to Slack Conversation

I tried what I could but I couldn't save any custom attributes to Request in RequestQueue with via python SDK:

await rq.add_request(request={'url': url, 'method': 'GET', 'userData': { 'myTest': 'test' } })

What I get is:

{
  "id": "7R37ec5G62ZfQHW",
  "json": "{\n  \"url\": \"https://www.apify.com/\",\n  \"method\": \"GET\",\n  \"id\": \"7R37ec5G62ZfQHW\",\n  \"uniqueKey\": \"https://www.apify.com/\",\n  \"userData\": {\n    \"scrapy_request\": \"gASVMgIAAAAAAAB9lCiMA3VybJSMFWh0dHBzOi8vd3d3LmFwaWZ5LmNvbZSMCGNhbGxiYWNrlE6M\\nB2VycmJhY2uUTowHaGVhZGVyc5R9lChDBkFjY2VwdJRdlEM/dGV4dC9odG1sLGFwcGxpY2F0aW9u\\nL3hodG1sK3htbCxhcHBsaWNhdGlvbi94bWw7cT0wLjksKi8qO3E9MC44lGFDD0FjY2VwdC1MYW5n\\ndWFnZZRdlEMCZW6UYUMKVXNlci1BZ2VudJRdlEMjU2NyYXB5LzIuMTEuMCAoK2h0dHBzOi8vc2Ny\\nYXB5Lm9yZymUYUMPQWNjZXB0LUVuY29kaW5nlF2UQw1nemlwLCBkZWZsYXRllGF1jAZtZXRob2SU\\njANHRVSUjARib2R5lEMAlIwHY29va2llc5R9lIwEbWV0YZR9lCiMEGFwaWZ5X3JlcXVlc3RfaWSU\\njA83UjM3ZWM1RzYyWmZRSFeUjBhhcGlmeV9yZXF1ZXN0X3VuaXF1ZV9rZXmUjBVodHRwczovL3d3\\ndy5hcGlmeS5jb22UjBBkb3dubG9hZF90aW1lb3V0lEdAZoAAAAAAAIwNZG93bmxvYWRfc2xvdJSM\\nDXd3dy5hcGlmeS5jb22UjBBkb3dubG9hZF9sYXRlbmN5lEc/3EanAAAAAHWMCGVuY29kaW5nlIwF\\ndXRmLTiUjAhwcmlvcml0eZRLAIwLZG9udF9maWx0ZXKUiYwFZmxhZ3OUXZSMCWNiX2t3YXJnc5R9\\nlHUu\\n\"\n  },
  "method": "GET",
  "orderNo": null,
  "retryCount": 0,
  "uniqueKey": "https://www.apify.com/",
  "url": "https://www.apify.com/"
}

When I decoded the scrapy_request attribute, I get:

'2}(urlhttps://www.apify.comcallbacknerrbacknheaders/}(CAccept]C?text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8aCAccept-Language]CenaC
User-Agent]C#Scrapy/2.11.0 (+https://scrapy.org/)aCAccept-Encoding]C
gzip, deflateaumethodGETbodyCcookies}meta}(apify_request_id7R37ec5G62ZfQHWapify_request_unique_keyhttps://www.apify.comdownload_timeoutG@f/
download_slot
www.apify.comdownload_latencyG?ÜF§uencodingutf-8priorityK�dont_filterflags]	cb_kwargs}u.

Which still doesn't contain myTest custom attribute.

So if I understand it correctly this is not currently supported via Python SDK, right?

Review

  • @jirimoravcik Could you please do a review for this, since you were investigating the problem, thank you.
  • Additional notes to the changes:
    • I moved request-related functions from scrapy/utils.py to a separate module.
    • I rewrote the unit test files and added new test cases there (regarding the optional fields userData and headers).
    • I tried to do commits in a meaningful way, so hopefully you can use them to review the changes of to_{apify, scrapy}_request functions.

@vdusek vdusek added this to the 81st sprint - Tooling team milestone Jan 23, 2024
@vdusek vdusek self-assigned this Jan 23, 2024
@vdusek vdusek added adhoc Ad-hoc unplanned task added during the sprint. bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team. labels Jan 23, 2024
@vdusek vdusek requested a review from jirimoravcik January 23, 2024 17:46
@vdusek vdusek force-pushed the scrapy-request-fix branch from 01d380f to 5e19b6e Compare January 23, 2024 17:52
@vdusek vdusek force-pushed the scrapy-request-fix branch from 5e19b6e to 7a48669 Compare January 23, 2024 18:27
@vdusek vdusek merged commit 1c68f62 into master Jan 23, 2024
@vdusek vdusek deleted the scrapy-request-fix branch January 23, 2024 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants