Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASI-880] Async fetching cid metadata #2575

Merged
merged 62 commits into from
Mar 31, 2022
Merged

[ASI-880] Async fetching cid metadata #2575

merged 62 commits into from
Mar 31, 2022

Conversation

isaacsolo
Copy link
Contributor

@isaacsolo isaacsolo commented Feb 20, 2022

Description

This change speeds up fetching CID metadata. Before, we were using nested thread pool executors with the requests library. This has costly overhead with making many connections to gateway endpoints for each CID and getting slow responses.

Using async here allows us to avoid blocking I/O and flattens the requests so all CID requests are made at once in a single thread and we process responses as they come in.

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
    else:
       print("Use Threads")
else:
    print("Multi Processing")

http://masnun.rocks/2016/10/06/async-python-the-different-forms-of-concurrency/

Tests

Tested on sandbox. Benchmarking performance fetching CID metadata for a block.

Before (changes from release branch + log)
https://github.com/AudiusProject/audius-protocol/commits/is-release-log

{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 3 CIDs in 0:00:00.263058 seconds", "timestamp": "2022-02-23 06:26:52,763"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 1 CIDs in 0:00:00.135504 seconds", "timestamp": "2022-02-23 06:26:54,846"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 5 CIDs in 0:00:00.367519 seconds", "timestamp": "2022-02-23 06:26:58,755"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 1 CIDs in 0:00:00.119488 seconds", "timestamp": "2022-02-23 06:27:01,225"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 6 CIDs in 0:00:00.420103 seconds", "timestamp": "2022-02-23 06:27:02,968"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 4 CIDs in 0:00:00.279827 seconds", "timestamp": "2022-02-23 06:24:18,706"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 3 CIDs in 0:00:00.229922 seconds", "timestamp": "2022-02-23 06:24:20,128"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 6 CIDs in 0:00:00.419742 seconds", "timestamp": "2022-02-23 06:24:21,965"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 13 CIDs in 0:00:00.824180 seconds", "timestamp": "2022-02-23 06:24:24,584"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 6 CIDs in 0:00:00.337738 seconds", "timestamp": "2022-02-23 06:24:28,940"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 7 CIDs in 0:00:00.388013 seconds", "timestamp": "2022-02-23 06:24:31,264"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 7 CIDs in 0:00:00.468619 seconds", "timestamp": "2022-02-23 06:24:34,507"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 4 CIDs in 0:00:00.272060 seconds", "timestamp": "2022-02-23 06:24:37,540"}

After:

{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 1 CIDs in 0:00:00.095177 seconds", "timestamp": "2022-02-23 06:02:55,534"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 5 CIDs in 0:00:00.167837 seconds", "timestamp": "2022-02-23 06:02:56,627"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 7 CIDs in 0:00:00.166352 seconds", "timestamp": "2022-02-23 06:02:59,290"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 11 CIDs in 0:00:00.240517 seconds", "timestamp": "2022-02-23 06:03:02,137"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 4 CIDs in 0:00:00.126204 seconds", "timestamp": "2022-02-23 06:03:05,840"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 2 CIDs in 0:00:00.141051 seconds", "timestamp": "2022-02-23 06:03:08,646"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 3 CIDs in 0:00:00.111742 seconds", "timestamp": "2022-02-23 06:03:09,938"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 3 CIDs in 0:00:00.221083 seconds", "timestamp": "2022-02-23 06:03:11,895"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 9 CIDs in 0:00:00.263251 seconds", "timestamp": "2022-02-23 06:03:14,363"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 8 CIDs in 0:00:00.279897 seconds", "timestamp": "2022-02-23 06:03:17,840"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 4 CIDs in 0:00:00.252667 seconds", "timestamp": "2022-02-23 07:04:54,381"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 10 CIDs in 0:00:00.240785 seconds", "timestamp": "2022-02-23 07:04:56,257"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 3 CIDs in 0:00:00.115498 seconds", "timestamp": "2022-02-23 07:04:59,530"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 8 CIDs in 0:00:00.175569 seconds", "timestamp": "2022-02-23 07:05:02,178"}

The longest fetch I've seen with async is for 12 CIDs.
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 12 CIDs in 0:00:00.319813 seconds", "timestamp": "2022-02-23 07:05:25,287"}
{"levelno": 20, "level": "INFO", "msg": "index.py | finished fetching 17 CIDs in 0:00:00.289499 seconds", "timestamp": "2022-02-23 07:44:39,623"}


How will this change be monitored?

Soaking on sandbox and prod. Verify CIDs are fetched, block diff is low.

rickyrombo and others added 30 commits February 14, 2022 16:52
* Revert "[AUD-1374] [AUD-1372] Add new favorites endpoints and include referenced metadata (#2385)" (#2484)

This reverts commit 8b9b0bd.

* Harden openresty code (#2496)

* Fix notification prev entries

* Fix test

* Fix lint

* Update single entry query and lint fix

Co-authored-by: Marcus Pasell <rickyrombo@users.noreply.github.com>
Co-authored-by: Cheran <cheran.v.senthil@gmail.com>
Co-authored-by: KJ Shanks <kyle.j.shanks@gmail.com>
…2513) (#2518)

* Explicitly log transactions where a user is not found, and set user_id = 0
* Can be rectified in a follow up, but unblocks indexing while still maintaining a record of disbursement
* Recalculate challenges

* Use real table
* Bump discovery version

* Bump content version
* Fix is current

* Fix user history

* Fix revert case

* Fix or query

* Lint

Co-authored-by: Dheeraj Manjunath <dheerajmanju1@gmail.com>
* Add is_current in additional missing spot

* Fix comment

* Fix broken query

* Fix lint
* Upgrade libs in content-node to 1.2.80

* Add to code
* add fetch ipfs metadata timeout and optimize user indexing

* remove extra logs
* Fix autocomplete regression

* Actually fix
* Update SPL dependency

* Fix balance indexing

* Fix

* Fix circular dep

* Fix lint

* Union type
* Process user operations in parallel, ensuring that per user operations are still handled serially to negate potential inconsistencies
* Timeout and threadpool changes

* Fix retrieved_metadata bug

* Fix lint

* PR revisions

* Lower block processing window
* Gut IPFS

* Fixup

* add cache log

* Pin deps for flask

* add colon

* Put index_network_peers back + cleanup

Co-authored-by: Raymond Jacobson <ray@audius.co>
Co-authored-by: Isaac <isaac@audius.co>
* Add index_blocks performance tracking

* Add trailing indexing perf stats
Copy link
Contributor

@joaquincasares joaquincasares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for sharing this new async/await form of concurrency as well as the related blog post.

Nothing stood out as detrimental in my review and I like the new flow!

@piazzatron
Copy link
Contributor

i didn't have any changes related to asyncio so whatever you decide. honestly i'm fine with this too. whatever is easiest

from what i've read, create_task is maybe more idiomatic but they should be functionally identical. no preference here

Copy link
Contributor

@piazzatron piazzatron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome – mostly just have little nits and a general request for adding types where possible. Really excited to get this in, tyyyyy for patience on slow review time

discovery-provider/src/tasks/index.py Outdated Show resolved Hide resolved
discovery-provider/src/tasks/index.py Outdated Show resolved Hide resolved
discovery-provider/src/utils/ipfs_lib.py Show resolved Hide resolved
discovery-provider/src/utils/ipfs_lib.py Outdated Show resolved Hide resolved
discovery-provider/src/utils/ipfs_lib.py Show resolved Hide resolved
discovery-provider/src/utils/ipfs_lib.py Outdated Show resolved Hide resolved
discovery-provider/src/utils/ipfs_lib.py Outdated Show resolved Hide resolved
discovery-provider/src/utils/ipfs_lib.py Outdated Show resolved Hide resolved
discovery-provider/src/utils/ipfs_lib.py Show resolved Hide resolved
discovery-provider/src/tasks/index.py Show resolved Hide resolved
@isaacsolo isaacsolo changed the base branch from release-v0.3.52 to master March 28, 2022 21:35
@gitguardian
Copy link

gitguardian bot commented Mar 28, 2022

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

@isaacsolo isaacsolo merged commit 6c13dc3 into master Mar 31, 2022
@isaacsolo isaacsolo deleted the is-improve-fetch branch March 31, 2022 16:49
@AudiusProject AudiusProject deleted a comment from linear bot Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet