Skip to content
This repository has been archived by the owner on Jan 16, 2023. It is now read-only.

Fix boto3 pre_should_copy_hook closing the connection too soon #220

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vschoener
Copy link

@vschoener vschoener commented Apr 21, 2022

I've been testing collecfast for a while and I've spotted an issue with the connection being closed too soon.

After digging around and trying to understand conversations about thread-safe, I've checked about S3Boto3Storage and I can confirm it's thread-safe as well.

You can check the log here https://pypi.org/project/django-storages/ : 1.6.4 (2017-07-27)
The conversation I found about it #107

As S3Boto3Storage is safe to use with thread, we can simply drop the logic about resetting the connection.
If you keep this logic, it closes the main connection of s3 client and some files are not getting synced at all. You have to run it multiple time to make it works.

I've tested locally by commenting off the code there, and it was working as expected https://github.com/antonagestam/collectfast/blob/master/collectfast/management/commands/collectstatic.py#L106

Here are my settings:

COLLECTFAST_ENABLED = os.getenv("COLLECTFAST_ENABLED") == "true"

# Default strategy for collectfast
COLLECTFAST_STRATEGY = os.getenv("COLLECTFAST_STRATEGY", "collectfast.strategies.filesystem.FileSystemStrategy")

# Define how much collectfast should store entries in the cache. We assume there are 2400 static file in the project
MAX_ENTRIES = int(os.getenv("MAX_ENTRIES", 300))

# Set which cache to use.
COLLECTFAST_CACHE = os.getenv("COLLECTFAST_CACHE", "default")

# Enable Parallel Uploads. Do not try to exceed this value with AWS. It will close the connection
COLLECTFAST_THREADS = int(os.getenv("COLLECTFAST_THREADS", 10))

COLLECTFAST_DEBUG = os.getenv("COLLECTFAST_DEBUG") == "true"

COLLECTFAST_REDIS_CACHE = {
    "BACKEND": "django_redis.cache.RedisCache",
    "LOCATION": os.environ.get("COLLECTFAST_REDIS_URL", "redis://redis/0"),
    "KEY_PREFIX": "collectfast",
    "OPTIONS": {
        "SOCKET_CONNECT_TIMEOUT": int(os.getenv("REDIS_SOCKET_CONNECT_TIMEOUT", default=2)),
        "SOCKET_TIMEOUT": int(os.getenv("REDIS_SOCKET_TIMEOUT", default=2)),
        "MAX_ENTRIES": MAX_ENTRIES,
        "IGNORE_EXCEPTIONS": os.getenv("COLLECTFAST_IGNORE_EXCEPTIONS") == "true",
    },
}

And my .env

STATICFILES_STORAGE="storages.backends.s3boto3.S3Boto3Storage"
COLLECTFAST_STRATEGY="collectfast.strategies.boto3.Boto3Strategy"
MAX_ENTRIES=3000
COLLECTFAST_ENABLED=true
COLLECTFAST_THREAD=20
COLLECTFAST_DEBUG=true
AWS_DEFAULT_ACL=private

COLLECTFAST_CACHE=collectfast_redis_cache
COLLECTFAST_REDIS_URL="redis://0.0.0.0:6379/0"
COLLECTFAST_IGNORE_EXCEPTIONS="true"

WDYT about that?

@codeclimate
Copy link

codeclimate bot commented Apr 21, 2022

Code Climate has analyzed commit 469da6c and detected 0 issues on this pull request.

View more on Code Climate.

@vschoener
Copy link
Author

@antonagestam is it possible for you to take a look at this even if you don't maintain the repo anymore? :)

@antonagestam
Copy link
Owner

antonagestam commented Apr 22, 2022

Hi @vschoener and thanks for your contribution. This looks sound, I'm guessing boto3 has gained thread safety since this was first implemented.

However, I'm not really willing to merge anything until the repository is in a maintainable state again. That means having a passing test suite separated from my personal AWS and GCP accounts specifically (the current ones have expired). I'd be happy to guide anyone willing to step and make that happen along the way, and to review PRs that make that happen.

There's a previous attempt at that here, that is unfortunately to be regarded as abandoned at this point, probably: #218

@vschoener
Copy link
Author

Hey @antonagestam,

Thanks for replying to this PR. Really appreciate. I'm a bit running out of time to improve the test part, I would love to contribute deeper about that and make it easier to maintain but I also have some requirements to make threads work in a short amount of time.

Is that something really important? Maybe because the tests will fail as you don't have AWS/GCP account anymore right?

I've seen the PR in progress trying to mock things around, it seems abandoned indeed.

@antonagestam
Copy link
Owner

Is that something really important?

Yes, I consider it fundamental.

but I also have some requirements to make threads work in a short amount of time.

I appreciate that, and your effort to contribute. I guess nothing should be stopping you from using your fork until things improve though, right?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants