-
Notifications
You must be signed in to change notification settings - Fork 0
Timx 530 prep work and s3 client #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,110 @@ | ||
| """tests/test_s3client.py""" | ||
|
|
||
| # ruff: noqa: PLR2004, SLF001 | ||
|
|
||
| import pytest | ||
|
|
||
| from timdex_dataset_api.utils import S3Client | ||
|
|
||
|
|
||
| def test_s3client_init(): | ||
| """Test S3Client initialization.""" | ||
| client = S3Client() | ||
| assert client.resource is not None | ||
|
|
||
|
|
||
| def test_s3client_init_with_minio_env(caplog, monkeypatch): | ||
| """Test S3Client initialization with MinIO environment variables.""" | ||
| caplog.set_level("DEBUG") | ||
|
|
||
| monkeypatch.setenv("MINIO_S3_ENDPOINT_URL", "http://localhost:9000") | ||
| monkeypatch.setenv("MINIO_USERNAME", "minioadmin") | ||
| monkeypatch.setenv("MINIO_PASSWORD", "minioadmin") | ||
| monkeypatch.setenv("MINIO_REGION", "us-east-1") | ||
|
|
||
| client = S3Client() | ||
| assert client.resource is not None | ||
| assert "MinIO env vars detected, using for S3Client" in caplog.text | ||
|
|
||
|
|
||
| def test_split_s3_uri(): | ||
| """Test _split_s3_uri method.""" | ||
| client = S3Client() | ||
| bucket, key = client._split_s3_uri("s3://timdex/path/to/file.txt") | ||
| assert bucket == "timdex" | ||
| assert key == "path/to/file.txt" | ||
|
|
||
|
|
||
| def test_split_s3_uri_invalid(): | ||
| """Test _split_s3_uri method with invalid URI.""" | ||
| client = S3Client() | ||
| with pytest.raises(ValueError, match="Invalid S3 URI"): | ||
| client._split_s3_uri("timdex/path/to/file.txt") | ||
|
|
||
|
|
||
| def test_upload_download_file(mock_s3_resource, tmp_path): | ||
| """Test upload_file and download_file methods.""" | ||
| client = S3Client() | ||
|
|
||
| # Create a test file | ||
| test_file = tmp_path / "test.txt" | ||
| test_file.write_text("test content") | ||
|
|
||
| # Upload the file | ||
| s3_uri = "s3://timdex/test.txt" | ||
| client.upload_file(test_file, s3_uri) | ||
|
|
||
| # Download the file to a different location | ||
| download_path = tmp_path / "downloaded.txt" | ||
| client.download_file(s3_uri, download_path) | ||
|
|
||
| # Verify the content | ||
| assert download_path.read_text() == "test content" | ||
|
|
||
|
|
||
| def test_delete_file(mock_s3_resource, tmp_path): | ||
| """Test delete_file method.""" | ||
| client = S3Client() | ||
|
|
||
| # Create and upload a test file | ||
| test_file = tmp_path / "test.txt" | ||
| test_file.write_text("test content") | ||
| s3_uri = "s3://timdex/test.txt" | ||
| client.upload_file(test_file, s3_uri) | ||
|
|
||
| # Delete the file | ||
| client.delete_file(s3_uri) | ||
|
|
||
| # Verify the file is deleted | ||
| bucket = mock_s3_resource.Bucket("timdex") | ||
| objects = list(bucket.objects.all()) | ||
| assert len(objects) == 0 | ||
|
|
||
|
|
||
| def test_delete_folder(mock_s3_resource, tmp_path): | ||
| """Test delete_folder method.""" | ||
| client = S3Client() | ||
|
|
||
| # Create and upload test files | ||
| for i in range(3): | ||
| test_file = tmp_path / f"test{i}.txt" | ||
| test_file.write_text(f"test content {i}") | ||
| s3_uri = f"s3://timdex/folder/test{i}.txt" | ||
| client.upload_file(test_file, s3_uri) | ||
|
|
||
| # Upload a file outside the folder | ||
| other_file = tmp_path / "other.txt" | ||
| other_file.write_text("other content") | ||
| client.upload_file(other_file, "s3://timdex/other.txt") | ||
|
|
||
| # Delete the folder | ||
| deleted_keys = client.delete_folder("s3://timdex/folder/") | ||
|
|
||
| # Verify only folder contents are deleted | ||
| assert len(deleted_keys) == 3 | ||
| assert all(key.startswith("folder/") for key in deleted_keys) | ||
|
|
||
| bucket = mock_s3_resource.Bucket("timdex") | ||
| objects = list(bucket.objects.all()) | ||
| assert len(objects) == 1 | ||
| assert objects[0].key == "other.txt" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| """timdex_dataset_api/utils.py""" | ||
|
|
||
| import logging | ||
| import os | ||
| import pathlib | ||
| from urllib.parse import urlparse | ||
|
|
||
| import boto3 | ||
| from mypy_boto3_s3.service_resource import S3ServiceResource | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class S3Client: | ||
| def __init__( | ||
| self, | ||
| ) -> None: | ||
| self.resource = self._create_resource() | ||
jonavellecuerdo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| def _create_resource(self) -> S3ServiceResource: | ||
| """Instantiate a boto3 S3 resource. | ||
| If env var MINIO_S3_ENDPOINT_URL is set, assume using local set of MinIO env vars. | ||
| """ | ||
| endpoint_url = os.getenv("MINIO_S3_ENDPOINT_URL") | ||
| if endpoint_url: | ||
| logger.debug("MinIO env vars detected, using for S3Client.") | ||
| return boto3.resource( | ||
| "s3", | ||
| endpoint_url=endpoint_url, | ||
| aws_access_key_id=os.getenv("MINIO_USERNAME"), | ||
| aws_secret_access_key=os.getenv("MINIO_PASSWORD"), | ||
| region_name=os.getenv("MINIO_REGION", "us-east-1"), | ||
| ) | ||
| return boto3.resource("s3") | ||
|
|
||
| def download_file(self, s3_uri: str, local_path: str | pathlib.Path) -> None: | ||
| bucket, key = self._split_s3_uri(s3_uri) | ||
| local_path = pathlib.Path(local_path) | ||
| local_path.parent.mkdir(parents=True, exist_ok=True) | ||
| self.resource.Bucket(bucket).download_file(key, str(local_path)) | ||
| logger.info(f"Downloaded {s3_uri} to {local_path}") | ||
|
|
||
| def upload_file(self, local_path: str | pathlib.Path, s3_uri: str) -> None: | ||
| bucket, key = self._split_s3_uri(s3_uri) | ||
| local_path = pathlib.Path(local_path) | ||
| self.resource.Bucket(bucket).upload_file(str(local_path), key) | ||
| logger.info(f"Uploaded {local_path} to {s3_uri}") | ||
|
|
||
| def delete_file(self, s3_uri: str) -> None: | ||
| bucket, key = self._split_s3_uri(s3_uri) | ||
| self.resource.Object(bucket, key).delete() | ||
| logger.info(f"Deleted {s3_uri}") | ||
|
|
||
| def delete_folder(self, s3_uri: str) -> list[str]: | ||
| """Delete all objects whose keys start with the given prefix.""" | ||
| bucket, prefix = self._split_s3_uri(s3_uri) | ||
| bucket_obj = self.resource.Bucket(bucket) | ||
| receipt = bucket_obj.objects.filter(Prefix=prefix).delete() | ||
|
|
||
| deleted_keys = [] | ||
| for request in receipt: | ||
| deleted_keys.extend([item["Key"] for item in request["Deleted"]]) | ||
| logger.info(f"Deleted {deleted_keys}") | ||
| return deleted_keys | ||
|
|
||
| @staticmethod | ||
| def _split_s3_uri(s3_uri: str) -> tuple[str, str]: | ||
| """Validate and split an S3 URI into (bucket, key).""" | ||
| parsed = urlparse(s3_uri) | ||
| if parsed.scheme != "s3" or not parsed.netloc or not parsed.path: | ||
| raise ValueError(f"Invalid S3 URI: {s3_uri!r}") | ||
|
|
||
| bucket = parsed.netloc | ||
| key = parsed.path.lstrip("/") # strip leading slash from /key | ||
| return bucket, key | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does line 37 set the logging level to
logging.WARNINGonly to be set tologging.DEBUGa few lines after? 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 37 sets the
WARNINGforlogging.basicConfig()which is kind of global. But then we set the actualtimdex_dataset_apilogger instance toDEBUG.I'll admit, there are probably other ways to do this... but... it's works nicely locally for dev work like so, quite literally from something I was just working on:
This ensures
DEBUGlogging from this library, but everything else is pretty quiet.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Got it. There's a part of me that thinks
logger->tda_loggerbut 'tis a non-blocking suggestion!