Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devcontainers setup #76

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

devcontainers setup #76

wants to merge 24 commits into from

Conversation

weber8thomas
Copy link
Collaborator

@weber8thomas weber8thomas commented May 24, 2024

Current setup

  • devcontainer.json draft : tried by setting up the docker-compose as the root container to be used to set up GH codespaces but had issues to troubleshoot what was going on. Switched to dind (docker in docker) and instanciate the docker-compose a posteriori.

  • Created setup_codespaces.sh script to structure the different actions to be started:

    • Populate .env based on the secrets defined at the GH repo level
    • Clone the depictio-data example data repo
    • Create a python venv for the depictio-cli
    • Trigger the create-user-and-return-token command from the depictio-cli to create the "Paul Cezanne" user, register the token & configuration in ~/.depictio/config.yaml
    • To be optimised and changed: read that token to complete the .env and restart the docker-compose. No user management is done at the Frontend level at the moment but that should be harmonised in the future.

    Issues

  • As mentioned above, code restructuration needs to be done in order not to rely on this AUTH_TMP_TOKEN at the depictio startup but to create it afterwards and use it in the GUI/CLI

  • Everything is working on my side until the step where polars is trying to write on the minio depictio-bucket created at the beginning of the setupcommand. Once data is aggregated and polars wants to push/write the delta-table to S3, I currently have the following issue :

Error occurred while writing Delta table: Generic S3 error: Client error with status 403 Forbidden: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidTokenId</Code><Message>The security token included in the request is invalid</Message><Key>delta-table/_delta_log/_last_checkpoint</Key><BucketName>new-bucket</BucketName><Resource>/new-bucket/delta-table/_delta_log/_last_checkpoint</Resource><RequestId>17D273FF272218D4</RequestId><HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId></Error>

This was not present on my local setup.

I tried to debug why doing the following:

from minio import Minio
from minio.error import S3Error
import io
import polars as pl
import deltalake

!unset AWS_ACCESS_KEY_ID
!unset AWS_SECRET_ACCESS_KEY
!unset AWS_SESSION_TOKEN
!unset AWS_SECURITY_TOKEN


# Create a new MinIO client with the new user's credentials
new_user_client = Minio(
    "localhost:9000",
    access_key="newuser",
    secret_key="newuser123",
    secure=False
)

# Verify write access by uploading a file
file_name = "testfile.txt"
content = b"Hello, this is a test file."
content_stream = io.BytesIO(content)

try:
    new_user_client.put_object(bucket_name, file_name, data=content_stream, length=len(content))
    print(f"File '{file_name}' uploaded successfully to bucket '{bucket_name}'.")
except S3Error as e:
    print(f"Error occurred during upload: {e}")

# Verify read access by downloading the file
try:
    response = new_user_client.get_object(bucket_name, file_name)
    data = response.read()
    print(f"File '{file_name}' downloaded successfully with content: {data.decode('utf-8')}")
except S3Error as e:
    print(f"Error occurred during download: {e}")

# Write a DeltaLake table using the new user's credentials
minio_storage_options = {
    "endpoint_url": "http://localhost:9000",
    "aws_access_key_id": "minio",
    "aws_secret_access_key": "minio123",
    "use_ssl": "False",
    "region_name": "us-east-1",
    "signature_version": "s3v4",
    "aws_s3_allow_unsafe_rename": "true",
    "aws_s3_use_arn_region": "true",
    "aws_allow_http": "true",
}

df = pl.DataFrame(
    {
        "foo": [1, 2, 3, 4, 5],
        "bar": [6, 7, 8, 9, 10],
        "ham": ["a", "b", "c", "d", "e"],
    }
)

try:
    deltalake.write_deltalake(f"s3://{bucket_name}/delta-table/", mode="overwrite", overwrite_schema=True, storage_options=minio_storage_options, data=df.to_pandas())
    print(f"Delta table written successfully to bucket '{bucket_name}'.")
except Exception as e:
    print(f"Error occurred while writing Delta table: {e}")

Result:

File 'testfile.txt' uploaded successfully to bucket 'new-bucket'.
File 'testfile.txt' downloaded successfully with content: Hello, this is a test file.
Error occurred while writing Delta table: Generic S3 error: Client error with status 403 Forbidden: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidTokenId</Code><Message>The security token included in the request is invalid</Message><Key>delta-table/_delta_log/_last_checkpoint</Key><BucketName>new-bucket</BucketName><Resource>/new-bucket/delta-table/_delta_log/_last_checkpoint</Resource><RequestId>17D273FF272218D4</RequestId><HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId></Error>

minio & boto3 APIs are working but not deltalake interface (using botocore & s3fs).

I tried to log the container, results below:

localhost:9000 [REQUEST s3.PutObject] [2024-05-24T15:03:56.694] [Client IP: 172.18.0.1]
localhost:9000 PUT /new-bucket/testfile.txt
localhost:9000 Proto: HTTP/1.1
localhost:9000 Host: localhost:9000
localhost:9000 X-Amz-Content-Sha256: d736345dab82fb01e17b25306ebfabe6c22e00b691a7b8007ad1c70609f36d19
localhost:9000 X-Amz-Date: 20240524T150356Z
localhost:9000 Accept-Encoding: identity
localhost:9000 Authorization: AWS4-HMAC-SHA256 Credential=newuser/20240524/us-east-1/s3/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-content-sha256;x-amz-date, Signature=d8a231bd155590a30441dd1826d996331262d341bd15b5909cee42eb02fc1576
localhost:9000 Content-Length: 27
localhost:9000 Content-Type: application/octet-stream
localhost:9000 User-Agent: MinIO (Linux; x86_64) minio-py/7.2.7
localhost:9000 <BLOB>
localhost:9000 [RESPONSE] [2024-05-24T15:03:56.737] [ Duration 42.756ms TTFB 0s ↑ 133 B  ↓ 0 B ]
localhost:9000 200 OK
localhost:9000 Accept-Ranges: bytes
localhost:9000 Content-Length: 0
localhost:9000 ETag: "c11eb9c412703a01d12867e962f3c96a"
localhost:9000 Server: MinIO
localhost:9000 Vary: Origin,Accept-Encoding
localhost:9000 X-Xss-Protection: 1; mode=block
localhost:9000 Strict-Transport-Security: max-age=31536000; includeSubDomains
localhost:9000 X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
localhost:9000 X-Amz-Request-Id: 17D274FCB4BCF59E
localhost:9000 X-Content-Type-Options: nosniff
localhost:9000 <BLOB>
localhost:9000 
127.0.0.1:9000  [OS os.OpenFileR] [2024-05-24T15:03:56.745] /data/new-bucket/testfile.txt/xl.meta 46.938µs
127.0.0.1:9000  [STORAGE storage.ReadXL] [2024-05-24T15:03:56.745] /data new-bucket testfile.txt 78.255µs
localhost:9000 [REQUEST s3.GetObject] [2024-05-24T15:03:56.745] [Client IP: 172.18.0.1]
localhost:9000 GET /new-bucket/testfile.txt
localhost:9000 Proto: HTTP/1.1
localhost:9000 Host: localhost:9000
localhost:9000 Content-Length: 0
localhost:9000 User-Agent: MinIO (Linux; x86_64) minio-py/7.2.7
localhost:9000 X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
localhost:9000 X-Amz-Date: 20240524T150356Z
localhost:9000 Accept-Encoding: identity
localhost:9000 Authorization: AWS4-HMAC-SHA256 Credential=newuser/20240524/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=df62d3d0db2c653ef5f706e4c859e105643481b95c2cb123d400ccde54e116b5
localhost:9000 <BLOB>
localhost:9000 [RESPONSE] [2024-05-24T15:03:56.746] [ Duration 856µs TTFB 827.332µs ↑ 93 B  ↓ 27 B ]
localhost:9000 200 OK
localhost:9000 Server: MinIO
localhost:9000 Strict-Transport-Security: max-age=31536000; includeSubDomains
localhost:9000 Vary: Origin,Accept-Encoding
localhost:9000 X-Amz-Request-Id: 17D274FCB7CB0D5E
localhost:9000 Accept-Ranges: bytes
localhost:9000 Content-Type: application/octet-stream
localhost:9000 ETag: "c11eb9c412703a01d12867e962f3c96a"
localhost:9000 Last-Modified: Fri, 24 May 2024 15:03:56 GMT
localhost:9000 X-Content-Type-Options: nosniff
localhost:9000 Content-Length: 27
localhost:9000 X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
localhost:9000 X-Xss-Protection: 1; mode=block
localhost:9000 <BLOB>
localhost:9000 
localhost:9000 [REQUEST s3.GetObject] [2024-05-24T15:03:56.784] [Client IP: 172.18.0.1]
localhost:9000 GET /new-bucket/delta-table/_delta_log/_last_checkpoint
localhost:9000 Proto: HTTP/1.1
localhost:9000 Host: localhost:9000
localhost:9000 Content-Length: 0
localhost:9000 User-Agent: object_store/0.9.1
localhost:9000 X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
localhost:9000 X-Amz-Date: 20240524T150356Z
localhost:9000 X-Amz-Security-Token: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI2NjUwNTQ2Yjk4ZmMxMzMyYWI2NTVlZTYiLCJleHAiOjE3OTQzMDA1MjN9.EBk3JsSgO6BBURfvNbFAj4_nvZCi6vjSSRyG6r3roBPKlVAhcl6K8crprxVM2X71kq6rRQhuk_zLOmXLUq2xaQHZ3kJbRdIBpqtJOfAXIURN8_M6wtY_iEjPLVBWI0a9WZKuvQ8DMnrcTV6_RbufcQwzMsxi63qvEOkoT_VZL06v1qBceXYztmiEKgf-Br-rr2CQ32A1GJ23d6kDsXJHCwBrCL2FA3qW1vIb0WLgCcQCR9jsi4v8us-i6VUkcMz65J0AiIILZ-82XkgJFQI8BACiaeXY_kvvAbjuVf_3auEVI9K0R-7luQng1gRyuLwZ8-KL7j6pAkHW72zLmQ7ZbQ
localhost:9000 Accept: */*
localhost:9000 Authorization: AWS4-HMAC-SHA256 Credential=minio/20240524/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=60e3b79daa0adf24c7be4c85758b45bc9e2559070a96f3e5d3f7b1c3c67002aa
localhost:9000 <BLOB>
localhost:9000 [RESPONSE] [2024-05-24T15:03:56.785] [ Duration 151µs TTFB 136.595µs ↑ 105 B  ↓ 430 B ]
localhost:9000 403 Forbidden
localhost:9000 Accept-Ranges: bytes
localhost:9000 Content-Length: 430
localhost:9000 Content-Type: application/xml
localhost:9000 Server: MinIO
localhost:9000 X-Amz-Request-Id: 17D274FCBA223127
localhost:9000 X-Content-Type-Options: nosniff
localhost:9000 X-Xss-Protection: 1; mode=block
localhost:9000 Strict-Transport-Security: max-age=31536000; includeSubDomains
localhost:9000 Vary: Origin,Accept-Encoding
localhost:9000 X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
localhost:9000 <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidTokenId</Code><Message>The security token included in the request is invalid</Message><Key>delta-table/_delta_log/_last_checkpoint</Key><BucketName>new-bucket</BucketName><Resource>/new-bucket/delta-table/_delta_log/_last_checkpoint</Resource><RequestId>17D274FCBA223127</RequestId><HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId></Error>
localhost:9000 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

devcontainer setup
3 participants