Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow support for S3 compatible storages #9474

Closed
vaddisrinivas opened this issue Jun 22, 2020 · 9 comments
Closed

Airflow support for S3 compatible storages #9474

vaddisrinivas opened this issue Jun 22, 2020 · 9 comments
Labels

Comments

@vaddisrinivas
Copy link

Hi,

Curious to know about the support for S3 compatible storages like DELL ECS, MINIO ETC

Thanks

@boring-cyborg
Copy link

boring-cyborg bot commented Jun 22, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@dimon222
Copy link
Contributor

Yes, it works as long as you specify endpoint url. However, it seems to be broken to serving logs in UI for some users on latest release 1.10.10 (works fine in 1.10.9)

@vaddisrinivas
Copy link
Author

vaddisrinivas commented Jun 22, 2020

It doesnt work for me even as I continue to provide HOST in the connection and also other relevant parameters.
If there is any alternative for the same, please help with that.
Also @dimon222, can you please share a sample connection screenshot/ instruction along with configurations thus enabled to facilitate remote logging?
Thanks.

@dispensable
Copy link

dispensable commented Jun 24, 2020

We are currently using Ceph RGW as our airflow cluster logging backend. It works perfect. But setting the connection is a little bit tricky U should not set host in the host form field otherwise set host in the Extra fields with {"host": "http://YOUR_S3_URL:PORT"}, just leave the host/port/schema form fields blank.

@vaddisrinivas
Copy link
Author

hi @dispensable , will try this and update the ticket asap!

@vaddisrinivas
Copy link
Author

hi @dispensable / all,

How do I pass the access_key_id and other parameters to connect to that? can someone please help on that?

@dimon222
Copy link
Contributor

dimon222 commented Jun 26, 2020

hi @dispensable / all,

How do I pass the access_key_id and other parameters to connect to that? can someone please help on that?

Username/Password fields in respective connection
Extra args I believe not supported apart of mentioned above host.

@vaddisrinivas
Copy link
Author

hi @dispensable/@dimon222, thanks for your help.

I have SUCCESSFULLY enabled pushing airflow logs to s3 compatible bucket by following what was mentioned above,
in brief :

  • created a new connection with -> {"host": "http://myhost:myport", "aws_access_key_id" :"myaccesskey/username"
    ,"aws_secret_access_key": "myreallybigsecretkey"}

  • then I went ahead, modified the configuration by adding
    AIRFLOW__CORE__REMOTE_LOGGING: True,
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: connectionthatwassetearlier,
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "s3://bucketname/pathorfolderonBucket",

  • restarted Airflow!

@Asgoret
Copy link

Asgoret commented Jun 30, 2020

Everywhere is HTTP, but what about HTTPS? airflow doesn't support HTTPS S3 endpoints? I've got a very odd error (An error occurred (InvalidAccessKeyId) when calling the ListBuckets operation: The AWS Access Key Id you provided does not exist in our records) when use HTTPS-based endpoint. In dev with HTTP, all works perfectly.
cc @dimon222

ok, how it works.
Connection in GUI:

Name: Some name
Type: S3
Host: <empty>
Schema: <empty>
Login: Your ID from minio
Password: You password key
Port: <empty>
Extras: {"host": "https://domain"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants