Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39262: [C++][Azure][FS] Add default credential auth configuration #39263

Merged

Conversation

Tom-Newton
Copy link
Contributor

@Tom-Newton Tom-Newton commented Dec 17, 2023

Rationale for this change

Default credential is a useful auth option.

What changes are included in this PR?

Implement AzureOptions::ConfigureDefaultCredential plus a little bit of plumbing to go around it.
Created a simple test.

Are these changes tested?

Added a simple unittest that everything initialises happily. This does not actually test a successful authentication. I think to do a real authentication with Azure we would need to run the test against real blob storage and we would need to create various identities which are non-trivial to create. Personally I think this is ok because all the complexity is abstracted away by the Azure SDK.

Are there any user-facing changes?

@Tom-Newton Tom-Newton force-pushed the tomnewton/azure_default_credential/GH-39262 branch from eab9db9 to 56d796f Compare December 17, 2023 16:09
@Tom-Newton Tom-Newton marked this pull request as ready for review December 17, 2023 16:25
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.h Outdated Show resolved Hide resolved
options.backend = AzureBackend::kAzurite; // Irrelevant for this test because it
// doesn't connect to the server.
ARROW_EXPECT_OK(options.ConfigureDefaultCredential("dummy-account-name"));
EXPECT_OK_AND_ASSIGN(auto default_credential_fs, AzureFileSystem::Make(options));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we can use Azurite with the DefaultAzureCredential: https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=visual-studio%2Cblob-storage#azure-sdks

Can we do an operation (CreateDir()?) and check the result to verify whether this filesystem is valid or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a bit of a try but I couldn't get the SSL setup working. I'll give it another go but I don't have great hope for making it work in CI even if I get it working locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll also try it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried and understand why you said SSL.
Azure::Identity::DefaultAzureCredential uses a Bearer token and Azure SDK for C++ rejects it with http: Bearer token authentication is not permitted for non TLS protected (https) endpoints.
If we want to use DefaultAzureCredential with Azurite, we need to generate a key and certificate pair and use it.

I looked at how to set it to Azure SDK for C++. It seems that we need to BlobClientOptions::Transport::Transport:

If we set BlobClientOptions::Transport::Transport, we need to specify curl based HTTP transport implementation or WinHTTP based HTTP transport implementation. They have different configurations for TLS...

How about using TestAzureHierarchicalNSFileSystem to test DefaultAzureCredential? If we use the real Azure service, we don't need to custom TLS configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would definitely be possible to test against real blob storage but there will be a significant amount of manual configuration for all the identities to test all the different authentications.

Then we need to provide details of these identities to TestAzureHierarchicalNSFileSystem either they are required always or we need to add new versions for each auth e.g. TestAzureHierarchicalNSFileSystemWithServicePrincipal.

Also there are some Auth methods that are not feasible to test. For example managed identity can only work on Azure VMs and workload identity can only work in kubernetes.

Personally I don't think this is worthwhile to make a more comprehensive test because of how little complexity there is outside the Azure SDK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is enough, because as @pitrou said the other day: "we are not re-implementing the Azure SDK".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there are some Auth methods that are not feasible to test. For example managed identity can only work on Azure VMs and workload identity can only work in kubernetes.

Oh...

OK. I see.

@github-actions github-actions bot added awaiting review Awaiting review awaiting changes Awaiting changes awaiting committer review Awaiting committer review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Dec 17, 2023
@kou kou changed the title GH-39262: [C++][Azure][FS] default credential auth GH-39262: [C++][Azure][FS] Add default credential auth configuration Dec 18, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Dec 18, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 18, 2023
Tom-Newton and others added 2 commits December 18, 2023 13:21
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

options.backend = AzureBackend::kAzurite; // Irrelevant for this test because it
// doesn't connect to the server.
ARROW_EXPECT_OK(options.ConfigureDefaultCredential("dummy-account-name"));
EXPECT_OK_AND_ASSIGN(auto default_credential_fs, AzureFileSystem::Make(options));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there are some Auth methods that are not feasible to test. For example managed identity can only work on Azure VMs and workload identity can only work in kubernetes.

Oh...

OK. I see.

@kou kou merged commit 659b231 into apache:main Dec 19, 2023
36 of 37 checks passed
@kou kou removed the awaiting change review Awaiting change review label Dec 19, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Dec 19, 2023
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 659b231.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

kou pushed a commit that referenced this pull request Dec 21, 2023
…39319)

### Rationale for this change
Workload identity is a useful Azure authentication method.

### What changes are included in this PR?
Implement `AzureOptions::ConfigureWorkloadIdentityCredential`

### Are these changes tested?
Added a simple test initialising a fileystem using `ConfigureWorkloadIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for #39263. 

### Are there any user-facing changes?
Workload identity authentication is now supported. 

* Closes: #39318

Authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
felipecrv pushed a commit that referenced this pull request Dec 23, 2023
…39321)

### Rationale for this change
Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code). 

### What changes are included in this PR?
- Make filesystem initialisation fail if `account_name_.empty()`. This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case. 
- Remove account name configuration on all auth configs, in favour of setting in separately from the auth configuration. 
- Implement `AzureOptions::ConfigureManagedIdentityCredential`

### Are these changes tested?
Added a simple test initialising a filesystem using `ConfigureManagedIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for #39263. 

### Are there any user-facing changes?
Managed identity authentication is now supported. 

* Closes: #39320

Authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
clayburn pushed a commit to clayburn/arrow that referenced this pull request Jan 23, 2024
…ation (apache#39263)

### Rationale for this change
Default credential is a useful auth option. 

### What changes are included in this PR?
Implement `AzureOptions::ConfigureDefaultCredential` plus a little bit of plumbing to go around it. 
Created a simple test. 

### Are these changes tested?
Added a simple unittest that everything initialises happily. This does not actually test a successful authentication. I think to do a real authentication with Azure we would need to run the test against real blob storage and we would need to create various identities which are non-trivial to create. Personally I think this is ok because all the complexity is abstracted away by the Azure SDK. 

### Are there any user-facing changes?

* Closes: apache#39262

Lead-authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
clayburn pushed a commit to clayburn/arrow that referenced this pull request Jan 23, 2024
…tion (apache#39319)

### Rationale for this change
Workload identity is a useful Azure authentication method.

### What changes are included in this PR?
Implement `AzureOptions::ConfigureWorkloadIdentityCredential`

### Are these changes tested?
Added a simple test initialising a fileystem using `ConfigureWorkloadIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. 

### Are there any user-facing changes?
Workload identity authentication is now supported. 

* Closes: apache#39318

Authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
clayburn pushed a commit to clayburn/arrow that referenced this pull request Jan 23, 2024
…ion (apache#39321)

### Rationale for this change
Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code). 

### What changes are included in this PR?
- Make filesystem initialisation fail if `account_name_.empty()`. This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case. 
- Remove account name configuration on all auth configs, in favour of setting in separately from the auth configuration. 
- Implement `AzureOptions::ConfigureManagedIdentityCredential`

### Are these changes tested?
Added a simple test initialising a filesystem using `ConfigureManagedIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. 

### Are there any user-facing changes?
Managed identity authentication is now supported. 

* Closes: apache#39320

Authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ation (apache#39263)

### Rationale for this change
Default credential is a useful auth option. 

### What changes are included in this PR?
Implement `AzureOptions::ConfigureDefaultCredential` plus a little bit of plumbing to go around it. 
Created a simple test. 

### Are these changes tested?
Added a simple unittest that everything initialises happily. This does not actually test a successful authentication. I think to do a real authentication with Azure we would need to run the test against real blob storage and we would need to create various identities which are non-trivial to create. Personally I think this is ok because all the complexity is abstracted away by the Azure SDK. 

### Are there any user-facing changes?

* Closes: apache#39262

Lead-authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…tion (apache#39319)

### Rationale for this change
Workload identity is a useful Azure authentication method.

### What changes are included in this PR?
Implement `AzureOptions::ConfigureWorkloadIdentityCredential`

### Are these changes tested?
Added a simple test initialising a fileystem using `ConfigureWorkloadIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. 

### Are there any user-facing changes?
Workload identity authentication is now supported. 

* Closes: apache#39318

Authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ion (apache#39321)

### Rationale for this change
Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code). 

### What changes are included in this PR?
- Make filesystem initialisation fail if `account_name_.empty()`. This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case. 
- Remove account name configuration on all auth configs, in favour of setting in separately from the auth configuration. 
- Implement `AzureOptions::ConfigureManagedIdentityCredential`

### Are these changes tested?
Added a simple test initialising a filesystem using `ConfigureManagedIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. 

### Are there any user-facing changes?
Managed identity authentication is now supported. 

* Closes: apache#39320

Authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][FS][Azure] Default credential authentication
3 participants