Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-6995. Update ranger-intg to v2.3.0 #3603

Merged
merged 11 commits into from Jul 21, 2022
Merged

Conversation

DaveTeng0
Copy link
Contributor

What changes were proposed in this pull request?

Update ranger-intg to v2.3.0

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6995

How was this patch tested?

Manual tests build from developer's machine

@DaveTeng0
Copy link
Contributor Author

Hey @errose28 , @smengcl please help review the small change, thanks!

@smengcl
Copy link
Contributor

smengcl commented Jul 18, 2022

Thanks @DaveTeng0 for the patch.

The CI dependency job is failing and looks related: https://github.com/apache/ozone/runs/7366664984

So it looks like ranger-intg 2.3.0 introduced a new dependency jcl-over-slf4j.jar. You could either:

  1. follow the instructions in the job log and add it to the jar-report.txt; or
  2. exclude this transitive dependency if it is unnecessary for RangerClient (which is what we use right now for S3 Multi-Tenancy feature) to function properly at runtime. e.g. https://github.com/apache/ozone/pull/3408/files#diff-9b97a58fd4c3b119bd214f9f27495be6556634fe83c3287dc1dec70cb209423cR159-R184

We might want to try approach (2) first as jcl-over-slf4j seem to have contaminated Ozone's classpath and is breaking this UT (unit test) utility method (not sure if this is the only broken method but is worrying): https://github.com/apache/ozone/runs/7366517427

Error:  org.apache.hadoop.ozone.om.request.volume.TestOMVolumeSetQuotaRequest.testValidateAndUpdateCacheWithQuota  Time elapsed: 0.24 s  <<< ERROR!
java.lang.ClassCastException: org.apache.commons.logging.impl.SLF4JLocationAwareLog cannot be cast to org.apache.commons.logging.impl.Log4JLogger
	at org.apache.ozone.test.GenericTestUtils$LogCapturer.captureLogs(GenericTestUtils.java:256)
	at org.apache.hadoop.ozone.om.request.volume.TestOMVolumeSetQuotaRequest.testValidateAndUpdateCacheWithQuota(TestOMVolumeSetQuotaRequest.java:189)

As long as the CI passes after the new dependency exclusion we should be good (TestMultiTenantAccessController verifies if RangerClient works properly). Otherwise we would try to fix the UT method GenericTestUtils$LogCapturer.captureLogs and use (1) -- though again not sure if there would be other impacts, we'd have to see.

@errose28
Copy link
Contributor

I'm not sure excluding the dependency is the correct fix. It looks like those failing tests are incorrectly using apache commons LogFactory to get the loggers to scan, instead of slf4j LoggerFactory. I think something like this might be better.

@smengcl
Copy link
Contributor

smengcl commented Jul 18, 2022

Thanks @errose28 . But do we want to introduce the extra jar? Looks like RangerClient is fine without the jar.

I do agree that the incorrect usage should be fixed. Maybe we could open another jira for it?

@errose28
Copy link
Contributor

I think we should not exclude a dependency unless it is causing breaking problems. Since the test logging issues were on our end I think we should leave jcl-over-slf4j in and make those small fixes in this PR , but cc @smengcl for a second opinion.

@smengcl
Copy link
Contributor

smengcl commented Jul 18, 2022

@errose28 Alrighty. Let's not exclude jcl-over-slf4j and fix the test util then.

@DaveTeng0
Copy link
Contributor Author

@errose28 Alrighty. Let's not exclude jcl-over-slf4j and fix the test util then.

sure! working on it!

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending CI

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional fixes in this PR @DaveTeng0 LGTM as well.

@neils-dev
Copy link
Contributor

Thanks for this PR @DaveTeng0. As we discussed in the community meeting, @errose28 , can we verify the ranger client integration works well with the s3gateway with Grpc transport? Run the CI with Grpc s3gateway settings:
ozone.om.s3.grpc.server_enabled = true
ozone.om.transport.class = org.apache.hadoop.ozone.om.protocolPB.GrpcOmTransportFactory

Also does the update to 2.3.0 work well both with and without multi-tenant enabled?

@DaveTeng0
Copy link
Contributor Author

CI completed.

@errose28
Copy link
Contributor

Hi @neils-dev. @smengcl did testing for the new ranger-intg jar version and grpc in a secure cluster and found some issues when testing grpc transport with tls. I will let him add the details. The new ranger client version worked the same as the old one though.

Also does the update to 2.3.0 work well both with and without multi-tenant enabled?

This version change is only for the Ranger client used by multi-tenancy in the ranger-intg jar, which is used by OM to write changes to Ranger for multi-tenancy. It does not affect existing Ranger integration outside of multi-tenancy, where OM only reads changes that have been made in Ranger. So the answer is yes. It works with multi-tenancy enabled as we tested, and is not being used when multi-tenancy is disabled.

@smengcl
Copy link
Contributor

smengcl commented Jul 21, 2022

Thanks for this PR @DaveTeng0. As we discussed in the community meeting, @errose28 , can we verify the ranger client integration works well with the s3gateway with Grpc transport? Run the CI with Grpc s3gateway settings: ozone.om.s3.grpc.server_enabled = true ozone.om.transport.class = org.apache.hadoop.ozone.om.protocolPB.GrpcOmTransportFactory

Also does the update to 2.3.0 work well both with and without multi-tenant enabled?

Hey @neils-dev , sorry for the wait. We were dealing with an issue on our end:

We bumped netty (from 4.1.74) to 4.1.77/4.1.78 internally without bumping netty-tcnative. Our build system failed to pick up netty-tcnative 2.0.48 jar correctly in the resulting package, leading to missing netty_tcnative library during OM startup (found when netty debugging is enabled). As a result the client timed out reaching any of the OMs when hdds.grpc.tls.enabled and ozone.om.s3.grpc.server_enabled are both set to true. Now the issue is resolved by bumping netty-tcnative to 2.0.52 (just a few hours ago. Thanks @adoroszlai ).

I have briefly tested S3 gateway with and without a tenant (default s3v volume). Works as expected.

Env:

  1. hdds.grpc.tls.enabled=true is set cluster-wide
  2. ozone.om.s3.grpc.server_enabled=true is set on OM (only affects OM)
  3. ozone.om.transport.class=org.apache.hadoop.ozone.om.protocolPB.GrpcOmTransportFactory is set on S3g
  4. Restarted Ozone service and confirmed the configs were applied in leader OM and S3g's /conf endpoint
$ kinit -kt /path/to/om.keytab om
$ ozone getconf confKey hdds.grpc.tls.enabled
true
$ ozone tenant create tenant1 --om-service-id=ozone1
22/07/21 01:44:14 INFO rpc.RpcClient: Creating Tenant: 'tenant1', with new volume: 'tenant1'
$ ozone tenant user assign --tenant=tenant1 hive --om-service-id=ozone1
export AWS_ACCESS_KEY_ID='tenant1$hive'
export AWS_SECRET_ACCESS_KEY='<RANDOMACCESSKEY>'
$ kdestroy
$ export AWS_ACCESS_KEY_ID='tenant1$hive'
$ export AWS_SECRET_ACCESS_KEY='<RANDOMACCESSKEY>'
$ alias awsc='aws s3api --endpoint https://<S3G>:9879 --ca-bundle /path/to/cacerts.pem'
$ awsc list-buckets
{
    "Buckets": []
}
$ awsc create-bucket --bucket buck1
{
    "Location": "https://<S3G>:9879/buck1"
}
$ awsc list-buckets
{
    "Buckets": [
        {
            "Name": "buck1",
            "CreationDate": "2022-07-21T01:49:23.022000+00:00"
        }
    ]
}
$ awsc list-objects --bucket buck1
$ awsc put-object --bucket buck1 --key awscliv2-uploaded.zip --body awscliv2.zip
$ awsc list-objects --bucket buck1
{
    "Contents": [
        {
            "Key": "awscliv2-uploaded.zip",
            "LastModified": "2022-07-21T01:54:12.548000+00:00",
            "ETag": "2022-07-21T01:54:12.548Z",
            "Size": 47048038,
            "StorageClass": "STANDARD"
        }
    ]
}
$ awsc get-object --bucket buck1 --key awscliv2-uploaded.zip awscliv2-got.zip
{
    "AcceptRanges": "bytes",
    "LastModified": "2022-07-21T01:54:12+00:00",
    "ContentLength": 47048038,
    "CacheControl": "no-cache",
    "ContentType": "application/octet-stream",
    "Expires": "2022-07-21T01:54:45+00:00",
    "Metadata": {}
}
$ sha256sum *.zip
bb8f11423aaa00be3a18f2cbf301d1d835e3ab17f0d91404ef5ee627ef216e58  awscliv2-got.zip
bb8f11423aaa00be3a18f2cbf301d1d835e3ab17f0d91404ef5ee627ef216e58  awscliv2.zip
$ kinit -kt /path/to/om.keytab om
$ ozone sh bucket list /tenant1
[ {
  "metadata" : { },
  "volumeName" : "tenant1",
  "name" : "buck1",
  "storageType" : "DISK",
  "versioning" : false,
  "usedBytes" : 141144114,
  "usedNamespace" : 1,
  "creationTime" : "2022-07-21T01:49:23.022Z",
  "modificationTime" : "2022-07-21T01:49:23.022Z",
  "quotaInBytes" : -1,
  "quotaInNamespace" : -1,
  "bucketLayout" : "LEGACY",
  "owner" : "hive",
  "link" : false
} ]
$ ozone sh key list /tenant1/buck1
[ {
  "volumeName" : "tenant1",
  "bucketName" : "buck1",
  "name" : "awscliv2-uploaded.zip",
  "dataSize" : 47048038,
  "creationTime" : "2022-07-21T01:54:11.868Z",
  "modificationTime" : "2022-07-21T01:54:12.548Z",
  "replicationConfig" : {
    "replicationFactor" : "THREE",
    "requiredNodes" : 3,
    "replicationType" : "RATIS"
  }
} ]
$ ozone fs -ls -R ofs://ozone1/tenant1/
drwxrwxrwx   - om om          0 2022-07-21 01:49 ofs://ozone1/tenant1/buck1
-rw-rw-rw-   3 om om   47048038 2022-07-21 01:54 ofs://ozone1/tenant1/buck1/awscliv2-uploaded.zip
# This is a known issue in displaying current logged in user as the bucket/key owner while it isn't.

Similarly tested with default s3v volume, works fine. The RangerClient version used here is an internal 2.3.0-based build.

Thanks,
Siyao

@neils-dev
Copy link
Contributor

No problem, I know you guys are busy. Thank you, @smengcl for testing out the ranger client upgrade with the S3G using the Grpc transport. And thanks for letting me know of how it was tested (and of course any problems you have putting it together :) ).

Glad to learn the ranger integration with and without tenants works as expected. Thanks again! - Neil

@smengcl
Copy link
Contributor

smengcl commented Jul 21, 2022

@neils-dev You're welcome! :D

@smengcl
Copy link
Contributor

smengcl commented Jul 21, 2022

Thanks @DaveTeng0 for the patch. Thanks @neils-dev for raising the jira. Thanks @errose28 for the review. CI passed. Will merge shortly.

@smengcl smengcl merged commit 2795fda into apache:master Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants