Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create Iceberg format from Hudi source while using S3 bucket as tableBasePath location in config file. #432

Closed
buddhayan opened this issue May 8, 2024 · 0 comments

Comments

@buddhayan
Copy link

I encountered an issue while attempting to convert Hudi to Iceberg format. When I provide a tableBasePath as a local file path, the conversion works fine. However, when I use tableBasePath as an S3 bucket, I encounter the below error. I'm testing this functionality from my AWS Cloud9 (EC2) instance. Please review the config file and error message provided, and advise if there's something I'm missing.

I followed the documentation (Creating your first interoperable table) to build the utilities-0.1.0-SNAPSHOT-bundled.jar and people hudi dataset. Then executed below command from AWS Cloud9 instance terminal,
java -jar utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig my_config.local.yaml

Config file my_config.yaml

sourceFormat: HUDI
targetFormats:
  - ICEBERG
datasets:
  -
    tableBasePath: s3://bucket-name-eu-west-1/temp/xtable_data/people/
    tableName: people
    partitionSpec: city:VALUE

Error:

~/environment/xtable-poc $ java -jar utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig my_config.local.yaml
WARNING: Runtime environment or build system does not support multi-release JARs. This will impact location-based features.
2024-05-08 18:17:36 INFO  org.apache.xtable.utilities.RunSync:148 - Running sync for basePath s3://bucket-name-eu-west-1/temp/xtable_data/people/ for following table formats [ICEBERG]
2024-05-08 18:17:36 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://bucket-name-eu-west-1/temp/xtable_data/people
2024-05-08 18:17:36 WARN  org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-05-08 18:17:37 WARN  org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-05-08 18:17:37 ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl:555 - Error getting localhost name. Using 'localhost'...
java.net.UnknownHostException: ip-**-**-**-***: ip-**-**-**-***: Name or service not known
        at java.net.InetAddress.getLocalHost(InetAddress.java:1670) ~[?:?]
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.getHostname(MetricsSystemImpl.java:553) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configureSystem(MetricsSystemImpl.java:489) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:485) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:188) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:163) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.s3a.S3AInstrumentation.getMetricsSystem(S3AInstrumentation.java:249) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.s3a.S3AInstrumentation.registerAsMetricsSource(S3AInstrumentation.java:272) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.s3a.S3AInstrumentation.<init>(S3AInstrumentation.java:229) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:519) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:116) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hudi.common.table.HoodieTableMetaClient.getFs(HoodieTableMetaClient.java:308) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:139) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:692) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hudi.common.table.HoodieTableMetaClient.access$000(HoodieTableMetaClient.java:85) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:774) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.hudi.HudiConversionSourceProvider.getConversionSourceInstance(HudiConversionSourceProvider.java:42) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.hudi.HudiConversionSourceProvider.getConversionSourceInstance(HudiConversionSourceProvider.java:31) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:92) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
Caused by: java.net.UnknownHostException: ip-**-**-**-***: Name or service not known
        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) ~[?:?]
        at java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:930) ~[?:?]
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1543) ~[?:?]
        at java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848) ~[?:?]
        at java.net.InetAddress.getAllByName0(InetAddress.java:1533) ~[?:?]
        at java.net.InetAddress.getLocalHost(InetAddress.java:1665) ~[?:?]
        ... 25 more
2024-05-08 18:17:37 WARN  org.apache.hadoop.fs.s3a.SDKV2Upgrade:39 - Directly referencing AWS SDK V1 credential provider com.amazonaws.auth.DefaultAWSCredentialsProviderChain. AWS SDK V1 credential providers will be removed once S3A is upgraded to SDK V2
2024-05-08 18:17:38 INFO  org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://bucket-name-eu-west-1/temp/xtable_data/people/.hoodie/hoodie.properties
Exception in thread "main" java.lang.NoSuchMethodError: 'java.lang.Object org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(org.apache.hadoop.fs.statistics.DurationTracker, org.apache.hadoop.util.functional.CallableRaisingIOE)'
        at org.apache.hadoop.fs.s3a.Invoker.onceTrackingDuration(Invoker.java:147)
        at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:282)
        at org.apache.hadoop.fs.s3a.S3AInputStream.lambda$lazySeek$1(S3AInputStream.java:435)
        at org.apache.hadoop.fs.s3a.Invoker.lambda$maybeRetry$3(Invoker.java:284)
        at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
        at org.apache.hadoop.fs.s3a.Invoker.lambda$maybeRetry$5(Invoker.java:408)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
        at org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:404)
        at org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:282)
        at org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:326)
        at org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:427)
        at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:545)
        at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
        at java.base/java.io.DataInputStream.read(DataInputStream.java:100)
        at java.base/java.util.Properties$LineReader.readLine(Properties.java:502)
        at java.base/java.util.Properties.load0(Properties.java:418)
        at java.base/java.util.Properties.load(Properties.java:407)
        at org.apache.hudi.common.table.HoodieTableConfig.fetchConfigs(HoodieTableConfig.java:352)
        at org.apache.hudi.common.table.HoodieTableConfig.<init>(HoodieTableConfig.java:278)
        at org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:141)
        at org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:692)
        at org.apache.hudi.common.table.HoodieTableMetaClient.access$000(HoodieTableMetaClient.java:85)
        at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:774)
        at org.apache.xtable.hudi.HudiConversionSourceProvider.getConversionSourceInstance(HudiConversionSourceProvider.java:42)
        at org.apache.xtable.hudi.HudiConversionSourceProvider.getConversionSourceInstance(HudiConversionSourceProvider.java:31)
        at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:92)
        at org.apache.xtable.utilities.RunSync.main(RunSync.java:169)
~/environment/xtable-poc $ 
@buddhayan buddhayan closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant