Skip to content

Docs: Changing java api quickstart to use recommended no-arg constructor#3253

Merged
rdblue merged 2 commits intoapache:masterfrom
samredai:javaquickstart
Oct 11, 2021
Merged

Docs: Changing java api quickstart to use recommended no-arg constructor#3253
rdblue merged 2 commits intoapache:masterfrom
samredai:javaquickstart

Conversation

@samredai
Copy link
Contributor

@samredai samredai commented Oct 8, 2021

This is tied to #3235 and updates the Java Quickstart to use the recommended constructor (instead of the deprecated one). The logic in the quickstart is also dependent on a NPE fix recently merged in.

@github-actions github-actions bot added the docs label Oct 8, 2021
properties.put("warehouse", "...");
properties.put("uri", "...");

catalog.initialize("hive", properties);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that this currently fails in 0.12.0 because it doesn't pass Configuration right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it currently fails but I think it's isolated to just if toString() is called which is hard to get around when using a REPL. The failure actually happens earlier than initialize() and seems to happen right at construction. (I'm going to try and confirm that last part)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that initialize is guaranteed to fail without a conf, updating this soon...

catalog.initialize("hive", properties);
```

Alternatively, you can configure the Hive Catalog using Spark's Hadoop configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say that adding the conf is an alternative because that implies that you don't need to pass the catalog properties. Catalog properties are separate config so you should always pass them to configure the catalog. The Hive connection URI and warehouse location are defaulted for Hive, but that's not a normal thing. Other catalogs pretty much ignore the Configuration except to load a Hadoop FileSystem internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see! I missed that part but it makes total sense, I'll update this.

The Hive catalog connects to a Hive MetaStore to keep track of Iceberg tables. This example uses Spark's Hadoop configuration to get a Hive catalog:
The Hive catalog connects to a Hive MetaStore to keep track of Iceberg tables.
You can initialize a Hive Catalog with a name and some properties.
(see: [Catalog properties](https://iceberg.apache.org/configuration/#catalog-properties))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call out that initialize will currently fail unless setConf is called first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Added a note right before the code snippet:

**note:** Currently, `setConf` is always required for hive catalogs, but this will change in the future.

@rdblue rdblue merged commit 8b84f66 into apache:master Oct 11, 2021
@rdblue
Copy link
Contributor

rdblue commented Oct 11, 2021

Thanks, @samredai!

RussellSpitzer pushed a commit to RussellSpitzer/iceberg that referenced this pull request Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants