Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notice / Common Errors #118

Closed
kecheung opened this issue Dec 6, 2022 · 0 comments
Closed

Notice / Common Errors #118

kecheung opened this issue Dec 6, 2022 · 0 comments

Comments

@kecheung
Copy link
Collaborator

kecheung commented Dec 6, 2022

Third Party Platforms

The jar provided in releases is originally built and designed for Azure Synapse. The connector is free to use and usage with third party applications will be provided "as-is" with no guarantee of support or it working with your platform. However, the current code is open sourced for contributions should you feel there are improvements that can be made.

All releases will be made here https://github.com/Azure/spark-cdm-connector/releases and not on Maven.

Example

If you want to use Databricks then, you will have to build the jar or use the jars we built in the releases. Credential passthrough will not work. As mentioned in #108, "I have received a reply from a Databricks team, they have informed that a credentials pass-through cannot be used in a scheduled tasks, so the problem is in Databricks, not library."

Credential passthrough

As referenced in #134, credential passthrough is a Synapse specific feature. Use app registration or SAS token auth if you are not using Synapse.

CDM Connector save change

If you upgrade from Spark 2 to Spark 3, the CDM connector save behaves differently. If the entity or the manifest does not exist, when doing a dataframe write operation with SaveMode.Append or SaveMode.Overwrite, it will throw something like NoSuchTableException: Manifest doesn't exist. root.manifest.cdm.json.

Solution is to remove the .mode(SaveMode.Append/Overwrite) option.

Jar "doesn't work" / java.lang.NoClassDefFoundError

See the first point. If you have a Spark cluster and then got similar errors, then you probably used the wrong verson. Some of the said classes exclusively exist within Spark 2 or 3, therefore you get an error. Below table shows the mapping of jar version to Spark version. Run this Scala code com.microsoft.cdm.BuildInfo.version if you don't know what version of the CDM connector you have.

CDM Version Spark Version
0.x 2.4
spark3.1-1.x 3.1.x
spark3.2-1.x 3.2.x

Example scenarios:

  • Spark 3 jar with Spark 2 cluster
    java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
    
  • Spark 2 jar with Spark 3 cluster
    java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/catalog/SupportsCatalogOptions
    

Reading a table gives: java.util.NoSuchElementException

See: Spark 3.3: Reading a table gives: java.util.NoSuchElementException: None.get #138.

@kecheung kecheung closed this as completed Dec 6, 2022
@Azure Azure locked and limited conversation to collaborators Dec 6, 2022
@kecheung kecheung pinned this issue Dec 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant