Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fieldNames - AttributeError: Not available before 1.0.0 sedona version #1247

Closed
AlexTelloG opened this issue Feb 21, 2024 · 4 comments · Fixed by #1343
Closed

fieldNames - AttributeError: Not available before 1.0.0 sedona version #1247

AlexTelloG opened this issue Feb 21, 2024 · 4 comments · Fixed by #1343

Comments

@AlexTelloG
Copy link

Expected behavior

When creating a spatial_rdd from a pyspark dataframe I should be able to access the fieldNames attribute by the following command:

spatial_rdd.fieldNames

Which should give me the additional names included in the spark data frame.

Actual behavior

I cannot longer access the fieldNames attribute of the resulting RDD. The following error appears:

´´´
AttributeError: Not available before 1.0.0 sedona version
´´´

This is interesting because the sedona version being used is not 1.0.0 but 1.4.1 or higher. Also this used to work without problem for previous versions of sedona.

I ran into this issue today migrating from sedona 1.4 to the latest one which deprecated the use of the SedonaRegistrator.

Steps to reproduce the problem

Create a spatial_rdd and attempt to get the fieldNames attribute:

spatial_rdd = Adapter.toSpatialRdd(spatial_rdd.select('local_id','location'), 'location')
self.search_rdd.analyze()

print(f'showing spatial_rdd.fieldNames: {spatial_rdd.fieldNames}')

Settings

Sedona version = 1.4.1 or higher

Apache Spark version = 3.4.1 or higher

Apache Flink version = ?

API type = Python

Scala version =

JRE version = 1.8, 1.11?

Python version = 3.9

Environment = Standalone

This is part of the spark config for spark-submit:

--packages org.apache.sedona:sedona-spark-3.4_2.12:1.5.1,
org.datasyslab:geotools-wrapper:1.4.0-28.2,
uk.co.gresearch.spark:spark-extension_2.13:2.11.0-3.4

@Kontinuation
Copy link
Member

You can use the spark-shaded dependency org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1 instead of org.apache.sedona:sedona-spark-3.4_2.12:1.5.1.

The sedona python binding cannot figure out the version number when the shaded jar is not being used (https://github.com/apache/sedona/blob/sedona-1.4.1/python/sedona/core/jvm/config.py#L207), maybe it is a problem and we should fix it.

@AlexTelloG
Copy link
Author

I have run into two bugs today and I document them here for your consideration.

  1. With the shaded dependency, specifically org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1 I run into another issue. -> edu.ucar#cdm-core;5.4.2: not found unresolved dependency. This appears to be a known and reported bug. Solution for now seems to downgrade to 1.4.1.

  2. The same seems to happen when reproducing the spark setup from your notebooks, for example located here.

@jiayuasu
Copy link
Member

@AlexTelloG Since you are using standalone spark with --packages option, you can append one more option --repositories https://artifacts.unidata.ucar.edu/repository/unidata-all. This will solve the edu.ucar#cdm-core;5.4.2: not found issue

@AlexTelloG
Copy link
Author

Thank you so much for the quick replies and help, really appreciate it. Also for the amazing work in this library!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants