-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use sedona.global.charset in ShapefileReader #1345
Comments
@adamaps If you are running Sedona in a cluster mode, this needs to be set via In your case, you might want to try this:
spark.executorEnv is a runtime config that can be set after your SparkSession or SedonaContext has been initiated.
|
Thank you for the quick response, @jiayuasu ! I tested the following in conf = SparkConf()
conf.set("sedona.global.charset", "utf8") # I have other conf settings not shown here
spark = SparkSession.builder.config(conf=conf).getOrCreate()
sedona = SedonaContext.create(spark) And I tested both of the following in spark.conf.set("spark.executorEnv.sedona.global.charset","utf8")
sedona.conf.set("spark.executorEnv.sedona.global.charset","utf8") Unfortunately I still see the same issue in both cases. |
The dataframe loaded from the sample shapefile:
|
Thank you, @Kontinuation! 🎉 I can confirm that setting the following configuration parameter in PySpark worked for my local setup. And thanks @jiayuasu for updating the docs. conf.set("spark.driver.extraJavaOptions", "-Dsedona.global.charset=utf8") Running on AWS/Glue is still causing issues, but this seems specific to our setup. |
Expected behavior
ShapefileReader.readToGeometryRDD(sedona_context, shp_file)
should use thesedona.global.charset
configuration property set in the spark session when reading shapefiles containing non-ASCII characters.E.g. A shapefile containing an attribute value
"Ariñiz/Aríñez"
should appear in a dataframe as"Ariñiz/Aríñez"
.Actual behavior
ShapefileReader.readToGeometryRDD(sedona_context, shp_file)
is not using the charset configuration property set in the spark context.E.g. A shapefile containing an attribute value
"Ariñiz/Aríñez"
appears in a dataframe as"Ariñiz/ArÃñez"
instead.Steps to reproduce the problem
I can confirm that
("sedona.global.charset", "utf8")
appears in the configuration settings by using:I also tried setting the charset property after creating the sedona context as follows (although this appears to be an older solution):
Please confirm how to set this configuration property correctly.
Settings
Sedona version = 1.5.1
Apache Spark version = 3.3.0
API type = Python
Python version = 3.10
Environment = AWS Glue 4.0 using
sedona-spark-shaded-3.0_2.12-1.5.1.jar
andgeotools-wrapper-1.5.1-28.2.jar
The text was updated successfully, but these errors were encountered: