Skip to content

JNI Hadoop file wrappers truncate paths containing literal # #2257

@zhtttylz

Description

@zhtttylz

Describe the bug
Auron's JNI Hadoop file wrappers may truncate paths containing a literal #.
This happens because JniBridge rebuilds paths via new Path(new URI(path)), where Java URI treats everything after # as a fragment, so Hadoop receives a truncated path.

For example, this intended path:

hdfs://mycluster/auron-it-hdfs-rbf-repro/raw#mini.txt

is opened as:

/auron-it-hdfs-rbf-repro/raw

To Reproduce

from pyspark.sql import SparkSession


spark = SparkSession.builder.appName("auron-hdfs-hash-repro").getOrCreate()
jvm = spark._jvm
conf = spark._jsc.hadoopConfiguration()

path = "hdfs://mycluster/auron-it-hdfs-rbf-repro/raw#mini.txt"
hadoop_path = jvm.org.apache.hadoop.fs.Path(path)
fs = hadoop_path.getFileSystem(conf)

print("path=", path)

if fs.exists(hadoop_path):
   fs.delete(hadoop_path, False)

out = fs.create(hadoop_path, True)
out.write(bytearray(b"data"))
out.close()
print("plain_hadoop_create_ok")

plain_in = fs.open(hadoop_path)
plain_in.close()
print("plain_hadoop_open_ok")

jvm.org.apache.auron.jni.JniBridge.openFileAsDataInputWrapper(fs, path).close()
print("jni_open_ok")

spark.stop()

The relevant output is:

Image

Expected behavior
Auron should preserve Hadoop Path(String) semantics when opening or creating files through JNI Hadoop wrappers.
JniBridge.openFileAsDataInputWrapper and JniBridge.createFileAsDataOutputWrapper should open the exact path string passed by the caller, including literal # characters in the Hadoop path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions