Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dfs.encrypt.data.transfer not performing SSL handsake #20

Open
douggie opened this issue Jun 13, 2019 · 0 comments
Open

dfs.encrypt.data.transfer not performing SSL handsake #20

douggie opened this issue Jun 13, 2019 · 0 comments

Comments

@douggie
Copy link

douggie commented Jun 13, 2019

Hi,

When running a secure (kerberos) HDFS with SSL encryption, handshaking with the namenode is performed correctly and commands that only interact with the namenode i.e. hdfs dfs -ls work perfectly.

However when any communication directed to the datanodes to read/write blocks, it seems the HDFSLib3 libaray is no longer performing handshaking and therefore is not able to access the blocks on the data nodes.

The other thing we noticed was that the client hdfs-site.xml and cluster have a replication factor of 2 however when putting files the HDFSLib3 defaults the replication factor of 3 although it is reading the value of 2 from the hdfs-site.xml, and we have to explicitly set the replication factor 2. Could this be related to hdfs-site.xml not being correctly read and hence the dfs.encrypt.data.transfer is not picked up when reading/writing files?

Reading and writing files using the hdfs dfs command line works without issue.

Does HDFSLib3 support dfs.encrypt.data.transfer?
Is this a duplicate of dask/hdfs3#146? The issue with using pyarrow is that the JNI wrapper leaks significant memory when running a large number of parallel jobs, where as in our testing libhdfs3 does not hence we are keen to try and use it.

What could cause HDFSLib3 to ignore the values in core-site.xml and hdfs-site.xml when reading/writing files?
Code sample


hdfs = HDFileSystem(host=host, port=port, user=user,
                     ticket_cache=ticket_cache)


# this works
print(hdfs.ls('/'))

# read/write files fails to add/read blocks due to encryption error    
test_file = '/tmp/testfile.csv'

# hdfs.open with default replication=0 makes libhdfs3
# trys to find 3 replication nodes instead of 2 (the value in dfs.replication)
with hdfs.open(test_file, mode='wb', replication=2) as f:
    f.write(b'Name, Amount\nAlice, 10, John, 20') # this fails to write blocks

Log Excerpt

INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to read expected encryption handshake from client at /xxx.xxx.xxx.xxx:47984. Perhaps the client is running an older version of Hadoop which does not support encryption

hdfs-site.xml settings

    <name>dfs.replication</name>
    <value>2</value>
  </property>
<property>
   <name>hadoop.security.authentication</name>
   <value>kerberos</value>
 </property>
 <property>
   <name>hadoop.security.authorization</name>
   <value>true</value>
 </property>
 <property>
   <name>hadoop.rpc.protection</name>
   <value>privacy</value>
 </property>
  <property>
    <name>dfs.encrypt.data.transfer</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.block.access.token.enable</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.encrypt.data.transfer.algorithm</name>
    <value></value> <!-- leave empty for AES -->
  </property>

  <property>
    <name>dfs.encrypt.data.transfer.cipher.suites</name>
    <value>AES/CTR/NoPadding</value>
  </property>

  <property>
    <name> dfs.encrypt.data.transfer.cipher.key.bitlength</name>
    <value>256</value> <!-- can also be set to 128 or 192 -->
  </property>
  <property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.https.enable</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.http.policy</name>
    <value>HTTPS_ONLY</value>
  </property>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant