Skip to content
A native go client for HDFS
Branch: master
Clone or download
twz123 and colinmarc Make Client.Remove non-recursive; add RemoveAll
Remove is currently recursive, but this doesn't match the
documentation or the stdlib os implementation. This adds RemoveAll
and adjusts the implementation of Remove to match the stdlib exactly.

This is functionally a (dangerous) breaking change, but won't result
in compilation errors.
Latest commit 6f7e441 Nov 19, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cmd/hdfs Make Client.Remove non-recursive; add RemoveAll Nov 19, 2018
hadoopconf Move conf.go into its own package, hadoopconf Aug 5, 2018
internal Actually bump things to v2 Aug 5, 2018
testdata Move conf.go into its own package, hadoopconf Aug 5, 2018
.gitignore divert minicluster output to a file Feb 25, 2015
.travis.yml
CODE_OF_CONDUCT.md
LICENSE.txt add license (fixes #4) Nov 24, 2014
Makefile
README.md No one knows how to install go packages now, least of all me Aug 5, 2018
client.go
client_test.go Make Client.Remove non-recursive; add RemoveAll Nov 19, 2018
content_summary.go
content_summary_test.go
error.go
file_reader.go Actually bump things to v2 Aug 5, 2018
file_reader_test.go Introduce deadlines for streaming reads and writes Aug 1, 2018
file_writer.go Actually bump things to v2 Aug 5, 2018
file_writer_test.go
fixtures.sh Rely on ambient Hadoop conf, rather than HADOOP_NAMENODE, in tests Jul 20, 2018
go.mod Actually bump things to v2 Aug 5, 2018
go.sum Convert the library to a go module Aug 5, 2018
hdfs.go log doesn't make sense for examples Oct 14, 2014
minicluster.sh
mkdir.go Actually bump things to v2 Aug 5, 2018
mkdir_test.go Use gohdfs1 and gohdfs2 for tests, rather than the superuser Jul 20, 2018
perms.go
perms_test.go
readdir.go
readdir_test.go
remove.go
remove_test.go
rename.go
rename_test.go Use gohdfs1 and gohdfs2 for tests, rather than the superuser Jul 20, 2018
stat.go
stat_fs.go Actually bump things to v2 Aug 5, 2018
stat_fs_test.go Add StatFs and the df [-h] command Mar 13, 2017
stat_test.go Use gohdfs1 and gohdfs2 for tests, rather than the superuser Jul 20, 2018
travis-setup-cdh5.sh
travis-setup-hdp2.sh Run a 'proper' hadoop cluster in travis, rather than a minicluster Jul 18, 2018
travis-setup.sh Run a 'proper' hadoop cluster in travis, rather than a minicluster Jul 18, 2018
walk.go info may be nil Jul 13, 2018
walk_test.go

README.md

HDFS for Go

GoDoc build

This is a native golang client for hdfs. It connects directly to the namenode using the protocol buffers API.

It tries to be idiomatic by aping the stdlib os package, where possible, and implements the interfaces from it, including os.FileInfo and os.PathError.

Here's what it looks like in action:

client, _ := hdfs.New("namenode:8020")

file, _ := client.Open("/mobydick.txt")

buf := make([]byte, 59)
file.ReadAt(buf, 48847)

fmt.Println(string(buf))
// => Abominable are the tumblers into which he pours his poison.

For complete documentation, check out the Godoc.

The hdfs Binary

Along with the library, this repo contains a commandline client for HDFS. Like the library, its primary aim is to be idiomatic, by enabling your favorite unix verbs:

$ hdfs --help
Usage: hdfs COMMAND
The flags available are a subset of the POSIX ones, but should behave similarly.

Valid commands:
  ls [-lah] [FILE]...
  rm [-rf] FILE...
  mv [-fT] SOURCE... DEST
  mkdir [-p] FILE...
  touch [-amc] FILE...
  chmod [-R] OCTAL-MODE FILE...
  chown [-R] OWNER[:GROUP] FILE...
  cat SOURCE...
  head [-n LINES | -c BYTES] SOURCE...
  tail [-n LINES | -c BYTES] SOURCE...
  du [-sh] FILE...
  checksum FILE...
  get SOURCE [DEST]
  getmerge SOURCE DEST
  put SOURCE DEST

Since it doesn't have to wait for the JVM to start up, it's also a lot faster hadoop -fs:

$ time hadoop fs -ls / > /dev/null

real  0m2.218s
user  0m2.500s
sys 0m0.376s

$ time hdfs ls / > /dev/null

real  0m0.015s
user  0m0.004s
sys 0m0.004s

Best of all, it comes with bash tab completion for paths!

Installing the commandline client

Grab a tarball from the releases page and unzip it wherever you like.

To configure the client, make sure one or both of these environment variables point to your Hadoop configuration (core-site.xml and hdfs-site.xml). On systems with Hadoop installed, they should already be set.

$ export HADOOP_HOME="/etc/hadoop"
$ export HADOOP_CONF_DIR="/etc/hadoop/conf"

To install tab completion globally on linux, copy or link the bash_completion file which comes with the tarball into the right place:

$ ln -sT bash_completion /etc/bash_completion.d/gohdfs

By default on non-kerberized clusters, the HDFS user is set to the currently-logged-in user. You can override this with another environment variable:

$ export HADOOP_USER_NAME=username

Using the commandline client with Kerberos authentication

Like hadoop fs, the commandline client expects a ccache file in the default location: /tmp/krb5cc_<uid>. That means it should 'just work' to use kinit:

$ kinit bob@EXAMPLE.com
$ hdfs ls /

If that doesn't work, try setting the KRB5CCNAME environment variable to wherever you have the ccache saved.

Compatibility

This library uses "Version 9" of the HDFS protocol, which means it should work with hadoop distributions based on 2.2.x and above. The tests run against CDH 5.x and HDP 2.x.

Acknowledgements

This library is heavily indebted to snakebite.

You can’t perform that action at this time.