Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crozzy auto config #41

Merged
merged 7 commits into from
Apr 12, 2016
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions client.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,14 @@ func New(address string) (*Client, error) {
return nil, err
}

if address == "" {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good but we should update the documentation for the method.

var nnErr error
address, nnErr = GetNamenodeFromConfig()

if nnErr != nil {
return nil, nnErr
}
}
return NewForUser(address, username)
}

Expand Down
95 changes: 95 additions & 0 deletions client_auto.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
package hdfs
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably just be conf.go.


import (
"encoding/xml"
"errors"
"io/ioutil"
"net/url"
"os"
"path"
"strings"
)

type Property struct {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and Result don't need to be capitalized, since they aren't exported. Also I would name Result something like propertyList.

Name string `xml:"name"`
Value string `xml:"value"`
}

type Result struct {
Property []Property `xml:"property"`
}

type HadoopConf map[string]string
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment, since it's exported.


var ErrUnresolvedNamenode = errors.New("Couldn't find a namenode address in any config.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go errors should be shorthand, uncapitalized and unpunctuated, like "no namenode address in config".


// Get Hadoop Properties - try to open a conf file, marshal the results
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for methods should be in the standard go form "LoadHadoopConfig loads the hadoop config from the given path..." etc.

// into a Result object and return the Properties of that object.
func LoadHadoopConfig(path string) (HadoopConf, error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with conf instead of config in the name, for consistency.

The HADOOP_HOME/HADOOP_CONF_DIR stuff below is potentially useful for other properties, too. How about we have that be the default behavior for this method, if path is an empty string? Then we don't need GetNamenodeFromConfig at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it'd be good to put the behavior in LoadHadoopConfig although I think GetNamenodeFromConfig is useful for keeping the logic that checks HADOOP_NAMENODE if no namenode is found in the hadoop config.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm is HADOOP_NAMENODE a real hadoop property? If not, we can just skip that.

result := Result{}
f, err := ioutil.ReadFile(path)
if err != nil {
return nil, err
}

xmlErr := xml.Unmarshal(f, &result)
if xmlErr != nil {
return nil, xmlErr
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should wrap this error so that the caller gets more information. something like fmt.Errorf("reading %s: %s", path, xmlErr) would do it.

}

hadoopConf := make(HadoopConf)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super minor, but you can save some allocations if you pass the size argument in here, since we know it in advance: make(HadoopConf, len(result))

for _, prop := range result.Property {
hadoopConf[prop.Name] = prop.Value
}
return hadoopConf, nil
}

func (conf HadoopConf) Namenodes() []string {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment, since it's exported.

var nns []string
for key, value := range conf {
if strings.Contains(key, "fs.defaultFS") {
nnUrl, _ := url.Parse(value)
nns = append(nns, nnUrl.Host)
}
if strings.HasPrefix(key, "dfs.namenode.rpc-address") {
nns = append(nns, value)
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should dedupe, I think. You can do that by writing to a map[string]bool first, and then pulling out the keys into a slice to return.

}
return nns
}

// Return first namenode address we find in the hadoop config files
// else we try and return HADOOP_NAMENODE env var value else err
func GetNamenodeFromConfig() (string, error) {
hadoopHome := os.Getenv("HADOOP_HOME")
hadoopConfDir := os.Getenv("HADOOP_CONF_DIR")
var tryPaths []string
if hadoopHome != "" {
hdfsPath := path.Join(hadoopHome, "conf", "hdfs-site.xml")
corePath := path.Join(hadoopHome, "conf", "core-site.xml")
tryPaths = append(tryPaths, []string{hdfsPath, corePath}...)
}
if hadoopConfDir != "" {
confHdfsPath := path.Join(hadoopConfDir, "hdfs-site.xml")
confCorePath := path.Join(hadoopConfDir, "core-site.xml")
tryPaths = append(tryPaths, []string{confHdfsPath, confCorePath}...)
}
var nameNodes []string
for _, tryPath := range tryPaths {
hadoopConf, err := LoadHadoopConfig(tryPath)
if err == nil {
nameNodes = append(nameNodes, hadoopConf.Namenodes()...)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just return once we find the config, rather than concatenating the results from all the config files. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same, but figured all the nodes might be useful for adding some kind of namenode-failover support. I'm happy with either, we can be pretty sure we will loop a max of 4 times so performance increases will be small. Up to you

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not worried about the perf, just the semantics. I think we should keep this behavior (load everything) but put it in the method above, like I was saying. That way you end up with one big HadoopConf that has all the properties from all the files. We just need to pick a resolution order (HADOOP_CONF_DIR before HADOOP_HOME, I guess?) and stick with it.

}
}

var address string
if len(nameNodes) > 0 {
address = nameNodes[0]
} else if os.Getenv("HADOOP_NAMENODE") != "" {
address = os.Getenv("HADOOP_NAMENODE")
} else {
return "", ErrUnresolvedNamenode
}

return address, nil
}
31 changes: 31 additions & 0 deletions client_auto_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
package hdfs

import (
"os"
"strings"
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

func TestAutoClientEnvVar(t *testing.T) {
_, err := GetNamenodeFromConfig()
assert.Nil(t, err)
}

func TestAutoClientHadoopHome(t *testing.T) {
pwd, _ := os.Getwd()
os.Setenv("HADOOP_HOME", strings.Join([]string{pwd, "test"}, "/"))
nn, _ := GetNamenodeFromConfig()
require.Contains(t, []string{"hadoop-namenode-01:8020", "hadoop-namenode-02:8020"}, nn)
os.Setenv("HADOOP_HOME", "")
}

func TestAutoClientHadoopConfDir(t *testing.T) {
pwd, _ := os.Getwd()
os.Setenv("HADOOP_CONF_DIR", strings.Join([]string{pwd, "test"}, "/"))
nn, _ := GetNamenodeFromConfig()
assert.EqualValues(t, "testnode:9000", nn)
os.Setenv("HADOOP_CONF_DIR", "")
}
50 changes: 50 additions & 0 deletions test/conf/hdfs-site.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>tests</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.ha.namenodes.tests</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.ha.namenodes.tests</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.tests.nn1</name>
<value>hadoop-namenode-01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.tests.nn2</name>
<value>hadoop-namenode-02:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.tests.nn1</name>
<value>hadoop-namenode-01:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.tests.nn2</name>
<value>hadoop-namenode-02:50070</value>
</property>
</configuration>
24 changes: 24 additions & 0 deletions test/core-site.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://testnode:9000</value>
</property>
</configuration>