/
ServerSetup.md.vm
199 lines (150 loc) · 7.26 KB
/
ServerSetup.md.vm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
Hadoop HDFS over HTTP - Server Setup
====================================
This page explains how to quickly setup HttpFS with Pseudo authentication against a Hadoop cluster with Pseudo authentication.
Install HttpFS
--------------
~ $ tar xzf httpfs-${project.version}.tar.gz
Configure HttpFS
----------------
By default, HttpFS assumes that Hadoop configuration files (`core-site.xml & hdfs-site.xml`) are in the HttpFS configuration directory.
If this is not the case, add to the `httpfs-site.xml` file the `httpfs.hadoop.config.dir` property set to the location of the Hadoop configuration directory.
Configure Hadoop
----------------
Edit Hadoop `core-site.xml` and defined the Unix user that will run the HttpFS server as a proxyuser. For example:
```xml
<property>
<name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name>
<value>httpfs-host.foo.com</value>
</property>
<property>
<name>hadoop.proxyuser.#HTTPFSUSER#.groups</name>
<value>*</value>
</property>
```
IMPORTANT: Replace `#HTTPFSUSER#` with the Unix user that will start the HttpFS server.
Restart Hadoop
--------------
You need to restart Hadoop for the proxyuser configuration to become active.
Start/Stop HttpFS
-----------------
To start/stop HttpFS, use `hdfs --daemon start|stop httpfs`. For example:
hadoop-${project.version} $ hdfs --daemon start httpfs
NOTE: The script `httpfs.sh` is deprecated. It is now just a wrapper of
`hdfs httpfs`.
Test HttpFS is working
----------------------
$ curl -sS 'http://<HTTPFSHOSTNAME>:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs'
{"Path":"\/user\/hdfs"}
HttpFS Configuration
--------------------
HttpFS preconfigures the HTTP port to 14000.
HttpFS supports the following [configuration properties](./httpfs-default.html) in the HttpFS's `etc/hadoop/httpfs-site.xml` configuration file.
HttpFS over HTTPS (SSL)
-----------------------
Enable SSL in `etc/hadoop/httpfs-site.xml`:
```xml
<property>
<name>httpfs.ssl.enabled</name>
<value>true</value>
<description>
Whether SSL is enabled. Default is false, i.e. disabled.
</description>
</property>
```
Configure `etc/hadoop/ssl-server.xml` with proper values, for example:
```xml
<property>
<name>ssl.server.keystore.location</name>
<value>${user.home}/.keystore</value>
<description>Keystore to be used. Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.password</name>
<value></value>
<description>Must be specified.</description>
</property>
<property>
<name>ssl.server.keystore.keypassword</name>
<value></value>
<description>Must be specified.</description>
</property>
```
The SSL passwords can be secured by a credential provider. See
[Credential Provider API](../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
You need to create an SSL certificate for the HttpFS server. As the `httpfs` Unix user, using the Java `keytool` command to create the SSL certificate:
$ keytool -genkey -alias jetty -keyalg RSA
You will be asked a series of questions in an interactive prompt. It will create the keystore file, which will be named **.keystore** and located in the `httpfs` user home directory.
The password you enter for "keystore password" must match the value of the
property `ssl.server.keystore.password` set in the `ssl-server.xml` in the
configuration directory.
The answer to "What is your first and last name?" (i.e. "CN") must be the hostname of the machine where the HttpFS Server will be running.
Start HttpFS. It should work over HTTPS.
Using the Hadoop `FileSystem` API or the Hadoop FS shell, use the `swebhdfs://` scheme. Make sure the JVM is picking up the truststore containing the public key of the SSL certificate if using a self-signed certificate.
For more information about the client side settings, see [SSL Configurations for SWebHDFS](../hadoop-project-dist/hadoop-hdfs/WebHDFS.html#SSL_Configurations_for_SWebHDFS).
NOTE: Some old SSL clients may use weak ciphers that are not supported by the HttpFS server. It is recommended to upgrade the SSL client.
Deprecated Environment Variables
--------------------------------
The following environment variables are deprecated. Set the corresponding
configuration properties instead.
Environment Variable | Configuration Property | Configuration File
----------------------------|------------------------------|--------------------
HTTPFS_HTTP_HOSTNAME | httpfs.http.hostname | httpfs-site.xml
HTTPFS_HTTP_PORT | httpfs.http.port | httpfs-site.xml
HTTPFS_MAX_HTTP_HEADER_SIZE | hadoop.http.max.request.header.size and hadoop.http.max.response.header.size | httpfs-site.xml
HTTPFS_MAX_THREADS | hadoop.http.max.threads | httpfs-site.xml
HTTPFS_SSL_ENABLED | httpfs.ssl.enabled | httpfs-site.xml
HTTPFS_SSL_KEYSTORE_FILE | ssl.server.keystore.location | ssl-server.xml
HTTPFS_SSL_KEYSTORE_PASS | ssl.server.keystore.password | ssl-server.xml
HTTP Default Services
---------------------
Name | Description
-------------------|------------------------------------
/conf | Display configuration properties
/jmx | Java JMX management interface
/logLevel | Get or set log level per class
/logs | Display log files
/stacks | Display JVM stacks
/static/index.html | The static home page
/prof | Async Profiler endpoint
To control the access to servlet `/conf`, `/jmx`, `/logLevel`, `/logs`,
`/stacks` and `/prof`, configure the following properties in `httpfs-site.xml`:
```xml
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
<description>Is service-level authorization enabled?</description>
</property>
<property>
<name>hadoop.security.instrumentation.requires.admin</name>
<value>true</value>
<description>
Indicates if administrator ACLs are required to access
instrumentation servlets (JMX, METRICS, CONF, STACKS, PROF).
</description>
</property>
<property>
<name>httpfs.http.administrators</name>
<value></value>
<description>ACL for the admins, this configuration is used to control
who can access the default servlets for HttpFS server. The value
should be a comma separated list of users and groups. The user list
comes first and is separated by a space followed by the group list,
e.g. "user1,user2 group1,group2". Both users and groups are optional,
so "user1", " group1", "", "user1 group1", "user1,user2 group1,group2"
are all valid (note the leading space in " group1"). '*' grants access
to all users and groups, e.g. '*', '* ' and ' *' are all valid.
</description>
</property>
```