Permalink
Browse files

NUTCH-559 - NTLM, Basic and Digest Authentication schemes for web/pro…

…xy. Contributed by Susam Pal.

git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/trunk@608972 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information...
1 parent 7f29db1 commit f10381127eff805c83d5d7cdd1073bb3994730a1 Tacettin Guney committed Jan 4, 2008
View
@@ -179,6 +179,9 @@ Unreleased changes (1.0-dev)
61. NUTCH-586 - Add option to run compiled classes without job file
(enis via ab)
+62. NUTCH-559 - NTLM, Basic and Digest Authentication schemes for web/proxy
+ server. (Susam Pal via dogacan)
+
Release 0.9 - 2007-04-02
1. Changed log4j confiquration to log to stdout on commandline
@@ -0,0 +1,61 @@
+<?xml version="1.0"?>
+<!--
+ This is the authentication configuration file for protocol-httpclient.
+ Different credentials for different authentication scopes can be
+ configured in this file. If a set of credentials is configured for a
+ particular authentication scope (i.e. particular host, port number,
+ scheme and realm), then that set of credentials would be sent only to
+ servers falling under the specified authentication scope. Apart from
+ this at most one set of credentials can be configured as 'default'.
+
+ When authentication is required to fetch a resource from a web-server,
+ the authentication-scope is determined from the host, port, scheme and
+ realm (if present) obtained from the URL of the page and the
+ authentication headers in the HTTP response. If it matches any
+ 'authscope' in this configuration file, then the 'credentials' for
+ that 'authscope' is used for authentication. Otherwise, it would use
+ the 'default' set of credentials (with an exception which is described
+ in the next paragraph), if present. If any attribute is missing, it
+ would match all values for that attribute.
+
+ If there are several pages having different authentication realms and
+ schemes on the same web-server (same host and port, but different
+ realms and schemes), and credentials for one or more of the realms and
+ schemes for that web-server is specified, then the 'default'
+ credentials would be ignored completely for that web-server (for that
+ host and port). So, credentials to handle all realms and schemes for
+ that server may be specified explicitly by adding an extra 'authscope'
+ tag with the 'realm' and 'scheme' attributes missing for that server.
+ This is demonstrated by the last 'authscope' tag for 'example:8080' in
+ the following example.
+
+ Example:-
+ <credentials username="susam" password="masus">
+ <default realm="sso"/>
+ <authscope host="192.168.101.33" port="80" realm="login"/>
+ <authscope host="example" port="8080" realm="blogs"/>
+ <authscope host="example" port="8080" realm="wiki"/>
+ <authscope host="example" port="80" realm="quiz" scheme="NTLM"/>
+ </credentials>
+ <credentials username="admin" password="nimda">
+ <authscope host="example" port="8080"/>
+ </credentials>
+
+ In the above example, 'example:8080' server has pages with multiple
+ authentication realms. The first set of credentials would be used for
+ 'blogs' and 'wiki' authentication realms. The second set of
+ credentials would be used for all other realms. For 'login' realm of
+ '192.168.101.33', the first set of credentials would be used. For any
+ other realm of '192.168.101.33' authentication would not be done. For
+ the NTLM authentication required by 'example:80', the first set of
+ credentials would be used. For 'sso' realms of all other servers, the
+ first set of credentials would be used, since it is configured as
+ 'default'.
+
+ NTLM does not use the notion of realms. The domain name may be
+ specified as the value for 'realm' attribute in case of NTLM.
+-->
+
+<auth-configuration>
+
+</auth-configuration>
@@ -119,6 +119,15 @@
</property>
<property>
+ <name>http.agent.host</name>
+ <value></value>
+ <description>Name or IP address of the host on which the Nutch crawler
+ would be running. Currently this is used by 'protocol-httpclient'
+ plugin.
+ </description>
+</property>
+
+<property>
<name>http.timeout</name>
<value>10000</value>
<description>The default network timeout, in milliseconds.</description>
@@ -155,6 +164,48 @@
</property>
<property>
+ <name>http.proxy.username</name>
+ <value></value>
+ <description>Username for proxy. This will be used by
+ 'protocol-httpclient', if the proxy server requests basic, digest
+ and/or NTLM authentication. To use this, 'protocol-httpclient' must
+ be present in the value of 'plugin.includes' property.
+ NOTE: For NTLM authentication, do not prefix the username with the
+ domain, i.e. 'susam' is correct whereas 'DOMAIN\susam' is incorrect.
+ </description>
+</property>
+
+<property>
+ <name>http.proxy.password</name>
+ <value></value>
+ <description>Password for proxy. This will be used by
+ 'protocol-httpclient', if the proxy server requests basic, digest
+ and/or NTLM authentication. To use this, 'protocol-httpclient' must
+ be present in the value of 'plugin.includes' property.
+ </description>
+</property>
+
+<property>
+ <name>http.proxy.realm</name>
+ <value></value>
+ <description>Authentication realm for proxy. Do not define a value
+ if realm is not required or authentication should take place for any
+ realm. NTLM does not use the notion of realms. Specify the domain name
+ of NTLM authentication as the value for this property. To use this,
+ 'protocol-httpclient' must be present in the value of
+ 'plugin.includes' property.
+ </description>
+</property>
+
+<property>
+ <name>http.auth.file</name>
+ <value>httpclient-auth.xml</value>
+ <description>Authentication configuration file for
+ 'protocol-httpclient' plugin.
+ </description>
+</property>
+
+<property>
<name>http.verbose</name>
<value>false</value>
<description>If true, HTTP will log more verbosely.</description>
@@ -89,6 +89,7 @@
<ant dir="languageidentifier" target="test"/>
<ant dir="lib-http" target="test"/>
<ant dir="ontology" target="test"/>
+ <ant dir="protocol-httpclient" target="test"/>
<!--ant dir="parse-ext" target="test"/-->
<ant dir="parse-html" target="test"/>
<!-- <ant dir="parse-mp3" target="test"/> -->
@@ -27,6 +27,23 @@
<fileset dir="${nutch.root}/build">
<include name="**/lib-http/*.jar" />
</fileset>
+ <fileset dir="${nutch.root}/lib/jetty-ext">
+ <include name="*.jar"/>
+ <exclude name="ant.jar"/>
+ </fileset>
+ <pathelement location="${build.dir}/test/conf"/>
</path>
+ <target name="deps-test">
+ <copy toDir="${build.test}">
+ <fileset dir="${src.test}" excludes="**/*.java"/>
+ </copy>
+ </target>
+
+ <!-- for junit test -->
+ <mkdir dir="${build.test}/data" />
+ <copy todir="${build.test}/data">
+ <fileset dir="jsp"/>
+ </copy>
+
</project>
@@ -0,0 +1,77 @@
+<%--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+--%>
+<%--
+ This JSP demonstrates basic authentication. When this JSP page is
+ requested with no query parameters, then the user must enter the
+ username as 'userx' and password as 'passx' when prompted for
+ authentication. Apart from this there are a few other test cases,
+ which can be used by passing a test case number as query parameter in
+ the following manner: basic.jsp?case=1, basic.jsp?case=2, etc.
+ The credentials for each test case can be easily figured out from the
+ code below.
+
+ Author: Susam Pal
+--%>
+<%@ page
+ import = "sun.misc.BASE64Decoder"
+%>
+<%
+ String authHeader = request.getHeader("Authorization");
+ String realm = null;
+ String username = null;
+ String password = null;
+ int testCase = 0;
+ try {
+ testCase = Integer.parseInt(request.getParameter("case"));
+ } catch (Exception ex) {
+ // do nothing
+ }
+ switch (testCase) {
+ case 1:
+ realm = "realm1"; username = "user1"; password = "pass1";
+ break;
+
+ case 2:
+ realm = "realm2"; username = "user2"; password = "pass2";
+ break;
+
+ default:
+ realm = "realmx"; username = "userx"; password = "passx";
+ break;
+ }
+
+ boolean authenticated = false;
+ if (authHeader != null && authHeader.toUpperCase().startsWith("BASIC")) {
+ String creds[] = new String(new BASE64Decoder().decodeBuffer(
+ authHeader.substring(6))).split(":", 2);
+ if (creds[0].equals(username) && creds[1].equals(password))
+ authenticated = true;
+ }
+ if (!authenticated) {
+ response.setHeader("WWW-Authenticate", "Basic realm=\"" + realm + "\"");
+ response.sendError(response.SC_UNAUTHORIZED);
+ } else {
+%>
+<html>
+<head><title>Basic Authentication Test</title></head>
+<body>
+<p>Hi <%= username %>, you have been successfully authenticated.</p>
+</body>
+</html>
+<%
+ }
+%>
@@ -0,0 +1,65 @@
+<%--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+--%>
+<%--
+ This JSP tests whether the client can remember cookies. When the JSP
+ is fetched for the first time without any query parameters, it sets
+ a few cookies in the client. On a second request, with the query
+ parameter, 'cookie=yes', it checks whether all the client has sent
+ the cookies. If the cookies are found, HTTP 200 response is returned.
+ If the cookies are not found, HTTP 403 response is returned.
+
+ Author: Susam Pal
+--%>
+<%
+ String cookieParam = request.getParameter("cookie");
+ if (!"yes".equals(cookieParam)) { // Send cookies
+ response.addCookie(new Cookie("var1", "val1"));
+ response.addCookie(new Cookie("var2", "val2"));
+%>
+<html>
+<head><title>Cookies Set</title></head>
+<body><p>Cookies have been set.</p></body>
+</html>
+<%
+ } else { // Check cookies
+ int cookiesCount = 0;
+
+ Cookie[] cookies = request.getCookies();
+ if (cookies != null) {
+ for (int i = 0; i < cookies.length; i++) {
+ if (cookies[i].getName().equals("var1")
+ && cookies[i].getValue().equals("val1"))
+ cookiesCount++;
+
+ if (cookies[i].getName().equals("var2")
+ && cookies[i].getValue().equals("val2"))
+ cookiesCount++;
+ }
+ }
+
+ if (cookiesCount != 2) {
+ response.sendError(response.SC_FORBIDDEN);
+ } else {
+%>
+<html>
+<head><title>Cookies Found</title></head>
+<body><p>Cookies found!</p></body>
+</html>
+<%
+ }
+ }
+%>
@@ -0,0 +1,71 @@
+<%--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+--%>
+<%--
+ This JSP tests digest authentication. It generates an HTTP response
+ with authorization header for digest authentication and checks the
+ user-name supplied by the client. It does not check the other
+ parameters and hashes as controlled JUnit tests would be performed
+ against this and only the proper submission of credentials need to
+ be tested.
+
+ Author: Susam Pal
+--%>
+<%@ page
+ import = "java.util.StringTokenizer"
+ import = "java.util.HashMap"
+%>
+<%
+ String username = "digest_user";
+ String authHeader = request.getHeader("Authorization");
+
+ boolean authenticated = false;
+ if (authHeader != null && authHeader.toUpperCase().startsWith("DIGEST")) {
+ HashMap map = new HashMap();
+ StringTokenizer tokenizer = new StringTokenizer(
+ authHeader.substring(7).trim(), ",");
+ while (tokenizer.hasMoreTokens()) {
+ String[] param = tokenizer.nextToken().trim().split("=", 2);
+ if (param[1].charAt(0) == '"') {
+ param[1] = param[1].substring(1, param[1].length() - 1);
+ }
+ map.put(param[0], param[1]);
+ }
+
+ if (username.equals((String)map.get("username")))
+ authenticated = true;
+ }
+
+ if (!authenticated) {
+ String realm = "realm=\"realm1\"";
+ String qop = "qop=\"auth,auth-int\"";
+ String nonce = "nonce=\"dcd98b7102dd2f0e8b11d0f600bfb0c093\"";
+ String opaque = "opaque=\"5ccc069c403ebaf9f0171e9517f40e41\"";
+
+ response.setHeader("WWW-Authenticate", "Digest " + realm + ", "
+ + qop + ", " + nonce + ", " + opaque);
+ response.sendError(response.SC_UNAUTHORIZED);
+ } else {
+%>
+<html>
+<head><title>Digest Authentication Test</title></head>
+<body>
+<p>Hi <%= username %>, you have been successfully authenticated.</p>
+</body>
+</html>
+<%
+ }
+%>
Oops, something went wrong. Retry.

0 comments on commit f103811

Please sign in to comment.