Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Add of NTLM support in order parse IIS site #52

Open
wants to merge 2 commits into from

2 participants

@mrcanard

I was trying to parse a website at my work by using your awesome anemone spider and had to fight a little with ntlm protocol to make it work.

I think other corporate people may experience the problem, hence the following pull request.

Thanls for your great job on anemone by the way !

@mtkd

I was fighting to get Anemone working with IIS using basic auth last week.

Hit same problem with curl today which was fixed by adding --ntlm and I've just realised that NTLM was probably the issue I had with Anemone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 24, 2012
  1. @mrcanard

    NTLM parameters in core.rb

    mrcanard authored
  2. @mrcanard

    Ajout du support NTLM

    mrcanard authored
This page is out of date. Refresh to see the latest.
Showing with 26 additions and 6 deletions.
  1. +1 −0  anemone.gemspec
  2. +9 −1 lib/anemone/core.rb
  3. +16 −5 lib/anemone/http.rb
View
1  anemone.gemspec
@@ -14,6 +14,7 @@ spec = Gem::Specification.new do |s|
s.add_dependency("nokogiri", ">= 1.3.0")
s.add_dependency("robotex", ">= 1.0.0")
+ s.add_development_dependency "ruby-ntlm", ">=0.0.1"
s.add_development_dependency "rake", ">=0.9.2"
s.add_development_dependency "rdoc", ">=3.12"
s.add_development_dependency "rspec", ">=2.8.0"
View
10 lib/anemone/core.rb
@@ -55,7 +55,15 @@ class Core
# proxy server port number
:proxy_port => false,
# HTTP read timeout in seconds
- :read_timeout => nil
+ :read_timeout => nil,
+ # Are we using NTLM protocol ?
+ :use_ntlm => false,
+ # NTLM user name
+ :ntlm_user => nil,
+ # NTLM domain name
+ :ntlm_domain => nil,
+ # NTLM password
+ :ntlm_password => nil
}
# Create setter methods for all options to be called from the crawl block
View
21 lib/anemone/http.rb
@@ -132,11 +132,22 @@ def get_response(url, referer = nil)
retries = 0
begin
start = Time.now()
- # format request
- req = Net::HTTP::Get.new(full_path, opts)
- # HTTP Basic authentication
- req.basic_auth url.user, url.password if url.user
- response = connection(url).request(req)
+ req = nil
+ response = nil
+ if ! @opts[:use_ntlm]
+ # format request
+ req = Net::HTTP::Get.new(full_path, opts)
+ # HTTP Basic authentication
+ req.basic_auth url.user, url.password if url.user
+ response = connection(url).request(req)
+ else
+ require 'ntlm/http'
+ # format request
+ req = Net::HTTP::Get.new(full_path, opts)
+ # NTLM authentication
+ req.ntlm_auth(@opts[:ntlm_user], @opts[:ntlm_domain], @opts[:ntlm_password])
+ response = connection(url).request(req)
+ end
finish = Time.now()
response_time = ((finish - start) * 1000).round
@cookie_store.merge!(response['Set-Cookie']) if accept_cookies?
Something went wrong with that request. Please try again.