Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Add of NTLM support in order parse IIS site #52

Open
wants to merge 2 commits into from

2 participants

mrcanard Matt Kydd
mrcanard

I was trying to parse a website at my work by using your awesome anemone spider and had to fight a little with ntlm protocol to make it work.

I think other corporate people may experience the problem, hence the following pull request.

Thanls for your great job on anemone by the way !

Matt Kydd

I was fighting to get Anemone working with IIS using basic auth last week.

Hit same problem with curl today which was fixed by adding --ntlm and I've just realised that NTLM was probably the issue I had with Anemone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 24, 2012
  1. mrcanard

    NTLM parameters in core.rb

    mrcanard authored
  2. mrcanard

    Ajout du support NTLM

    mrcanard authored
This page is out of date. Refresh to see the latest.
Showing with 26 additions and 6 deletions.
  1. +1 −0  anemone.gemspec
  2. +9 −1 lib/anemone/core.rb
  3. +16 −5 lib/anemone/http.rb
1  anemone.gemspec
View
@@ -14,6 +14,7 @@ spec = Gem::Specification.new do |s|
s.add_dependency("nokogiri", ">= 1.3.0")
s.add_dependency("robotex", ">= 1.0.0")
+ s.add_development_dependency "ruby-ntlm", ">=0.0.1"
s.add_development_dependency "rake", ">=0.9.2"
s.add_development_dependency "rdoc", ">=3.12"
s.add_development_dependency "rspec", ">=2.8.0"
10 lib/anemone/core.rb
View
@@ -55,7 +55,15 @@ class Core
# proxy server port number
:proxy_port => false,
# HTTP read timeout in seconds
- :read_timeout => nil
+ :read_timeout => nil,
+ # Are we using NTLM protocol ?
+ :use_ntlm => false,
+ # NTLM user name
+ :ntlm_user => nil,
+ # NTLM domain name
+ :ntlm_domain => nil,
+ # NTLM password
+ :ntlm_password => nil
}
# Create setter methods for all options to be called from the crawl block
21 lib/anemone/http.rb
View
@@ -132,11 +132,22 @@ def get_response(url, referer = nil)
retries = 0
begin
start = Time.now()
- # format request
- req = Net::HTTP::Get.new(full_path, opts)
- # HTTP Basic authentication
- req.basic_auth url.user, url.password if url.user
- response = connection(url).request(req)
+ req = nil
+ response = nil
+ if ! @opts[:use_ntlm]
+ # format request
+ req = Net::HTTP::Get.new(full_path, opts)
+ # HTTP Basic authentication
+ req.basic_auth url.user, url.password if url.user
+ response = connection(url).request(req)
+ else
+ require 'ntlm/http'
+ # format request
+ req = Net::HTTP::Get.new(full_path, opts)
+ # NTLM authentication
+ req.ntlm_auth(@opts[:ntlm_user], @opts[:ntlm_domain], @opts[:ntlm_password])
+ response = connection(url).request(req)
+ end
finish = Time.now()
response_time = ((finish - start) * 1000).round
@cookie_store.merge!(response['Set-Cookie']) if accept_cookies?
Something went wrong with that request. Please try again.